ml-finance-python
python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
README.md
(6674B)
1 ## 01 Working with Market Data: NASDAQ_TotalView-ITCH Order Book
2
3 While FIX has a dominant large market share, exchanges also offer native protocols. The Nasdaq offers a TotalView ITCH direct data-feed protocol that allows subscribers to track individual orders for equity instruments from placement to execution or cancellation.
4
5 As a result, it allows for the reconstruction of the order book that keeps track of the list of active-limit buy and sell orders for a specific security or financial instrument. The order book reveals the market depth throughout the day by listing the number of shares being bid or offered at each price point. It may also identify the market participant responsible for specific buy and sell orders unless it is placed anonymously. Market depth is a key indicator of liquidity and the potential price impact of sizable market orders.
6
7 In addition to matching market and limit orders, the Nasdaq also operates auctions or crosses that execute a large number of trades at market opening and closing. Crosses are becoming more important as passive investing continues to grow and traders look for opportunities to execute larger blocks of stock. TotalView also disseminates the Net Order Imbalance Indicator (NOII) for the Nasdaq opening and closing crosses and Nasdaq IPO/Halt cross.
8
9 ### Parsing Binary ITCH Messages
10
11 The ITCH v5.0 specification declares over 20 message types related to system events, stock characteristics, the placement and modification of limit orders, and trade execution. It also contains information about the net order imbalance before the open and closing cross.
12
13 The Nasdaq offers samples of daily binary files for several months. The GitHub repository for this chapter contains a notebook, build_order_book.ipynb that illustrates how to parse a sample file of ITCH messages and reconstruct both the executed trades and the order book for any given tick.
14
15 The following table shows the frequency of the most common message types for the sample file used in the book (dated March 29, 2018). The code meanwhile updated to use a new sample from March 27, 2019.
16
17 | Message type | Order book impact | Number of messages |
18 |:------------:|------------------------------------------------------------------------------------|-------------------:|
19 | A | New unattributed limit order | 136,522,761 |
20 | D | Order canceled | 133,811,007 |
21 | U | Order canceled and replaced | 21,941,015 |
22 | E | Full or partial execution; possibly multiple messages for the same original order | 6,687,379 |
23 | X | Modified after partial cancellation | 5,088,959 |
24 | F | Add attributed order | 2,718,602 |
25 | P | Trade Message (non-cross) | 1,120,861 |
26 | C | Executed in whole or in part at a price different from the initial display price | 157,442 |
27 | Q | Cross Trade Message | 17,233 |
28
29 For each message, the specification lays out the components and their respective length and data types:
30
31
32 | Name | Offset | Length | Value | Notes |
33 |-------------------------|---------|---------|------------|--------------------------------------------------------------------------------------|
34 | Message Type | 0 | 1 | S | System Event Message |
35 | Stock Locate | 1 | 2 | Integer | Always 0 |
36 | Tracking Number | 3 | 2 | Integer | Nasdaq internal tracking number |
37 | Timestamp | 5 | 6 | Integer | Nanoseconds since midnight |
38 | Order Reference Number | 11 | 8 | Integer | The unique reference number assigned to the new order at the time of receipt. |
39 | Buy/Sell Indicator | 19 | 1 | Alpha | The type of order being added. B = Buy Order. S = Sell Order. |
40 | Shares | 20 | 4 | Integer | The total number of shares associated with the order being added to the book. |
41 | Stock | 24 | 8 | Alpha | Stock symbol, right padded with spaces |
42 | Price | 32 | 4 | Price (4) | The display price of the new order. Refer to Data Types for field processing notes. |
43 | Attribution | 36 | 4 | Alpha | Nasdaq Market participant identifier associated with the entered order |
44
45 The notebook [01_build_itch_order_book](01_build_itch_order_book.ipynb) contains the code to
46 - download NASDAQ Total View sample tick data,
47 - parse the messages from the binary source data
48 - reconstruct the order book for a given stock
49 - visualize order flow data
50 - normalize tick data
51
52 The code has been updated to use the latest NASDAQ sample file dated March 27, 2019.
53
54 Warning: the tick data is around 12GB in size and some processing steps can take several hours on a 4-core i7 CPU with 32GB RAM.
55
56 ### Regularizing tick data
57 The trade data is indexed by nanoseconds and is very noisy. The bid-ask bounce, for instance, causes the price to oscillate between the bid and ask prices when trade initiation alternates between buy and sell market orders. To improve the noise-signal ratio and improve the statistical properties, we need to resample and regularize the tick data by aggregating the trading activity.
58
59 We typically collect the open (first), low, high, and closing (last) price for the aggregated period, alongside the volume-weighted average price (VWAP), the number of shares traded, and the timestamp associated with the data.
60
61 The notebook [02_normalize_tick_data](02_normalize_tick_data.ipynb) illustrates how to normalize noisy tick using time and volume bars that use different aggregation methods.