ml-finance-python
python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb
(9494B)
1 {
2 "cells": [
3 {
4 "cell_type": "code",
5 "execution_count": 1,
6 "metadata": {
7 "collapsed": true
8 },
9 "outputs": [],
10 "source": [
11 "from quantopian.pipeline import Pipeline\n",
12 "from quantopian.research import run_pipeline\n",
13 "from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume"
14 ]
15 },
16 {
17 "cell_type": "markdown",
18 "metadata": {},
19 "source": [
20 "##Classifiers\n",
21 "A classifier is a function from an asset and a moment in time to a [categorical output](https://en.wikipedia.org/wiki/Categorical_variable) such as a `string` or `integer` label:\n",
22 "```\n",
23 "F(asset, timestamp) -> category\n",
24 "```\n",
25 "An example of a classifier producing a string output is the exchange ID of a security. To create this classifier, we'll have to import `Fundamentals.exchange_id` and use the [latest](https://www.quantopian.com/tutorials/pipeline#lesson3) attribute to instantiate our classifier:"
26 ]
27 },
28 {
29 "cell_type": "code",
30 "execution_count": 2,
31 "metadata": {
32 "collapsed": true
33 },
34 "outputs": [],
35 "source": [
36 "from quantopian.pipeline.data import Fundamentals\n",
37 "\n",
38 "# Since the underlying data of Fundamentals.exchange_id\n",
39 "# is of type string, .latest returns a Classifier\n",
40 "exchange = Fundamentals.exchange_id.latest"
41 ]
42 },
43 {
44 "cell_type": "markdown",
45 "metadata": {},
46 "source": [
47 "Previously, we saw that the `latest` attribute produced an instance of a `Factor`. In this case, since the underlying data is of type `string`, `latest` produces a `Classifier`.\n",
48 "\n",
49 "Similarly, a computation producing the latest Morningstar sector code of a security is a `Classifier`. In this case, the underlying type is an `int`, but the integer doesn't represent a numerical value (it's a category) so it produces a classifier. To get the latest sector code, we can use the built-in `Sector` classifier."
50 ]
51 },
52 {
53 "cell_type": "code",
54 "execution_count": 3,
55 "metadata": {
56 "collapsed": false
57 },
58 "outputs": [],
59 "source": [
60 "from quantopian.pipeline.classifiers.fundamentals import Sector \n",
61 "morningstar_sector = Sector()"
62 ]
63 },
64 {
65 "cell_type": "markdown",
66 "metadata": {},
67 "source": [
68 "Using `Sector` is equivalent to `Fundamentals.morningstar_sector_code.latest`."
69 ]
70 },
71 {
72 "cell_type": "markdown",
73 "metadata": {},
74 "source": [
75 "###Building Filters from Classifiers\n",
76 "Classifiers can also be used to produce filters with methods like `isnull`, `eq`, and `startswith`. The full list of `Classifier` methods producing `Filters` can be found [here](https://www.quantopian.com/help#quantopian_pipeline_classifiers_Classifier).\n",
77 "\n",
78 "As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the `eq` method of our `exchange` classifier."
79 ]
80 },
81 {
82 "cell_type": "code",
83 "execution_count": 4,
84 "metadata": {
85 "collapsed": true
86 },
87 "outputs": [],
88 "source": [
89 "nyse_filter = exchange.eq('NYS')"
90 ]
91 },
92 {
93 "cell_type": "markdown",
94 "metadata": {},
95 "source": [
96 "This filter will return `True` for securities having `'NYS'` as their most recent `exchange_id`."
97 ]
98 },
99 {
100 "cell_type": "markdown",
101 "metadata": {},
102 "source": [
103 "###Quantiles\n",
104 "Classifiers can also be produced from various `Factor` methods. The most general of these is the `quantiles` method which accepts a bin count as an argument. The `quantiles` method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a `Classifier` with these labels. `NaN`s are labeled with -1. Aliases are available for [quartiles](https://www.quantopian.com/help/#quantopian_pipeline_factors_Factor_quartiles) (`quantiles(4)`), [quintiles](https://www.quantopian.com/help/#quantopian_pipeline_factors_Factor_quintiles) (`quantiles(5)`), and [deciles](https://www.quantopian.com/help/#quantopian_pipeline_factors_Factor_deciles) (`quantiles(10)`). As an example, this is what a filter for the top decile of a factor might look like:"
105 ]
106 },
107 {
108 "cell_type": "code",
109 "execution_count": 5,
110 "metadata": {
111 "collapsed": true
112 },
113 "outputs": [],
114 "source": [
115 "dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()\n",
116 "top_decile = (dollar_volume_decile.eq(9))"
117 ]
118 },
119 {
120 "cell_type": "markdown",
121 "metadata": {},
122 "source": [
123 "Let's put each of our classifiers into a pipeline and run it to see what they look like."
124 ]
125 },
126 {
127 "cell_type": "code",
128 "execution_count": 4,
129 "metadata": {
130 "collapsed": false
131 },
132 "outputs": [],
133 "source": [
134 "def make_pipeline():\n",
135 " exchange = Fundamentals.exchange_id.latest\n",
136 " nyse_filter = exchange.eq('NYS')\n",
137 "\n",
138 " morningstar_sector = Sector()\n",
139 "\n",
140 " dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()\n",
141 " top_decile = (dollar_volume_decile.eq(9))\n",
142 "\n",
143 " return Pipeline(\n",
144 " columns={\n",
145 " 'exchange': exchange,\n",
146 " 'sector_code': morningstar_sector,\n",
147 " 'dollar_volume_decile': dollar_volume_decile\n",
148 " },\n",
149 " screen=(nyse_filter & top_decile)\n",
150 " )"
151 ]
152 },
153 {
154 "cell_type": "code",
155 "execution_count": 5,
156 "metadata": {
157 "collapsed": false
158 },
159 "outputs": [
160 {
161 "name": "stdout",
162 "output_type": "stream",
163 "text": [
164 "Number of securities that passed the filter: 513\n"
165 ]
166 },
167 {
168 "data": {
169 "text/html": [
170 "<div>\n",
171 "<table border=\"1\" class=\"dataframe\">\n",
172 " <thead>\n",
173 " <tr style=\"text-align: right;\">\n",
174 " <th></th>\n",
175 " <th></th>\n",
176 " <th>dollar_volume_decile</th>\n",
177 " <th>exchange</th>\n",
178 " <th>sector_code</th>\n",
179 " </tr>\n",
180 " </thead>\n",
181 " <tbody>\n",
182 " <tr>\n",
183 " <th rowspan=\"5\" valign=\"top\">2015-05-05 00:00:00+00:00</th>\n",
184 " <th>Equity(2 [ARNC])</th>\n",
185 " <td>9</td>\n",
186 " <td>NYS</td>\n",
187 " <td>101</td>\n",
188 " </tr>\n",
189 " <tr>\n",
190 " <th>Equity(62 [ABT])</th>\n",
191 " <td>9</td>\n",
192 " <td>NYS</td>\n",
193 " <td>206</td>\n",
194 " </tr>\n",
195 " <tr>\n",
196 " <th>Equity(64 [ABX])</th>\n",
197 " <td>9</td>\n",
198 " <td>NYS</td>\n",
199 " <td>101</td>\n",
200 " </tr>\n",
201 " <tr>\n",
202 " <th>Equity(76 [TAP])</th>\n",
203 " <td>9</td>\n",
204 " <td>NYS</td>\n",
205 " <td>205</td>\n",
206 " </tr>\n",
207 " <tr>\n",
208 " <th>Equity(128 [ADM])</th>\n",
209 " <td>9</td>\n",
210 " <td>NYS</td>\n",
211 " <td>205</td>\n",
212 " </tr>\n",
213 " </tbody>\n",
214 "</table>\n",
215 "</div>"
216 ],
217 "text/plain": [
218 " dollar_volume_decile exchange \\\n",
219 "2015-05-05 00:00:00+00:00 Equity(2 [ARNC]) 9 NYS \n",
220 " Equity(62 [ABT]) 9 NYS \n",
221 " Equity(64 [ABX]) 9 NYS \n",
222 " Equity(76 [TAP]) 9 NYS \n",
223 " Equity(128 [ADM]) 9 NYS \n",
224 "\n",
225 " sector_code \n",
226 "2015-05-05 00:00:00+00:00 Equity(2 [ARNC]) 101 \n",
227 " Equity(62 [ABT]) 206 \n",
228 " Equity(64 [ABX]) 101 \n",
229 " Equity(76 [TAP]) 205 \n",
230 " Equity(128 [ADM]) 205 "
231 ]
232 },
233 "execution_count": 5,
234 "metadata": {},
235 "output_type": "execute_result"
236 }
237 ],
238 "source": [
239 "result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')\n",
240 "print 'Number of securities that passed the filter: %d' % len(result)\n",
241 "result.head(5)"
242 ]
243 },
244 {
245 "cell_type": "markdown",
246 "metadata": {},
247 "source": [
248 "Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as [demean](https://www.quantopian.com/help#quantopian_pipeline_factors_Factor_demean) and [groupby](https://www.quantopian.com/help#quantopian_pipeline_factors_Factor_groupby) are outside the scope of this tutorial. A future tutorial will cover more advanced uses for classifiers.\n",
249 "\n",
250 "In the next lesson, we'll look at the different datasets that we can use in pipeline."
251 ]
252 }
253 ],
254 "metadata": {
255 "kernelspec": {
256 "display_name": "Python 2",
257 "language": "python",
258 "name": "python2"
259 },
260 "language_info": {
261 "codemirror_mode": {
262 "name": "ipython",
263 "version": 2
264 },
265 "file_extension": ".py",
266 "mimetype": "text/x-python",
267 "name": "python",
268 "nbconvert_exporter": "python",
269 "pygments_lexer": "ipython2",
270 "version": "2.7.12"
271 }
272 },
273 "nbformat": 4,
274 "nbformat_minor": 0
275 }