ml-finance-python
python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb
(34539B)
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {
6 "collapsed": true
7 },
8 "source": [
9 "# EventVestor: Earnings Releases\n",
10 "\n",
11 "In this notebook, we'll take a look at EventVestor's *Earnings Releases* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents quarterly earnings releases.\n",
12 "\n",
13 "### Blaze\n",
14 "Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n",
15 "\n",
16 "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n",
17 "\n",
18 "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n",
19 "\n",
20 "Helpful links:\n",
21 "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n",
22 "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n",
23 "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n",
24 "\n",
25 "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n",
26 "> `from odo import odo` \n",
27 "> `odo(expr, pandas.DataFrame)`\n",
28 "\n",
29 "### Free samples and limits\n",
30 "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n",
31 "\n",
32 "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n",
33 "\n",
34 "With preamble in place, let's get started:"
35 ]
36 },
37 {
38 "cell_type": "code",
39 "execution_count": 1,
40 "metadata": {
41 "collapsed": false
42 },
43 "outputs": [],
44 "source": [
45 "# import the dataset\n",
46 "from quantopian.interactive.data.eventvestor import earnings_releases\n",
47 "# or if you want to import the free dataset, use:\n",
48 "# from quantopian.interactivedata.eventvestor import earnings_releases_free\n",
49 "\n",
50 "# import data operations\n",
51 "from odo import odo\n",
52 "# import other libraries we will use\n",
53 "import pandas as pd"
54 ]
55 },
56 {
57 "cell_type": "code",
58 "execution_count": 2,
59 "metadata": {
60 "collapsed": false
61 },
62 "outputs": [
63 {
64 "data": {
65 "text/plain": [
66 "dshape(\"\"\"var * {\n",
67 " event_id: ?float64,\n",
68 " asof_date: datetime,\n",
69 " trade_date: ?datetime,\n",
70 " symbol: ?string,\n",
71 " event_type: ?string,\n",
72 " event_headline: ?string,\n",
73 " event_phase: ?string,\n",
74 " fiscal_period: ?string,\n",
75 " calendar_period: ?string,\n",
76 " fiscal_periodend: ?datetime,\n",
77 " currency: ?string,\n",
78 " revenue: ?float64,\n",
79 " gross_income: ?float64,\n",
80 " operating_income: ?float64,\n",
81 " net_income: ?float64,\n",
82 " eps: ?float64,\n",
83 " eps_surprisepct: ?float64,\n",
84 " event_rating: ?float64,\n",
85 " timestamp: datetime,\n",
86 " sid: ?int64\n",
87 " }\"\"\")"
88 ]
89 },
90 "execution_count": 2,
91 "metadata": {},
92 "output_type": "execute_result"
93 }
94 ],
95 "source": [
96 "# Let's use blaze to understand the data a bit using Blaze dshape()\n",
97 "earnings_releases.dshape"
98 ]
99 },
100 {
101 "cell_type": "code",
102 "execution_count": 3,
103 "metadata": {
104 "collapsed": false
105 },
106 "outputs": [
107 {
108 "data": {
109 "text/html": [
110 "139427"
111 ],
112 "text/plain": [
113 "139427"
114 ]
115 },
116 "execution_count": 3,
117 "metadata": {},
118 "output_type": "execute_result"
119 }
120 ],
121 "source": [
122 "# And how many rows are there?\n",
123 "# N.B. we're using a Blaze function to do this, not len()\n",
124 "earnings_releases.count()"
125 ]
126 },
127 {
128 "cell_type": "code",
129 "execution_count": 4,
130 "metadata": {
131 "collapsed": false
132 },
133 "outputs": [
134 {
135 "data": {
136 "text/html": [
137 "<table border=\"1\" class=\"dataframe\">\n",
138 " <thead>\n",
139 " <tr style=\"text-align: right;\">\n",
140 " <th></th>\n",
141 " <th>event_id</th>\n",
142 " <th>asof_date</th>\n",
143 " <th>trade_date</th>\n",
144 " <th>symbol</th>\n",
145 " <th>event_type</th>\n",
146 " <th>event_headline</th>\n",
147 " <th>event_phase</th>\n",
148 " <th>fiscal_period</th>\n",
149 " <th>calendar_period</th>\n",
150 " <th>fiscal_periodend</th>\n",
151 " <th>currency</th>\n",
152 " <th>revenue</th>\n",
153 " <th>gross_income</th>\n",
154 " <th>operating_income</th>\n",
155 " <th>net_income</th>\n",
156 " <th>eps</th>\n",
157 " <th>eps_surprisepct</th>\n",
158 " <th>event_rating</th>\n",
159 " <th>timestamp</th>\n",
160 " <th>sid</th>\n",
161 " </tr>\n",
162 " </thead>\n",
163 " <tbody>\n",
164 " <tr>\n",
165 " <th>0</th>\n",
166 " <td>526391</td>\n",
167 " <td>2007-01-03</td>\n",
168 " <td>2007-01-04</td>\n",
169 " <td>ANGO</td>\n",
170 " <td>Earnings Release</td>\n",
171 " <td>AngioDynamics 2Q Net up 48%</td>\n",
172 " <td>NaN</td>\n",
173 " <td>2Q 07</td>\n",
174 " <td>4Q 06</td>\n",
175 " <td>2006-12-02</td>\n",
176 " <td>$</td>\n",
177 " <td>24.37</td>\n",
178 " <td>14.24</td>\n",
179 " <td>3.0</td>\n",
180 " <td>2.45</td>\n",
181 " <td>0.15</td>\n",
182 " <td>0</td>\n",
183 " <td>1</td>\n",
184 " <td>2007-01-04</td>\n",
185 " <td>26324</td>\n",
186 " </tr>\n",
187 " <tr>\n",
188 " <th>1</th>\n",
189 " <td>196507</td>\n",
190 " <td>2007-01-03</td>\n",
191 " <td>2007-01-04</td>\n",
192 " <td>BLUD</td>\n",
193 " <td>Earnings Release</td>\n",
194 " <td>Immucor Reports 2Q Results</td>\n",
195 " <td>NaN</td>\n",
196 " <td>NaN</td>\n",
197 " <td>NaN</td>\n",
198 " <td>NaT</td>\n",
199 " <td>NaN</td>\n",
200 " <td>0.00</td>\n",
201 " <td>0.00</td>\n",
202 " <td>0.0</td>\n",
203 " <td>0.00</td>\n",
204 " <td>0.00</td>\n",
205 " <td>0</td>\n",
206 " <td>1</td>\n",
207 " <td>2007-01-04</td>\n",
208 " <td>955</td>\n",
209 " </tr>\n",
210 " <tr>\n",
211 " <th>2</th>\n",
212 " <td>180559</td>\n",
213 " <td>2007-01-03</td>\n",
214 " <td>2007-01-03</td>\n",
215 " <td>CALM</td>\n",
216 " <td>Earnings Release</td>\n",
217 " <td>CAL-MAINE FOODS REPORTS 2Q 07 RESULTS</td>\n",
218 " <td>NaN</td>\n",
219 " <td>2Q 07</td>\n",
220 " <td>4Q 06</td>\n",
221 " <td>2006-12-02</td>\n",
222 " <td>$</td>\n",
223 " <td>137.74</td>\n",
224 " <td>24.96</td>\n",
225 " <td>10.5</td>\n",
226 " <td>6.40</td>\n",
227 " <td>0.27</td>\n",
228 " <td>0</td>\n",
229 " <td>1</td>\n",
230 " <td>2007-01-04</td>\n",
231 " <td>16169</td>\n",
232 " </tr>\n",
233 " </tbody>\n",
234 "</table>"
235 ],
236 "text/plain": [
237 " event_id asof_date trade_date symbol event_type \\\n",
238 "0 526391 2007-01-03 2007-01-04 ANGO Earnings Release \n",
239 "1 196507 2007-01-03 2007-01-04 BLUD Earnings Release \n",
240 "2 180559 2007-01-03 2007-01-03 CALM Earnings Release \n",
241 "\n",
242 " event_headline event_phase fiscal_period \\\n",
243 "0 AngioDynamics 2Q Net up 48% NaN 2Q 07 \n",
244 "1 Immucor Reports 2Q Results NaN NaN \n",
245 "2 CAL-MAINE FOODS REPORTS 2Q 07 RESULTS NaN 2Q 07 \n",
246 "\n",
247 " calendar_period fiscal_periodend currency revenue gross_income \\\n",
248 "0 4Q 06 2006-12-02 $ 24.37 14.24 \n",
249 "1 NaN NaT NaN 0.00 0.00 \n",
250 "2 4Q 06 2006-12-02 $ 137.74 24.96 \n",
251 "\n",
252 " operating_income net_income eps eps_surprisepct event_rating \\\n",
253 "0 3.0 2.45 0.15 0 1 \n",
254 "1 0.0 0.00 0.00 0 1 \n",
255 "2 10.5 6.40 0.27 0 1 \n",
256 "\n",
257 " timestamp sid \n",
258 "0 2007-01-04 26324 \n",
259 "1 2007-01-04 955 \n",
260 "2 2007-01-04 16169 "
261 ]
262 },
263 "execution_count": 4,
264 "metadata": {},
265 "output_type": "execute_result"
266 }
267 ],
268 "source": [
269 "# Let's see what the data looks like. We'll grab the first three rows.\n",
270 "earnings_releases[:3]"
271 ]
272 },
273 {
274 "cell_type": "markdown",
275 "metadata": {},
276 "source": [
277 "Let's go over the columns:\n",
278 "- **event_id**: the unique identifier for this event.\n",
279 "- **asof_date**: EventVestor's timestamp of event capture.\n",
280 "- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.\n",
281 "- **symbol**: stock ticker symbol of the affected company.\n",
282 "- **event_type**: this should always be *Earnings Release/Earnings release*.\n",
283 "- **event_headline**: a brief description of the event\n",
284 "- **event_phase**: the inclusion of this field is likely an error on the part of the data vendor. We're currently attempting to resolve this.\n",
285 "- **fiscal_period**: fiscal period for the reported earnings, such as 1Q 15, 2Q 15, etc.\n",
286 "- **calendar_period**: identifies the calendar period based on the fiscal period end date. E.g. if the fiscal period ends any time after the middle of a given calendar quarter, like 1Q 15, that calendar quarter will be assigned regardless of the fiscal quarter.\n",
287 "- **fiscal_periodend**: the last date for the reported earnings period.\n",
288 "- **currency**: currency used for reporting earnings.\n",
289 "- **revenue**: revenue in millions\n",
290 "- **gross_income**: gross income in millions\n",
291 "- **operating_income**: operating income in millions\n",
292 "- **net_income**: net income in millions\n",
293 "- **eps**: earnings per share, in the reported currency\n",
294 "- **eps_surprisepct**: the meaning of this column is presently uncertain. We're working with our data vendor to resolve this issue.\n",
295 "- **event_rating**: this is always 1. The meaning of this is uncertain.\n",
296 "- **timestamp**: this is our timestamp on when we registered the data.\n",
297 "- **sid**: the equity's unique identifier. Use this instead of the symbol."
298 ]
299 },
300 {
301 "cell_type": "markdown",
302 "metadata": {},
303 "source": [
304 "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n",
305 "\n",
306 "We can select columns and rows with ease. Below, we'll fetch all of Apple's entries from 2012."
307 ]
308 },
309 {
310 "cell_type": "code",
311 "execution_count": 5,
312 "metadata": {
313 "collapsed": false,
314 "scrolled": true
315 },
316 "outputs": [
317 {
318 "data": {
319 "text/html": [
320 "<table border=\"1\" class=\"dataframe\">\n",
321 " <thead>\n",
322 " <tr style=\"text-align: right;\">\n",
323 " <th></th>\n",
324 " <th>event_id</th>\n",
325 " <th>asof_date</th>\n",
326 " <th>trade_date</th>\n",
327 " <th>symbol</th>\n",
328 " <th>event_type</th>\n",
329 " <th>event_headline</th>\n",
330 " <th>event_phase</th>\n",
331 " <th>fiscal_period</th>\n",
332 " <th>calendar_period</th>\n",
333 " <th>fiscal_periodend</th>\n",
334 " <th>currency</th>\n",
335 " <th>revenue</th>\n",
336 " <th>gross_income</th>\n",
337 " <th>operating_income</th>\n",
338 " <th>net_income</th>\n",
339 " <th>eps</th>\n",
340 " <th>eps_surprisepct</th>\n",
341 " <th>event_rating</th>\n",
342 " <th>timestamp</th>\n",
343 " <th>sid</th>\n",
344 " </tr>\n",
345 " </thead>\n",
346 " <tbody>\n",
347 " <tr>\n",
348 " <th>0</th>\n",
349 " <td>1385939</td>\n",
350 " <td>2012-01-24</td>\n",
351 " <td>2012-01-25</td>\n",
352 " <td>AAPL</td>\n",
353 " <td>Earnings Release</td>\n",
354 " <td>Apple 1Q 12 Net Jumps to $13B on Higher Revenues</td>\n",
355 " <td>NaN</td>\n",
356 " <td>1Q 12</td>\n",
357 " <td>4Q 11</td>\n",
358 " <td>2011-12-31</td>\n",
359 " <td>$</td>\n",
360 " <td>46333</td>\n",
361 " <td>20703</td>\n",
362 " <td>17340</td>\n",
363 " <td>13064</td>\n",
364 " <td>13.87</td>\n",
365 " <td>38.29</td>\n",
366 " <td>1</td>\n",
367 " <td>2012-01-25</td>\n",
368 " <td>24</td>\n",
369 " </tr>\n",
370 " <tr>\n",
371 " <th>1</th>\n",
372 " <td>1421108</td>\n",
373 " <td>2012-04-24</td>\n",
374 " <td>2012-04-25</td>\n",
375 " <td>AAPL</td>\n",
376 " <td>Earnings Release</td>\n",
377 " <td>Apple 2Q 12 Net Up 94% on Higher Revenues</td>\n",
378 " <td>NaN</td>\n",
379 " <td>2Q 12</td>\n",
380 " <td>1Q 12</td>\n",
381 " <td>2012-03-31</td>\n",
382 " <td>$</td>\n",
383 " <td>39186</td>\n",
384 " <td>18564</td>\n",
385 " <td>15384</td>\n",
386 " <td>11622</td>\n",
387 " <td>12.30</td>\n",
388 " <td>23.74</td>\n",
389 " <td>1</td>\n",
390 " <td>2012-04-25</td>\n",
391 " <td>24</td>\n",
392 " </tr>\n",
393 " <tr>\n",
394 " <th>2</th>\n",
395 " <td>1456685</td>\n",
396 " <td>2012-07-24</td>\n",
397 " <td>2012-07-25</td>\n",
398 " <td>AAPL</td>\n",
399 " <td>Earnings Release</td>\n",
400 " <td>Apple 3Q 12 Net Up 21%</td>\n",
401 " <td>NaN</td>\n",
402 " <td>3Q 12</td>\n",
403 " <td>2Q 12</td>\n",
404 " <td>2012-06-30</td>\n",
405 " <td>$</td>\n",
406 " <td>35023</td>\n",
407 " <td>14994</td>\n",
408 " <td>11573</td>\n",
409 " <td>8824</td>\n",
410 " <td>9.32</td>\n",
411 " <td>-10.21</td>\n",
412 " <td>1</td>\n",
413 " <td>2012-07-25</td>\n",
414 " <td>24</td>\n",
415 " </tr>\n",
416 " <tr>\n",
417 " <th>3</th>\n",
418 " <td>1496807</td>\n",
419 " <td>2012-10-25</td>\n",
420 " <td>2012-10-26</td>\n",
421 " <td>AAPL</td>\n",
422 " <td>Earnings Release</td>\n",
423 " <td>Apple 4Q 12 Net Up 24%</td>\n",
424 " <td>NaN</td>\n",
425 " <td>4Q 12</td>\n",
426 " <td>3Q 12</td>\n",
427 " <td>2012-09-29</td>\n",
428 " <td>$</td>\n",
429 " <td>35966</td>\n",
430 " <td>14401</td>\n",
431 " <td>10944</td>\n",
432 " <td>8223</td>\n",
433 " <td>8.67</td>\n",
434 " <td>-2.03</td>\n",
435 " <td>1</td>\n",
436 " <td>2012-10-26</td>\n",
437 " <td>24</td>\n",
438 " </tr>\n",
439 " </tbody>\n",
440 "</table>"
441 ],
442 "text/plain": [
443 " event_id asof_date trade_date symbol event_type \\\n",
444 "0 1385939 2012-01-24 2012-01-25 AAPL Earnings Release \n",
445 "1 1421108 2012-04-24 2012-04-25 AAPL Earnings Release \n",
446 "2 1456685 2012-07-24 2012-07-25 AAPL Earnings Release \n",
447 "3 1496807 2012-10-25 2012-10-26 AAPL Earnings Release \n",
448 "\n",
449 " event_headline event_phase fiscal_period \\\n",
450 "0 Apple 1Q 12 Net Jumps to $13B on Higher Revenues NaN 1Q 12 \n",
451 "1 Apple 2Q 12 Net Up 94% on Higher Revenues NaN 2Q 12 \n",
452 "2 Apple 3Q 12 Net Up 21% NaN 3Q 12 \n",
453 "3 Apple 4Q 12 Net Up 24% NaN 4Q 12 \n",
454 "\n",
455 " calendar_period fiscal_periodend currency revenue gross_income \\\n",
456 "0 4Q 11 2011-12-31 $ 46333 20703 \n",
457 "1 1Q 12 2012-03-31 $ 39186 18564 \n",
458 "2 2Q 12 2012-06-30 $ 35023 14994 \n",
459 "3 3Q 12 2012-09-29 $ 35966 14401 \n",
460 "\n",
461 " operating_income net_income eps eps_surprisepct event_rating \\\n",
462 "0 17340 13064 13.87 38.29 1 \n",
463 "1 15384 11622 12.30 23.74 1 \n",
464 "2 11573 8824 9.32 -10.21 1 \n",
465 "3 10944 8223 8.67 -2.03 1 \n",
466 "\n",
467 " timestamp sid \n",
468 "0 2012-01-25 24 \n",
469 "1 2012-04-25 24 \n",
470 "2 2012-07-25 24 \n",
471 "3 2012-10-26 24 "
472 ]
473 },
474 "execution_count": 5,
475 "metadata": {},
476 "output_type": "execute_result"
477 }
478 ],
479 "source": [
480 "# get apple's sid first\n",
481 "aapl_sid = symbols('AAPL').sid\n",
482 "aapl_earnings = earnings_releases[('2011-12-31' < earnings_releases['asof_date']) & (earnings_releases['asof_date'] <'2013-01-01') & (earnings_releases.sid==aapl_sid)]\n",
483 "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n",
484 "aapl_earnings.sort('asof_date')"
485 ]
486 },
487 {
488 "cell_type": "markdown",
489 "metadata": {},
490 "source": [
491 "Now suppose we want a DataFrame of all earnings releases with revenue over 30 billion dollars. For those earnings releases, we only want the sid and the asof_date."
492 ]
493 },
494 {
495 "cell_type": "code",
496 "execution_count": 6,
497 "metadata": {
498 "collapsed": false
499 },
500 "outputs": [
501 {
502 "data": {
503 "text/html": [
504 "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
505 "<table border=\"1\" class=\"dataframe\">\n",
506 " <thead>\n",
507 " <tr style=\"text-align: right;\">\n",
508 " <th></th>\n",
509 " <th>sid</th>\n",
510 " <th>asof_date</th>\n",
511 " <th>revenue</th>\n",
512 " </tr>\n",
513 " </thead>\n",
514 " <tbody>\n",
515 " <tr>\n",
516 " <th>510</th>\n",
517 " <td>26503</td>\n",
518 " <td>2013-01-24</td>\n",
519 " <td>8743000</td>\n",
520 " </tr>\n",
521 " <tr>\n",
522 " <th>491</th>\n",
523 " <td>26503</td>\n",
524 " <td>2012-10-26</td>\n",
525 " <td>7593000</td>\n",
526 " </tr>\n",
527 " <tr>\n",
528 " <th>657</th>\n",
529 " <td>26503</td>\n",
530 " <td>2015-04-22</td>\n",
531 " <td>7022000</td>\n",
532 " </tr>\n",
533 " <tr>\n",
534 " <th>474</th>\n",
535 " <td>26503</td>\n",
536 " <td>2012-07-26</td>\n",
537 " <td>6910000</td>\n",
538 " </tr>\n",
539 " <tr>\n",
540 " <th>529</th>\n",
541 " <td>26503</td>\n",
542 " <td>2013-04-22</td>\n",
543 " <td>6803000</td>\n",
544 " </tr>\n",
545 " <tr>\n",
546 " <th>98</th>\n",
547 " <td>7543</td>\n",
548 " <td>2008-02-05</td>\n",
549 " <td>6709983</td>\n",
550 " </tr>\n",
551 " <tr>\n",
552 " <th>342</th>\n",
553 " <td>26503</td>\n",
554 " <td>2010-10-21</td>\n",
555 " <td>6698000</td>\n",
556 " </tr>\n",
557 " <tr>\n",
558 " <th>443</th>\n",
559 " <td>26503</td>\n",
560 " <td>2012-01-27</td>\n",
561 " <td>6610000</td>\n",
562 " </tr>\n",
563 " <tr>\n",
564 " <th>590</th>\n",
565 " <td>7543</td>\n",
566 " <td>2014-02-04</td>\n",
567 " <td>6585044</td>\n",
568 " </tr>\n",
569 " <tr>\n",
570 " <th>563</th>\n",
571 " <td>26503</td>\n",
572 " <td>2013-10-17</td>\n",
573 " <td>6579000</td>\n",
574 " </tr>\n",
575 " <tr>\n",
576 " <th>547</th>\n",
577 " <td>26503</td>\n",
578 " <td>2013-07-19</td>\n",
579 " <td>6572000</td>\n",
580 " </tr>\n",
581 " <tr>\n",
582 " <th>49</th>\n",
583 " <td>7543</td>\n",
584 " <td>2007-08-03</td>\n",
585 " <td>6522637</td>\n",
586 " </tr>\n",
587 " <tr>\n",
588 " <th>362</th>\n",
589 " <td>26503</td>\n",
590 " <td>2011-01-21</td>\n",
591 " <td>6483000</td>\n",
592 " </tr>\n",
593 " <tr>\n",
594 " <th>324</th>\n",
595 " <td>26503</td>\n",
596 " <td>2010-07-22</td>\n",
597 " <td>6454000</td>\n",
598 " </tr>\n",
599 " <tr>\n",
600 " <th>576</th>\n",
601 " <td>7543</td>\n",
602 " <td>2013-11-06</td>\n",
603 " <td>6282166</td>\n",
604 " </tr>\n",
605 " <tr>\n",
606 " <th>417</th>\n",
607 " <td>26503</td>\n",
608 " <td>2011-10-20</td>\n",
609 " <td>6269000</td>\n",
610 " </tr>\n",
611 " <tr>\n",
612 " <th>559</th>\n",
613 " <td>7543</td>\n",
614 " <td>2013-08-02</td>\n",
615 " <td>6255319</td>\n",
616 " </tr>\n",
617 " <tr>\n",
618 " <th>155</th>\n",
619 " <td>7543</td>\n",
620 " <td>2008-08-07</td>\n",
621 " <td>6220000</td>\n",
622 " </tr>\n",
623 " <tr>\n",
624 " <th>457</th>\n",
625 " <td>26503</td>\n",
626 " <td>2012-04-24</td>\n",
627 " <td>6184000</td>\n",
628 " </tr>\n",
629 " <tr>\n",
630 " <th>283</th>\n",
631 " <td>26503</td>\n",
632 " <td>2010-01-20</td>\n",
633 " <td>6082000</td>\n",
634 " </tr>\n",
635 " <tr>\n",
636 " <th>398</th>\n",
637 " <td>26503</td>\n",
638 " <td>2011-07-21</td>\n",
639 " <td>6047000</td>\n",
640 " </tr>\n",
641 " <tr>\n",
642 " <th>182</th>\n",
643 " <td>7543</td>\n",
644 " <td>2008-11-06</td>\n",
645 " <td>5975275</td>\n",
646 " </tr>\n",
647 " <tr>\n",
648 " <th>258</th>\n",
649 " <td>26503</td>\n",
650 " <td>2009-10-15</td>\n",
651 " <td>5974000</td>\n",
652 " </tr>\n",
653 " <tr>\n",
654 " <th>310</th>\n",
655 " <td>26503</td>\n",
656 " <td>2010-04-22</td>\n",
657 " <td>5876000</td>\n",
658 " </tr>\n",
659 " <tr>\n",
660 " <th>485</th>\n",
661 " <td>7543</td>\n",
662 " <td>2012-08-03</td>\n",
663 " <td>5501573</td>\n",
664 " </tr>\n",
665 " <tr>\n",
666 " <th>503</th>\n",
667 " <td>7543</td>\n",
668 " <td>2012-11-05</td>\n",
669 " <td>5406781</td>\n",
670 " </tr>\n",
671 " <tr>\n",
672 " <th>381</th>\n",
673 " <td>26503</td>\n",
674 " <td>2011-04-18</td>\n",
675 " <td>5366000</td>\n",
676 " </tr>\n",
677 " <tr>\n",
678 " <th>519</th>\n",
679 " <td>7543</td>\n",
680 " <td>2013-02-05</td>\n",
681 " <td>5318752</td>\n",
682 " </tr>\n",
683 " <tr>\n",
684 " <th>301</th>\n",
685 " <td>7543</td>\n",
686 " <td>2010-02-04</td>\n",
687 " <td>5292890</td>\n",
688 " </tr>\n",
689 " <tr>\n",
690 " <th>231</th>\n",
691 " <td>26503</td>\n",
692 " <td>2009-07-16</td>\n",
693 " <td>4891000</td>\n",
694 " </tr>\n",
695 " <tr>\n",
696 " <th>...</th>\n",
697 " <td>...</td>\n",
698 " <td>...</td>\n",
699 " <td>...</td>\n",
700 " </tr>\n",
701 " <tr>\n",
702 " <th>21</th>\n",
703 " <td>3149</td>\n",
704 " <td>2007-07-13</td>\n",
705 " <td>42316</td>\n",
706 " </tr>\n",
707 " <tr>\n",
708 " <th>104</th>\n",
709 " <td>3149</td>\n",
710 " <td>2008-04-11</td>\n",
711 " <td>42243</td>\n",
712 " </tr>\n",
713 " <tr>\n",
714 " <th>13</th>\n",
715 " <td>24074</td>\n",
716 " <td>2007-04-26</td>\n",
717 " <td>42156</td>\n",
718 " </tr>\n",
719 " <tr>\n",
720 " <th>625</th>\n",
721 " <td>24</td>\n",
722 " <td>2014-10-20</td>\n",
723 " <td>42123</td>\n",
724 " </tr>\n",
725 " <tr>\n",
726 " <th>629</th>\n",
727 " <td>23052</td>\n",
728 " <td>2014-10-28</td>\n",
729 " <td>42114</td>\n",
730 " </tr>\n",
731 " <tr>\n",
732 " <th>642</th>\n",
733 " <td>3149</td>\n",
734 " <td>2015-01-23</td>\n",
735 " <td>42004</td>\n",
736 " </tr>\n",
737 " <tr>\n",
738 " <th>169</th>\n",
739 " <td>18711</td>\n",
740 " <td>2008-10-28</td>\n",
741 " <td>41723</td>\n",
742 " </tr>\n",
743 " <tr>\n",
744 " <th>391</th>\n",
745 " <td>7538</td>\n",
746 " <td>2011-04-29</td>\n",
747 " <td>41602</td>\n",
748 " </tr>\n",
749 " <tr>\n",
750 " <th>415</th>\n",
751 " <td>22899</td>\n",
752 " <td>2011-10-19</td>\n",
753 " <td>41562</td>\n",
754 " </tr>\n",
755 " <tr>\n",
756 " <th>423</th>\n",
757 " <td>7538</td>\n",
758 " <td>2011-10-28</td>\n",
759 " <td>41525</td>\n",
760 " </tr>\n",
761 " <tr>\n",
762 " <th>135</th>\n",
763 " <td>2673</td>\n",
764 " <td>2008-07-24</td>\n",
765 " <td>41500</td>\n",
766 " </tr>\n",
767 " <tr>\n",
768 " <th>286</th>\n",
769 " <td>3149</td>\n",
770 " <td>2010-01-22</td>\n",
771 " <td>41438</td>\n",
772 " </tr>\n",
773 " <tr>\n",
774 " <th>361</th>\n",
775 " <td>3149</td>\n",
776 " <td>2011-01-21</td>\n",
777 " <td>41377</td>\n",
778 " </tr>\n",
779 " <tr>\n",
780 " <th>266</th>\n",
781 " <td>23998</td>\n",
782 " <td>2009-10-28</td>\n",
783 " <td>41305</td>\n",
784 " </tr>\n",
785 " <tr>\n",
786 " <th>605</th>\n",
787 " <td>42788</td>\n",
788 " <td>2014-04-30</td>\n",
789 " <td>41099</td>\n",
790 " </tr>\n",
791 " <tr>\n",
792 " <th>76</th>\n",
793 " <td>2673</td>\n",
794 " <td>2007-11-08</td>\n",
795 " <td>41078</td>\n",
796 " </tr>\n",
797 " <tr>\n",
798 " <th>499</th>\n",
799 " <td>1091</td>\n",
800 " <td>2012-11-02</td>\n",
801 " <td>41050</td>\n",
802 " </tr>\n",
803 " <tr>\n",
804 " <th>500</th>\n",
805 " <td>11100</td>\n",
806 " <td>2012-11-02</td>\n",
807 " <td>41050</td>\n",
808 " </tr>\n",
809 " <tr>\n",
810 " <th>633</th>\n",
811 " <td>42788</td>\n",
812 " <td>2014-10-29</td>\n",
813 " <td>41048</td>\n",
814 " </tr>\n",
815 " <tr>\n",
816 " <th>645</th>\n",
817 " <td>23052</td>\n",
818 " <td>2015-01-29</td>\n",
819 " <td>40959</td>\n",
820 " </tr>\n",
821 " <tr>\n",
822 " <th>380</th>\n",
823 " <td>22899</td>\n",
824 " <td>2011-04-18</td>\n",
825 " <td>40952</td>\n",
826 " </tr>\n",
827 " <tr>\n",
828 " <th>139</th>\n",
829 " <td>18711</td>\n",
830 " <td>2008-07-28</td>\n",
831 " <td>40569</td>\n",
832 " </tr>\n",
833 " <tr>\n",
834 " <th>591</th>\n",
835 " <td>40430</td>\n",
836 " <td>2014-02-06</td>\n",
837 " <td>40485</td>\n",
838 " </tr>\n",
839 " <tr>\n",
840 " <th>408</th>\n",
841 " <td>7538</td>\n",
842 " <td>2011-07-29</td>\n",
843 " <td>40465</td>\n",
844 " </tr>\n",
845 " <tr>\n",
846 " <th>580</th>\n",
847 " <td>3149</td>\n",
848 " <td>2014-01-17</td>\n",
849 " <td>40382</td>\n",
850 " </tr>\n",
851 " <tr>\n",
852 " <th>673</th>\n",
853 " <td>23112</td>\n",
854 " <td>2015-07-31</td>\n",
855 " <td>40357</td>\n",
856 " </tr>\n",
857 " <tr>\n",
858 " <th>670</th>\n",
859 " <td>23052</td>\n",
860 " <td>2015-07-28</td>\n",
861 " <td>40277</td>\n",
862 " </tr>\n",
863 " <tr>\n",
864 " <th>247</th>\n",
865 " <td>23112</td>\n",
866 " <td>2009-07-31</td>\n",
867 " <td>40205</td>\n",
868 " </tr>\n",
869 " <tr>\n",
870 " <th>9</th>\n",
871 " <td>3149</td>\n",
872 " <td>2007-04-13</td>\n",
873 " <td>40195</td>\n",
874 " </tr>\n",
875 " <tr>\n",
876 " <th>377</th>\n",
877 " <td>7538</td>\n",
878 " <td>2011-02-11</td>\n",
879 " <td>40157</td>\n",
880 " </tr>\n",
881 " </tbody>\n",
882 "</table>\n",
883 "<p>670 rows × 3 columns</p>\n",
884 "</div>"
885 ],
886 "text/plain": [
887 " sid asof_date revenue\n",
888 "510 26503 2013-01-24 8743000\n",
889 "491 26503 2012-10-26 7593000\n",
890 "657 26503 2015-04-22 7022000\n",
891 "474 26503 2012-07-26 6910000\n",
892 "529 26503 2013-04-22 6803000\n",
893 "98 7543 2008-02-05 6709983\n",
894 "342 26503 2010-10-21 6698000\n",
895 "443 26503 2012-01-27 6610000\n",
896 "590 7543 2014-02-04 6585044\n",
897 "563 26503 2013-10-17 6579000\n",
898 "547 26503 2013-07-19 6572000\n",
899 "49 7543 2007-08-03 6522637\n",
900 "362 26503 2011-01-21 6483000\n",
901 "324 26503 2010-07-22 6454000\n",
902 "576 7543 2013-11-06 6282166\n",
903 "417 26503 2011-10-20 6269000\n",
904 "559 7543 2013-08-02 6255319\n",
905 "155 7543 2008-08-07 6220000\n",
906 "457 26503 2012-04-24 6184000\n",
907 "283 26503 2010-01-20 6082000\n",
908 "398 26503 2011-07-21 6047000\n",
909 "182 7543 2008-11-06 5975275\n",
910 "258 26503 2009-10-15 5974000\n",
911 "310 26503 2010-04-22 5876000\n",
912 "485 7543 2012-08-03 5501573\n",
913 "503 7543 2012-11-05 5406781\n",
914 "381 26503 2011-04-18 5366000\n",
915 "519 7543 2013-02-05 5318752\n",
916 "301 7543 2010-02-04 5292890\n",
917 "231 26503 2009-07-16 4891000\n",
918 ".. ... ... ...\n",
919 "21 3149 2007-07-13 42316\n",
920 "104 3149 2008-04-11 42243\n",
921 "13 24074 2007-04-26 42156\n",
922 "625 24 2014-10-20 42123\n",
923 "629 23052 2014-10-28 42114\n",
924 "642 3149 2015-01-23 42004\n",
925 "169 18711 2008-10-28 41723\n",
926 "391 7538 2011-04-29 41602\n",
927 "415 22899 2011-10-19 41562\n",
928 "423 7538 2011-10-28 41525\n",
929 "135 2673 2008-07-24 41500\n",
930 "286 3149 2010-01-22 41438\n",
931 "361 3149 2011-01-21 41377\n",
932 "266 23998 2009-10-28 41305\n",
933 "605 42788 2014-04-30 41099\n",
934 "76 2673 2007-11-08 41078\n",
935 "499 1091 2012-11-02 41050\n",
936 "500 11100 2012-11-02 41050\n",
937 "633 42788 2014-10-29 41048\n",
938 "645 23052 2015-01-29 40959\n",
939 "380 22899 2011-04-18 40952\n",
940 "139 18711 2008-07-28 40569\n",
941 "591 40430 2014-02-06 40485\n",
942 "408 7538 2011-07-29 40465\n",
943 "580 3149 2014-01-17 40382\n",
944 "673 23112 2015-07-31 40357\n",
945 "670 23052 2015-07-28 40277\n",
946 "247 23112 2009-07-31 40205\n",
947 "9 3149 2007-04-13 40195\n",
948 "377 7538 2011-02-11 40157\n",
949 "\n",
950 "[670 rows x 3 columns]"
951 ]
952 },
953 "execution_count": 6,
954 "metadata": {},
955 "output_type": "execute_result"
956 }
957 ],
958 "source": [
959 "# manipulate with Blaze first:\n",
960 "big_earnings = earnings_releases[earnings_releases.revenue > 40000]\n",
961 "# now that we've got a much smaller object (len: ~2167 rows), we can convert it to a pandas DataFrame\n",
962 "df = odo(big_earnings, pd.DataFrame)\n",
963 "df = df[['sid', 'asof_date','revenue']].dropna()\n",
964 "df.sort('revenue',ascending=False)"
965 ]
966 },
967 {
968 "cell_type": "code",
969 "execution_count": null,
970 "metadata": {
971 "collapsed": true
972 },
973 "outputs": [],
974 "source": []
975 }
976 ],
977 "metadata": {
978 "kernelspec": {
979 "display_name": "Python 2",
980 "language": "python",
981 "name": "python2"
982 },
983 "language_info": {
984 "codemirror_mode": {
985 "name": "ipython",
986 "version": 2
987 },
988 "file_extension": ".py",
989 "mimetype": "text/x-python",
990 "name": "python",
991 "nbconvert_exporter": "python",
992 "pygments_lexer": "ipython2",
993 "version": "2.7.10"
994 }
995 },
996 "nbformat": 4,
997 "nbformat_minor": 0
998 }