ml-finance-python
python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb
(29212B)
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {
6 "collapsed": true
7 },
8 "source": [
9 "# EventVestor: Spin-Offs\n",
10 "\n",
11 "In this notebook, we'll take a look at EventVestor's *Spin-Offs* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents corporate spin-off events.\n",
12 "\n",
13 "### Blaze\n",
14 "Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n",
15 "\n",
16 "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n",
17 "\n",
18 "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n",
19 "\n",
20 "Helpful links:\n",
21 "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n",
22 "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n",
23 "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n",
24 "\n",
25 "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n",
26 "> `from odo import odo` \n",
27 "> `odo(expr, pandas.DataFrame)`\n",
28 "\n",
29 "### Free samples and limits\n",
30 "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n",
31 "\n",
32 "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n",
33 "\n",
34 "With preamble in place, let's get started:"
35 ]
36 },
37 {
38 "cell_type": "code",
39 "execution_count": 1,
40 "metadata": {
41 "collapsed": false
42 },
43 "outputs": [],
44 "source": [
45 "# import the dataset\n",
46 "from quantopian.interactive.data.eventvestor import spin_offs\n",
47 "# or if you want to import the free dataset, use:\n",
48 "# from quantopian.data.eventvestor import spin_offs_free\n",
49 "\n",
50 "# import data operations\n",
51 "from odo import odo\n",
52 "# import other libraries we will use\n",
53 "import pandas as pd"
54 ]
55 },
56 {
57 "cell_type": "code",
58 "execution_count": 2,
59 "metadata": {
60 "collapsed": false
61 },
62 "outputs": [
63 {
64 "data": {
65 "text/plain": [
66 "dshape(\"\"\"var * {\n",
67 " event_id: ?float64,\n",
68 " asof_date: datetime,\n",
69 " trade_date: ?datetime,\n",
70 " symbol: ?string,\n",
71 " event_type: ?string,\n",
72 " event_headline: ?string,\n",
73 " spinoff_phase: ?string,\n",
74 " spinoff_name: ?string,\n",
75 " event_rating: ?float64,\n",
76 " timestamp: datetime,\n",
77 " sid: ?int64\n",
78 " }\"\"\")"
79 ]
80 },
81 "execution_count": 2,
82 "metadata": {},
83 "output_type": "execute_result"
84 }
85 ],
86 "source": [
87 "# Let's use blaze to understand the data a bit using Blaze dshape()\n",
88 "spin_offs.dshape"
89 ]
90 },
91 {
92 "cell_type": "code",
93 "execution_count": 3,
94 "metadata": {
95 "collapsed": false
96 },
97 "outputs": [
98 {
99 "data": {
100 "text/html": [
101 "1189"
102 ],
103 "text/plain": [
104 "1189"
105 ]
106 },
107 "execution_count": 3,
108 "metadata": {},
109 "output_type": "execute_result"
110 }
111 ],
112 "source": [
113 "# And how many rows are there?\n",
114 "# N.B. we're using a Blaze function to do this, not len()\n",
115 "spin_offs.count()"
116 ]
117 },
118 {
119 "cell_type": "code",
120 "execution_count": 4,
121 "metadata": {
122 "collapsed": false
123 },
124 "outputs": [
125 {
126 "data": {
127 "text/html": [
128 "<table border=\"1\" class=\"dataframe\">\n",
129 " <thead>\n",
130 " <tr style=\"text-align: right;\">\n",
131 " <th></th>\n",
132 " <th>event_id</th>\n",
133 " <th>asof_date</th>\n",
134 " <th>trade_date</th>\n",
135 " <th>symbol</th>\n",
136 " <th>event_type</th>\n",
137 " <th>event_headline</th>\n",
138 " <th>spinoff_phase</th>\n",
139 " <th>spinoff_name</th>\n",
140 " <th>event_rating</th>\n",
141 " <th>timestamp</th>\n",
142 " <th>sid</th>\n",
143 " </tr>\n",
144 " </thead>\n",
145 " <tbody>\n",
146 " <tr>\n",
147 " <th>0</th>\n",
148 " <td>127421</td>\n",
149 " <td>2007-01-08</td>\n",
150 " <td>2007-01-09</td>\n",
151 " <td>DUK</td>\n",
152 " <td>Spin-off</td>\n",
153 " <td>Duke Energy completes Natural Gas business spi...</td>\n",
154 " <td>Completes</td>\n",
155 " <td>NaN</td>\n",
156 " <td>1</td>\n",
157 " <td>2007-01-09</td>\n",
158 " <td>2351</td>\n",
159 " </tr>\n",
160 " <tr>\n",
161 " <th>1</th>\n",
162 " <td>134268</td>\n",
163 " <td>2007-01-08</td>\n",
164 " <td>2007-01-08</td>\n",
165 " <td>NCR</td>\n",
166 " <td>Spin-off</td>\n",
167 " <td>NCR To Separate Into Two Independent Companies</td>\n",
168 " <td>Proposal</td>\n",
169 " <td>NaN</td>\n",
170 " <td>1</td>\n",
171 " <td>2007-01-09</td>\n",
172 " <td>16389</td>\n",
173 " </tr>\n",
174 " <tr>\n",
175 " <th>2</th>\n",
176 " <td>77960</td>\n",
177 " <td>2007-01-16</td>\n",
178 " <td>2007-01-16</td>\n",
179 " <td>VZ</td>\n",
180 " <td>Spin-off</td>\n",
181 " <td>Verizon to spin off and merge local exchange a...</td>\n",
182 " <td>Board Approval</td>\n",
183 " <td>NaN</td>\n",
184 " <td>1</td>\n",
185 " <td>2007-01-17</td>\n",
186 " <td>21839</td>\n",
187 " </tr>\n",
188 " </tbody>\n",
189 "</table>"
190 ],
191 "text/plain": [
192 " event_id asof_date trade_date symbol event_type \\\n",
193 "0 127421 2007-01-08 2007-01-09 DUK Spin-off \n",
194 "1 134268 2007-01-08 2007-01-08 NCR Spin-off \n",
195 "2 77960 2007-01-16 2007-01-16 VZ Spin-off \n",
196 "\n",
197 " event_headline spinoff_phase \\\n",
198 "0 Duke Energy completes Natural Gas business spi... Completes \n",
199 "1 NCR To Separate Into Two Independent Companies Proposal \n",
200 "2 Verizon to spin off and merge local exchange a... Board Approval \n",
201 "\n",
202 " spinoff_name event_rating timestamp sid \n",
203 "0 NaN 1 2007-01-09 2351 \n",
204 "1 NaN 1 2007-01-09 16389 \n",
205 "2 NaN 1 2007-01-17 21839 "
206 ]
207 },
208 "execution_count": 4,
209 "metadata": {},
210 "output_type": "execute_result"
211 }
212 ],
213 "source": [
214 "# Let's see what the data looks like. We'll grab the first three rows.\n",
215 "spin_offs[:3]"
216 ]
217 },
218 {
219 "cell_type": "markdown",
220 "metadata": {},
221 "source": [
222 "Let's go over the columns:\n",
223 "- **event_id**: the unique identifier for this event.\n",
224 "- **asof_date**: EventVestor's timestamp of event capture.\n",
225 "- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.\n",
226 "- **symbol**: stock ticker symbol of the affected company.\n",
227 "- **event_type**: this should always be *Spin-off*.\n",
228 "- **event_headline**: a brief description of the event\n",
229 "- **spinoff_phase**: values include *proposal, approval, completes*.\n",
230 "- **spinoff_name**: name of the entity being spun off.\n",
231 "- **event_rating**: this is always 1. The meaning of this is uncertain.\n",
232 "- **timestamp**: this is our timestamp on when we registered the data.\n",
233 "- **sid**: the equity's unique identifier. Use this instead of the symbol."
234 ]
235 },
236 {
237 "cell_type": "markdown",
238 "metadata": {},
239 "source": [
240 "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n",
241 "\n",
242 "We can select columns and rows with ease. Below, we'll fetch Yahoo's 2015 spin-offs."
243 ]
244 },
245 {
246 "cell_type": "code",
247 "execution_count": 5,
248 "metadata": {
249 "collapsed": false,
250 "scrolled": true
251 },
252 "outputs": [
253 {
254 "data": {
255 "text/html": [
256 "<table border=\"1\" class=\"dataframe\">\n",
257 " <thead>\n",
258 " <tr style=\"text-align: right;\">\n",
259 " <th></th>\n",
260 " <th>event_id</th>\n",
261 " <th>asof_date</th>\n",
262 " <th>trade_date</th>\n",
263 " <th>symbol</th>\n",
264 " <th>event_type</th>\n",
265 " <th>event_headline</th>\n",
266 " <th>spinoff_phase</th>\n",
267 " <th>spinoff_name</th>\n",
268 " <th>event_rating</th>\n",
269 " <th>timestamp</th>\n",
270 " <th>sid</th>\n",
271 " </tr>\n",
272 " </thead>\n",
273 " <tbody>\n",
274 " <tr>\n",
275 " <th>0</th>\n",
276 " <td>1827542</td>\n",
277 " <td>2015-01-27</td>\n",
278 " <td>2015-01-28</td>\n",
279 " <td>YHOO</td>\n",
280 " <td>Spin-off</td>\n",
281 " <td>Yahoo to Spin-Off its Alibaba Stake into Newly...</td>\n",
282 " <td>Board Approval</td>\n",
283 " <td>NaN</td>\n",
284 " <td>1</td>\n",
285 " <td>2015-01-28 00:00:00</td>\n",
286 " <td>14848</td>\n",
287 " </tr>\n",
288 " <tr>\n",
289 " <th>1</th>\n",
290 " <td>1903562</td>\n",
291 " <td>2015-07-17</td>\n",
292 " <td>2015-07-20</td>\n",
293 " <td>YHOO</td>\n",
294 " <td>Spin-off</td>\n",
295 " <td>Yahoo! Announces SEC Filing for Planned Spin-O...</td>\n",
296 " <td>Updates</td>\n",
297 " <td>Aabaco Holdings Inc.</td>\n",
298 " <td>1</td>\n",
299 " <td>2015-07-18 00:00:00</td>\n",
300 " <td>14848</td>\n",
301 " </tr>\n",
302 " <tr>\n",
303 " <th>2</th>\n",
304 " <td>1937451</td>\n",
305 " <td>2015-09-28</td>\n",
306 " <td>2015-09-29</td>\n",
307 " <td>YHOO</td>\n",
308 " <td>Spin-off</td>\n",
309 " <td>Yahoo! to Proceed Alibaba Stake Spinoff withou...</td>\n",
310 " <td>Updates</td>\n",
311 " <td>Alibaba Holding Group Ltd.</td>\n",
312 " <td>1</td>\n",
313 " <td>2015-09-29 11:14:35.314487</td>\n",
314 " <td>14848</td>\n",
315 " </tr>\n",
316 " </tbody>\n",
317 "</table>"
318 ],
319 "text/plain": [
320 " event_id asof_date trade_date symbol event_type \\\n",
321 "0 1827542 2015-01-27 2015-01-28 YHOO Spin-off \n",
322 "1 1903562 2015-07-17 2015-07-20 YHOO Spin-off \n",
323 "2 1937451 2015-09-28 2015-09-29 YHOO Spin-off \n",
324 "\n",
325 " event_headline spinoff_phase \\\n",
326 "0 Yahoo to Spin-Off its Alibaba Stake into Newly... Board Approval \n",
327 "1 Yahoo! Announces SEC Filing for Planned Spin-O... Updates \n",
328 "2 Yahoo! to Proceed Alibaba Stake Spinoff withou... Updates \n",
329 "\n",
330 " spinoff_name event_rating timestamp sid \n",
331 "0 NaN 1 2015-01-28 00:00:00 14848 \n",
332 "1 Aabaco Holdings Inc. 1 2015-07-18 00:00:00 14848 \n",
333 "2 Alibaba Holding Group Ltd. 1 2015-09-29 11:14:35.314487 14848 "
334 ]
335 },
336 "execution_count": 5,
337 "metadata": {},
338 "output_type": "execute_result"
339 }
340 ],
341 "source": [
342 "# get yahoo's sid first\n",
343 "yahoo_sid = symbols('YHOO').sid\n",
344 "spinoffs = spin_offs[('2014-12-31' < spin_offs['asof_date']) & \n",
345 " (spin_offs['asof_date'] <'2016-01-01') & \n",
346 " (spin_offs.sid == yahoo_sid)]\n",
347 "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n",
348 "spinoffs.sort('asof_date')"
349 ]
350 },
351 {
352 "cell_type": "markdown",
353 "metadata": {},
354 "source": [
355 "Now suppose we want a DataFrame of `spin_offs`, but only want the `asof_date, spinoff_phase`, and the `sid`."
356 ]
357 },
358 {
359 "cell_type": "code",
360 "execution_count": 6,
361 "metadata": {
362 "collapsed": false
363 },
364 "outputs": [
365 {
366 "data": {
367 "text/html": [
368 "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
369 "<table border=\"1\" class=\"dataframe\">\n",
370 " <thead>\n",
371 " <tr style=\"text-align: right;\">\n",
372 " <th></th>\n",
373 " <th>asof_date</th>\n",
374 " <th>spinoff_phase</th>\n",
375 " <th>sid</th>\n",
376 " </tr>\n",
377 " </thead>\n",
378 " <tbody>\n",
379 " <tr>\n",
380 " <th>0</th>\n",
381 " <td>2007-01-08</td>\n",
382 " <td>Completes</td>\n",
383 " <td>2351</td>\n",
384 " </tr>\n",
385 " <tr>\n",
386 " <th>1</th>\n",
387 " <td>2007-01-08</td>\n",
388 " <td>Proposal</td>\n",
389 " <td>16389</td>\n",
390 " </tr>\n",
391 " <tr>\n",
392 " <th>2</th>\n",
393 " <td>2007-01-16</td>\n",
394 " <td>Board Approval</td>\n",
395 " <td>21839</td>\n",
396 " </tr>\n",
397 " <tr>\n",
398 " <th>3</th>\n",
399 " <td>2007-01-17</td>\n",
400 " <td>Updates</td>\n",
401 " <td>4758</td>\n",
402 " </tr>\n",
403 " <tr>\n",
404 " <th>4</th>\n",
405 " <td>2007-01-19</td>\n",
406 " <td>Updates</td>\n",
407 " <td>13373</td>\n",
408 " </tr>\n",
409 " <tr>\n",
410 " <th>5</th>\n",
411 " <td>2007-01-31</td>\n",
412 " <td>Completes</td>\n",
413 " <td>13373</td>\n",
414 " </tr>\n",
415 " <tr>\n",
416 " <th>6</th>\n",
417 " <td>2007-01-31</td>\n",
418 " <td>Board Approval</td>\n",
419 " <td>4954</td>\n",
420 " </tr>\n",
421 " <tr>\n",
422 " <th>8</th>\n",
423 " <td>2007-02-02</td>\n",
424 " <td>Updates</td>\n",
425 " <td>8326</td>\n",
426 " </tr>\n",
427 " <tr>\n",
428 " <th>9</th>\n",
429 " <td>2007-02-06</td>\n",
430 " <td>Updates</td>\n",
431 " <td>22954</td>\n",
432 " </tr>\n",
433 " <tr>\n",
434 " <th>10</th>\n",
435 " <td>2007-02-27</td>\n",
436 " <td>Proposal</td>\n",
437 " <td>22983</td>\n",
438 " </tr>\n",
439 " <tr>\n",
440 " <th>11</th>\n",
441 " <td>2007-02-27</td>\n",
442 " <td>Proposal</td>\n",
443 " <td>7449</td>\n",
444 " </tr>\n",
445 " <tr>\n",
446 " <th>12</th>\n",
447 " <td>2007-03-02</td>\n",
448 " <td>Updates</td>\n",
449 " <td>3443</td>\n",
450 " </tr>\n",
451 " <tr>\n",
452 " <th>13</th>\n",
453 " <td>2007-03-02</td>\n",
454 " <td>Updates</td>\n",
455 " <td>32880</td>\n",
456 " </tr>\n",
457 " <tr>\n",
458 " <th>14</th>\n",
459 " <td>2007-03-06</td>\n",
460 " <td>Updates</td>\n",
461 " <td>22983</td>\n",
462 " </tr>\n",
463 " <tr>\n",
464 " <th>15</th>\n",
465 " <td>2007-03-07</td>\n",
466 " <td>Completes</td>\n",
467 " <td>8326</td>\n",
468 " </tr>\n",
469 " <tr>\n",
470 " <th>16</th>\n",
471 " <td>2007-03-20</td>\n",
472 " <td>Updates</td>\n",
473 " <td>4954</td>\n",
474 " </tr>\n",
475 " <tr>\n",
476 " <th>17</th>\n",
477 " <td>2007-03-21</td>\n",
478 " <td>Proposal</td>\n",
479 " <td>630</td>\n",
480 " </tr>\n",
481 " <tr>\n",
482 " <th>18</th>\n",
483 " <td>2007-03-26</td>\n",
484 " <td>Updates</td>\n",
485 " <td>22954</td>\n",
486 " </tr>\n",
487 " <tr>\n",
488 " <th>19</th>\n",
489 " <td>2007-03-30</td>\n",
490 " <td>Completes</td>\n",
491 " <td>4954</td>\n",
492 " </tr>\n",
493 " <tr>\n",
494 " <th>20</th>\n",
495 " <td>2007-04-03</td>\n",
496 " <td>Updates</td>\n",
497 " <td>3443</td>\n",
498 " </tr>\n",
499 " <tr>\n",
500 " <th>21</th>\n",
501 " <td>2007-04-03</td>\n",
502 " <td>Updates</td>\n",
503 " <td>32880</td>\n",
504 " </tr>\n",
505 " <tr>\n",
506 " <th>22</th>\n",
507 " <td>2007-04-04</td>\n",
508 " <td>Completes</td>\n",
509 " <td>630</td>\n",
510 " </tr>\n",
511 " <tr>\n",
512 " <th>23</th>\n",
513 " <td>2007-04-04</td>\n",
514 " <td>Proposal</td>\n",
515 " <td>5025</td>\n",
516 " </tr>\n",
517 " <tr>\n",
518 " <th>24</th>\n",
519 " <td>2007-04-05</td>\n",
520 " <td>Completes</td>\n",
521 " <td>3443</td>\n",
522 " </tr>\n",
523 " <tr>\n",
524 " <th>25</th>\n",
525 " <td>2007-04-05</td>\n",
526 " <td>Completes</td>\n",
527 " <td>32880</td>\n",
528 " </tr>\n",
529 " <tr>\n",
530 " <th>26</th>\n",
531 " <td>2007-04-10</td>\n",
532 " <td>Proposal</td>\n",
533 " <td>18027</td>\n",
534 " </tr>\n",
535 " <tr>\n",
536 " <th>28</th>\n",
537 " <td>2007-05-15</td>\n",
538 " <td>Proposal</td>\n",
539 " <td>4010</td>\n",
540 " </tr>\n",
541 " <tr>\n",
542 " <th>29</th>\n",
543 " <td>2007-05-26</td>\n",
544 " <td>Updates</td>\n",
545 " <td>2190</td>\n",
546 " </tr>\n",
547 " <tr>\n",
548 " <th>30</th>\n",
549 " <td>2007-06-01</td>\n",
550 " <td>Board Approval</td>\n",
551 " <td>17080</td>\n",
552 " </tr>\n",
553 " <tr>\n",
554 " <th>31</th>\n",
555 " <td>2007-06-08</td>\n",
556 " <td>Proposal</td>\n",
557 " <td>7679</td>\n",
558 " </tr>\n",
559 " <tr>\n",
560 " <th>...</th>\n",
561 " <td>...</td>\n",
562 " <td>...</td>\n",
563 " <td>...</td>\n",
564 " </tr>\n",
565 " <tr>\n",
566 " <th>1028</th>\n",
567 " <td>2015-07-13</td>\n",
568 " <td>Updates</td>\n",
569 " <td>34575</td>\n",
570 " </tr>\n",
571 " <tr>\n",
572 " <th>1029</th>\n",
573 " <td>2015-07-14</td>\n",
574 " <td>Board Approval</td>\n",
575 " <td>9693</td>\n",
576 " </tr>\n",
577 " <tr>\n",
578 " <th>1030</th>\n",
579 " <td>2015-07-17</td>\n",
580 " <td>Updates</td>\n",
581 " <td>14848</td>\n",
582 " </tr>\n",
583 " <tr>\n",
584 " <th>1031</th>\n",
585 " <td>2015-07-20</td>\n",
586 " <td>Completes</td>\n",
587 " <td>24819</td>\n",
588 " </tr>\n",
589 " <tr>\n",
590 " <th>1032</th>\n",
591 " <td>2015-07-22</td>\n",
592 " <td>Updates</td>\n",
593 " <td>34575</td>\n",
594 " </tr>\n",
595 " <tr>\n",
596 " <th>1034</th>\n",
597 " <td>2015-07-24</td>\n",
598 " <td>Updates</td>\n",
599 " <td>34575</td>\n",
600 " </tr>\n",
601 " <tr>\n",
602 " <th>1035</th>\n",
603 " <td>2015-07-24</td>\n",
604 " <td>Updates</td>\n",
605 " <td>4117</td>\n",
606 " </tr>\n",
607 " <tr>\n",
608 " <th>1036</th>\n",
609 " <td>2015-07-30</td>\n",
610 " <td>Updates</td>\n",
611 " <td>22015</td>\n",
612 " </tr>\n",
613 " <tr>\n",
614 " <th>1037</th>\n",
615 " <td>2015-07-30</td>\n",
616 " <td>Board Approval</td>\n",
617 " <td>18821</td>\n",
618 " </tr>\n",
619 " <tr>\n",
620 " <th>1038</th>\n",
621 " <td>2015-07-31</td>\n",
622 " <td>Updates</td>\n",
623 " <td>13306</td>\n",
624 " </tr>\n",
625 " <tr>\n",
626 " <th>1039</th>\n",
627 " <td>2015-08-03</td>\n",
628 " <td>Completes</td>\n",
629 " <td>9693</td>\n",
630 " </tr>\n",
631 " <tr>\n",
632 " <th>1040</th>\n",
633 " <td>2015-08-03</td>\n",
634 " <td>Proposal</td>\n",
635 " <td>21608</td>\n",
636 " </tr>\n",
637 " <tr>\n",
638 " <th>1041</th>\n",
639 " <td>2015-08-04</td>\n",
640 " <td>Proposal</td>\n",
641 " <td>2248</td>\n",
642 " </tr>\n",
643 " <tr>\n",
644 " <th>1042</th>\n",
645 " <td>2015-08-06</td>\n",
646 " <td>Proposal</td>\n",
647 " <td>32878</td>\n",
648 " </tr>\n",
649 " <tr>\n",
650 " <th>1043</th>\n",
651 " <td>2015-08-06</td>\n",
652 " <td>Updates</td>\n",
653 " <td>47812</td>\n",
654 " </tr>\n",
655 " <tr>\n",
656 " <th>1044</th>\n",
657 " <td>2015-08-18</td>\n",
658 " <td>Completes</td>\n",
659 " <td>18821</td>\n",
660 " </tr>\n",
661 " <tr>\n",
662 " <th>1045</th>\n",
663 " <td>2015-08-27</td>\n",
664 " <td>NaN</td>\n",
665 " <td>22689</td>\n",
666 " </tr>\n",
667 " <tr>\n",
668 " <th>1046</th>\n",
669 " <td>2015-08-31</td>\n",
670 " <td>Updates</td>\n",
671 " <td>1898</td>\n",
672 " </tr>\n",
673 " <tr>\n",
674 " <th>1047</th>\n",
675 " <td>2015-09-01</td>\n",
676 " <td>Updates</td>\n",
677 " <td>4656</td>\n",
678 " </tr>\n",
679 " <tr>\n",
680 " <th>1048</th>\n",
681 " <td>2015-09-04</td>\n",
682 " <td>Updates</td>\n",
683 " <td>21608</td>\n",
684 " </tr>\n",
685 " <tr>\n",
686 " <th>1049</th>\n",
687 " <td>2015-09-08</td>\n",
688 " <td>Board Approval</td>\n",
689 " <td>1936</td>\n",
690 " </tr>\n",
691 " <tr>\n",
692 " <th>1050</th>\n",
693 " <td>2015-09-08</td>\n",
694 " <td>NaN</td>\n",
695 " <td>42176</td>\n",
696 " </tr>\n",
697 " <tr>\n",
698 " <th>1051</th>\n",
699 " <td>2015-09-11</td>\n",
700 " <td>Updates</td>\n",
701 " <td>34067</td>\n",
702 " </tr>\n",
703 " <tr>\n",
704 " <th>1052</th>\n",
705 " <td>2015-09-15</td>\n",
706 " <td>Updates</td>\n",
707 " <td>11498</td>\n",
708 " </tr>\n",
709 " <tr>\n",
710 " <th>1053</th>\n",
711 " <td>2015-09-16</td>\n",
712 " <td>Board Approval</td>\n",
713 " <td>460</td>\n",
714 " </tr>\n",
715 " <tr>\n",
716 " <th>1054</th>\n",
717 " <td>2015-09-22</td>\n",
718 " <td>Proposal</td>\n",
719 " <td>559</td>\n",
720 " </tr>\n",
721 " <tr>\n",
722 " <th>1055</th>\n",
723 " <td>2015-09-28</td>\n",
724 " <td>NaN</td>\n",
725 " <td>2</td>\n",
726 " </tr>\n",
727 " <tr>\n",
728 " <th>1160</th>\n",
729 " <td>2014-07-11</td>\n",
730 " <td>Board Approval</td>\n",
731 " <td>5249</td>\n",
732 " </tr>\n",
733 " <tr>\n",
734 " <th>1187</th>\n",
735 " <td>2015-09-28</td>\n",
736 " <td>Completes</td>\n",
737 " <td>7086</td>\n",
738 " </tr>\n",
739 " <tr>\n",
740 " <th>1188</th>\n",
741 " <td>2015-09-28</td>\n",
742 " <td>Updates</td>\n",
743 " <td>14848</td>\n",
744 " </tr>\n",
745 " </tbody>\n",
746 "</table>\n",
747 "<p>929 rows × 3 columns</p>\n",
748 "</div>"
749 ],
750 "text/plain": [
751 " asof_date spinoff_phase sid\n",
752 "0 2007-01-08 Completes 2351\n",
753 "1 2007-01-08 Proposal 16389\n",
754 "2 2007-01-16 Board Approval 21839\n",
755 "3 2007-01-17 Updates 4758\n",
756 "4 2007-01-19 Updates 13373\n",
757 "5 2007-01-31 Completes 13373\n",
758 "6 2007-01-31 Board Approval 4954\n",
759 "8 2007-02-02 Updates 8326\n",
760 "9 2007-02-06 Updates 22954\n",
761 "10 2007-02-27 Proposal 22983\n",
762 "11 2007-02-27 Proposal 7449\n",
763 "12 2007-03-02 Updates 3443\n",
764 "13 2007-03-02 Updates 32880\n",
765 "14 2007-03-06 Updates 22983\n",
766 "15 2007-03-07 Completes 8326\n",
767 "16 2007-03-20 Updates 4954\n",
768 "17 2007-03-21 Proposal 630\n",
769 "18 2007-03-26 Updates 22954\n",
770 "19 2007-03-30 Completes 4954\n",
771 "20 2007-04-03 Updates 3443\n",
772 "21 2007-04-03 Updates 32880\n",
773 "22 2007-04-04 Completes 630\n",
774 "23 2007-04-04 Proposal 5025\n",
775 "24 2007-04-05 Completes 3443\n",
776 "25 2007-04-05 Completes 32880\n",
777 "26 2007-04-10 Proposal 18027\n",
778 "28 2007-05-15 Proposal 4010\n",
779 "29 2007-05-26 Updates 2190\n",
780 "30 2007-06-01 Board Approval 17080\n",
781 "31 2007-06-08 Proposal 7679\n",
782 "... ... ... ...\n",
783 "1028 2015-07-13 Updates 34575\n",
784 "1029 2015-07-14 Board Approval 9693\n",
785 "1030 2015-07-17 Updates 14848\n",
786 "1031 2015-07-20 Completes 24819\n",
787 "1032 2015-07-22 Updates 34575\n",
788 "1034 2015-07-24 Updates 34575\n",
789 "1035 2015-07-24 Updates 4117\n",
790 "1036 2015-07-30 Updates 22015\n",
791 "1037 2015-07-30 Board Approval 18821\n",
792 "1038 2015-07-31 Updates 13306\n",
793 "1039 2015-08-03 Completes 9693\n",
794 "1040 2015-08-03 Proposal 21608\n",
795 "1041 2015-08-04 Proposal 2248\n",
796 "1042 2015-08-06 Proposal 32878\n",
797 "1043 2015-08-06 Updates 47812\n",
798 "1044 2015-08-18 Completes 18821\n",
799 "1045 2015-08-27 NaN 22689\n",
800 "1046 2015-08-31 Updates 1898\n",
801 "1047 2015-09-01 Updates 4656\n",
802 "1048 2015-09-04 Updates 21608\n",
803 "1049 2015-09-08 Board Approval 1936\n",
804 "1050 2015-09-08 NaN 42176\n",
805 "1051 2015-09-11 Updates 34067\n",
806 "1052 2015-09-15 Updates 11498\n",
807 "1053 2015-09-16 Board Approval 460\n",
808 "1054 2015-09-22 Proposal 559\n",
809 "1055 2015-09-28 NaN 2\n",
810 "1160 2014-07-11 Board Approval 5249\n",
811 "1187 2015-09-28 Completes 7086\n",
812 "1188 2015-09-28 Updates 14848\n",
813 "\n",
814 "[929 rows x 3 columns]"
815 ]
816 },
817 "execution_count": 6,
818 "metadata": {},
819 "output_type": "execute_result"
820 }
821 ],
822 "source": [
823 "#len(spin_offs) = ~10000, so we can convert it to a dataframe without a worry -- it's a small dataset.\n",
824 "df = odo(spin_offs, pd.DataFrame)\n",
825 "df = df[['asof_date','spinoff_phase','sid']]\n",
826 "df = df[df.sid.notnull()]\n",
827 "# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.\n",
828 "df"
829 ]
830 },
831 {
832 "cell_type": "code",
833 "execution_count": null,
834 "metadata": {
835 "collapsed": true
836 },
837 "outputs": [],
838 "source": []
839 }
840 ],
841 "metadata": {
842 "kernelspec": {
843 "display_name": "Python 2",
844 "language": "python",
845 "name": "python2"
846 },
847 "language_info": {
848 "codemirror_mode": {
849 "name": "ipython",
850 "version": 2
851 },
852 "file_extension": ".py",
853 "mimetype": "text/x-python",
854 "name": "python",
855 "nbconvert_exporter": "python",
856 "pygments_lexer": "ipython2",
857 "version": "2.7.10"
858 }
859 },
860 "nbformat": 4,
861 "nbformat_minor": 0
862 }