ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(29212B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "collapsed": true
      7    },
      8    "source": [
      9     "# EventVestor: Spin-Offs\n",
     10     "\n",
     11     "In this notebook, we'll take a look at EventVestor's *Spin-Offs* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents corporate spin-off events.\n",
     12     "\n",
     13     "### Blaze\n",
     14     "Before we dig into the data, we want to tell you about how  you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n",
     15     "\n",
     16     "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n",
     17     "\n",
     18     "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n",
     19     "\n",
     20     "Helpful links:\n",
     21     "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n",
     22     "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n",
     23     "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n",
     24     "\n",
     25     "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n",
     26     "> `from odo import odo`  \n",
     27     "> `odo(expr, pandas.DataFrame)`\n",
     28     "\n",
     29     "### Free samples and limits\n",
     30     "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n",
     31     "\n",
     32     "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n",
     33     "\n",
     34     "With preamble in place, let's get started:"
     35    ]
     36   },
     37   {
     38    "cell_type": "code",
     39    "execution_count": 1,
     40    "metadata": {
     41     "collapsed": false
     42    },
     43    "outputs": [],
     44    "source": [
     45     "# import the dataset\n",
     46     "from quantopian.interactive.data.eventvestor import spin_offs\n",
     47     "# or if you want to import the free dataset, use:\n",
     48     "# from quantopian.data.eventvestor import spin_offs_free\n",
     49     "\n",
     50     "# import data operations\n",
     51     "from odo import odo\n",
     52     "# import other libraries we will use\n",
     53     "import pandas as pd"
     54    ]
     55   },
     56   {
     57    "cell_type": "code",
     58    "execution_count": 2,
     59    "metadata": {
     60     "collapsed": false
     61    },
     62    "outputs": [
     63     {
     64      "data": {
     65       "text/plain": [
     66        "dshape(\"\"\"var * {\n",
     67        "  event_id: ?float64,\n",
     68        "  asof_date: datetime,\n",
     69        "  trade_date: ?datetime,\n",
     70        "  symbol: ?string,\n",
     71        "  event_type: ?string,\n",
     72        "  event_headline: ?string,\n",
     73        "  spinoff_phase: ?string,\n",
     74        "  spinoff_name: ?string,\n",
     75        "  event_rating: ?float64,\n",
     76        "  timestamp: datetime,\n",
     77        "  sid: ?int64\n",
     78        "  }\"\"\")"
     79       ]
     80      },
     81      "execution_count": 2,
     82      "metadata": {},
     83      "output_type": "execute_result"
     84     }
     85    ],
     86    "source": [
     87     "# Let's use blaze to understand the data a bit using Blaze dshape()\n",
     88     "spin_offs.dshape"
     89    ]
     90   },
     91   {
     92    "cell_type": "code",
     93    "execution_count": 3,
     94    "metadata": {
     95     "collapsed": false
     96    },
     97    "outputs": [
     98     {
     99      "data": {
    100       "text/html": [
    101        "1189"
    102       ],
    103       "text/plain": [
    104        "1189"
    105       ]
    106      },
    107      "execution_count": 3,
    108      "metadata": {},
    109      "output_type": "execute_result"
    110     }
    111    ],
    112    "source": [
    113     "# And how many rows are there?\n",
    114     "# N.B. we're using a Blaze function to do this, not len()\n",
    115     "spin_offs.count()"
    116    ]
    117   },
    118   {
    119    "cell_type": "code",
    120    "execution_count": 4,
    121    "metadata": {
    122     "collapsed": false
    123    },
    124    "outputs": [
    125     {
    126      "data": {
    127       "text/html": [
    128        "<table border=\"1\" class=\"dataframe\">\n",
    129        "  <thead>\n",
    130        "    <tr style=\"text-align: right;\">\n",
    131        "      <th></th>\n",
    132        "      <th>event_id</th>\n",
    133        "      <th>asof_date</th>\n",
    134        "      <th>trade_date</th>\n",
    135        "      <th>symbol</th>\n",
    136        "      <th>event_type</th>\n",
    137        "      <th>event_headline</th>\n",
    138        "      <th>spinoff_phase</th>\n",
    139        "      <th>spinoff_name</th>\n",
    140        "      <th>event_rating</th>\n",
    141        "      <th>timestamp</th>\n",
    142        "      <th>sid</th>\n",
    143        "    </tr>\n",
    144        "  </thead>\n",
    145        "  <tbody>\n",
    146        "    <tr>\n",
    147        "      <th>0</th>\n",
    148        "      <td>127421</td>\n",
    149        "      <td>2007-01-08</td>\n",
    150        "      <td>2007-01-09</td>\n",
    151        "      <td>DUK</td>\n",
    152        "      <td>Spin-off</td>\n",
    153        "      <td>Duke Energy completes Natural Gas business spi...</td>\n",
    154        "      <td>Completes</td>\n",
    155        "      <td>NaN</td>\n",
    156        "      <td>1</td>\n",
    157        "      <td>2007-01-09</td>\n",
    158        "      <td>2351</td>\n",
    159        "    </tr>\n",
    160        "    <tr>\n",
    161        "      <th>1</th>\n",
    162        "      <td>134268</td>\n",
    163        "      <td>2007-01-08</td>\n",
    164        "      <td>2007-01-08</td>\n",
    165        "      <td>NCR</td>\n",
    166        "      <td>Spin-off</td>\n",
    167        "      <td>NCR To Separate Into Two Independent Companies</td>\n",
    168        "      <td>Proposal</td>\n",
    169        "      <td>NaN</td>\n",
    170        "      <td>1</td>\n",
    171        "      <td>2007-01-09</td>\n",
    172        "      <td>16389</td>\n",
    173        "    </tr>\n",
    174        "    <tr>\n",
    175        "      <th>2</th>\n",
    176        "      <td>77960</td>\n",
    177        "      <td>2007-01-16</td>\n",
    178        "      <td>2007-01-16</td>\n",
    179        "      <td>VZ</td>\n",
    180        "      <td>Spin-off</td>\n",
    181        "      <td>Verizon to spin off and merge local exchange a...</td>\n",
    182        "      <td>Board Approval</td>\n",
    183        "      <td>NaN</td>\n",
    184        "      <td>1</td>\n",
    185        "      <td>2007-01-17</td>\n",
    186        "      <td>21839</td>\n",
    187        "    </tr>\n",
    188        "  </tbody>\n",
    189        "</table>"
    190       ],
    191       "text/plain": [
    192        "   event_id  asof_date trade_date symbol event_type  \\\n",
    193        "0    127421 2007-01-08 2007-01-09    DUK   Spin-off   \n",
    194        "1    134268 2007-01-08 2007-01-08    NCR   Spin-off   \n",
    195        "2     77960 2007-01-16 2007-01-16     VZ   Spin-off   \n",
    196        "\n",
    197        "                                      event_headline   spinoff_phase  \\\n",
    198        "0  Duke Energy completes Natural Gas business spi...       Completes   \n",
    199        "1     NCR To Separate Into Two Independent Companies        Proposal   \n",
    200        "2  Verizon to spin off and merge local exchange a...  Board Approval   \n",
    201        "\n",
    202        "  spinoff_name  event_rating  timestamp    sid  \n",
    203        "0          NaN             1 2007-01-09   2351  \n",
    204        "1          NaN             1 2007-01-09  16389  \n",
    205        "2          NaN             1 2007-01-17  21839  "
    206       ]
    207      },
    208      "execution_count": 4,
    209      "metadata": {},
    210      "output_type": "execute_result"
    211     }
    212    ],
    213    "source": [
    214     "# Let's see what the data looks like. We'll grab the first three rows.\n",
    215     "spin_offs[:3]"
    216    ]
    217   },
    218   {
    219    "cell_type": "markdown",
    220    "metadata": {},
    221    "source": [
    222     "Let's go over the columns:\n",
    223     "- **event_id**: the unique identifier for this event.\n",
    224     "- **asof_date**: EventVestor's timestamp of event capture.\n",
    225     "- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.\n",
    226     "- **symbol**: stock ticker symbol of the affected company.\n",
    227     "- **event_type**: this should always be *Spin-off*.\n",
    228     "- **event_headline**: a brief description of the event\n",
    229     "- **spinoff_phase**: values include *proposal, approval, completes*.\n",
    230     "- **spinoff_name**: name of the entity being spun off.\n",
    231     "- **event_rating**: this is always 1. The meaning of this is uncertain.\n",
    232     "- **timestamp**: this is our timestamp on when we registered the data.\n",
    233     "- **sid**: the equity's unique identifier. Use this instead of the symbol."
    234    ]
    235   },
    236   {
    237    "cell_type": "markdown",
    238    "metadata": {},
    239    "source": [
    240     "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n",
    241     "\n",
    242     "We can select columns and rows with ease. Below, we'll fetch Yahoo's 2015 spin-offs."
    243    ]
    244   },
    245   {
    246    "cell_type": "code",
    247    "execution_count": 5,
    248    "metadata": {
    249     "collapsed": false,
    250     "scrolled": true
    251    },
    252    "outputs": [
    253     {
    254      "data": {
    255       "text/html": [
    256        "<table border=\"1\" class=\"dataframe\">\n",
    257        "  <thead>\n",
    258        "    <tr style=\"text-align: right;\">\n",
    259        "      <th></th>\n",
    260        "      <th>event_id</th>\n",
    261        "      <th>asof_date</th>\n",
    262        "      <th>trade_date</th>\n",
    263        "      <th>symbol</th>\n",
    264        "      <th>event_type</th>\n",
    265        "      <th>event_headline</th>\n",
    266        "      <th>spinoff_phase</th>\n",
    267        "      <th>spinoff_name</th>\n",
    268        "      <th>event_rating</th>\n",
    269        "      <th>timestamp</th>\n",
    270        "      <th>sid</th>\n",
    271        "    </tr>\n",
    272        "  </thead>\n",
    273        "  <tbody>\n",
    274        "    <tr>\n",
    275        "      <th>0</th>\n",
    276        "      <td>1827542</td>\n",
    277        "      <td>2015-01-27</td>\n",
    278        "      <td>2015-01-28</td>\n",
    279        "      <td>YHOO</td>\n",
    280        "      <td>Spin-off</td>\n",
    281        "      <td>Yahoo to Spin-Off its Alibaba Stake into Newly...</td>\n",
    282        "      <td>Board Approval</td>\n",
    283        "      <td>NaN</td>\n",
    284        "      <td>1</td>\n",
    285        "      <td>2015-01-28 00:00:00</td>\n",
    286        "      <td>14848</td>\n",
    287        "    </tr>\n",
    288        "    <tr>\n",
    289        "      <th>1</th>\n",
    290        "      <td>1903562</td>\n",
    291        "      <td>2015-07-17</td>\n",
    292        "      <td>2015-07-20</td>\n",
    293        "      <td>YHOO</td>\n",
    294        "      <td>Spin-off</td>\n",
    295        "      <td>Yahoo! Announces SEC Filing for Planned Spin-O...</td>\n",
    296        "      <td>Updates</td>\n",
    297        "      <td>Aabaco Holdings Inc.</td>\n",
    298        "      <td>1</td>\n",
    299        "      <td>2015-07-18 00:00:00</td>\n",
    300        "      <td>14848</td>\n",
    301        "    </tr>\n",
    302        "    <tr>\n",
    303        "      <th>2</th>\n",
    304        "      <td>1937451</td>\n",
    305        "      <td>2015-09-28</td>\n",
    306        "      <td>2015-09-29</td>\n",
    307        "      <td>YHOO</td>\n",
    308        "      <td>Spin-off</td>\n",
    309        "      <td>Yahoo! to Proceed Alibaba Stake Spinoff withou...</td>\n",
    310        "      <td>Updates</td>\n",
    311        "      <td>Alibaba Holding Group Ltd.</td>\n",
    312        "      <td>1</td>\n",
    313        "      <td>2015-09-29 11:14:35.314487</td>\n",
    314        "      <td>14848</td>\n",
    315        "    </tr>\n",
    316        "  </tbody>\n",
    317        "</table>"
    318       ],
    319       "text/plain": [
    320        "   event_id  asof_date trade_date symbol event_type  \\\n",
    321        "0   1827542 2015-01-27 2015-01-28   YHOO   Spin-off   \n",
    322        "1   1903562 2015-07-17 2015-07-20   YHOO   Spin-off   \n",
    323        "2   1937451 2015-09-28 2015-09-29   YHOO   Spin-off   \n",
    324        "\n",
    325        "                                      event_headline   spinoff_phase  \\\n",
    326        "0  Yahoo to Spin-Off its Alibaba Stake into Newly...  Board Approval   \n",
    327        "1  Yahoo! Announces SEC Filing for Planned Spin-O...         Updates   \n",
    328        "2  Yahoo! to Proceed Alibaba Stake Spinoff withou...         Updates   \n",
    329        "\n",
    330        "                  spinoff_name  event_rating                  timestamp    sid  \n",
    331        "0                          NaN             1        2015-01-28 00:00:00  14848  \n",
    332        "1         Aabaco Holdings Inc.             1        2015-07-18 00:00:00  14848  \n",
    333        "2  Alibaba Holding Group Ltd.              1 2015-09-29 11:14:35.314487  14848  "
    334       ]
    335      },
    336      "execution_count": 5,
    337      "metadata": {},
    338      "output_type": "execute_result"
    339     }
    340    ],
    341    "source": [
    342     "# get yahoo's sid first\n",
    343     "yahoo_sid = symbols('YHOO').sid\n",
    344     "spinoffs = spin_offs[('2014-12-31' < spin_offs['asof_date']) & \n",
    345     "                                (spin_offs['asof_date'] <'2016-01-01') & \n",
    346     "                                (spin_offs.sid == yahoo_sid)]\n",
    347     "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n",
    348     "spinoffs.sort('asof_date')"
    349    ]
    350   },
    351   {
    352    "cell_type": "markdown",
    353    "metadata": {},
    354    "source": [
    355     "Now suppose we want a DataFrame of `spin_offs`, but only want the `asof_date, spinoff_phase`, and the `sid`."
    356    ]
    357   },
    358   {
    359    "cell_type": "code",
    360    "execution_count": 6,
    361    "metadata": {
    362     "collapsed": false
    363    },
    364    "outputs": [
    365     {
    366      "data": {
    367       "text/html": [
    368        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
    369        "<table border=\"1\" class=\"dataframe\">\n",
    370        "  <thead>\n",
    371        "    <tr style=\"text-align: right;\">\n",
    372        "      <th></th>\n",
    373        "      <th>asof_date</th>\n",
    374        "      <th>spinoff_phase</th>\n",
    375        "      <th>sid</th>\n",
    376        "    </tr>\n",
    377        "  </thead>\n",
    378        "  <tbody>\n",
    379        "    <tr>\n",
    380        "      <th>0</th>\n",
    381        "      <td>2007-01-08</td>\n",
    382        "      <td>Completes</td>\n",
    383        "      <td>2351</td>\n",
    384        "    </tr>\n",
    385        "    <tr>\n",
    386        "      <th>1</th>\n",
    387        "      <td>2007-01-08</td>\n",
    388        "      <td>Proposal</td>\n",
    389        "      <td>16389</td>\n",
    390        "    </tr>\n",
    391        "    <tr>\n",
    392        "      <th>2</th>\n",
    393        "      <td>2007-01-16</td>\n",
    394        "      <td>Board Approval</td>\n",
    395        "      <td>21839</td>\n",
    396        "    </tr>\n",
    397        "    <tr>\n",
    398        "      <th>3</th>\n",
    399        "      <td>2007-01-17</td>\n",
    400        "      <td>Updates</td>\n",
    401        "      <td>4758</td>\n",
    402        "    </tr>\n",
    403        "    <tr>\n",
    404        "      <th>4</th>\n",
    405        "      <td>2007-01-19</td>\n",
    406        "      <td>Updates</td>\n",
    407        "      <td>13373</td>\n",
    408        "    </tr>\n",
    409        "    <tr>\n",
    410        "      <th>5</th>\n",
    411        "      <td>2007-01-31</td>\n",
    412        "      <td>Completes</td>\n",
    413        "      <td>13373</td>\n",
    414        "    </tr>\n",
    415        "    <tr>\n",
    416        "      <th>6</th>\n",
    417        "      <td>2007-01-31</td>\n",
    418        "      <td>Board Approval</td>\n",
    419        "      <td>4954</td>\n",
    420        "    </tr>\n",
    421        "    <tr>\n",
    422        "      <th>8</th>\n",
    423        "      <td>2007-02-02</td>\n",
    424        "      <td>Updates</td>\n",
    425        "      <td>8326</td>\n",
    426        "    </tr>\n",
    427        "    <tr>\n",
    428        "      <th>9</th>\n",
    429        "      <td>2007-02-06</td>\n",
    430        "      <td>Updates</td>\n",
    431        "      <td>22954</td>\n",
    432        "    </tr>\n",
    433        "    <tr>\n",
    434        "      <th>10</th>\n",
    435        "      <td>2007-02-27</td>\n",
    436        "      <td>Proposal</td>\n",
    437        "      <td>22983</td>\n",
    438        "    </tr>\n",
    439        "    <tr>\n",
    440        "      <th>11</th>\n",
    441        "      <td>2007-02-27</td>\n",
    442        "      <td>Proposal</td>\n",
    443        "      <td>7449</td>\n",
    444        "    </tr>\n",
    445        "    <tr>\n",
    446        "      <th>12</th>\n",
    447        "      <td>2007-03-02</td>\n",
    448        "      <td>Updates</td>\n",
    449        "      <td>3443</td>\n",
    450        "    </tr>\n",
    451        "    <tr>\n",
    452        "      <th>13</th>\n",
    453        "      <td>2007-03-02</td>\n",
    454        "      <td>Updates</td>\n",
    455        "      <td>32880</td>\n",
    456        "    </tr>\n",
    457        "    <tr>\n",
    458        "      <th>14</th>\n",
    459        "      <td>2007-03-06</td>\n",
    460        "      <td>Updates</td>\n",
    461        "      <td>22983</td>\n",
    462        "    </tr>\n",
    463        "    <tr>\n",
    464        "      <th>15</th>\n",
    465        "      <td>2007-03-07</td>\n",
    466        "      <td>Completes</td>\n",
    467        "      <td>8326</td>\n",
    468        "    </tr>\n",
    469        "    <tr>\n",
    470        "      <th>16</th>\n",
    471        "      <td>2007-03-20</td>\n",
    472        "      <td>Updates</td>\n",
    473        "      <td>4954</td>\n",
    474        "    </tr>\n",
    475        "    <tr>\n",
    476        "      <th>17</th>\n",
    477        "      <td>2007-03-21</td>\n",
    478        "      <td>Proposal</td>\n",
    479        "      <td>630</td>\n",
    480        "    </tr>\n",
    481        "    <tr>\n",
    482        "      <th>18</th>\n",
    483        "      <td>2007-03-26</td>\n",
    484        "      <td>Updates</td>\n",
    485        "      <td>22954</td>\n",
    486        "    </tr>\n",
    487        "    <tr>\n",
    488        "      <th>19</th>\n",
    489        "      <td>2007-03-30</td>\n",
    490        "      <td>Completes</td>\n",
    491        "      <td>4954</td>\n",
    492        "    </tr>\n",
    493        "    <tr>\n",
    494        "      <th>20</th>\n",
    495        "      <td>2007-04-03</td>\n",
    496        "      <td>Updates</td>\n",
    497        "      <td>3443</td>\n",
    498        "    </tr>\n",
    499        "    <tr>\n",
    500        "      <th>21</th>\n",
    501        "      <td>2007-04-03</td>\n",
    502        "      <td>Updates</td>\n",
    503        "      <td>32880</td>\n",
    504        "    </tr>\n",
    505        "    <tr>\n",
    506        "      <th>22</th>\n",
    507        "      <td>2007-04-04</td>\n",
    508        "      <td>Completes</td>\n",
    509        "      <td>630</td>\n",
    510        "    </tr>\n",
    511        "    <tr>\n",
    512        "      <th>23</th>\n",
    513        "      <td>2007-04-04</td>\n",
    514        "      <td>Proposal</td>\n",
    515        "      <td>5025</td>\n",
    516        "    </tr>\n",
    517        "    <tr>\n",
    518        "      <th>24</th>\n",
    519        "      <td>2007-04-05</td>\n",
    520        "      <td>Completes</td>\n",
    521        "      <td>3443</td>\n",
    522        "    </tr>\n",
    523        "    <tr>\n",
    524        "      <th>25</th>\n",
    525        "      <td>2007-04-05</td>\n",
    526        "      <td>Completes</td>\n",
    527        "      <td>32880</td>\n",
    528        "    </tr>\n",
    529        "    <tr>\n",
    530        "      <th>26</th>\n",
    531        "      <td>2007-04-10</td>\n",
    532        "      <td>Proposal</td>\n",
    533        "      <td>18027</td>\n",
    534        "    </tr>\n",
    535        "    <tr>\n",
    536        "      <th>28</th>\n",
    537        "      <td>2007-05-15</td>\n",
    538        "      <td>Proposal</td>\n",
    539        "      <td>4010</td>\n",
    540        "    </tr>\n",
    541        "    <tr>\n",
    542        "      <th>29</th>\n",
    543        "      <td>2007-05-26</td>\n",
    544        "      <td>Updates</td>\n",
    545        "      <td>2190</td>\n",
    546        "    </tr>\n",
    547        "    <tr>\n",
    548        "      <th>30</th>\n",
    549        "      <td>2007-06-01</td>\n",
    550        "      <td>Board Approval</td>\n",
    551        "      <td>17080</td>\n",
    552        "    </tr>\n",
    553        "    <tr>\n",
    554        "      <th>31</th>\n",
    555        "      <td>2007-06-08</td>\n",
    556        "      <td>Proposal</td>\n",
    557        "      <td>7679</td>\n",
    558        "    </tr>\n",
    559        "    <tr>\n",
    560        "      <th>...</th>\n",
    561        "      <td>...</td>\n",
    562        "      <td>...</td>\n",
    563        "      <td>...</td>\n",
    564        "    </tr>\n",
    565        "    <tr>\n",
    566        "      <th>1028</th>\n",
    567        "      <td>2015-07-13</td>\n",
    568        "      <td>Updates</td>\n",
    569        "      <td>34575</td>\n",
    570        "    </tr>\n",
    571        "    <tr>\n",
    572        "      <th>1029</th>\n",
    573        "      <td>2015-07-14</td>\n",
    574        "      <td>Board Approval</td>\n",
    575        "      <td>9693</td>\n",
    576        "    </tr>\n",
    577        "    <tr>\n",
    578        "      <th>1030</th>\n",
    579        "      <td>2015-07-17</td>\n",
    580        "      <td>Updates</td>\n",
    581        "      <td>14848</td>\n",
    582        "    </tr>\n",
    583        "    <tr>\n",
    584        "      <th>1031</th>\n",
    585        "      <td>2015-07-20</td>\n",
    586        "      <td>Completes</td>\n",
    587        "      <td>24819</td>\n",
    588        "    </tr>\n",
    589        "    <tr>\n",
    590        "      <th>1032</th>\n",
    591        "      <td>2015-07-22</td>\n",
    592        "      <td>Updates</td>\n",
    593        "      <td>34575</td>\n",
    594        "    </tr>\n",
    595        "    <tr>\n",
    596        "      <th>1034</th>\n",
    597        "      <td>2015-07-24</td>\n",
    598        "      <td>Updates</td>\n",
    599        "      <td>34575</td>\n",
    600        "    </tr>\n",
    601        "    <tr>\n",
    602        "      <th>1035</th>\n",
    603        "      <td>2015-07-24</td>\n",
    604        "      <td>Updates</td>\n",
    605        "      <td>4117</td>\n",
    606        "    </tr>\n",
    607        "    <tr>\n",
    608        "      <th>1036</th>\n",
    609        "      <td>2015-07-30</td>\n",
    610        "      <td>Updates</td>\n",
    611        "      <td>22015</td>\n",
    612        "    </tr>\n",
    613        "    <tr>\n",
    614        "      <th>1037</th>\n",
    615        "      <td>2015-07-30</td>\n",
    616        "      <td>Board Approval</td>\n",
    617        "      <td>18821</td>\n",
    618        "    </tr>\n",
    619        "    <tr>\n",
    620        "      <th>1038</th>\n",
    621        "      <td>2015-07-31</td>\n",
    622        "      <td>Updates</td>\n",
    623        "      <td>13306</td>\n",
    624        "    </tr>\n",
    625        "    <tr>\n",
    626        "      <th>1039</th>\n",
    627        "      <td>2015-08-03</td>\n",
    628        "      <td>Completes</td>\n",
    629        "      <td>9693</td>\n",
    630        "    </tr>\n",
    631        "    <tr>\n",
    632        "      <th>1040</th>\n",
    633        "      <td>2015-08-03</td>\n",
    634        "      <td>Proposal</td>\n",
    635        "      <td>21608</td>\n",
    636        "    </tr>\n",
    637        "    <tr>\n",
    638        "      <th>1041</th>\n",
    639        "      <td>2015-08-04</td>\n",
    640        "      <td>Proposal</td>\n",
    641        "      <td>2248</td>\n",
    642        "    </tr>\n",
    643        "    <tr>\n",
    644        "      <th>1042</th>\n",
    645        "      <td>2015-08-06</td>\n",
    646        "      <td>Proposal</td>\n",
    647        "      <td>32878</td>\n",
    648        "    </tr>\n",
    649        "    <tr>\n",
    650        "      <th>1043</th>\n",
    651        "      <td>2015-08-06</td>\n",
    652        "      <td>Updates</td>\n",
    653        "      <td>47812</td>\n",
    654        "    </tr>\n",
    655        "    <tr>\n",
    656        "      <th>1044</th>\n",
    657        "      <td>2015-08-18</td>\n",
    658        "      <td>Completes</td>\n",
    659        "      <td>18821</td>\n",
    660        "    </tr>\n",
    661        "    <tr>\n",
    662        "      <th>1045</th>\n",
    663        "      <td>2015-08-27</td>\n",
    664        "      <td>NaN</td>\n",
    665        "      <td>22689</td>\n",
    666        "    </tr>\n",
    667        "    <tr>\n",
    668        "      <th>1046</th>\n",
    669        "      <td>2015-08-31</td>\n",
    670        "      <td>Updates</td>\n",
    671        "      <td>1898</td>\n",
    672        "    </tr>\n",
    673        "    <tr>\n",
    674        "      <th>1047</th>\n",
    675        "      <td>2015-09-01</td>\n",
    676        "      <td>Updates</td>\n",
    677        "      <td>4656</td>\n",
    678        "    </tr>\n",
    679        "    <tr>\n",
    680        "      <th>1048</th>\n",
    681        "      <td>2015-09-04</td>\n",
    682        "      <td>Updates</td>\n",
    683        "      <td>21608</td>\n",
    684        "    </tr>\n",
    685        "    <tr>\n",
    686        "      <th>1049</th>\n",
    687        "      <td>2015-09-08</td>\n",
    688        "      <td>Board Approval</td>\n",
    689        "      <td>1936</td>\n",
    690        "    </tr>\n",
    691        "    <tr>\n",
    692        "      <th>1050</th>\n",
    693        "      <td>2015-09-08</td>\n",
    694        "      <td>NaN</td>\n",
    695        "      <td>42176</td>\n",
    696        "    </tr>\n",
    697        "    <tr>\n",
    698        "      <th>1051</th>\n",
    699        "      <td>2015-09-11</td>\n",
    700        "      <td>Updates</td>\n",
    701        "      <td>34067</td>\n",
    702        "    </tr>\n",
    703        "    <tr>\n",
    704        "      <th>1052</th>\n",
    705        "      <td>2015-09-15</td>\n",
    706        "      <td>Updates</td>\n",
    707        "      <td>11498</td>\n",
    708        "    </tr>\n",
    709        "    <tr>\n",
    710        "      <th>1053</th>\n",
    711        "      <td>2015-09-16</td>\n",
    712        "      <td>Board Approval</td>\n",
    713        "      <td>460</td>\n",
    714        "    </tr>\n",
    715        "    <tr>\n",
    716        "      <th>1054</th>\n",
    717        "      <td>2015-09-22</td>\n",
    718        "      <td>Proposal</td>\n",
    719        "      <td>559</td>\n",
    720        "    </tr>\n",
    721        "    <tr>\n",
    722        "      <th>1055</th>\n",
    723        "      <td>2015-09-28</td>\n",
    724        "      <td>NaN</td>\n",
    725        "      <td>2</td>\n",
    726        "    </tr>\n",
    727        "    <tr>\n",
    728        "      <th>1160</th>\n",
    729        "      <td>2014-07-11</td>\n",
    730        "      <td>Board Approval</td>\n",
    731        "      <td>5249</td>\n",
    732        "    </tr>\n",
    733        "    <tr>\n",
    734        "      <th>1187</th>\n",
    735        "      <td>2015-09-28</td>\n",
    736        "      <td>Completes</td>\n",
    737        "      <td>7086</td>\n",
    738        "    </tr>\n",
    739        "    <tr>\n",
    740        "      <th>1188</th>\n",
    741        "      <td>2015-09-28</td>\n",
    742        "      <td>Updates</td>\n",
    743        "      <td>14848</td>\n",
    744        "    </tr>\n",
    745        "  </tbody>\n",
    746        "</table>\n",
    747        "<p>929 rows × 3 columns</p>\n",
    748        "</div>"
    749       ],
    750       "text/plain": [
    751        "      asof_date   spinoff_phase    sid\n",
    752        "0    2007-01-08       Completes   2351\n",
    753        "1    2007-01-08        Proposal  16389\n",
    754        "2    2007-01-16  Board Approval  21839\n",
    755        "3    2007-01-17         Updates   4758\n",
    756        "4    2007-01-19         Updates  13373\n",
    757        "5    2007-01-31       Completes  13373\n",
    758        "6    2007-01-31  Board Approval   4954\n",
    759        "8    2007-02-02         Updates   8326\n",
    760        "9    2007-02-06         Updates  22954\n",
    761        "10   2007-02-27        Proposal  22983\n",
    762        "11   2007-02-27        Proposal   7449\n",
    763        "12   2007-03-02         Updates   3443\n",
    764        "13   2007-03-02         Updates  32880\n",
    765        "14   2007-03-06         Updates  22983\n",
    766        "15   2007-03-07       Completes   8326\n",
    767        "16   2007-03-20         Updates   4954\n",
    768        "17   2007-03-21        Proposal    630\n",
    769        "18   2007-03-26         Updates  22954\n",
    770        "19   2007-03-30       Completes   4954\n",
    771        "20   2007-04-03         Updates   3443\n",
    772        "21   2007-04-03         Updates  32880\n",
    773        "22   2007-04-04       Completes    630\n",
    774        "23   2007-04-04        Proposal   5025\n",
    775        "24   2007-04-05       Completes   3443\n",
    776        "25   2007-04-05       Completes  32880\n",
    777        "26   2007-04-10        Proposal  18027\n",
    778        "28   2007-05-15        Proposal   4010\n",
    779        "29   2007-05-26         Updates   2190\n",
    780        "30   2007-06-01  Board Approval  17080\n",
    781        "31   2007-06-08        Proposal   7679\n",
    782        "...         ...             ...    ...\n",
    783        "1028 2015-07-13         Updates  34575\n",
    784        "1029 2015-07-14  Board Approval   9693\n",
    785        "1030 2015-07-17         Updates  14848\n",
    786        "1031 2015-07-20       Completes  24819\n",
    787        "1032 2015-07-22         Updates  34575\n",
    788        "1034 2015-07-24         Updates  34575\n",
    789        "1035 2015-07-24         Updates   4117\n",
    790        "1036 2015-07-30         Updates  22015\n",
    791        "1037 2015-07-30  Board Approval  18821\n",
    792        "1038 2015-07-31         Updates  13306\n",
    793        "1039 2015-08-03       Completes   9693\n",
    794        "1040 2015-08-03        Proposal  21608\n",
    795        "1041 2015-08-04        Proposal   2248\n",
    796        "1042 2015-08-06        Proposal  32878\n",
    797        "1043 2015-08-06         Updates  47812\n",
    798        "1044 2015-08-18       Completes  18821\n",
    799        "1045 2015-08-27             NaN  22689\n",
    800        "1046 2015-08-31         Updates   1898\n",
    801        "1047 2015-09-01         Updates   4656\n",
    802        "1048 2015-09-04         Updates  21608\n",
    803        "1049 2015-09-08  Board Approval   1936\n",
    804        "1050 2015-09-08             NaN  42176\n",
    805        "1051 2015-09-11         Updates  34067\n",
    806        "1052 2015-09-15         Updates  11498\n",
    807        "1053 2015-09-16  Board Approval    460\n",
    808        "1054 2015-09-22        Proposal    559\n",
    809        "1055 2015-09-28             NaN      2\n",
    810        "1160 2014-07-11  Board Approval   5249\n",
    811        "1187 2015-09-28       Completes   7086\n",
    812        "1188 2015-09-28         Updates  14848\n",
    813        "\n",
    814        "[929 rows x 3 columns]"
    815       ]
    816      },
    817      "execution_count": 6,
    818      "metadata": {},
    819      "output_type": "execute_result"
    820     }
    821    ],
    822    "source": [
    823     "#len(spin_offs) = ~10000, so we can convert it to a dataframe without a worry -- it's a small dataset.\n",
    824     "df = odo(spin_offs, pd.DataFrame)\n",
    825     "df = df[['asof_date','spinoff_phase','sid']]\n",
    826     "df = df[df.sid.notnull()]\n",
    827     "# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.\n",
    828     "df"
    829    ]
    830   },
    831   {
    832    "cell_type": "code",
    833    "execution_count": null,
    834    "metadata": {
    835     "collapsed": true
    836    },
    837    "outputs": [],
    838    "source": []
    839   }
    840  ],
    841  "metadata": {
    842   "kernelspec": {
    843    "display_name": "Python 2",
    844    "language": "python",
    845    "name": "python2"
    846   },
    847   "language_info": {
    848    "codemirror_mode": {
    849     "name": "ipython",
    850     "version": 2
    851    },
    852    "file_extension": ".py",
    853    "mimetype": "text/x-python",
    854    "name": "python",
    855    "nbconvert_exporter": "python",
    856    "pygments_lexer": "ipython2",
    857    "version": "2.7.10"
    858   }
    859  },
    860  "nbformat": 4,
    861  "nbformat_minor": 0
    862 }