ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(34539B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "collapsed": true
      7    },
      8    "source": [
      9     "# EventVestor: Earnings Releases\n",
     10     "\n",
     11     "In this notebook, we'll take a look at EventVestor's *Earnings Releases* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents quarterly earnings releases.\n",
     12     "\n",
     13     "### Blaze\n",
     14     "Before we dig into the data, we want to tell you about how  you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n",
     15     "\n",
     16     "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n",
     17     "\n",
     18     "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n",
     19     "\n",
     20     "Helpful links:\n",
     21     "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n",
     22     "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n",
     23     "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n",
     24     "\n",
     25     "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n",
     26     "> `from odo import odo`  \n",
     27     "> `odo(expr, pandas.DataFrame)`\n",
     28     "\n",
     29     "### Free samples and limits\n",
     30     "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n",
     31     "\n",
     32     "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n",
     33     "\n",
     34     "With preamble in place, let's get started:"
     35    ]
     36   },
     37   {
     38    "cell_type": "code",
     39    "execution_count": 1,
     40    "metadata": {
     41     "collapsed": false
     42    },
     43    "outputs": [],
     44    "source": [
     45     "# import the dataset\n",
     46     "from quantopian.interactive.data.eventvestor import earnings_releases\n",
     47     "# or if you want to import the free dataset, use:\n",
     48     "# from quantopian.interactivedata.eventvestor import earnings_releases_free\n",
     49     "\n",
     50     "# import data operations\n",
     51     "from odo import odo\n",
     52     "# import other libraries we will use\n",
     53     "import pandas as pd"
     54    ]
     55   },
     56   {
     57    "cell_type": "code",
     58    "execution_count": 2,
     59    "metadata": {
     60     "collapsed": false
     61    },
     62    "outputs": [
     63     {
     64      "data": {
     65       "text/plain": [
     66        "dshape(\"\"\"var * {\n",
     67        "  event_id: ?float64,\n",
     68        "  asof_date: datetime,\n",
     69        "  trade_date: ?datetime,\n",
     70        "  symbol: ?string,\n",
     71        "  event_type: ?string,\n",
     72        "  event_headline: ?string,\n",
     73        "  event_phase: ?string,\n",
     74        "  fiscal_period: ?string,\n",
     75        "  calendar_period: ?string,\n",
     76        "  fiscal_periodend: ?datetime,\n",
     77        "  currency: ?string,\n",
     78        "  revenue: ?float64,\n",
     79        "  gross_income: ?float64,\n",
     80        "  operating_income: ?float64,\n",
     81        "  net_income: ?float64,\n",
     82        "  eps: ?float64,\n",
     83        "  eps_surprisepct: ?float64,\n",
     84        "  event_rating: ?float64,\n",
     85        "  timestamp: datetime,\n",
     86        "  sid: ?int64\n",
     87        "  }\"\"\")"
     88       ]
     89      },
     90      "execution_count": 2,
     91      "metadata": {},
     92      "output_type": "execute_result"
     93     }
     94    ],
     95    "source": [
     96     "# Let's use blaze to understand the data a bit using Blaze dshape()\n",
     97     "earnings_releases.dshape"
     98    ]
     99   },
    100   {
    101    "cell_type": "code",
    102    "execution_count": 3,
    103    "metadata": {
    104     "collapsed": false
    105    },
    106    "outputs": [
    107     {
    108      "data": {
    109       "text/html": [
    110        "139427"
    111       ],
    112       "text/plain": [
    113        "139427"
    114       ]
    115      },
    116      "execution_count": 3,
    117      "metadata": {},
    118      "output_type": "execute_result"
    119     }
    120    ],
    121    "source": [
    122     "# And how many rows are there?\n",
    123     "# N.B. we're using a Blaze function to do this, not len()\n",
    124     "earnings_releases.count()"
    125    ]
    126   },
    127   {
    128    "cell_type": "code",
    129    "execution_count": 4,
    130    "metadata": {
    131     "collapsed": false
    132    },
    133    "outputs": [
    134     {
    135      "data": {
    136       "text/html": [
    137        "<table border=\"1\" class=\"dataframe\">\n",
    138        "  <thead>\n",
    139        "    <tr style=\"text-align: right;\">\n",
    140        "      <th></th>\n",
    141        "      <th>event_id</th>\n",
    142        "      <th>asof_date</th>\n",
    143        "      <th>trade_date</th>\n",
    144        "      <th>symbol</th>\n",
    145        "      <th>event_type</th>\n",
    146        "      <th>event_headline</th>\n",
    147        "      <th>event_phase</th>\n",
    148        "      <th>fiscal_period</th>\n",
    149        "      <th>calendar_period</th>\n",
    150        "      <th>fiscal_periodend</th>\n",
    151        "      <th>currency</th>\n",
    152        "      <th>revenue</th>\n",
    153        "      <th>gross_income</th>\n",
    154        "      <th>operating_income</th>\n",
    155        "      <th>net_income</th>\n",
    156        "      <th>eps</th>\n",
    157        "      <th>eps_surprisepct</th>\n",
    158        "      <th>event_rating</th>\n",
    159        "      <th>timestamp</th>\n",
    160        "      <th>sid</th>\n",
    161        "    </tr>\n",
    162        "  </thead>\n",
    163        "  <tbody>\n",
    164        "    <tr>\n",
    165        "      <th>0</th>\n",
    166        "      <td>526391</td>\n",
    167        "      <td>2007-01-03</td>\n",
    168        "      <td>2007-01-04</td>\n",
    169        "      <td>ANGO</td>\n",
    170        "      <td>Earnings Release</td>\n",
    171        "      <td>AngioDynamics 2Q Net up 48%</td>\n",
    172        "      <td>NaN</td>\n",
    173        "      <td>2Q 07</td>\n",
    174        "      <td>4Q 06</td>\n",
    175        "      <td>2006-12-02</td>\n",
    176        "      <td>$</td>\n",
    177        "      <td>24.37</td>\n",
    178        "      <td>14.24</td>\n",
    179        "      <td>3.0</td>\n",
    180        "      <td>2.45</td>\n",
    181        "      <td>0.15</td>\n",
    182        "      <td>0</td>\n",
    183        "      <td>1</td>\n",
    184        "      <td>2007-01-04</td>\n",
    185        "      <td>26324</td>\n",
    186        "    </tr>\n",
    187        "    <tr>\n",
    188        "      <th>1</th>\n",
    189        "      <td>196507</td>\n",
    190        "      <td>2007-01-03</td>\n",
    191        "      <td>2007-01-04</td>\n",
    192        "      <td>BLUD</td>\n",
    193        "      <td>Earnings Release</td>\n",
    194        "      <td>Immucor Reports 2Q Results</td>\n",
    195        "      <td>NaN</td>\n",
    196        "      <td>NaN</td>\n",
    197        "      <td>NaN</td>\n",
    198        "      <td>NaT</td>\n",
    199        "      <td>NaN</td>\n",
    200        "      <td>0.00</td>\n",
    201        "      <td>0.00</td>\n",
    202        "      <td>0.0</td>\n",
    203        "      <td>0.00</td>\n",
    204        "      <td>0.00</td>\n",
    205        "      <td>0</td>\n",
    206        "      <td>1</td>\n",
    207        "      <td>2007-01-04</td>\n",
    208        "      <td>955</td>\n",
    209        "    </tr>\n",
    210        "    <tr>\n",
    211        "      <th>2</th>\n",
    212        "      <td>180559</td>\n",
    213        "      <td>2007-01-03</td>\n",
    214        "      <td>2007-01-03</td>\n",
    215        "      <td>CALM</td>\n",
    216        "      <td>Earnings Release</td>\n",
    217        "      <td>CAL-MAINE FOODS REPORTS 2Q 07 RESULTS</td>\n",
    218        "      <td>NaN</td>\n",
    219        "      <td>2Q 07</td>\n",
    220        "      <td>4Q 06</td>\n",
    221        "      <td>2006-12-02</td>\n",
    222        "      <td>$</td>\n",
    223        "      <td>137.74</td>\n",
    224        "      <td>24.96</td>\n",
    225        "      <td>10.5</td>\n",
    226        "      <td>6.40</td>\n",
    227        "      <td>0.27</td>\n",
    228        "      <td>0</td>\n",
    229        "      <td>1</td>\n",
    230        "      <td>2007-01-04</td>\n",
    231        "      <td>16169</td>\n",
    232        "    </tr>\n",
    233        "  </tbody>\n",
    234        "</table>"
    235       ],
    236       "text/plain": [
    237        "   event_id  asof_date trade_date symbol        event_type  \\\n",
    238        "0    526391 2007-01-03 2007-01-04   ANGO  Earnings Release   \n",
    239        "1    196507 2007-01-03 2007-01-04   BLUD  Earnings Release   \n",
    240        "2    180559 2007-01-03 2007-01-03   CALM  Earnings Release   \n",
    241        "\n",
    242        "                          event_headline event_phase fiscal_period  \\\n",
    243        "0            AngioDynamics 2Q Net up 48%         NaN         2Q 07   \n",
    244        "1             Immucor Reports 2Q Results         NaN           NaN   \n",
    245        "2  CAL-MAINE FOODS REPORTS 2Q 07 RESULTS         NaN         2Q 07   \n",
    246        "\n",
    247        "  calendar_period fiscal_periodend currency  revenue  gross_income  \\\n",
    248        "0           4Q 06       2006-12-02        $    24.37         14.24   \n",
    249        "1             NaN              NaT      NaN     0.00          0.00   \n",
    250        "2           4Q 06       2006-12-02        $   137.74         24.96   \n",
    251        "\n",
    252        "   operating_income  net_income   eps  eps_surprisepct  event_rating  \\\n",
    253        "0               3.0        2.45  0.15                0             1   \n",
    254        "1               0.0        0.00  0.00                0             1   \n",
    255        "2              10.5        6.40  0.27                0             1   \n",
    256        "\n",
    257        "   timestamp    sid  \n",
    258        "0 2007-01-04  26324  \n",
    259        "1 2007-01-04    955  \n",
    260        "2 2007-01-04  16169  "
    261       ]
    262      },
    263      "execution_count": 4,
    264      "metadata": {},
    265      "output_type": "execute_result"
    266     }
    267    ],
    268    "source": [
    269     "# Let's see what the data looks like. We'll grab the first three rows.\n",
    270     "earnings_releases[:3]"
    271    ]
    272   },
    273   {
    274    "cell_type": "markdown",
    275    "metadata": {},
    276    "source": [
    277     "Let's go over the columns:\n",
    278     "- **event_id**: the unique identifier for this event.\n",
    279     "- **asof_date**: EventVestor's timestamp of event capture.\n",
    280     "- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.\n",
    281     "- **symbol**: stock ticker symbol of the affected company.\n",
    282     "- **event_type**: this should always be *Earnings Release/Earnings release*.\n",
    283     "- **event_headline**: a brief description of the event\n",
    284     "- **event_phase**: the inclusion of this field is likely an error on the part of the data vendor. We're currently attempting to resolve this.\n",
    285     "- **fiscal_period**: fiscal period for the reported earnings, such as 1Q 15, 2Q 15, etc.\n",
    286     "- **calendar_period**: identifies the calendar period based on the fiscal period end date. E.g. if the fiscal period ends any time after the middle of a given calendar quarter, like 1Q 15, that calendar quarter will be assigned regardless of the fiscal quarter.\n",
    287     "- **fiscal_periodend**: the last date for the reported earnings period.\n",
    288     "- **currency**: currency used for reporting earnings.\n",
    289     "- **revenue**: revenue in millions\n",
    290     "- **gross_income**: gross income in millions\n",
    291     "- **operating_income**: operating income in millions\n",
    292     "- **net_income**: net income in millions\n",
    293     "- **eps**: earnings per share, in the reported currency\n",
    294     "- **eps_surprisepct**: the meaning of this column is presently uncertain. We're working with our data vendor to resolve this issue.\n",
    295     "- **event_rating**: this is always 1. The meaning of this is uncertain.\n",
    296     "- **timestamp**: this is our timestamp on when we registered the data.\n",
    297     "- **sid**: the equity's unique identifier. Use this instead of the symbol."
    298    ]
    299   },
    300   {
    301    "cell_type": "markdown",
    302    "metadata": {},
    303    "source": [
    304     "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n",
    305     "\n",
    306     "We can select columns and rows with ease. Below, we'll fetch all of Apple's entries from 2012."
    307    ]
    308   },
    309   {
    310    "cell_type": "code",
    311    "execution_count": 5,
    312    "metadata": {
    313     "collapsed": false,
    314     "scrolled": true
    315    },
    316    "outputs": [
    317     {
    318      "data": {
    319       "text/html": [
    320        "<table border=\"1\" class=\"dataframe\">\n",
    321        "  <thead>\n",
    322        "    <tr style=\"text-align: right;\">\n",
    323        "      <th></th>\n",
    324        "      <th>event_id</th>\n",
    325        "      <th>asof_date</th>\n",
    326        "      <th>trade_date</th>\n",
    327        "      <th>symbol</th>\n",
    328        "      <th>event_type</th>\n",
    329        "      <th>event_headline</th>\n",
    330        "      <th>event_phase</th>\n",
    331        "      <th>fiscal_period</th>\n",
    332        "      <th>calendar_period</th>\n",
    333        "      <th>fiscal_periodend</th>\n",
    334        "      <th>currency</th>\n",
    335        "      <th>revenue</th>\n",
    336        "      <th>gross_income</th>\n",
    337        "      <th>operating_income</th>\n",
    338        "      <th>net_income</th>\n",
    339        "      <th>eps</th>\n",
    340        "      <th>eps_surprisepct</th>\n",
    341        "      <th>event_rating</th>\n",
    342        "      <th>timestamp</th>\n",
    343        "      <th>sid</th>\n",
    344        "    </tr>\n",
    345        "  </thead>\n",
    346        "  <tbody>\n",
    347        "    <tr>\n",
    348        "      <th>0</th>\n",
    349        "      <td>1385939</td>\n",
    350        "      <td>2012-01-24</td>\n",
    351        "      <td>2012-01-25</td>\n",
    352        "      <td>AAPL</td>\n",
    353        "      <td>Earnings Release</td>\n",
    354        "      <td>Apple 1Q 12 Net Jumps to $13B on Higher Revenues</td>\n",
    355        "      <td>NaN</td>\n",
    356        "      <td>1Q 12</td>\n",
    357        "      <td>4Q 11</td>\n",
    358        "      <td>2011-12-31</td>\n",
    359        "      <td>$</td>\n",
    360        "      <td>46333</td>\n",
    361        "      <td>20703</td>\n",
    362        "      <td>17340</td>\n",
    363        "      <td>13064</td>\n",
    364        "      <td>13.87</td>\n",
    365        "      <td>38.29</td>\n",
    366        "      <td>1</td>\n",
    367        "      <td>2012-01-25</td>\n",
    368        "      <td>24</td>\n",
    369        "    </tr>\n",
    370        "    <tr>\n",
    371        "      <th>1</th>\n",
    372        "      <td>1421108</td>\n",
    373        "      <td>2012-04-24</td>\n",
    374        "      <td>2012-04-25</td>\n",
    375        "      <td>AAPL</td>\n",
    376        "      <td>Earnings Release</td>\n",
    377        "      <td>Apple 2Q 12 Net Up 94% on Higher Revenues</td>\n",
    378        "      <td>NaN</td>\n",
    379        "      <td>2Q 12</td>\n",
    380        "      <td>1Q 12</td>\n",
    381        "      <td>2012-03-31</td>\n",
    382        "      <td>$</td>\n",
    383        "      <td>39186</td>\n",
    384        "      <td>18564</td>\n",
    385        "      <td>15384</td>\n",
    386        "      <td>11622</td>\n",
    387        "      <td>12.30</td>\n",
    388        "      <td>23.74</td>\n",
    389        "      <td>1</td>\n",
    390        "      <td>2012-04-25</td>\n",
    391        "      <td>24</td>\n",
    392        "    </tr>\n",
    393        "    <tr>\n",
    394        "      <th>2</th>\n",
    395        "      <td>1456685</td>\n",
    396        "      <td>2012-07-24</td>\n",
    397        "      <td>2012-07-25</td>\n",
    398        "      <td>AAPL</td>\n",
    399        "      <td>Earnings Release</td>\n",
    400        "      <td>Apple 3Q 12 Net Up 21%</td>\n",
    401        "      <td>NaN</td>\n",
    402        "      <td>3Q 12</td>\n",
    403        "      <td>2Q 12</td>\n",
    404        "      <td>2012-06-30</td>\n",
    405        "      <td>$</td>\n",
    406        "      <td>35023</td>\n",
    407        "      <td>14994</td>\n",
    408        "      <td>11573</td>\n",
    409        "      <td>8824</td>\n",
    410        "      <td>9.32</td>\n",
    411        "      <td>-10.21</td>\n",
    412        "      <td>1</td>\n",
    413        "      <td>2012-07-25</td>\n",
    414        "      <td>24</td>\n",
    415        "    </tr>\n",
    416        "    <tr>\n",
    417        "      <th>3</th>\n",
    418        "      <td>1496807</td>\n",
    419        "      <td>2012-10-25</td>\n",
    420        "      <td>2012-10-26</td>\n",
    421        "      <td>AAPL</td>\n",
    422        "      <td>Earnings Release</td>\n",
    423        "      <td>Apple 4Q 12 Net Up 24%</td>\n",
    424        "      <td>NaN</td>\n",
    425        "      <td>4Q 12</td>\n",
    426        "      <td>3Q 12</td>\n",
    427        "      <td>2012-09-29</td>\n",
    428        "      <td>$</td>\n",
    429        "      <td>35966</td>\n",
    430        "      <td>14401</td>\n",
    431        "      <td>10944</td>\n",
    432        "      <td>8223</td>\n",
    433        "      <td>8.67</td>\n",
    434        "      <td>-2.03</td>\n",
    435        "      <td>1</td>\n",
    436        "      <td>2012-10-26</td>\n",
    437        "      <td>24</td>\n",
    438        "    </tr>\n",
    439        "  </tbody>\n",
    440        "</table>"
    441       ],
    442       "text/plain": [
    443        "   event_id  asof_date trade_date symbol        event_type  \\\n",
    444        "0   1385939 2012-01-24 2012-01-25   AAPL  Earnings Release   \n",
    445        "1   1421108 2012-04-24 2012-04-25   AAPL  Earnings Release   \n",
    446        "2   1456685 2012-07-24 2012-07-25   AAPL  Earnings Release   \n",
    447        "3   1496807 2012-10-25 2012-10-26   AAPL  Earnings Release   \n",
    448        "\n",
    449        "                                     event_headline event_phase fiscal_period  \\\n",
    450        "0  Apple 1Q 12 Net Jumps to $13B on Higher Revenues         NaN         1Q 12   \n",
    451        "1         Apple 2Q 12 Net Up 94% on Higher Revenues         NaN         2Q 12   \n",
    452        "2                            Apple 3Q 12 Net Up 21%         NaN         3Q 12   \n",
    453        "3                            Apple 4Q 12 Net Up 24%         NaN         4Q 12   \n",
    454        "\n",
    455        "  calendar_period fiscal_periodend currency  revenue  gross_income  \\\n",
    456        "0           4Q 11       2011-12-31        $    46333         20703   \n",
    457        "1           1Q 12       2012-03-31        $    39186         18564   \n",
    458        "2           2Q 12       2012-06-30        $    35023         14994   \n",
    459        "3           3Q 12       2012-09-29        $    35966         14401   \n",
    460        "\n",
    461        "   operating_income  net_income    eps  eps_surprisepct  event_rating  \\\n",
    462        "0             17340       13064  13.87            38.29             1   \n",
    463        "1             15384       11622  12.30            23.74             1   \n",
    464        "2             11573        8824   9.32           -10.21             1   \n",
    465        "3             10944        8223   8.67            -2.03             1   \n",
    466        "\n",
    467        "   timestamp  sid  \n",
    468        "0 2012-01-25   24  \n",
    469        "1 2012-04-25   24  \n",
    470        "2 2012-07-25   24  \n",
    471        "3 2012-10-26   24  "
    472       ]
    473      },
    474      "execution_count": 5,
    475      "metadata": {},
    476      "output_type": "execute_result"
    477     }
    478    ],
    479    "source": [
    480     "# get apple's sid first\n",
    481     "aapl_sid = symbols('AAPL').sid\n",
    482     "aapl_earnings = earnings_releases[('2011-12-31' < earnings_releases['asof_date']) & (earnings_releases['asof_date'] <'2013-01-01') & (earnings_releases.sid==aapl_sid)]\n",
    483     "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n",
    484     "aapl_earnings.sort('asof_date')"
    485    ]
    486   },
    487   {
    488    "cell_type": "markdown",
    489    "metadata": {},
    490    "source": [
    491     "Now suppose we want a DataFrame of all earnings releases with revenue over 30 billion dollars. For those earnings releases, we only want the sid and the asof_date."
    492    ]
    493   },
    494   {
    495    "cell_type": "code",
    496    "execution_count": 6,
    497    "metadata": {
    498     "collapsed": false
    499    },
    500    "outputs": [
    501     {
    502      "data": {
    503       "text/html": [
    504        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
    505        "<table border=\"1\" class=\"dataframe\">\n",
    506        "  <thead>\n",
    507        "    <tr style=\"text-align: right;\">\n",
    508        "      <th></th>\n",
    509        "      <th>sid</th>\n",
    510        "      <th>asof_date</th>\n",
    511        "      <th>revenue</th>\n",
    512        "    </tr>\n",
    513        "  </thead>\n",
    514        "  <tbody>\n",
    515        "    <tr>\n",
    516        "      <th>510</th>\n",
    517        "      <td>26503</td>\n",
    518        "      <td>2013-01-24</td>\n",
    519        "      <td>8743000</td>\n",
    520        "    </tr>\n",
    521        "    <tr>\n",
    522        "      <th>491</th>\n",
    523        "      <td>26503</td>\n",
    524        "      <td>2012-10-26</td>\n",
    525        "      <td>7593000</td>\n",
    526        "    </tr>\n",
    527        "    <tr>\n",
    528        "      <th>657</th>\n",
    529        "      <td>26503</td>\n",
    530        "      <td>2015-04-22</td>\n",
    531        "      <td>7022000</td>\n",
    532        "    </tr>\n",
    533        "    <tr>\n",
    534        "      <th>474</th>\n",
    535        "      <td>26503</td>\n",
    536        "      <td>2012-07-26</td>\n",
    537        "      <td>6910000</td>\n",
    538        "    </tr>\n",
    539        "    <tr>\n",
    540        "      <th>529</th>\n",
    541        "      <td>26503</td>\n",
    542        "      <td>2013-04-22</td>\n",
    543        "      <td>6803000</td>\n",
    544        "    </tr>\n",
    545        "    <tr>\n",
    546        "      <th>98</th>\n",
    547        "      <td>7543</td>\n",
    548        "      <td>2008-02-05</td>\n",
    549        "      <td>6709983</td>\n",
    550        "    </tr>\n",
    551        "    <tr>\n",
    552        "      <th>342</th>\n",
    553        "      <td>26503</td>\n",
    554        "      <td>2010-10-21</td>\n",
    555        "      <td>6698000</td>\n",
    556        "    </tr>\n",
    557        "    <tr>\n",
    558        "      <th>443</th>\n",
    559        "      <td>26503</td>\n",
    560        "      <td>2012-01-27</td>\n",
    561        "      <td>6610000</td>\n",
    562        "    </tr>\n",
    563        "    <tr>\n",
    564        "      <th>590</th>\n",
    565        "      <td>7543</td>\n",
    566        "      <td>2014-02-04</td>\n",
    567        "      <td>6585044</td>\n",
    568        "    </tr>\n",
    569        "    <tr>\n",
    570        "      <th>563</th>\n",
    571        "      <td>26503</td>\n",
    572        "      <td>2013-10-17</td>\n",
    573        "      <td>6579000</td>\n",
    574        "    </tr>\n",
    575        "    <tr>\n",
    576        "      <th>547</th>\n",
    577        "      <td>26503</td>\n",
    578        "      <td>2013-07-19</td>\n",
    579        "      <td>6572000</td>\n",
    580        "    </tr>\n",
    581        "    <tr>\n",
    582        "      <th>49</th>\n",
    583        "      <td>7543</td>\n",
    584        "      <td>2007-08-03</td>\n",
    585        "      <td>6522637</td>\n",
    586        "    </tr>\n",
    587        "    <tr>\n",
    588        "      <th>362</th>\n",
    589        "      <td>26503</td>\n",
    590        "      <td>2011-01-21</td>\n",
    591        "      <td>6483000</td>\n",
    592        "    </tr>\n",
    593        "    <tr>\n",
    594        "      <th>324</th>\n",
    595        "      <td>26503</td>\n",
    596        "      <td>2010-07-22</td>\n",
    597        "      <td>6454000</td>\n",
    598        "    </tr>\n",
    599        "    <tr>\n",
    600        "      <th>576</th>\n",
    601        "      <td>7543</td>\n",
    602        "      <td>2013-11-06</td>\n",
    603        "      <td>6282166</td>\n",
    604        "    </tr>\n",
    605        "    <tr>\n",
    606        "      <th>417</th>\n",
    607        "      <td>26503</td>\n",
    608        "      <td>2011-10-20</td>\n",
    609        "      <td>6269000</td>\n",
    610        "    </tr>\n",
    611        "    <tr>\n",
    612        "      <th>559</th>\n",
    613        "      <td>7543</td>\n",
    614        "      <td>2013-08-02</td>\n",
    615        "      <td>6255319</td>\n",
    616        "    </tr>\n",
    617        "    <tr>\n",
    618        "      <th>155</th>\n",
    619        "      <td>7543</td>\n",
    620        "      <td>2008-08-07</td>\n",
    621        "      <td>6220000</td>\n",
    622        "    </tr>\n",
    623        "    <tr>\n",
    624        "      <th>457</th>\n",
    625        "      <td>26503</td>\n",
    626        "      <td>2012-04-24</td>\n",
    627        "      <td>6184000</td>\n",
    628        "    </tr>\n",
    629        "    <tr>\n",
    630        "      <th>283</th>\n",
    631        "      <td>26503</td>\n",
    632        "      <td>2010-01-20</td>\n",
    633        "      <td>6082000</td>\n",
    634        "    </tr>\n",
    635        "    <tr>\n",
    636        "      <th>398</th>\n",
    637        "      <td>26503</td>\n",
    638        "      <td>2011-07-21</td>\n",
    639        "      <td>6047000</td>\n",
    640        "    </tr>\n",
    641        "    <tr>\n",
    642        "      <th>182</th>\n",
    643        "      <td>7543</td>\n",
    644        "      <td>2008-11-06</td>\n",
    645        "      <td>5975275</td>\n",
    646        "    </tr>\n",
    647        "    <tr>\n",
    648        "      <th>258</th>\n",
    649        "      <td>26503</td>\n",
    650        "      <td>2009-10-15</td>\n",
    651        "      <td>5974000</td>\n",
    652        "    </tr>\n",
    653        "    <tr>\n",
    654        "      <th>310</th>\n",
    655        "      <td>26503</td>\n",
    656        "      <td>2010-04-22</td>\n",
    657        "      <td>5876000</td>\n",
    658        "    </tr>\n",
    659        "    <tr>\n",
    660        "      <th>485</th>\n",
    661        "      <td>7543</td>\n",
    662        "      <td>2012-08-03</td>\n",
    663        "      <td>5501573</td>\n",
    664        "    </tr>\n",
    665        "    <tr>\n",
    666        "      <th>503</th>\n",
    667        "      <td>7543</td>\n",
    668        "      <td>2012-11-05</td>\n",
    669        "      <td>5406781</td>\n",
    670        "    </tr>\n",
    671        "    <tr>\n",
    672        "      <th>381</th>\n",
    673        "      <td>26503</td>\n",
    674        "      <td>2011-04-18</td>\n",
    675        "      <td>5366000</td>\n",
    676        "    </tr>\n",
    677        "    <tr>\n",
    678        "      <th>519</th>\n",
    679        "      <td>7543</td>\n",
    680        "      <td>2013-02-05</td>\n",
    681        "      <td>5318752</td>\n",
    682        "    </tr>\n",
    683        "    <tr>\n",
    684        "      <th>301</th>\n",
    685        "      <td>7543</td>\n",
    686        "      <td>2010-02-04</td>\n",
    687        "      <td>5292890</td>\n",
    688        "    </tr>\n",
    689        "    <tr>\n",
    690        "      <th>231</th>\n",
    691        "      <td>26503</td>\n",
    692        "      <td>2009-07-16</td>\n",
    693        "      <td>4891000</td>\n",
    694        "    </tr>\n",
    695        "    <tr>\n",
    696        "      <th>...</th>\n",
    697        "      <td>...</td>\n",
    698        "      <td>...</td>\n",
    699        "      <td>...</td>\n",
    700        "    </tr>\n",
    701        "    <tr>\n",
    702        "      <th>21</th>\n",
    703        "      <td>3149</td>\n",
    704        "      <td>2007-07-13</td>\n",
    705        "      <td>42316</td>\n",
    706        "    </tr>\n",
    707        "    <tr>\n",
    708        "      <th>104</th>\n",
    709        "      <td>3149</td>\n",
    710        "      <td>2008-04-11</td>\n",
    711        "      <td>42243</td>\n",
    712        "    </tr>\n",
    713        "    <tr>\n",
    714        "      <th>13</th>\n",
    715        "      <td>24074</td>\n",
    716        "      <td>2007-04-26</td>\n",
    717        "      <td>42156</td>\n",
    718        "    </tr>\n",
    719        "    <tr>\n",
    720        "      <th>625</th>\n",
    721        "      <td>24</td>\n",
    722        "      <td>2014-10-20</td>\n",
    723        "      <td>42123</td>\n",
    724        "    </tr>\n",
    725        "    <tr>\n",
    726        "      <th>629</th>\n",
    727        "      <td>23052</td>\n",
    728        "      <td>2014-10-28</td>\n",
    729        "      <td>42114</td>\n",
    730        "    </tr>\n",
    731        "    <tr>\n",
    732        "      <th>642</th>\n",
    733        "      <td>3149</td>\n",
    734        "      <td>2015-01-23</td>\n",
    735        "      <td>42004</td>\n",
    736        "    </tr>\n",
    737        "    <tr>\n",
    738        "      <th>169</th>\n",
    739        "      <td>18711</td>\n",
    740        "      <td>2008-10-28</td>\n",
    741        "      <td>41723</td>\n",
    742        "    </tr>\n",
    743        "    <tr>\n",
    744        "      <th>391</th>\n",
    745        "      <td>7538</td>\n",
    746        "      <td>2011-04-29</td>\n",
    747        "      <td>41602</td>\n",
    748        "    </tr>\n",
    749        "    <tr>\n",
    750        "      <th>415</th>\n",
    751        "      <td>22899</td>\n",
    752        "      <td>2011-10-19</td>\n",
    753        "      <td>41562</td>\n",
    754        "    </tr>\n",
    755        "    <tr>\n",
    756        "      <th>423</th>\n",
    757        "      <td>7538</td>\n",
    758        "      <td>2011-10-28</td>\n",
    759        "      <td>41525</td>\n",
    760        "    </tr>\n",
    761        "    <tr>\n",
    762        "      <th>135</th>\n",
    763        "      <td>2673</td>\n",
    764        "      <td>2008-07-24</td>\n",
    765        "      <td>41500</td>\n",
    766        "    </tr>\n",
    767        "    <tr>\n",
    768        "      <th>286</th>\n",
    769        "      <td>3149</td>\n",
    770        "      <td>2010-01-22</td>\n",
    771        "      <td>41438</td>\n",
    772        "    </tr>\n",
    773        "    <tr>\n",
    774        "      <th>361</th>\n",
    775        "      <td>3149</td>\n",
    776        "      <td>2011-01-21</td>\n",
    777        "      <td>41377</td>\n",
    778        "    </tr>\n",
    779        "    <tr>\n",
    780        "      <th>266</th>\n",
    781        "      <td>23998</td>\n",
    782        "      <td>2009-10-28</td>\n",
    783        "      <td>41305</td>\n",
    784        "    </tr>\n",
    785        "    <tr>\n",
    786        "      <th>605</th>\n",
    787        "      <td>42788</td>\n",
    788        "      <td>2014-04-30</td>\n",
    789        "      <td>41099</td>\n",
    790        "    </tr>\n",
    791        "    <tr>\n",
    792        "      <th>76</th>\n",
    793        "      <td>2673</td>\n",
    794        "      <td>2007-11-08</td>\n",
    795        "      <td>41078</td>\n",
    796        "    </tr>\n",
    797        "    <tr>\n",
    798        "      <th>499</th>\n",
    799        "      <td>1091</td>\n",
    800        "      <td>2012-11-02</td>\n",
    801        "      <td>41050</td>\n",
    802        "    </tr>\n",
    803        "    <tr>\n",
    804        "      <th>500</th>\n",
    805        "      <td>11100</td>\n",
    806        "      <td>2012-11-02</td>\n",
    807        "      <td>41050</td>\n",
    808        "    </tr>\n",
    809        "    <tr>\n",
    810        "      <th>633</th>\n",
    811        "      <td>42788</td>\n",
    812        "      <td>2014-10-29</td>\n",
    813        "      <td>41048</td>\n",
    814        "    </tr>\n",
    815        "    <tr>\n",
    816        "      <th>645</th>\n",
    817        "      <td>23052</td>\n",
    818        "      <td>2015-01-29</td>\n",
    819        "      <td>40959</td>\n",
    820        "    </tr>\n",
    821        "    <tr>\n",
    822        "      <th>380</th>\n",
    823        "      <td>22899</td>\n",
    824        "      <td>2011-04-18</td>\n",
    825        "      <td>40952</td>\n",
    826        "    </tr>\n",
    827        "    <tr>\n",
    828        "      <th>139</th>\n",
    829        "      <td>18711</td>\n",
    830        "      <td>2008-07-28</td>\n",
    831        "      <td>40569</td>\n",
    832        "    </tr>\n",
    833        "    <tr>\n",
    834        "      <th>591</th>\n",
    835        "      <td>40430</td>\n",
    836        "      <td>2014-02-06</td>\n",
    837        "      <td>40485</td>\n",
    838        "    </tr>\n",
    839        "    <tr>\n",
    840        "      <th>408</th>\n",
    841        "      <td>7538</td>\n",
    842        "      <td>2011-07-29</td>\n",
    843        "      <td>40465</td>\n",
    844        "    </tr>\n",
    845        "    <tr>\n",
    846        "      <th>580</th>\n",
    847        "      <td>3149</td>\n",
    848        "      <td>2014-01-17</td>\n",
    849        "      <td>40382</td>\n",
    850        "    </tr>\n",
    851        "    <tr>\n",
    852        "      <th>673</th>\n",
    853        "      <td>23112</td>\n",
    854        "      <td>2015-07-31</td>\n",
    855        "      <td>40357</td>\n",
    856        "    </tr>\n",
    857        "    <tr>\n",
    858        "      <th>670</th>\n",
    859        "      <td>23052</td>\n",
    860        "      <td>2015-07-28</td>\n",
    861        "      <td>40277</td>\n",
    862        "    </tr>\n",
    863        "    <tr>\n",
    864        "      <th>247</th>\n",
    865        "      <td>23112</td>\n",
    866        "      <td>2009-07-31</td>\n",
    867        "      <td>40205</td>\n",
    868        "    </tr>\n",
    869        "    <tr>\n",
    870        "      <th>9</th>\n",
    871        "      <td>3149</td>\n",
    872        "      <td>2007-04-13</td>\n",
    873        "      <td>40195</td>\n",
    874        "    </tr>\n",
    875        "    <tr>\n",
    876        "      <th>377</th>\n",
    877        "      <td>7538</td>\n",
    878        "      <td>2011-02-11</td>\n",
    879        "      <td>40157</td>\n",
    880        "    </tr>\n",
    881        "  </tbody>\n",
    882        "</table>\n",
    883        "<p>670 rows × 3 columns</p>\n",
    884        "</div>"
    885       ],
    886       "text/plain": [
    887        "       sid  asof_date  revenue\n",
    888        "510  26503 2013-01-24  8743000\n",
    889        "491  26503 2012-10-26  7593000\n",
    890        "657  26503 2015-04-22  7022000\n",
    891        "474  26503 2012-07-26  6910000\n",
    892        "529  26503 2013-04-22  6803000\n",
    893        "98    7543 2008-02-05  6709983\n",
    894        "342  26503 2010-10-21  6698000\n",
    895        "443  26503 2012-01-27  6610000\n",
    896        "590   7543 2014-02-04  6585044\n",
    897        "563  26503 2013-10-17  6579000\n",
    898        "547  26503 2013-07-19  6572000\n",
    899        "49    7543 2007-08-03  6522637\n",
    900        "362  26503 2011-01-21  6483000\n",
    901        "324  26503 2010-07-22  6454000\n",
    902        "576   7543 2013-11-06  6282166\n",
    903        "417  26503 2011-10-20  6269000\n",
    904        "559   7543 2013-08-02  6255319\n",
    905        "155   7543 2008-08-07  6220000\n",
    906        "457  26503 2012-04-24  6184000\n",
    907        "283  26503 2010-01-20  6082000\n",
    908        "398  26503 2011-07-21  6047000\n",
    909        "182   7543 2008-11-06  5975275\n",
    910        "258  26503 2009-10-15  5974000\n",
    911        "310  26503 2010-04-22  5876000\n",
    912        "485   7543 2012-08-03  5501573\n",
    913        "503   7543 2012-11-05  5406781\n",
    914        "381  26503 2011-04-18  5366000\n",
    915        "519   7543 2013-02-05  5318752\n",
    916        "301   7543 2010-02-04  5292890\n",
    917        "231  26503 2009-07-16  4891000\n",
    918        "..     ...        ...      ...\n",
    919        "21    3149 2007-07-13    42316\n",
    920        "104   3149 2008-04-11    42243\n",
    921        "13   24074 2007-04-26    42156\n",
    922        "625     24 2014-10-20    42123\n",
    923        "629  23052 2014-10-28    42114\n",
    924        "642   3149 2015-01-23    42004\n",
    925        "169  18711 2008-10-28    41723\n",
    926        "391   7538 2011-04-29    41602\n",
    927        "415  22899 2011-10-19    41562\n",
    928        "423   7538 2011-10-28    41525\n",
    929        "135   2673 2008-07-24    41500\n",
    930        "286   3149 2010-01-22    41438\n",
    931        "361   3149 2011-01-21    41377\n",
    932        "266  23998 2009-10-28    41305\n",
    933        "605  42788 2014-04-30    41099\n",
    934        "76    2673 2007-11-08    41078\n",
    935        "499   1091 2012-11-02    41050\n",
    936        "500  11100 2012-11-02    41050\n",
    937        "633  42788 2014-10-29    41048\n",
    938        "645  23052 2015-01-29    40959\n",
    939        "380  22899 2011-04-18    40952\n",
    940        "139  18711 2008-07-28    40569\n",
    941        "591  40430 2014-02-06    40485\n",
    942        "408   7538 2011-07-29    40465\n",
    943        "580   3149 2014-01-17    40382\n",
    944        "673  23112 2015-07-31    40357\n",
    945        "670  23052 2015-07-28    40277\n",
    946        "247  23112 2009-07-31    40205\n",
    947        "9     3149 2007-04-13    40195\n",
    948        "377   7538 2011-02-11    40157\n",
    949        "\n",
    950        "[670 rows x 3 columns]"
    951       ]
    952      },
    953      "execution_count": 6,
    954      "metadata": {},
    955      "output_type": "execute_result"
    956     }
    957    ],
    958    "source": [
    959     "# manipulate with Blaze first:\n",
    960     "big_earnings = earnings_releases[earnings_releases.revenue > 40000]\n",
    961     "# now that we've got a much smaller object (len: ~2167 rows), we can convert it to a pandas DataFrame\n",
    962     "df = odo(big_earnings, pd.DataFrame)\n",
    963     "df = df[['sid', 'asof_date','revenue']].dropna()\n",
    964     "df.sort('revenue',ascending=False)"
    965    ]
    966   },
    967   {
    968    "cell_type": "code",
    969    "execution_count": null,
    970    "metadata": {
    971     "collapsed": true
    972    },
    973    "outputs": [],
    974    "source": []
    975   }
    976  ],
    977  "metadata": {
    978   "kernelspec": {
    979    "display_name": "Python 2",
    980    "language": "python",
    981    "name": "python2"
    982   },
    983   "language_info": {
    984    "codemirror_mode": {
    985     "name": "ipython",
    986     "version": 2
    987    },
    988    "file_extension": ".py",
    989    "mimetype": "text/x-python",
    990    "name": "python",
    991    "nbconvert_exporter": "python",
    992    "pygments_lexer": "ipython2",
    993    "version": "2.7.10"
    994   }
    995  },
    996  "nbformat": 4,
    997  "nbformat_minor": 0
    998 }