ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(13552B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "collapsed": true
      7    },
      8    "source": [
      9     "# EventVestor: Shareholder Meetings\n",
     10     "\n",
     11     "In this notebook, we'll take a look at EventVestor's *Shareholder Meetings* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents companies' annual and special shareholder meetings calendars.\n",
     12     "\n",
     13     "### Blaze\n",
     14     "Before we dig into the data, we want to tell you about how  you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n",
     15     "\n",
     16     "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n",
     17     "\n",
     18     "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n",
     19     "\n",
     20     "Helpful links:\n",
     21     "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n",
     22     "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n",
     23     "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n",
     24     "\n",
     25     "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n",
     26     "> `from odo import odo`  \n",
     27     "> `odo(expr, pandas.DataFrame)`\n",
     28     "\n",
     29     "### Free samples and limits\n",
     30     "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n",
     31     "\n",
     32     "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n",
     33     "\n",
     34     "With preamble in place, let's get started:"
     35    ]
     36   },
     37   {
     38    "cell_type": "code",
     39    "execution_count": 1,
     40    "metadata": {
     41     "collapsed": false
     42    },
     43    "outputs": [],
     44    "source": [
     45     "# import the dataset\n",
     46     "from quantopian.interactive.data.eventvestor import shareholder_meetings\n",
     47     "# or if you want to import the free dataset, use:\n",
     48     "# from quantopian.data.eventvestor import shareholder_meetings_free\n",
     49     "\n",
     50     "# import data operations\n",
     51     "from odo import odo\n",
     52     "# import other libraries we will use\n",
     53     "import pandas as pd"
     54    ]
     55   },
     56   {
     57    "cell_type": "code",
     58    "execution_count": 2,
     59    "metadata": {
     60     "collapsed": false
     61    },
     62    "outputs": [
     63     {
     64      "data": {
     65       "text/plain": [
     66        "dshape(\"\"\"var * {\n",
     67        "  event_id: ?float64,\n",
     68        "  asof_date: datetime,\n",
     69        "  symbol: ?string,\n",
     70        "  event_headline: ?string,\n",
     71        "  meeting_type: ?string,\n",
     72        "  record_date: ?datetime,\n",
     73        "  meeting_date: ?datetime,\n",
     74        "  timestamp: datetime,\n",
     75        "  sid: ?int64\n",
     76        "  }\"\"\")"
     77       ]
     78      },
     79      "execution_count": 2,
     80      "metadata": {},
     81      "output_type": "execute_result"
     82     }
     83    ],
     84    "source": [
     85     "# Let's use blaze to understand the data a bit using Blaze dshape()\n",
     86     "shareholder_meetings.dshape"
     87    ]
     88   },
     89   {
     90    "cell_type": "code",
     91    "execution_count": 3,
     92    "metadata": {
     93     "collapsed": false
     94    },
     95    "outputs": [
     96     {
     97      "data": {
     98       "text/html": [
     99        "8969"
    100       ],
    101       "text/plain": [
    102        "8969"
    103       ]
    104      },
    105      "execution_count": 3,
    106      "metadata": {},
    107      "output_type": "execute_result"
    108     }
    109    ],
    110    "source": [
    111     "# And how many rows are there?\n",
    112     "# N.B. we're using a Blaze function to do this, not len()\n",
    113     "shareholder_meetings.count()"
    114    ]
    115   },
    116   {
    117    "cell_type": "code",
    118    "execution_count": 4,
    119    "metadata": {
    120     "collapsed": false
    121    },
    122    "outputs": [
    123     {
    124      "data": {
    125       "text/html": [
    126        "<table border=\"1\" class=\"dataframe\">\n",
    127        "  <thead>\n",
    128        "    <tr style=\"text-align: right;\">\n",
    129        "      <th></th>\n",
    130        "      <th>event_id</th>\n",
    131        "      <th>asof_date</th>\n",
    132        "      <th>symbol</th>\n",
    133        "      <th>event_headline</th>\n",
    134        "      <th>meeting_type</th>\n",
    135        "      <th>record_date</th>\n",
    136        "      <th>meeting_date</th>\n",
    137        "      <th>timestamp</th>\n",
    138        "      <th>sid</th>\n",
    139        "    </tr>\n",
    140        "  </thead>\n",
    141        "  <tbody>\n",
    142        "    <tr>\n",
    143        "      <th>0</th>\n",
    144        "      <td>9000012933</td>\n",
    145        "      <td>2009-01-02</td>\n",
    146        "      <td>CENT</td>\n",
    147        "      <td>Central Garden &amp; Pet announces Shareholder Mee...</td>\n",
    148        "      <td>Annual Meeting</td>\n",
    149        "      <td>2008-12-19</td>\n",
    150        "      <td>2009-02-09</td>\n",
    151        "      <td>2009-01-03</td>\n",
    152        "      <td>18855</td>\n",
    153        "    </tr>\n",
    154        "    <tr>\n",
    155        "      <th>1</th>\n",
    156        "      <td>9000016639</td>\n",
    157        "      <td>2009-12-21</td>\n",
    158        "      <td>PENX</td>\n",
    159        "      <td>Penford Corp. announces Shareholder Meeting</td>\n",
    160        "      <td>Annual Meeting</td>\n",
    161        "      <td>2009-12-04</td>\n",
    162        "      <td>2010-01-26</td>\n",
    163        "      <td>2009-12-22</td>\n",
    164        "      <td>18082</td>\n",
    165        "    </tr>\n",
    166        "    <tr>\n",
    167        "      <th>2</th>\n",
    168        "      <td>9000016643</td>\n",
    169        "      <td>2009-12-23</td>\n",
    170        "      <td>CCF</td>\n",
    171        "      <td>Chase announces Shareholder Meeting</td>\n",
    172        "      <td>Annual Meeting</td>\n",
    173        "      <td>2009-11-30</td>\n",
    174        "      <td>2010-01-29</td>\n",
    175        "      <td>2009-12-24</td>\n",
    176        "      <td>13810</td>\n",
    177        "    </tr>\n",
    178        "  </tbody>\n",
    179        "</table>"
    180       ],
    181       "text/plain": [
    182        "     event_id  asof_date symbol  \\\n",
    183        "0  9000012933 2009-01-02   CENT   \n",
    184        "1  9000016639 2009-12-21   PENX   \n",
    185        "2  9000016643 2009-12-23    CCF   \n",
    186        "\n",
    187        "                                      event_headline    meeting_type  \\\n",
    188        "0  Central Garden & Pet announces Shareholder Mee...  Annual Meeting   \n",
    189        "1        Penford Corp. announces Shareholder Meeting  Annual Meeting   \n",
    190        "2                Chase announces Shareholder Meeting  Annual Meeting   \n",
    191        "\n",
    192        "  record_date meeting_date  timestamp    sid  \n",
    193        "0  2008-12-19   2009-02-09 2009-01-03  18855  \n",
    194        "1  2009-12-04   2010-01-26 2009-12-22  18082  \n",
    195        "2  2009-11-30   2010-01-29 2009-12-24  13810  "
    196       ]
    197      },
    198      "execution_count": 4,
    199      "metadata": {},
    200      "output_type": "execute_result"
    201     }
    202    ],
    203    "source": [
    204     "# Let's see what the data looks like. We'll grab the first three rows.\n",
    205     "shareholder_meetings[:3]"
    206    ]
    207   },
    208   {
    209    "cell_type": "markdown",
    210    "metadata": {},
    211    "source": [
    212     "Let's go over the columns:\n",
    213     "- **event_id**: the unique identifier for this event.\n",
    214     "- **asof_date**: EventVestor's timestamp of event capture.\n",
    215     "- **symbol**: stock ticker symbol of the affected company.\n",
    216     "- **event_headline**: a brief description of the event\n",
    217     "- **meeting_type**: types include *annual meeting, special meeting, proxy contest*.\n",
    218     "- **record_date**: record date to be eligible for proxy vote\n",
    219     "- **meeting_date**: shareholder meeting date\n",
    220     "- **timestamp**: this is our timestamp on when we registered the data.\n",
    221     "- **sid**: the equity's unique identifier. Use this instead of the symbol."
    222    ]
    223   },
    224   {
    225    "cell_type": "markdown",
    226    "metadata": {},
    227    "source": [
    228     "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n",
    229     "\n",
    230     "We can select columns and rows with ease. Below, we'll fetch Tesla's 2013 and 2014 meetings."
    231    ]
    232   },
    233   {
    234    "cell_type": "code",
    235    "execution_count": 5,
    236    "metadata": {
    237     "collapsed": false,
    238     "scrolled": true
    239    },
    240    "outputs": [
    241     {
    242      "data": {
    243       "text/html": [
    244        "<table border=\"1\" class=\"dataframe\">\n",
    245        "  <thead>\n",
    246        "    <tr style=\"text-align: right;\">\n",
    247        "      <th></th>\n",
    248        "      <th>event_id</th>\n",
    249        "      <th>asof_date</th>\n",
    250        "      <th>symbol</th>\n",
    251        "      <th>event_headline</th>\n",
    252        "      <th>meeting_type</th>\n",
    253        "      <th>record_date</th>\n",
    254        "      <th>meeting_date</th>\n",
    255        "      <th>timestamp</th>\n",
    256        "      <th>sid</th>\n",
    257        "    </tr>\n",
    258        "  </thead>\n",
    259        "  <tbody>\n",
    260        "    <tr>\n",
    261        "      <th>0</th>\n",
    262        "      <td>900002592</td>\n",
    263        "      <td>2013-04-17</td>\n",
    264        "      <td>TSLA</td>\n",
    265        "      <td>TESLA MOTORS announces Shareholder Meeting</td>\n",
    266        "      <td>Annual Meeting</td>\n",
    267        "      <td>2013-04-10</td>\n",
    268        "      <td>2013-06-04</td>\n",
    269        "      <td>2013-04-18</td>\n",
    270        "      <td>39840</td>\n",
    271        "    </tr>\n",
    272        "    <tr>\n",
    273        "      <th>1</th>\n",
    274        "      <td>9000012760</td>\n",
    275        "      <td>2014-04-24</td>\n",
    276        "      <td>TSLA</td>\n",
    277        "      <td>Tesla Motors, Inc. announces Shareholder Meeting</td>\n",
    278        "      <td>Annual Meeting</td>\n",
    279        "      <td>2014-04-10</td>\n",
    280        "      <td>2014-06-03</td>\n",
    281        "      <td>2014-04-25</td>\n",
    282        "      <td>39840</td>\n",
    283        "    </tr>\n",
    284        "  </tbody>\n",
    285        "</table>"
    286       ],
    287       "text/plain": [
    288        "     event_id  asof_date symbol  \\\n",
    289        "0   900002592 2013-04-17   TSLA   \n",
    290        "1  9000012760 2014-04-24   TSLA   \n",
    291        "\n",
    292        "                                     event_headline    meeting_type  \\\n",
    293        "0        TESLA MOTORS announces Shareholder Meeting  Annual Meeting   \n",
    294        "1  Tesla Motors, Inc. announces Shareholder Meeting  Annual Meeting   \n",
    295        "\n",
    296        "  record_date meeting_date  timestamp    sid  \n",
    297        "0  2013-04-10   2013-06-04 2013-04-18  39840  \n",
    298        "1  2014-04-10   2014-06-03 2014-04-25  39840  "
    299       ]
    300      },
    301      "execution_count": 5,
    302      "metadata": {},
    303      "output_type": "execute_result"
    304     }
    305    ],
    306    "source": [
    307     "# get tesla's sid first\n",
    308     "tesla_sid = symbols('TSLA').sid\n",
    309     "meetings = shareholder_meetings[('2012-12-31' < shareholder_meetings['asof_date']) & \n",
    310     "                                (shareholder_meetings['asof_date'] <'2015-01-01') & \n",
    311     "                                (shareholder_meetings.sid == tesla_sid)]\n",
    312     "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n",
    313     "meetings.sort('asof_date')"
    314    ]
    315   },
    316   {
    317    "cell_type": "markdown",
    318    "metadata": {},
    319    "source": [
    320     "Now suppose we want a DataFrame of the Blaze Data Object above, but only want the `record_date, meeting_date`, and `sid`."
    321    ]
    322   },
    323   {
    324    "cell_type": "code",
    325    "execution_count": 6,
    326    "metadata": {
    327     "collapsed": false
    328    },
    329    "outputs": [
    330     {
    331      "data": {
    332       "text/html": [
    333        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
    334        "<table border=\"1\" class=\"dataframe\">\n",
    335        "  <thead>\n",
    336        "    <tr style=\"text-align: right;\">\n",
    337        "      <th></th>\n",
    338        "      <th>record_date</th>\n",
    339        "      <th>meeting_date</th>\n",
    340        "      <th>sid</th>\n",
    341        "    </tr>\n",
    342        "  </thead>\n",
    343        "  <tbody>\n",
    344        "    <tr>\n",
    345        "      <th>0</th>\n",
    346        "      <td>2013-04-10</td>\n",
    347        "      <td>2013-06-04</td>\n",
    348        "      <td>39840</td>\n",
    349        "    </tr>\n",
    350        "    <tr>\n",
    351        "      <th>1</th>\n",
    352        "      <td>2014-04-10</td>\n",
    353        "      <td>2014-06-03</td>\n",
    354        "      <td>39840</td>\n",
    355        "    </tr>\n",
    356        "  </tbody>\n",
    357        "</table>\n",
    358        "</div>"
    359       ],
    360       "text/plain": [
    361        "  record_date meeting_date    sid\n",
    362        "0  2013-04-10   2013-06-04  39840\n",
    363        "1  2014-04-10   2014-06-03  39840"
    364       ]
    365      },
    366      "execution_count": 6,
    367      "metadata": {},
    368      "output_type": "execute_result"
    369     }
    370    ],
    371    "source": [
    372     "df = odo(meetings, pd.DataFrame)\n",
    373     "df = df[['record_date','meeting_date','sid']]\n",
    374     "df"
    375    ]
    376   },
    377   {
    378    "cell_type": "code",
    379    "execution_count": null,
    380    "metadata": {
    381     "collapsed": true
    382    },
    383    "outputs": [],
    384    "source": []
    385   }
    386  ],
    387  "metadata": {
    388   "kernelspec": {
    389    "display_name": "Python 2",
    390    "language": "python",
    391    "name": "python2"
    392   },
    393   "language_info": {
    394    "codemirror_mode": {
    395     "name": "ipython",
    396     "version": 2
    397    },
    398    "file_extension": ".py",
    399    "mimetype": "text/x-python",
    400    "name": "python",
    401    "nbconvert_exporter": "python",
    402    "pygments_lexer": "ipython2",
    403    "version": "2.7.10"
    404   }
    405  },
    406  "nbformat": 4,
    407  "nbformat_minor": 0
    408 }