ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(17515B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "collapsed": true
      7    },
      8    "source": [
      9     "# EventVestor: Share Repurchases\n",
     10     "\n",
     11     "In this notebook, we'll take a look at EventVestor's *Share Repurchases* dataset, available on the [Quantopian Store](https://www.quantopian.com/store). This dataset spans January 01, 2007 through the current day, and documents actual share repurchase announcements by companies. Note that this is **different** from [Share Buyback Authorizations](https://www.quantopian.com/store/eventvestor/buyback_auth).\n",
     12     "\n",
     13     "### Blaze\n",
     14     "Before we dig into the data, we want to tell you about how  you generally access Quantopian Store data sets. These datasets are available through an API service known as [Blaze](http://blaze.pydata.org). Blaze provides the Quantopian user with a convenient interface to access very large datasets.\n",
     15     "\n",
     16     "Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.\n",
     17     "\n",
     18     "It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.\n",
     19     "\n",
     20     "Helpful links:\n",
     21     "* [Query building for Blaze](http://blaze.pydata.org/en/latest/queries.html)\n",
     22     "* [Pandas-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-pandas.html)\n",
     23     "* [SQL-to-Blaze dictionary](http://blaze.pydata.org/en/latest/rosetta-sql.html).\n",
     24     "\n",
     25     "Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:\n",
     26     "> `from odo import odo`  \n",
     27     "> `odo(expr, pandas.DataFrame)`\n",
     28     "\n",
     29     "### Free samples and limits\n",
     30     "One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.\n",
     31     "\n",
     32     "There is a *free* version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.\n",
     33     "\n",
     34     "With preamble in place, let's get started:"
     35    ]
     36   },
     37   {
     38    "cell_type": "code",
     39    "execution_count": 1,
     40    "metadata": {
     41     "collapsed": false
     42    },
     43    "outputs": [],
     44    "source": [
     45     "# import the dataset\n",
     46     "from quantopian.interactive.data.eventvestor import share_repurchases\n",
     47     "# or if you want to import the free dataset, use:\n",
     48     "# from quantopian.interactive.data.eventvestor import share_repurchases_free\n",
     49     "\n",
     50     "# import data operations\n",
     51     "from odo import odo\n",
     52     "# import other libraries we will use\n",
     53     "import pandas as pd"
     54    ]
     55   },
     56   {
     57    "cell_type": "code",
     58    "execution_count": 2,
     59    "metadata": {
     60     "collapsed": false
     61    },
     62    "outputs": [
     63     {
     64      "data": {
     65       "text/plain": [
     66        "dshape(\"\"\"var * {\n",
     67        "  event_id: ?float64,\n",
     68        "  asof_date: datetime,\n",
     69        "  trade_date: ?datetime,\n",
     70        "  symbol: ?string,\n",
     71        "  event_type: ?string,\n",
     72        "  event_headline: ?string,\n",
     73        "  repurchase_amount: ?float64,\n",
     74        "  repurchase_units: ?string,\n",
     75        "  event_rating: ?float64,\n",
     76        "  timestamp: datetime,\n",
     77        "  sid: ?int64\n",
     78        "  }\"\"\")"
     79       ]
     80      },
     81      "execution_count": 2,
     82      "metadata": {},
     83      "output_type": "execute_result"
     84     }
     85    ],
     86    "source": [
     87     "# Let's use blaze to understand the data a bit using Blaze dshape()\n",
     88     "share_repurchases.dshape"
     89    ]
     90   },
     91   {
     92    "cell_type": "code",
     93    "execution_count": 3,
     94    "metadata": {
     95     "collapsed": false
     96    },
     97    "outputs": [
     98     {
     99      "data": {
    100       "text/html": [
    101        "15509"
    102       ],
    103       "text/plain": [
    104        "15509"
    105       ]
    106      },
    107      "execution_count": 3,
    108      "metadata": {},
    109      "output_type": "execute_result"
    110     }
    111    ],
    112    "source": [
    113     "# And how many rows are there?\n",
    114     "# N.B. we're using a Blaze function to do this, not len()\n",
    115     "share_repurchases.count()"
    116    ]
    117   },
    118   {
    119    "cell_type": "code",
    120    "execution_count": 4,
    121    "metadata": {
    122     "collapsed": false
    123    },
    124    "outputs": [
    125     {
    126      "data": {
    127       "text/html": [
    128        "<table border=\"1\" class=\"dataframe\">\n",
    129        "  <thead>\n",
    130        "    <tr style=\"text-align: right;\">\n",
    131        "      <th></th>\n",
    132        "      <th>event_id</th>\n",
    133        "      <th>asof_date</th>\n",
    134        "      <th>trade_date</th>\n",
    135        "      <th>symbol</th>\n",
    136        "      <th>event_type</th>\n",
    137        "      <th>event_headline</th>\n",
    138        "      <th>repurchase_amount</th>\n",
    139        "      <th>repurchase_units</th>\n",
    140        "      <th>event_rating</th>\n",
    141        "      <th>timestamp</th>\n",
    142        "      <th>sid</th>\n",
    143        "    </tr>\n",
    144        "  </thead>\n",
    145        "  <tbody>\n",
    146        "    <tr>\n",
    147        "      <th>0</th>\n",
    148        "      <td>1113050</td>\n",
    149        "      <td>2007-01-17</td>\n",
    150        "      <td>2007-01-17</td>\n",
    151        "      <td>TESS</td>\n",
    152        "      <td>Buyback Update</td>\n",
    153        "      <td>TESSCO Tech Repurchases $1.7M Shares in 3Q 07 ...</td>\n",
    154        "      <td>1.7</td>\n",
    155        "      <td>$M</td>\n",
    156        "      <td>1</td>\n",
    157        "      <td>2007-01-18</td>\n",
    158        "      <td>11968</td>\n",
    159        "    </tr>\n",
    160        "    <tr>\n",
    161        "      <th>1</th>\n",
    162        "      <td>131345</td>\n",
    163        "      <td>2007-01-17</td>\n",
    164        "      <td>2007-01-18</td>\n",
    165        "      <td>WM</td>\n",
    166        "      <td>Buyback Update</td>\n",
    167        "      <td>Washington Mutual Announces $2.7B Accelerated ...</td>\n",
    168        "      <td>2700.0</td>\n",
    169        "      <td>$M</td>\n",
    170        "      <td>1</td>\n",
    171        "      <td>2007-01-18</td>\n",
    172        "      <td>19181</td>\n",
    173        "    </tr>\n",
    174        "    <tr>\n",
    175        "      <th>2</th>\n",
    176        "      <td>137183</td>\n",
    177        "      <td>2007-01-23</td>\n",
    178        "      <td>2007-01-23</td>\n",
    179        "      <td>RDN</td>\n",
    180        "      <td>Buyback Update</td>\n",
    181        "      <td>Radian Group Repurchased 1.5M shares for $81.1...</td>\n",
    182        "      <td>81.1</td>\n",
    183        "      <td>$M</td>\n",
    184        "      <td>1</td>\n",
    185        "      <td>2007-01-24</td>\n",
    186        "      <td>20276</td>\n",
    187        "    </tr>\n",
    188        "  </tbody>\n",
    189        "</table>"
    190       ],
    191       "text/plain": [
    192        "   event_id  asof_date trade_date symbol      event_type  \\\n",
    193        "0   1113050 2007-01-17 2007-01-17   TESS  Buyback Update   \n",
    194        "1    131345 2007-01-17 2007-01-18     WM  Buyback Update   \n",
    195        "2    137183 2007-01-23 2007-01-23    RDN  Buyback Update   \n",
    196        "\n",
    197        "                                      event_headline  repurchase_amount  \\\n",
    198        "0  TESSCO Tech Repurchases $1.7M Shares in 3Q 07 ...                1.7   \n",
    199        "1  Washington Mutual Announces $2.7B Accelerated ...             2700.0   \n",
    200        "2  Radian Group Repurchased 1.5M shares for $81.1...               81.1   \n",
    201        "\n",
    202        "  repurchase_units  event_rating  timestamp    sid  \n",
    203        "0               $M             1 2007-01-18  11968  \n",
    204        "1               $M             1 2007-01-18  19181  \n",
    205        "2               $M             1 2007-01-24  20276  "
    206       ]
    207      },
    208      "execution_count": 4,
    209      "metadata": {},
    210      "output_type": "execute_result"
    211     }
    212    ],
    213    "source": [
    214     "# Let's see what the data looks like. We'll grab the first three rows.\n",
    215     "share_repurchases[:3]"
    216    ]
    217   },
    218   {
    219    "cell_type": "markdown",
    220    "metadata": {},
    221    "source": [
    222     "Let's go over the columns:\n",
    223     "- **event_id**: the unique identifier for this event.\n",
    224     "- **asof_date**: EventVestor's timestamp of event capture.\n",
    225     "- **trade_date**: for event announcements made before trading ends, trade_date is the same as event_date. For announcements issued after market close, trade_date is next market open day.\n",
    226     "- **symbol**: stock ticker symbol of the affected company.\n",
    227     "- **event_type**: this should always be *Buyback Update*.\n",
    228     "- **event_headline**: a brief description of the event\n",
    229     "- **repurchase_amount**: amount of shares (in repurchase_units) repurchased during the reported period\n",
    230     "- **repurchase_units**: millions of dollars or percent of total shares outstanding.\n",
    231     "- **event_rating**: this is always 1. The meaning of this is uncertain.\n",
    232     "- **timestamp**: this is our timestamp on when we registered the data.\n",
    233     "- **sid**: the equity's unique identifier. Use this instead of the symbol."
    234    ]
    235   },
    236   {
    237    "cell_type": "markdown",
    238    "metadata": {},
    239    "source": [
    240     "We've done much of the data processing for you. Fields like `timestamp` and `sid` are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the `sid` across all our equity databases.\n",
    241     "\n",
    242     "We can select columns and rows with ease. Below, we'll fetch Apple's 2014 share repurchases."
    243    ]
    244   },
    245   {
    246    "cell_type": "code",
    247    "execution_count": 5,
    248    "metadata": {
    249     "collapsed": false,
    250     "scrolled": true
    251    },
    252    "outputs": [
    253     {
    254      "data": {
    255       "text/html": [
    256        "<table border=\"1\" class=\"dataframe\">\n",
    257        "  <thead>\n",
    258        "    <tr style=\"text-align: right;\">\n",
    259        "      <th></th>\n",
    260        "      <th>event_id</th>\n",
    261        "      <th>asof_date</th>\n",
    262        "      <th>trade_date</th>\n",
    263        "      <th>symbol</th>\n",
    264        "      <th>event_type</th>\n",
    265        "      <th>event_headline</th>\n",
    266        "      <th>repurchase_amount</th>\n",
    267        "      <th>repurchase_units</th>\n",
    268        "      <th>event_rating</th>\n",
    269        "      <th>timestamp</th>\n",
    270        "      <th>sid</th>\n",
    271        "    </tr>\n",
    272        "  </thead>\n",
    273        "  <tbody>\n",
    274        "    <tr>\n",
    275        "      <th>0</th>\n",
    276        "      <td>1918241</td>\n",
    277        "      <td>2014-01-27</td>\n",
    278        "      <td>2014-01-27</td>\n",
    279        "      <td>AAPL</td>\n",
    280        "      <td>Buyback Update</td>\n",
    281        "      <td>Apple Repurchases $5.03B Common Stock in 1Q 14</td>\n",
    282        "      <td>5029</td>\n",
    283        "      <td>$M</td>\n",
    284        "      <td>1</td>\n",
    285        "      <td>2014-01-28</td>\n",
    286        "      <td>24</td>\n",
    287        "    </tr>\n",
    288        "    <tr>\n",
    289        "      <th>1</th>\n",
    290        "      <td>1674141</td>\n",
    291        "      <td>2014-02-07</td>\n",
    292        "      <td>2014-02-07</td>\n",
    293        "      <td>AAPL</td>\n",
    294        "      <td>Buyback Update</td>\n",
    295        "      <td>Apple Repurchases $14B Common Stock Since 1Q 1...</td>\n",
    296        "      <td>14000</td>\n",
    297        "      <td>$M</td>\n",
    298        "      <td>1</td>\n",
    299        "      <td>2014-02-08</td>\n",
    300        "      <td>24</td>\n",
    301        "    </tr>\n",
    302        "    <tr>\n",
    303        "      <th>2</th>\n",
    304        "      <td>1918254</td>\n",
    305        "      <td>2014-04-23</td>\n",
    306        "      <td>2014-04-23</td>\n",
    307        "      <td>AAPL</td>\n",
    308        "      <td>Buyback Update</td>\n",
    309        "      <td>Apple Repurchases $23B Common Stock in FY 14 YTD</td>\n",
    310        "      <td>23000</td>\n",
    311        "      <td>$M</td>\n",
    312        "      <td>1</td>\n",
    313        "      <td>2014-04-24</td>\n",
    314        "      <td>24</td>\n",
    315        "    </tr>\n",
    316        "    <tr>\n",
    317        "      <th>3</th>\n",
    318        "      <td>1918258</td>\n",
    319        "      <td>2014-07-22</td>\n",
    320        "      <td>2014-07-22</td>\n",
    321        "      <td>AAPL</td>\n",
    322        "      <td>Buyback Update</td>\n",
    323        "      <td>Apple Repurchases $28B Common Stock in FY 14 YTD</td>\n",
    324        "      <td>5000</td>\n",
    325        "      <td>$M</td>\n",
    326        "      <td>1</td>\n",
    327        "      <td>2014-07-23</td>\n",
    328        "      <td>24</td>\n",
    329        "    </tr>\n",
    330        "    <tr>\n",
    331        "      <th>4</th>\n",
    332        "      <td>1918275</td>\n",
    333        "      <td>2014-10-20</td>\n",
    334        "      <td>2014-10-20</td>\n",
    335        "      <td>AAPL</td>\n",
    336        "      <td>Buyback Update</td>\n",
    337        "      <td>Apple Repurchases $45B Common Stock in FY 14</td>\n",
    338        "      <td>17000</td>\n",
    339        "      <td>$M</td>\n",
    340        "      <td>1</td>\n",
    341        "      <td>2014-10-21</td>\n",
    342        "      <td>24</td>\n",
    343        "    </tr>\n",
    344        "  </tbody>\n",
    345        "</table>"
    346       ],
    347       "text/plain": [
    348        "   event_id  asof_date trade_date symbol      event_type  \\\n",
    349        "0   1918241 2014-01-27 2014-01-27   AAPL  Buyback Update   \n",
    350        "1   1674141 2014-02-07 2014-02-07   AAPL  Buyback Update   \n",
    351        "2   1918254 2014-04-23 2014-04-23   AAPL  Buyback Update   \n",
    352        "3   1918258 2014-07-22 2014-07-22   AAPL  Buyback Update   \n",
    353        "4   1918275 2014-10-20 2014-10-20   AAPL  Buyback Update   \n",
    354        "\n",
    355        "                                      event_headline  repurchase_amount  \\\n",
    356        "0     Apple Repurchases $5.03B Common Stock in 1Q 14               5029   \n",
    357        "1  Apple Repurchases $14B Common Stock Since 1Q 1...              14000   \n",
    358        "2   Apple Repurchases $23B Common Stock in FY 14 YTD              23000   \n",
    359        "3   Apple Repurchases $28B Common Stock in FY 14 YTD               5000   \n",
    360        "4       Apple Repurchases $45B Common Stock in FY 14              17000   \n",
    361        "\n",
    362        "  repurchase_units  event_rating  timestamp  sid  \n",
    363        "0               $M             1 2014-01-28   24  \n",
    364        "1               $M             1 2014-02-08   24  \n",
    365        "2               $M             1 2014-04-24   24  \n",
    366        "3               $M             1 2014-07-23   24  \n",
    367        "4               $M             1 2014-10-21   24  "
    368       ]
    369      },
    370      "execution_count": 5,
    371      "metadata": {},
    372      "output_type": "execute_result"
    373     }
    374    ],
    375    "source": [
    376     "# get apple's sid first\n",
    377     "apple_sid = symbols('AAPL').sid\n",
    378     "buybacks = share_repurchases[('2013-12-31' < share_repurchases['asof_date']) & \n",
    379     "                                (share_repurchases['asof_date'] <'2015-01-01') & \n",
    380     "                                (share_repurchases.sid == apple_sid)]\n",
    381     "# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.\n",
    382     "buybacks.sort('asof_date')"
    383    ]
    384   },
    385   {
    386    "cell_type": "markdown",
    387    "metadata": {},
    388    "source": [
    389     "Now suppose we want a DataFrame of the Blaze Data Object above, but only want the `asof_date, repurchase_units`, and the `repurchase_amount`."
    390    ]
    391   },
    392   {
    393    "cell_type": "code",
    394    "execution_count": 6,
    395    "metadata": {
    396     "collapsed": false
    397    },
    398    "outputs": [
    399     {
    400      "data": {
    401       "text/html": [
    402        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
    403        "<table border=\"1\" class=\"dataframe\">\n",
    404        "  <thead>\n",
    405        "    <tr style=\"text-align: right;\">\n",
    406        "      <th></th>\n",
    407        "      <th>asof_date</th>\n",
    408        "      <th>repurchase_amount</th>\n",
    409        "      <th>repurchase_units</th>\n",
    410        "    </tr>\n",
    411        "  </thead>\n",
    412        "  <tbody>\n",
    413        "    <tr>\n",
    414        "      <th>0</th>\n",
    415        "      <td>2014-01-27</td>\n",
    416        "      <td>5029</td>\n",
    417        "      <td>$M</td>\n",
    418        "    </tr>\n",
    419        "    <tr>\n",
    420        "      <th>1</th>\n",
    421        "      <td>2014-02-07</td>\n",
    422        "      <td>14000</td>\n",
    423        "      <td>$M</td>\n",
    424        "    </tr>\n",
    425        "    <tr>\n",
    426        "      <th>2</th>\n",
    427        "      <td>2014-04-23</td>\n",
    428        "      <td>23000</td>\n",
    429        "      <td>$M</td>\n",
    430        "    </tr>\n",
    431        "    <tr>\n",
    432        "      <th>3</th>\n",
    433        "      <td>2014-07-22</td>\n",
    434        "      <td>5000</td>\n",
    435        "      <td>$M</td>\n",
    436        "    </tr>\n",
    437        "    <tr>\n",
    438        "      <th>4</th>\n",
    439        "      <td>2014-10-20</td>\n",
    440        "      <td>17000</td>\n",
    441        "      <td>$M</td>\n",
    442        "    </tr>\n",
    443        "  </tbody>\n",
    444        "</table>\n",
    445        "</div>"
    446       ],
    447       "text/plain": [
    448        "   asof_date  repurchase_amount repurchase_units\n",
    449        "0 2014-01-27               5029               $M\n",
    450        "1 2014-02-07              14000               $M\n",
    451        "2 2014-04-23              23000               $M\n",
    452        "3 2014-07-22               5000               $M\n",
    453        "4 2014-10-20              17000               $M"
    454       ]
    455      },
    456      "execution_count": 6,
    457      "metadata": {},
    458      "output_type": "execute_result"
    459     }
    460    ],
    461    "source": [
    462     "df = odo(buybacks, pd.DataFrame)\n",
    463     "df = df[['asof_date','repurchase_amount','repurchase_units']]\n",
    464     "df"
    465    ]
    466   }
    467  ],
    468  "metadata": {
    469   "kernelspec": {
    470    "display_name": "Python 2",
    471    "language": "python",
    472    "name": "python2"
    473   },
    474   "language_info": {
    475    "codemirror_mode": {
    476     "name": "ipython",
    477     "version": 2
    478    },
    479    "file_extension": ".py",
    480    "mimetype": "text/x-python",
    481    "name": "python",
    482    "nbconvert_exporter": "python",
    483    "pygments_lexer": "ipython2",
    484    "version": "2.7.12"
    485   }
    486  },
    487  "nbformat": 4,
    488  "nbformat_minor": 0
    489 }