ml-finance-python

python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb

(9494B)
      1 {
      2  "cells": [
      3   {
      4    "cell_type": "code",
      5    "execution_count": 1,
      6    "metadata": {
      7     "collapsed": true
      8    },
      9    "outputs": [],
     10    "source": [
     11     "from quantopian.pipeline import Pipeline\n",
     12     "from quantopian.research import run_pipeline\n",
     13     "from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume"
     14    ]
     15   },
     16   {
     17    "cell_type": "markdown",
     18    "metadata": {},
     19    "source": [
     20     "##Classifiers\n",
     21     "A classifier is a function from an asset and a moment in time to a [categorical output](https://en.wikipedia.org/wiki/Categorical_variable) such as a `string` or `integer` label:\n",
     22     "```\n",
     23     "F(asset, timestamp) -> category\n",
     24     "```\n",
     25     "An example of a classifier producing a string output is the exchange ID of a security. To create this classifier, we'll have to import `Fundamentals.exchange_id` and use the [latest](https://www.quantopian.com/tutorials/pipeline#lesson3) attribute to instantiate our classifier:"
     26    ]
     27   },
     28   {
     29    "cell_type": "code",
     30    "execution_count": 2,
     31    "metadata": {
     32     "collapsed": true
     33    },
     34    "outputs": [],
     35    "source": [
     36     "from quantopian.pipeline.data import Fundamentals\n",
     37     "\n",
     38     "# Since the underlying data of Fundamentals.exchange_id\n",
     39     "# is of type string, .latest returns a Classifier\n",
     40     "exchange = Fundamentals.exchange_id.latest"
     41    ]
     42   },
     43   {
     44    "cell_type": "markdown",
     45    "metadata": {},
     46    "source": [
     47     "Previously, we saw that the `latest` attribute produced an instance of a `Factor`. In this case, since the underlying data is of type `string`, `latest` produces a `Classifier`.\n",
     48     "\n",
     49     "Similarly, a computation producing the latest Morningstar sector code of a security is a `Classifier`. In this case, the underlying type is an `int`, but the integer doesn't represent a numerical value (it's a category) so it produces a classifier. To get the latest sector code, we can use the built-in `Sector` classifier."
     50    ]
     51   },
     52   {
     53    "cell_type": "code",
     54    "execution_count": 3,
     55    "metadata": {
     56     "collapsed": false
     57    },
     58    "outputs": [],
     59    "source": [
     60     "from quantopian.pipeline.classifiers.fundamentals import Sector  \n",
     61     "morningstar_sector = Sector()"
     62    ]
     63   },
     64   {
     65    "cell_type": "markdown",
     66    "metadata": {},
     67    "source": [
     68     "Using `Sector` is equivalent to `Fundamentals.morningstar_sector_code.latest`."
     69    ]
     70   },
     71   {
     72    "cell_type": "markdown",
     73    "metadata": {},
     74    "source": [
     75     "###Building Filters from Classifiers\n",
     76     "Classifiers can also be used to produce filters with methods like `isnull`, `eq`, and `startswith`. The full list of `Classifier` methods producing `Filters` can be found [here](https://www.quantopian.com/help#quantopian_pipeline_classifiers_Classifier).\n",
     77     "\n",
     78     "As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the `eq` method of our `exchange` classifier."
     79    ]
     80   },
     81   {
     82    "cell_type": "code",
     83    "execution_count": 4,
     84    "metadata": {
     85     "collapsed": true
     86    },
     87    "outputs": [],
     88    "source": [
     89     "nyse_filter = exchange.eq('NYS')"
     90    ]
     91   },
     92   {
     93    "cell_type": "markdown",
     94    "metadata": {},
     95    "source": [
     96     "This filter will return `True` for securities having `'NYS'` as their most recent `exchange_id`."
     97    ]
     98   },
     99   {
    100    "cell_type": "markdown",
    101    "metadata": {},
    102    "source": [
    103     "###Quantiles\n",
    104     "Classifiers can also be produced from various `Factor` methods. The most general of these is the `quantiles` method which accepts a bin count as an argument. The `quantiles` method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a `Classifier` with these labels. `NaN`s are labeled with -1. Aliases are available for [quartiles](https://www.quantopian.com/help/#quantopian_pipeline_factors_Factor_quartiles) (`quantiles(4)`), [quintiles](https://www.quantopian.com/help/#quantopian_pipeline_factors_Factor_quintiles) (`quantiles(5)`), and [deciles](https://www.quantopian.com/help/#quantopian_pipeline_factors_Factor_deciles) (`quantiles(10)`). As an example, this is what a filter for the top decile of a factor might look like:"
    105    ]
    106   },
    107   {
    108    "cell_type": "code",
    109    "execution_count": 5,
    110    "metadata": {
    111     "collapsed": true
    112    },
    113    "outputs": [],
    114    "source": [
    115     "dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()\n",
    116     "top_decile = (dollar_volume_decile.eq(9))"
    117    ]
    118   },
    119   {
    120    "cell_type": "markdown",
    121    "metadata": {},
    122    "source": [
    123     "Let's put each of our classifiers into a pipeline and run it to see what they look like."
    124    ]
    125   },
    126   {
    127    "cell_type": "code",
    128    "execution_count": 4,
    129    "metadata": {
    130     "collapsed": false
    131    },
    132    "outputs": [],
    133    "source": [
    134     "def make_pipeline():\n",
    135     "    exchange = Fundamentals.exchange_id.latest\n",
    136     "    nyse_filter = exchange.eq('NYS')\n",
    137     "\n",
    138     "    morningstar_sector = Sector()\n",
    139     "\n",
    140     "    dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()\n",
    141     "    top_decile = (dollar_volume_decile.eq(9))\n",
    142     "\n",
    143     "    return Pipeline(\n",
    144     "        columns={\n",
    145     "            'exchange': exchange,\n",
    146     "            'sector_code': morningstar_sector,\n",
    147     "            'dollar_volume_decile': dollar_volume_decile\n",
    148     "        },\n",
    149     "        screen=(nyse_filter & top_decile)\n",
    150     "    )"
    151    ]
    152   },
    153   {
    154    "cell_type": "code",
    155    "execution_count": 5,
    156    "metadata": {
    157     "collapsed": false
    158    },
    159    "outputs": [
    160     {
    161      "name": "stdout",
    162      "output_type": "stream",
    163      "text": [
    164       "Number of securities that passed the filter: 513\n"
    165      ]
    166     },
    167     {
    168      "data": {
    169       "text/html": [
    170        "<div>\n",
    171        "<table border=\"1\" class=\"dataframe\">\n",
    172        "  <thead>\n",
    173        "    <tr style=\"text-align: right;\">\n",
    174        "      <th></th>\n",
    175        "      <th></th>\n",
    176        "      <th>dollar_volume_decile</th>\n",
    177        "      <th>exchange</th>\n",
    178        "      <th>sector_code</th>\n",
    179        "    </tr>\n",
    180        "  </thead>\n",
    181        "  <tbody>\n",
    182        "    <tr>\n",
    183        "      <th rowspan=\"5\" valign=\"top\">2015-05-05 00:00:00+00:00</th>\n",
    184        "      <th>Equity(2 [ARNC])</th>\n",
    185        "      <td>9</td>\n",
    186        "      <td>NYS</td>\n",
    187        "      <td>101</td>\n",
    188        "    </tr>\n",
    189        "    <tr>\n",
    190        "      <th>Equity(62 [ABT])</th>\n",
    191        "      <td>9</td>\n",
    192        "      <td>NYS</td>\n",
    193        "      <td>206</td>\n",
    194        "    </tr>\n",
    195        "    <tr>\n",
    196        "      <th>Equity(64 [ABX])</th>\n",
    197        "      <td>9</td>\n",
    198        "      <td>NYS</td>\n",
    199        "      <td>101</td>\n",
    200        "    </tr>\n",
    201        "    <tr>\n",
    202        "      <th>Equity(76 [TAP])</th>\n",
    203        "      <td>9</td>\n",
    204        "      <td>NYS</td>\n",
    205        "      <td>205</td>\n",
    206        "    </tr>\n",
    207        "    <tr>\n",
    208        "      <th>Equity(128 [ADM])</th>\n",
    209        "      <td>9</td>\n",
    210        "      <td>NYS</td>\n",
    211        "      <td>205</td>\n",
    212        "    </tr>\n",
    213        "  </tbody>\n",
    214        "</table>\n",
    215        "</div>"
    216       ],
    217       "text/plain": [
    218        "                                             dollar_volume_decile exchange  \\\n",
    219        "2015-05-05 00:00:00+00:00 Equity(2 [ARNC])                      9      NYS   \n",
    220        "                          Equity(62 [ABT])                      9      NYS   \n",
    221        "                          Equity(64 [ABX])                      9      NYS   \n",
    222        "                          Equity(76 [TAP])                      9      NYS   \n",
    223        "                          Equity(128 [ADM])                     9      NYS   \n",
    224        "\n",
    225        "                                             sector_code  \n",
    226        "2015-05-05 00:00:00+00:00 Equity(2 [ARNC])           101  \n",
    227        "                          Equity(62 [ABT])           206  \n",
    228        "                          Equity(64 [ABX])           101  \n",
    229        "                          Equity(76 [TAP])           205  \n",
    230        "                          Equity(128 [ADM])          205  "
    231       ]
    232      },
    233      "execution_count": 5,
    234      "metadata": {},
    235      "output_type": "execute_result"
    236     }
    237    ],
    238    "source": [
    239     "result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')\n",
    240     "print 'Number of securities that passed the filter: %d' % len(result)\n",
    241     "result.head(5)"
    242    ]
    243   },
    244   {
    245    "cell_type": "markdown",
    246    "metadata": {},
    247    "source": [
    248     "Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as [demean](https://www.quantopian.com/help#quantopian_pipeline_factors_Factor_demean) and [groupby](https://www.quantopian.com/help#quantopian_pipeline_factors_Factor_groupby) are outside the scope of this tutorial. A future tutorial will cover more advanced uses for classifiers.\n",
    249     "\n",
    250     "In the next lesson, we'll look at the different datasets that we can use in pipeline."
    251    ]
    252   }
    253  ],
    254  "metadata": {
    255   "kernelspec": {
    256    "display_name": "Python 2",
    257    "language": "python",
    258    "name": "python2"
    259   },
    260   "language_info": {
    261    "codemirror_mode": {
    262     "name": "ipython",
    263     "version": 2
    264    },
    265    "file_extension": ".py",
    266    "mimetype": "text/x-python",
    267    "name": "python",
    268    "nbconvert_exporter": "python",
    269    "pygments_lexer": "ipython2",
    270    "version": "2.7.12"
    271   }
    272  },
    273  "nbformat": 4,
    274  "nbformat_minor": 0
    275 }