ml-finance-python

python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb

(14675B)
      1 {
      2  "cells": [
      3   {
      4    "cell_type": "code",
      5    "execution_count": 1,
      6    "metadata": {
      7     "collapsed": true
      8    },
      9    "outputs": [],
     10    "source": [
     11     "from quantopian.pipeline import Pipeline\n",
     12     "from quantopian.research import run_pipeline\n",
     13     "from quantopian.pipeline.data.builtin import USEquityPricing\n",
     14     "from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume"
     15    ]
     16   },
     17   {
     18    "cell_type": "markdown",
     19    "metadata": {},
     20    "source": [
     21     "##Putting It All Together\n",
     22     "Now that we've covered the basic components of the Pipeline API, let's construct a pipeline that we might want to use in an algorithm.\n",
     23     "\n",
     24     "To start, let's first create a filter to narrow down the types of securities coming out of our pipeline. In this example, we will create a filter to select for securities that meet all of the following criteria:\n",
     25     "- Is a primary share\n",
     26     "- Is listed as a common stock\n",
     27     "- Is not a [depositary receipt](http://www.investopedia.com/terms/d/depositaryreceipt.asp) (ADR/GDR)\n",
     28     "- Is not trading [over-the-counter](http://www.investopedia.com/terms/o/otc.asp) (OTC)\n",
     29     "- Is not [when-issued](http://www.investopedia.com/terms/w/wi.asp) (WI)\n",
     30     "- Doesn't have a name indicating it's a [limited partnership](http://www.investopedia.com/terms/l/limitedpartnership.asp) (LP)\n",
     31     "- Doesn't have a company reference entry indicating it's a LP\n",
     32     "- Is not an [ETF](http://www.investopedia.com/terms/e/etf.asp) (has Morningstar fundamental data)\n",
     33     "\n",
     34     "\n",
     35     "####Why These Criteria?\n",
     36     "Selecting for primary shares and common stock helps us to select only a single security for each company. In general, primary shares are a good representative asset of a company so we will select for these in our pipeline.\n",
     37     "\n",
     38     "ADRs and GDRs are issuances in the US equity market for stocks that trade on other exchanges. Frequently, there is inherent risk associated with depositary receipts due to currency fluctuations so we exclude them from our pipeline.\n",
     39     "\n",
     40     "OTC, WI, and LP securities are not tradeable with most brokers. As a result, we exclude them from our pipeline.\n",
     41     "\n",
     42     "###Creating Our Pipeline\n",
     43     "Let's create a filter for each criterion and combine them together to create a `tradeable_stocks` filter. First, we need to import the Morningstar `DataSet` as well as the `IsPrimaryShare` builtin filter."
     44    ]
     45   },
     46   {
     47    "cell_type": "code",
     48    "execution_count": 2,
     49    "metadata": {
     50     "collapsed": true
     51    },
     52    "outputs": [],
     53    "source": [
     54     "from quantopian.pipeline.data import Fundamentals\n",
     55     "from quantopian.pipeline.filters.fundamentals import IsPrimaryShare"
     56    ]
     57   },
     58   {
     59    "cell_type": "markdown",
     60    "metadata": {},
     61    "source": [
     62     "Now we can define our filters:"
     63    ]
     64   },
     65   {
     66    "cell_type": "code",
     67    "execution_count": 3,
     68    "metadata": {
     69     "collapsed": true
     70    },
     71    "outputs": [],
     72    "source": [
     73     "# Filter for primary share equities. IsPrimaryShare is a built-in filter.\n",
     74     "primary_share = IsPrimaryShare()\n",
     75     "\n",
     76     "# Equities listed as common stock (as opposed to, say, preferred stock).\n",
     77     "# 'ST00000001' indicates common stock.\n",
     78     "common_stock = Fundamentals.security_type.latest.eq('ST00000001')\n",
     79     "\n",
     80     "# Non-depositary receipts. Recall that the ~ operator inverts filters,\n",
     81     "# turning Trues into Falses and vice versa\n",
     82     "not_depositary = ~Fundamentals.is_depositary_receipt.latest\n",
     83     "\n",
     84     "# Equities not trading over-the-counter.\n",
     85     "not_otc = ~Fundamentals.exchange_id.latest.startswith('OTC')\n",
     86     "\n",
     87     "# Not when-issued equities.\n",
     88     "not_wi = ~Fundamentals.symbol.latest.endswith('.WI')\n",
     89     "\n",
     90     "# Equities without LP in their name, .matches does a match using a regular\n",
     91     "# expression\n",
     92     "not_lp_name = ~Fundamentals.standard_name.latest.matches('.* L[. ]?P.?$')\n",
     93     "\n",
     94     "# Equities with a null value in the limited_partnership Morningstar\n",
     95     "# fundamental field.\n",
     96     "not_lp_balance_sheet = Fundamentals.limited_partnership.latest.isnull()\n",
     97     "\n",
     98     "# Equities whose most recent Morningstar market cap is not null have\n",
     99     "# fundamental data and therefore are not ETFs.\n",
    100     "have_market_cap = Fundamentals.market_cap.latest.notnull()\n",
    101     "\n",
    102     "# Filter for stocks that pass all of our previous filters.\n",
    103     "tradeable_stocks = (\n",
    104     "    primary_share\n",
    105     "    & common_stock\n",
    106     "    & not_depositary\n",
    107     "    & not_otc\n",
    108     "    & not_wi\n",
    109     "    & not_lp_name\n",
    110     "    & not_lp_balance_sheet\n",
    111     "    & have_market_cap\n",
    112     ")"
    113    ]
    114   },
    115   {
    116    "cell_type": "markdown",
    117    "metadata": {},
    118    "source": [
    119     "Note that when defining our filters, we used several `Classifier` methods that we haven't yet seen including `notnull`, `startswith`, `endswith`, and `matches`. Documentation on these methods is available [here](https://www.quantopian.com/help#quantopian_pipeline_classifiers_Classifier).\n",
    120     "\n",
    121     "Next, let's create a filter for the top 30% of tradeable stocks by 20-day average dollar volume. We'll call this our `base_universe`."
    122    ]
    123   },
    124   {
    125    "cell_type": "code",
    126    "execution_count": 4,
    127    "metadata": {
    128     "collapsed": true
    129    },
    130    "outputs": [],
    131    "source": [
    132     "base_universe = AverageDollarVolume(window_length=20, mask=tradeable_stocks).percentile_between(70, 100)"
    133    ]
    134   },
    135   {
    136    "cell_type": "markdown",
    137    "metadata": {},
    138    "source": [
    139     "####Built-in Base Universe\n",
    140     "\n",
    141     "We have just defined our own base universe to select 'tradeable' securities with high dollar volume. However, Quantopian has several built-in filters that do something similar, the best and newest of which is the [QTradableStocksUS](https://www.quantopian.com/help#quantopian_pipeline_filters_QTradableStocksUS). The QTradableStocksUS is a built-in pipeline filter that selects a daily universe of stocks that are filtered in three passes and adhere to a set of criteria to yield the most liquid universe possible without any size constraints. The QTradableStocksUS therefore has no size cutoff unlike its predecessors, the [Q500US](https://www.quantopian.com/help#quantopian_pipeline_filters_Q500US) and the [Q1500US](https://www.quantopian.com/help#quantopian_pipeline_filters_Q1500US). More detail on the selection criteria of the QTradableStocksUS can be found [here](https://www.quantopian.com/posts/working-on-our-best-universe-yet-qtradablestocksus).\n",
    142     "\n",
    143     "To simplify our pipeline, let's replace what we've already written for our `base_universe` with the `QTradableStocksUS` built-in filter. First, we need to import it."
    144    ]
    145   },
    146   {
    147    "cell_type": "code",
    148    "execution_count": 5,
    149    "metadata": {
    150     "collapsed": true
    151    },
    152    "outputs": [],
    153    "source": [
    154     "from quantopian.pipeline.filters import QTradableStocksUS"
    155    ]
    156   },
    157   {
    158    "cell_type": "markdown",
    159    "metadata": {},
    160    "source": [
    161     "Then, let's set our base_universe to the `QTradableStocksUS`."
    162    ]
    163   },
    164   {
    165    "cell_type": "code",
    166    "execution_count": 6,
    167    "metadata": {
    168     "collapsed": true
    169    },
    170    "outputs": [],
    171    "source": [
    172     "base_universe = QTradableStocksUS()"
    173    ]
    174   },
    175   {
    176    "cell_type": "markdown",
    177    "metadata": {},
    178    "source": [
    179     "Now that we have a filter `base_universe` that we can use to select a subset of securities, let's focus on creating factors for this subset. For this example, let's create a pipeline for a mean reversion strategy. In this strategy, we'll look at the 10-day and 30-day moving averages (close price). Let's plan to open equally weighted long positions in the 75 securities with the least (most negative) percent difference and equally weighted short positions in the 75 with the greatest percent difference. To do this, let's create two moving average factors using our `base_universe` filter as a mask. Then let's combine them into a factor computing the percent difference."
    180    ]
    181   },
    182   {
    183    "cell_type": "code",
    184    "execution_count": 7,
    185    "metadata": {
    186     "collapsed": true
    187    },
    188    "outputs": [],
    189    "source": [
    190     "# 10-day close price average.\n",
    191     "mean_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10, mask=base_universe)\n",
    192     "\n",
    193     "# 30-day close price average.\n",
    194     "mean_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30, mask=base_universe)\n",
    195     "\n",
    196     "percent_difference = (mean_10 - mean_30) / mean_30"
    197    ]
    198   },
    199   {
    200    "cell_type": "markdown",
    201    "metadata": {},
    202    "source": [
    203     "Next, let's create filters for the top 75 and bottom 75 equities by `percent_difference`."
    204    ]
    205   },
    206   {
    207    "cell_type": "code",
    208    "execution_count": 8,
    209    "metadata": {
    210     "collapsed": true
    211    },
    212    "outputs": [],
    213    "source": [
    214     "# Create a filter to select securities to short.\n",
    215     "shorts = percent_difference.top(75)\n",
    216     "\n",
    217     "# Create a filter to select securities to long.\n",
    218     "longs = percent_difference.bottom(75)"
    219    ]
    220   },
    221   {
    222    "cell_type": "markdown",
    223    "metadata": {},
    224    "source": [
    225     "Let's then combine `shorts` and `longs` to create a new filter that we can use as the screen of our pipeline:"
    226    ]
    227   },
    228   {
    229    "cell_type": "code",
    230    "execution_count": 9,
    231    "metadata": {
    232     "collapsed": true
    233    },
    234    "outputs": [],
    235    "source": [
    236     "securities_to_trade = (shorts | longs)"
    237    ]
    238   },
    239   {
    240    "cell_type": "markdown",
    241    "metadata": {},
    242    "source": [
    243     "Since our earlier filters were used as masks as we built up to this final filter, when we use `securities_to_trade` as a screen, the output securities will meet the criteria outlined at the beginning of the lesson (primary shares, non-ETFs, etc.). They will also have high dollar volume."
    244    ]
    245   },
    246   {
    247    "cell_type": "markdown",
    248    "metadata": {
    249     "collapsed": true
    250    },
    251    "source": [
    252     "Finally, let's instantiate our pipeline. Since we are planning on opening equally weighted long and short positions later, the only information that we actually need from our pipeline is which securities we want to trade (the pipeline index) and whether or not to open a long or a short position. Let's add our `longs` and `shorts` filters to our pipeline and set our screen to be `securities_to_trade`."
    253    ]
    254   },
    255   {
    256    "cell_type": "code",
    257    "execution_count": 10,
    258    "metadata": {
    259     "collapsed": true
    260    },
    261    "outputs": [],
    262    "source": [
    263     "def make_pipeline():\n",
    264     "    \n",
    265     "    # Base universe filter.\n",
    266     "    base_universe = QTradableStocksUS()\n",
    267     "    \n",
    268     "    # 10-day close price average.\n",
    269     "    mean_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10, mask=base_universe)\n",
    270     "\n",
    271     "    # 30-day close price average.\n",
    272     "    mean_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30, mask=base_universe)\n",
    273     "\n",
    274     "    # Percent difference factor.\n",
    275     "    percent_difference = (mean_10 - mean_30) / mean_30\n",
    276     "    \n",
    277     "    # Create a filter to select securities to short.\n",
    278     "    shorts = percent_difference.top(75)\n",
    279     "\n",
    280     "    # Create a filter to select securities to long.\n",
    281     "    longs = percent_difference.bottom(75)\n",
    282     "    \n",
    283     "    # Filter for the securities that we want to trade.\n",
    284     "    securities_to_trade = (shorts | longs)\n",
    285     "    \n",
    286     "    return Pipeline(\n",
    287     "        columns={\n",
    288     "            'longs': longs,\n",
    289     "            'shorts': shorts\n",
    290     "        },\n",
    291     "        screen=securities_to_trade\n",
    292     "    )"
    293    ]
    294   },
    295   {
    296    "cell_type": "markdown",
    297    "metadata": {},
    298    "source": [
    299     "Running this pipeline will result in a DataFrame containing 2 columns. Each day, the columns will contain boolean values that we can use to decide whether we want to open a long or short position in each security."
    300    ]
    301   },
    302   {
    303    "cell_type": "code",
    304    "execution_count": 11,
    305    "metadata": {
    306     "collapsed": false
    307    },
    308    "outputs": [
    309     {
    310      "data": {
    311       "text/html": [
    312        "<div>\n",
    313        "<table border=\"1\" class=\"dataframe\">\n",
    314        "  <thead>\n",
    315        "    <tr style=\"text-align: right;\">\n",
    316        "      <th></th>\n",
    317        "      <th></th>\n",
    318        "      <th>longs</th>\n",
    319        "      <th>shorts</th>\n",
    320        "    </tr>\n",
    321        "  </thead>\n",
    322        "  <tbody>\n",
    323        "    <tr>\n",
    324        "      <th rowspan=\"5\" valign=\"top\">2015-05-05 00:00:00+00:00</th>\n",
    325        "      <th>Equity(39 [DDC])</th>\n",
    326        "      <td>False</td>\n",
    327        "      <td>True</td>\n",
    328        "    </tr>\n",
    329        "    <tr>\n",
    330        "      <th>Equity(351 [AMD])</th>\n",
    331        "      <td>True</td>\n",
    332        "      <td>False</td>\n",
    333        "    </tr>\n",
    334        "    <tr>\n",
    335        "      <th>Equity(371 [TVTY])</th>\n",
    336        "      <td>True</td>\n",
    337        "      <td>False</td>\n",
    338        "    </tr>\n",
    339        "    <tr>\n",
    340        "      <th>Equity(474 [APOG])</th>\n",
    341        "      <td>False</td>\n",
    342        "      <td>True</td>\n",
    343        "    </tr>\n",
    344        "    <tr>\n",
    345        "      <th>Equity(523 [AAN])</th>\n",
    346        "      <td>False</td>\n",
    347        "      <td>True</td>\n",
    348        "    </tr>\n",
    349        "  </tbody>\n",
    350        "</table>\n",
    351        "</div>"
    352       ],
    353       "text/plain": [
    354        "                                              longs shorts\n",
    355        "2015-05-05 00:00:00+00:00 Equity(39 [DDC])    False   True\n",
    356        "                          Equity(351 [AMD])    True  False\n",
    357        "                          Equity(371 [TVTY])   True  False\n",
    358        "                          Equity(474 [APOG])  False   True\n",
    359        "                          Equity(523 [AAN])   False   True"
    360       ]
    361      },
    362      "execution_count": 11,
    363      "metadata": {},
    364      "output_type": "execute_result"
    365     }
    366    ],
    367    "source": [
    368     "result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')\n",
    369     "result.head()"
    370    ]
    371   },
    372   {
    373    "cell_type": "markdown",
    374    "metadata": {},
    375    "source": [
    376     "In the next lesson, we'll add this pipeline to an algorithm."
    377    ]
    378   }
    379  ],
    380  "metadata": {
    381   "kernelspec": {
    382    "display_name": "Python 2",
    383    "language": "python",
    384    "name": "python2"
    385   },
    386   "language_info": {
    387    "codemirror_mode": {
    388     "name": "ipython",
    389     "version": 2
    390    },
    391    "file_extension": ".py",
    392    "mimetype": "text/x-python",
    393    "name": "python",
    394    "nbconvert_exporter": "python",
    395    "pygments_lexer": "ipython2",
    396    "version": "2.7.12"
    397   }
    398  },
    399  "nbformat": 4,
    400  "nbformat_minor": 1
    401 }