ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(6933B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "code",
      5    "execution_count": 1,
      6    "metadata": {
      7     "collapsed": true
      8    },
      9    "outputs": [],
     10    "source": [
     11     "from quantopian.pipeline import Pipeline\n",
     12     "from quantopian.research import run_pipeline\n",
     13     "from quantopian.pipeline.data.builtin import USEquityPricing\n",
     14     "from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume"
     15    ]
     16   },
     17   {
     18    "cell_type": "markdown",
     19    "metadata": {},
     20    "source": [
     21     "##Masking\n",
     22     "Sometimes we want to ignore certain assets when computing pipeline expresssions. There are two common cases where ignoring assets is useful:\n",
     23     "1. We want to compute an expression that's computationally expensive, and we know we only care about results for certain assets. A common example of such an expensive expression is a `Factor` computing the coefficients of a regression ([RollingLinearRegressionOfReturns](https://www.quantopian.com/help#quantopian_pipeline_factors_RollingLinearRegressionOfReturns)).\n",
     24     "2. We want to compute an expression that performs comparisons between assets, but we only want those comparisons to be performed against a subset of all assets. For example, we might want to use the `Factor` method `top` to compute the top 200 assets by earnings yield, ignoring assets that don't meet some liquidity constraint.\n",
     25     "\n",
     26     "To support these two use-cases, all `Factors` and many `Factor` methods can accept a mask argument, which must be a `Filter` indicating which assets to consider when computing."
     27    ]
     28   },
     29   {
     30    "cell_type": "markdown",
     31    "metadata": {},
     32    "source": [
     33     "###Masking Factors\n",
     34     "Let's say we want our pipeline to output securities with a high or low percent difference but we also only want to consider securities with a dollar volume above $10,000,000. To do this, let's rearrange our `make_pipeline` function so that we first create the `high_dollar_volume` filter. We can then use this filter as a `mask` for moving average factors by passing `high_dollar_volume` as the `mask` argument to `SimpleMovingAverage`."
     35    ]
     36   },
     37   {
     38    "cell_type": "code",
     39    "execution_count": 2,
     40    "metadata": {
     41     "collapsed": false
     42    },
     43    "outputs": [],
     44    "source": [
     45     "# Dollar volume factor\n",
     46     "dollar_volume = AverageDollarVolume(window_length=30)\n",
     47     "\n",
     48     "# High dollar volume filter\n",
     49     "high_dollar_volume = (dollar_volume > 10000000)\n",
     50     "\n",
     51     "# Average close price factors\n",
     52     "mean_close_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10, mask=high_dollar_volume)\n",
     53     "mean_close_30 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=30, mask=high_dollar_volume)\n",
     54     "\n",
     55     "# Relative difference factor\n",
     56     "percent_difference = (mean_close_10 - mean_close_30) / mean_close_30"
     57    ]
     58   },
     59   {
     60    "cell_type": "markdown",
     61    "metadata": {},
     62    "source": [
     63     "Applying the mask to `SimpleMovingAverage` restricts the average close price factors to a computation over the ~2000 securities passing the `high_dollar_volume` filter, as opposed to ~8000 without a mask. When we combine `mean_close_10` and `mean_close_30` to form `percent_difference`, the computation is performed on the same ~2000 securities."
     64    ]
     65   },
     66   {
     67    "cell_type": "markdown",
     68    "metadata": {},
     69    "source": [
     70     "###Masking Filters\n",
     71     "Masks can be also be applied to methods that return filters like `top`, `bottom`, and `percentile_between`.\n",
     72     "\n",
     73     "Masks are most useful when we want to apply a filter in the earlier steps of a combined computation. For example, suppose we want to get the 50 securities with the highest open price that are also in the top 10% of dollar volume. Suppose that we then want the 90th-100th percentile of these securities by close price. We can do this with the following:"
     74    ]
     75   },
     76   {
     77    "cell_type": "code",
     78    "execution_count": 3,
     79    "metadata": {
     80     "collapsed": false
     81    },
     82    "outputs": [],
     83    "source": [
     84     "# Dollar volume factor\n",
     85     "dollar_volume = AverageDollarVolume(window_length=30)\n",
     86     "\n",
     87     "# High dollar volume filter\n",
     88     "high_dollar_volume = dollar_volume.percentile_between(90,100)\n",
     89     "\n",
     90     "# Top open price filter (high dollar volume securities)\n",
     91     "top_open_price = USEquityPricing.open.latest.top(50, mask=high_dollar_volume)\n",
     92     "\n",
     93     "# Top percentile close price filter (high dollar volume, top 50 open price)\n",
     94     "high_close_price = USEquityPricing.close.latest.percentile_between(90, 100, mask=top_open_price)"
     95    ]
     96   },
     97   {
     98    "cell_type": "markdown",
     99    "metadata": {},
    100    "source": [
    101     "Let's put this into `make_pipeline` and output an empty pipeline screened with our `high_close_price` filter."
    102    ]
    103   },
    104   {
    105    "cell_type": "code",
    106    "execution_count": 4,
    107    "metadata": {
    108     "collapsed": true
    109    },
    110    "outputs": [],
    111    "source": [
    112     "def make_pipeline():\n",
    113     "\n",
    114     "    # Dollar volume factor\n",
    115     "    dollar_volume = AverageDollarVolume(window_length=30)\n",
    116     "\n",
    117     "    # High dollar volume filter\n",
    118     "    high_dollar_volume = dollar_volume.percentile_between(90,100)\n",
    119     "\n",
    120     "    # Top open securities filter (high dollar volume securities)\n",
    121     "    top_open_price = USEquityPricing.open.latest.top(50, mask=high_dollar_volume)\n",
    122     "\n",
    123     "    # Top percentile close price filter (high dollar volume, top 50 open price)\n",
    124     "    high_close_price = USEquityPricing.close.latest.percentile_between(90, 100, mask=top_open_price)\n",
    125     "\n",
    126     "    return Pipeline(\n",
    127     "        screen=high_close_price\n",
    128     "    )"
    129    ]
    130   },
    131   {
    132    "cell_type": "markdown",
    133    "metadata": {},
    134    "source": [
    135     "Running this pipeline outputs 5 securities on May 5th, 2015."
    136    ]
    137   },
    138   {
    139    "cell_type": "code",
    140    "execution_count": 5,
    141    "metadata": {
    142     "collapsed": false
    143    },
    144    "outputs": [
    145     {
    146      "name": "stdout",
    147      "output_type": "stream",
    148      "text": [
    149       "Number of securities that passed the filter: 5\n"
    150      ]
    151     }
    152    ],
    153    "source": [
    154     "result = run_pipeline(make_pipeline(), '2015-05-05', '2015-05-05')\n",
    155     "print 'Number of securities that passed the filter: %d' % len(result)"
    156    ]
    157   },
    158   {
    159    "cell_type": "markdown",
    160    "metadata": {},
    161    "source": [
    162     "Note that applying masks in layers as we did above can be thought of as an \"asset funnel\".\n",
    163     "\n",
    164     "In the next lesson, we'll look at classifiers."
    165    ]
    166   }
    167  ],
    168  "metadata": {
    169   "kernelspec": {
    170    "display_name": "Python 2",
    171    "language": "python",
    172    "name": "python2"
    173   },
    174   "language_info": {
    175    "codemirror_mode": {
    176     "name": "ipython",
    177     "version": 2
    178    },
    179    "file_extension": ".py",
    180    "mimetype": "text/x-python",
    181    "name": "python",
    182    "nbconvert_exporter": "python",
    183    "pygments_lexer": "ipython2",
    184    "version": "2.7.11"
    185   }
    186  },
    187  "nbformat": 4,
    188  "nbformat_minor": 0
    189 }