ml-finance-python

python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb

(17156B)
      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "deletable": true,
      7     "editable": true
      8    },
      9    "source": [
     10     "#Exercises: Hypothesis Testing\n",
     11     "By Christopher van Hoecke and Maxwell Margenot\n",
     12     "\n",
     13     "## Lecture Link\n",
     14     "\n",
     15     "https://www.quantopian.com/lectures/hypothesis-testing\n",
     16     "\n",
     17     "###IMPORTANT NOTE: \n",
     18     "This lecture corresponds to the Hypothesis Testing lecture, which is part of the Quantopian lecture series. This homework expects you to rely heavily on the code presented in the corresponding lecture. Please copy and paste regularly from that lecture when starting to work on the problems, as trying to do them from scratch will likely be too difficult.\n",
     19     "\n",
     20     "When you feel comfortable with the topics presented here, see if you can create an algorithm that qualifies for the Quantopian Contest. Participants are evaluated on their ability to produce risk-constrained alpha and the top 10 contest participants are awarded cash prizes on a daily basis.\n",
     21     "\n",
     22     "https://www.quantopian.com/contest\n",
     23     "\n",
     24     "Part of the Quantopian Lecture Series:\n",
     25     "\n",
     26     "* [www.quantopian.com/lectures](https://www.quantopian.com/lectures)\n",
     27     "* [github.com/quantopian/research_public](https://github.com/quantopian/research_public)\n",
     28     "\n",
     29     "----"
     30    ]
     31   },
     32   {
     33    "cell_type": "code",
     34    "execution_count": null,
     35    "metadata": {
     36     "collapsed": true,
     37     "deletable": true,
     38     "editable": true
     39    },
     40    "outputs": [],
     41    "source": [
     42     "# Useful Libraries\n",
     43     "import pandas as pd\n",
     44     "import numpy as np\n",
     45     "import matplotlib.pyplot as plt\n",
     46     "from scipy.stats import t\n",
     47     "import scipy.stats"
     48    ]
     49   },
     50   {
     51    "cell_type": "markdown",
     52    "metadata": {
     53     "deletable": true,
     54     "editable": true
     55    },
     56    "source": [
     57     "# Exercise 1: Hypothesis Testing.\n",
     58     "## a. One tail test. \n",
     59     "\n",
     60     "Using the techniques laid out in lecture, verify if we can state that the returns of TSLA **are greater** than 0.\n",
     61     "- Start by stating the null and alternative hypothesis\n",
     62     "    - Are we dealing with a one or two tailed test? Why? \n",
     63     "- Calculate the mean differences, and the Z-test using the formula provided in class. \n",
     64     "    - *Recall: This is a one parameter test, use the appropriate Z-test*\n",
     65     "- Use the stat library to calculate the associated p value with your t statistic. \n",
     66     "    - Compare your found p-value to the set $\\alpha$ value, and conclude. \n",
     67     "    \n",
     68     "\n",
     69     "###### Useful Formulas: \n",
     70     "$$ \\text{Test statistic} =  \\frac{\\bar{X}*\\mu - \\theta_0}{s*{\\bar{X}}} = \\frac{\\bar{X}_\\mu - 0}{s\\sqrt{n}} $$  "
     71    ]
     72   },
     73   {
     74    "cell_type": "code",
     75    "execution_count": null,
     76    "metadata": {
     77     "collapsed": true,
     78     "deletable": true,
     79     "editable": true
     80    },
     81    "outputs": [],
     82    "source": [
     83     "prices1 = get_pricing('TSLA', start_date = '2015-01-01', end_date = '2016-01-01', fields = 'price')\n",
     84     "returns_sample_tsla = prices1.pct_change()[1:]\n",
     85     "\n",
     86     "print 'Tesla return sample mean', returns_sample_tsla.mean()\n",
     87     "print 'Tesla return sample standard deviation', returns_sample_tsla.std()\n",
     88     "print 'Tesla return sample size', len(returns_sample_tsla)"
     89    ]
     90   },
     91   {
     92    "cell_type": "markdown",
     93    "metadata": {
     94     "deletable": true,
     95     "editable": true
     96    },
     97    "source": [
     98     "Write your hypotheses here:"
     99    ]
    100   },
    101   {
    102    "cell_type": "code",
    103    "execution_count": null,
    104    "metadata": {
    105     "collapsed": true,
    106     "deletable": true,
    107     "editable": true
    108    },
    109    "outputs": [],
    110    "source": [
    111     "# Testing\n",
    112     "\n",
    113     "## Your code goes here\n",
    114     "\n",
    115     "## Sample mean difference: \n",
    116     "\n",
    117     "\n",
    118     "## Z- Statistic: \n",
    119     "\n",
    120     "print 't-statistic is:', test_stat\n",
    121     "\n",
    122     "## Finding the p-value for one tail test\n",
    123     "\n",
    124     "print 'p-value is: ', p_val"
    125    ]
    126   },
    127   {
    128    "cell_type": "markdown",
    129    "metadata": {
    130     "deletable": true,
    131     "editable": true
    132    },
    133    "source": [
    134     "## b. Two tailed test. \n",
    135     "\n",
    136     "Using the techniques laid out in lecture, verify if we can state that the returns of TSLA **are equal** to 0.\n",
    137     "- Start by stating the null and alternative hypothesis\n",
    138     "    - Are we dealing with a one or two tailed test? Why? \n",
    139     "- Calculate the mean differences, and the Z-test using the formula provided in class. \n",
    140     "    - *Recall: This is a one parameter test, use the appropriate Z-test*\n",
    141     "- Use the stat library to calculate the associated p value with your t statistic. \n",
    142     "    - Compare your found p-value to the set $\\alpha$ value, and conclude. "
    143    ]
    144   },
    145   {
    146    "cell_type": "markdown",
    147    "metadata": {
    148     "deletable": true,
    149     "editable": true
    150    },
    151    "source": [
    152     "###### Hypotheses.\n",
    153     "<center>_Your answer goes here_</center>"
    154    ]
    155   },
    156   {
    157    "cell_type": "code",
    158    "execution_count": null,
    159    "metadata": {
    160     "collapsed": true,
    161     "deletable": true,
    162     "editable": true
    163    },
    164    "outputs": [],
    165    "source": [
    166     "## Your code goes here\n",
    167     "\n",
    168     "## Sample mean difference: \n",
    169     "\n",
    170     "\n",
    171     "## Z- Statistic: \n",
    172     "\n",
    173     "print 't-statistic is:', test_stat\n",
    174     "\n",
    175     "## Finding the p-value for one tail test\n",
    176     "\n",
    177     "print 'p-value is: ', p_val"
    178    ]
    179   },
    180   {
    181    "cell_type": "markdown",
    182    "metadata": {
    183     "deletable": true,
    184     "editable": true
    185    },
    186    "source": [
    187     "----"
    188    ]
    189   },
    190   {
    191    "cell_type": "markdown",
    192    "metadata": {
    193     "deletable": true,
    194     "editable": true
    195    },
    196    "source": [
    197     "# Exercise 2: \n",
    198     "## a. Critical Values. \n",
    199     "Find the critical values associated with $\\alpha = 1\\%, 5\\%, 10\\%$ and graph the rejection regions on a plot for a two tailed test. \n",
    200     "\n",
    201     "Useful formulas: \n",
    202     "$$ f = 1 - \\frac{\\alpha}{2} $$ \n",
    203     "\n",
    204     "In order to find the z-value associated with each f value use the [z-table](http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf) here.   \n",
    205     "*You can read more about how to read z-tables [here](http://www.dummies.com/education/math/statistics/how-to-find-probabilities-for-z-with-the-z-table/)*"
    206    ]
    207   },
    208   {
    209    "cell_type": "code",
    210    "execution_count": null,
    211    "metadata": {
    212     "collapsed": true,
    213     "deletable": true,
    214     "editable": true
    215    },
    216    "outputs": [],
    217    "source": [
    218     "## Your code goes here\n",
    219     "\n",
    220     "# For alpha = 10%\n",
    221     "alpha = 0.1\n",
    222     "f =\n",
    223     "print 'alpha = 10%: f = ', f\n",
    224     "\n",
    225     "# For alpha = 5%\n",
    226     "alpha = 0.05\n",
    227     "f = \n",
    228     "print 'alpha = 5%: f = ', f\n",
    229     "\n",
    230     "# For alpha = 1%\n",
    231     "alpha = 0.01\n",
    232     "f = \n",
    233     "print 'alpha = 1%: f = ', f"
    234    ]
    235   },
    236   {
    237    "cell_type": "code",
    238    "execution_count": null,
    239    "metadata": {
    240     "collapsed": true,
    241     "deletable": true,
    242     "editable": true
    243    },
    244    "outputs": [],
    245    "source": [
    246     "# Plot a standard normal distribution and mark the critical regions with shading\n",
    247     "x = np.linspace(-3, 3, 100)\n",
    248     "norm_pdf = lambda x: (1/np.sqrt(2 * np.pi)) * np.exp(-x * x / 2)\n",
    249     "y = norm_pdf(x)\n",
    250     "\n",
    251     "fig, ax = plt.subplots(1, 1, sharex=True)\n",
    252     "ax.plot(x, y)\n",
    253     "\n",
    254     "# Value for alpha = 1%\n",
    255     "ax.fill_between(x, 0, y, where =  x > ## Your code goes here\n",
    256     "                , label = 'alpha = 10%')\n",
    257     "ax.fill_between(x, 0, y, where = x < ) ## Your code goes here\n",
    258     "\n",
    259     "# Value for alpha = 5%\n",
    260     "ax.fill_between(x, 0, y, where = x > ## Your code goes here\n",
    261     "                , color = 'red', label = 'alpha = 5%')\n",
    262     "ax.fill_between(x, 0, y, where = x < ## Your code goes here\n",
    263     "                , color = 'red')\n",
    264     "\n",
    265     "#Value for alpha = 10%\n",
    266     "ax.fill_between(x, 0, y, where = x > ## Your code goes here\n",
    267     "                , facecolor='green', label = 'alpha = 1%')\n",
    268     "ax.fill_between(x, 0, y, where = x < ## Your code goes here\n",
    269     "                , facecolor='green')\n",
    270     "\n",
    271     "plt.title('Rejection regions for a two-tailed hypothesis test at 90%, 95%, 99% confidence')\n",
    272     "plt.xlabel('x')\n",
    273     "plt.ylabel('p(x)')\n",
    274     "plt.legend();"
    275    ]
    276   },
    277   {
    278    "cell_type": "markdown",
    279    "metadata": {
    280     "deletable": true,
    281     "editable": true
    282    },
    283    "source": [
    284     "## b. Mean T-Test\n",
    285     "Run a T-test on the SPY returns, to determine if the mean returns is 0.01.   \n",
    286     "- Find the two critical values for a 90% two tailed $z$-test\n",
    287     "- Use the formula above to run a t-test on the sample data.\n",
    288     "- Conclude about the test results."
    289    ]
    290   },
    291   {
    292    "cell_type": "code",
    293    "execution_count": null,
    294    "metadata": {
    295     "collapsed": true,
    296     "deletable": true,
    297     "editable": true
    298    },
    299    "outputs": [],
    300    "source": [
    301     "# Calculating Critical Values probability\n",
    302     "\n",
    303     "alpha = 0.1\n",
    304     "f = ## Your code goes here\n",
    305     "print f"
    306    ]
    307   },
    308   {
    309    "cell_type": "code",
    310    "execution_count": null,
    311    "metadata": {
    312     "collapsed": true,
    313     "deletable": true,
    314     "editable": true
    315    },
    316    "outputs": [],
    317    "source": [
    318     "data = get_pricing('SPY', start_date = '2016-01-01', end_date = '2017-01-01', fields = 'price')\n",
    319     "returns_sample = data.pct_change()[1:]\n",
    320     "\n",
    321     "# Running the T-test.\n",
    322     "n = len(returns_sample)\n",
    323     "\n",
    324     "test_statistic = ## Your code goes here\n",
    325     "print 't test statistic: ', test_statistic"
    326    ]
    327   },
    328   {
    329    "cell_type": "markdown",
    330    "metadata": {
    331     "collapsed": true,
    332     "deletable": true,
    333     "editable": true
    334    },
    335    "source": [
    336     "# c. Mean p-value test\n",
    337     "Given the returns data above, use the p-value to determine the results of the previous hypothesis test. "
    338    ]
    339   },
    340   {
    341    "cell_type": "code",
    342    "execution_count": null,
    343    "metadata": {
    344     "collapsed": true,
    345     "deletable": true,
    346     "editable": true
    347    },
    348    "outputs": [],
    349    "source": [
    350     "# Running p-value test. \n",
    351     "\n",
    352     "alpha = 0.1\n",
    353     "p_val = ## Your code goes here\n",
    354     "print 'p-value is: ', p_val"
    355    ]
    356   },
    357   {
    358    "cell_type": "markdown",
    359    "metadata": {
    360     "deletable": true,
    361     "editable": true
    362    },
    363    "source": [
    364     "----"
    365    ]
    366   },
    367   {
    368    "cell_type": "markdown",
    369    "metadata": {
    370     "collapsed": true,
    371     "deletable": true,
    372     "editable": true
    373    },
    374    "source": [
    375     "# Exercise 3: Multiple Variables Tests.\n",
    376     "## a. Hypothesis testing on Means.\n",
    377     "- State the hypothesis tests for comparing two means\n",
    378     "- Find the test statistic along with the degrees of freedom for the following two assets. Assume variance is different (We assume XLF to be a safer buy than GS. \n",
    379     "- Use the [t-table](https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values) to conclude about your hypothesis test. *Pick $\\alpha = 10\\%$*\n",
    380     "\n",
    381     "######Useful Formulas: \n",
    382     "$$ t = \\frac{\\bar{X}_1 - \\bar{X}_2}{(\\frac{s_p^2}{n_1} + \\frac{s_p^2}{n_2})^{1/2}}$$\n",
    383     "$$ t = \\frac{\\bar{X}_1 - \\bar{X}_2}{(\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2})^{1/2}}$$\n",
    384     "$$df = \\frac{(\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2})^2}{(s_1^2/n_1)^2/(n_1-1) + (s_2^2/n_2)^2/(n_2-1)}$$\n",
    385     "\n",
    386     "*note: one formula for t involves equal variance, the other does not. Use the right one given the information above*\n",
    387     "\n",
    388     "\n",
    389     "-----------------"
    390    ]
    391   },
    392   {
    393    "cell_type": "markdown",
    394    "metadata": {
    395     "deletable": true,
    396     "editable": true
    397    },
    398    "source": [
    399     "Write your hypotheses here:"
    400    ]
    401   },
    402   {
    403    "cell_type": "code",
    404    "execution_count": null,
    405    "metadata": {
    406     "collapsed": true,
    407     "deletable": true,
    408     "editable": true
    409    },
    410    "outputs": [],
    411    "source": [
    412     "# Data Collection\n",
    413     "alpha = 0.1\n",
    414     "symbol_list = ['XLF', 'MCD']\n",
    415     "start = '2015-01-01'\n",
    416     "end = '2016-01-01'\n",
    417     "\n",
    418     "pricing_sample = get_pricing(symbol_list, start_date = start, end_date = end, fields='price')\n",
    419     "pricing_sample.columns = map(lambda x: x.symbol, pricing_sample.columns)\n",
    420     "returns_sample = pricing_sample.pct_change()[1:]\n",
    421     "\n",
    422     "\n",
    423     "# Sample mean values\n",
    424     "mu_xlf, mu_gs = returns_sample.mean()\n",
    425     "s_xlf, s_gs = returns_sample.std()\n",
    426     "n_xlf = len(returns_sample['XLF'])\n",
    427     "n_gs = len(returns_sample['MCD'])\n",
    428     "\n",
    429     "test_statistic = ## Your code goes here\n",
    430     "df = ## Your code goes here\n",
    431     "\n",
    432     "print 't test statistic: ', test_statistic\n",
    433     "print 'Degrees of freedom (modified): ', df\n",
    434     "print 'p-value: ', ## Your code goes here"
    435    ]
    436   },
    437   {
    438    "cell_type": "markdown",
    439    "metadata": {
    440     "deletable": true,
    441     "editable": true
    442    },
    443    "source": [
    444     "## b. Hypothesis Testing on Variances. \n",
    445     "- State the hypothesis tests for comparing two means. \n",
    446     "- Calculate the returns and compare their variances.\n",
    447     "- Calculate the F-test using the variances\n",
    448     "- Check that both values have the same degrees of freedom. "
    449    ]
    450   },
    451   {
    452    "cell_type": "markdown",
    453    "metadata": {
    454     "collapsed": true,
    455     "deletable": true,
    456     "editable": true
    457    },
    458    "source": [
    459     "Write your hypotheses here:"
    460    ]
    461   },
    462   {
    463    "cell_type": "code",
    464    "execution_count": null,
    465    "metadata": {
    466     "collapsed": true,
    467     "deletable": true,
    468     "editable": true
    469    },
    470    "outputs": [],
    471    "source": [
    472     "# Data\n",
    473     "symbol_list = ['XLF', 'MCD']\n",
    474     "start = \"2015-01-01\"\n",
    475     "end = \"2016-01-01\"\n",
    476     "pricing_sample = get_pricing(symbol_list, start_date = start, end_date = end, fields = 'price')\n",
    477     "pricing_sample.columns = map(lambda x: x.symbol, pricing_sample.columns)\n",
    478     "returns_sample = pricing_sample.pct_change()[1:]\n",
    479     "\n",
    480     "# Take returns from above, MCD and XLF, and compare their variances\n",
    481     "\n",
    482     "## Your code goes here\n",
    483     "\n",
    484     "print 'XLF standard deviation is: ', xlf_std_dev\n",
    485     "print 'MCD standard deviation is: ', mcd_std_dev\n",
    486     "\n",
    487     "# Calculate F-test with MCD.std > XLF.std\n",
    488     "\n",
    489     "## Your code goes here\n",
    490     "\n",
    491     "print \"F Test statistic: \", test_statistic\n",
    492     "\n",
    493     "#degree of freedom \n",
    494     "df1 = ## Your code goes here\n",
    495     "df2 = ## Your code goe here\n",
    496     "print df1\n",
    497     "print df2\n",
    498     "\n",
    499     "# Calculate critical values. \n",
    500     "from scipy.stats import f\n",
    501     "\n",
    502     "upper_crit_value = f.ppf(0.975, df1, df2)\n",
    503     "lower_crit_value = f.ppf(0.025, df1, df2)\n",
    504     "print 'Upper critical value at a = 0.05 with df1 = {0} and df2 = {1}: '.format(df1, df2), upper_crit_value\n",
    505     "print 'Lower critical value at a = 0.05 with df1 = {0} and df2 = {1}: '.format(df1, df2), lower_crit_value"
    506    ]
    507   },
    508   {
    509    "cell_type": "markdown",
    510    "metadata": {},
    511    "source": [
    512     "---\n",
    513     "\n",
    514     "Congratulations on completing the Hypothesis Testing exercises!\n",
    515     "\n",
    516     "As you learn more about writing trading models and the Quantopian platform, enter the daily [Quantopian Contest](https://www.quantopian.com/contest). Your strategy will be evaluated for a cash prize every day.\n",
    517     "\n",
    518     "Start by going through the [Writing a Contest Algorithm](https://www.quantopian.com/tutorials/contest) tutorial."
    519    ]
    520   },
    521   {
    522    "cell_type": "markdown",
    523    "metadata": {
    524     "collapsed": true,
    525     "deletable": true,
    526     "editable": true
    527    },
    528    "source": [
    529     "*This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. (\"Quantopian\"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company.  In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.*"
    530    ]
    531   }
    532  ],
    533  "metadata": {
    534   "kernelspec": {
    535    "display_name": "Python 2",
    536    "language": "python",
    537    "name": "python2"
    538   },
    539   "language_info": {
    540    "codemirror_mode": {
    541     "name": "ipython",
    542     "version": 2
    543    },
    544    "file_extension": ".py",
    545    "mimetype": "text/x-python",
    546    "name": "python",
    547    "nbconvert_exporter": "python",
    548    "pygments_lexer": "ipython2",
    549    "version": "2.7.12"
    550   }
    551  },
    552  "nbformat": 4,
    553  "nbformat_minor": 2
    554 }