ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(9901B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "collapsed": true
      7    },
      8    "source": [
      9     "# Exercises: Comparing ETFs\n",
     10     "By Christopher van Hoecke, Maxwell Margenot, and Delaney Mackenzie\n",
     11     "\n",
     12     "\n",
     13     "## Lecture Link :\n",
     14     "https://www.quantopian.com/lectures/statistical-moments\n",
     15     "\n",
     16     "https://www.quantopian.com/lectures/hypothesis-testing\n",
     17     "\n",
     18     "###IMPORTANT NOTE: \n",
     19     "This lecture corresponds to the statistical moments and hypothesis testing lecture, which is part of the Quantopian lecture series. This homework expects you to rely heavily on the code presented in the corresponding lecture. Please copy and paste regularly from that lecture when starting to work on the problems, as trying to do them from scratch will likely be too difficult.\n",
     20     "\n",
     21     "When you feel comfortable with the topics presented here, see if you can create an algorithm that qualifies for the Quantopian Contest. Participants are evaluated on their ability to produce risk-constrained alpha and the top 10 contest participants are awarded cash prizes on a daily basis.\n",
     22     "\n",
     23     "https://www.quantopian.com/contest\n",
     24     "\n",
     25     "Part of the Quantopian Lecture Series:\n",
     26     "\n",
     27     "* [www.quantopian.com/lectures](https://www.quantopian.com/lectures)\n",
     28     "* [github.com/quantopian/research_public](https://github.com/quantopian/research_public)\n",
     29     "----"
     30    ]
     31   },
     32   {
     33    "cell_type": "markdown",
     34    "metadata": {},
     35    "source": [
     36     "## Key Concepts\n",
     37     "t-statistic formula for unequal variances : $ t = \\frac{\\bar{X}_1 - \\bar{X}_2}{(\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2})^{1/2}}$\n",
     38     "\n",
     39     "Where $s_1$ and $s_2$ are the standard deviation of set 1 and set 2; and $n_1$ and $n_2$ are the number of observations we have."
     40    ]
     41   },
     42   {
     43    "cell_type": "code",
     44    "execution_count": null,
     45    "metadata": {
     46     "collapsed": true
     47    },
     48    "outputs": [],
     49    "source": [
     50     "# Useful Libraries\n",
     51     "import numpy as np\n",
     52     "import matplotlib.pyplot as plt\n",
     53     "from scipy import stats\n",
     54     "import seaborn as sns"
     55    ]
     56   },
     57   {
     58    "cell_type": "code",
     59    "execution_count": null,
     60    "metadata": {
     61     "collapsed": true
     62    },
     63    "outputs": [],
     64    "source": [
     65     "# Useful functions \n",
     66     "def normal_test(X):\n",
     67     "    z, pval = stats.normaltest(X)\n",
     68     "    if pval < 0.05:\n",
     69     "        print 'Values are not normally distributed.'\n",
     70     "    else: \n",
     71     "        print 'Values are normally distributed.'\n",
     72     "    return"
     73    ]
     74   },
     75   {
     76    "cell_type": "markdown",
     77    "metadata": {},
     78    "source": [
     79     "#### Data"
     80    ]
     81   },
     82   {
     83    "cell_type": "code",
     84    "execution_count": null,
     85    "metadata": {
     86     "collapsed": true
     87    },
     88    "outputs": [],
     89    "source": [
     90     "# Get pricing data for an energy (XLE) and industrial (XLI) ETF\n",
     91     "xle = get_pricing('XLE', fields = 'price', start_date = '2016-01-01', end_date = '2017-01-01')\n",
     92     "xli = get_pricing('XLI', fields = 'price', start_date = '2016-01-01', end_date = '2017-01-01')\n",
     93     "\n",
     94     "# Compute returns\n",
     95     "xle_returns = xle.pct_change()[1:]\n",
     96     "xli_returns = xli.pct_change()[1:]"
     97    ]
     98   },
     99   {
    100    "cell_type": "markdown",
    101    "metadata": {},
    102    "source": [
    103     "## Exercise 1 : Hypothesis Testing on Variance. \n",
    104     "- Plot the histogram of the returns of XLE and XLI\n",
    105     "- Check to see if each return stream is normally distributed\n",
    106     "- If the assets are normally distributed, use the F-test to perform a hypothesis test and decide whether they have the two assets have the same variance.\n",
    107     "- If the assets are **not** normally distributed, use the Levene test (in the scipy library) to perform a hypothesis test on variance. "
    108    ]
    109   },
    110   {
    111    "cell_type": "code",
    112    "execution_count": null,
    113    "metadata": {
    114     "collapsed": true
    115    },
    116    "outputs": [],
    117    "source": [
    118     "# Histograms of XLE and XLI returns\n",
    119     "\n",
    120     "## Your code goes here"
    121    ]
    122   },
    123   {
    124    "cell_type": "code",
    125    "execution_count": null,
    126    "metadata": {
    127     "collapsed": true
    128    },
    129    "outputs": [],
    130    "source": [
    131     "# Checking for normality using function above. \n",
    132     "\n",
    133     "## Your code goes here"
    134    ]
    135   },
    136   {
    137    "cell_type": "code",
    138    "execution_count": null,
    139    "metadata": {
    140     "collapsed": true
    141    },
    142    "outputs": [],
    143    "source": [
    144     "# Use the levene or the F-test to check hypothesis of variance. \n",
    145     "\n",
    146     "## Your code goes ehre"
    147    ]
    148   },
    149   {
    150    "cell_type": "markdown",
    151    "metadata": {},
    152    "source": [
    153     "----"
    154    ]
    155   },
    156   {
    157    "cell_type": "markdown",
    158    "metadata": {},
    159    "source": [
    160     "## Exercise 2 : Hypothesis Testing on Mean.\n",
    161     "\n",
    162     "Since we know that the variances are not equal, we must use Welch's t-test. \n",
    163     "- Calculate the mean returns of XLE and XLI.\n",
    164     "    - Find the difference between the two means.\n",
    165     "- Calculate the standard deviation of the returns of XLE and XLI\n",
    166     "- Using the formula given above, calculate the t-test statistic (Using $\\alpha = 0.05$) for Welch's t-test to test whether the mean returns of XLE and XLI are different.\n",
    167     "- Consult the [Hypothesis Testing Lecture](https://www.quantopian.com/lectures#Hypothesis-Testing) to calculate the p-value for this test. Are the mean returns of XLE and XLI the same?\n",
    168     "\n",
    169     "\n",
    170     "- Now use the t-test function for two independent samples from the scipy library. Compare the results."
    171    ]
    172   },
    173   {
    174    "cell_type": "code",
    175    "execution_count": null,
    176    "metadata": {
    177     "collapsed": true
    178    },
    179    "outputs": [],
    180    "source": [
    181     "# Manually calculating the t-statistic\n",
    182     "# Note that the test also requires information about the degrees of freedom\n",
    183     "# We will not compute that here\n",
    184     "\n",
    185     "## Your code goes here"
    186    ]
    187   },
    188   {
    189    "cell_type": "code",
    190    "execution_count": null,
    191    "metadata": {
    192     "collapsed": true
    193    },
    194    "outputs": [],
    195    "source": [
    196     "# Alternative form, using the scipy library on python. \n",
    197     "\n",
    198     "## Your code goes here"
    199    ]
    200   },
    201   {
    202    "cell_type": "markdown",
    203    "metadata": {},
    204    "source": [
    205     "----"
    206    ]
    207   },
    208   {
    209    "cell_type": "markdown",
    210    "metadata": {
    211     "collapsed": true
    212    },
    213    "source": [
    214     "## Exercise 3 : Skewness\n",
    215     "- Calculate the mean and median of the two assets\n",
    216     "- Calculate the skewness using the scipy library"
    217    ]
    218   },
    219   {
    220    "cell_type": "code",
    221    "execution_count": null,
    222    "metadata": {
    223     "collapsed": true
    224    },
    225    "outputs": [],
    226    "source": [
    227     "# Calculate the mean and median of xle and xli using the numpy library\n",
    228     "\n",
    229     "## Your code goes here"
    230    ]
    231   },
    232   {
    233    "cell_type": "code",
    234    "execution_count": null,
    235    "metadata": {
    236     "collapsed": true
    237    },
    238    "outputs": [],
    239    "source": [
    240     "# Print values of Skewness for xle and xli returns \n",
    241     "\n",
    242     "## Your code goes here"
    243    ]
    244   },
    245   {
    246    "cell_type": "markdown",
    247    "metadata": {},
    248    "source": [
    249     "----"
    250    ]
    251   },
    252   {
    253    "cell_type": "markdown",
    254    "metadata": {},
    255    "source": [
    256     "## Exercise 4 :  Kurtosis\n",
    257     "- Check the kurtosis of the two assets, using the scipy library. \n",
    258     "- Using the seaborn library, plot the distribution of XLE and XLI returns. \n",
    259     "\n",
    260     "Recall:   \n",
    261     "- Kurtosis > 3 is leptokurtic, a highly peaked, narrow deviation from the mean\n",
    262     "- Kurtosis = 3 is mesokurtic. The most significant mesokurtic distribution is the normal distribution family. \n",
    263     "- Kurtosis < 3 is platykurtic, a lower-peaked, broad deviation from the mean"
    264    ]
    265   },
    266   {
    267    "cell_type": "code",
    268    "execution_count": null,
    269    "metadata": {
    270     "collapsed": true,
    271     "scrolled": true
    272    },
    273    "outputs": [],
    274    "source": [
    275     "# Print value of Kurtosis for xle and xli returns \n",
    276     "\n",
    277     "## Your code goes here"
    278    ]
    279   },
    280   {
    281    "cell_type": "code",
    282    "execution_count": null,
    283    "metadata": {
    284     "collapsed": true,
    285     "scrolled": false
    286    },
    287    "outputs": [],
    288    "source": [
    289     "# Distribution plot of XLE returns in red (for Kurtosis of 1.6). \n",
    290     "# Distribution plot of XLI returns in blue (for Kurtosis of 2.0).\n",
    291     "\n",
    292     "## Your code goes here"
    293    ]
    294   },
    295   {
    296    "cell_type": "markdown",
    297    "metadata": {},
    298    "source": [
    299     "*This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. (\"Quantopian\"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company.  In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.*"
    300    ]
    301   }
    302  ],
    303  "metadata": {
    304   "kernelspec": {
    305    "display_name": "Python 2",
    306    "language": "python",
    307    "name": "python2"
    308   },
    309   "language_info": {
    310    "codemirror_mode": {
    311     "name": "ipython",
    312     "version": 2
    313    },
    314    "file_extension": ".py",
    315    "mimetype": "text/x-python",
    316    "name": "python",
    317    "nbconvert_exporter": "python",
    318    "pygments_lexer": "ipython2",
    319    "version": "2.7.10"
    320   }
    321  },
    322  "nbformat": 4,
    323  "nbformat_minor": 2
    324 }