ml-finance-python

python scripts for finance machine learning

git clone https://9o.is/git/ml-finance-python.git

notebook.ipynb

(17989B)


      1 {
      2  "cells": [
      3   {
      4    "cell_type": "markdown",
      5    "metadata": {
      6     "deletable": true,
      7     "editable": true
      8    },
      9    "source": [
     10     "# Exercises: Introduction to pandas\n",
     11     "By Christopher van Hoecke, Maxwell Margenot\n",
     12     "\n",
     13     "## Lecture Link : \n",
     14     "https://www.quantopian.com/lectures/introduction-to-pandas\n",
     15     "\n",
     16     "### IMPORTANT NOTE: \n",
     17     "This lecture corresponds to the Introduction to Pandas lecture, which is part of the Quantopian lecture series. This homework expects you to rely heavily on the code presented in the corresponding lecture. Please copy and paste regularly from that lecture when starting to work on the problems, as trying to do them from scratch will likely be too difficult.\n",
     18     "\n",
     19     "Part of the Quantopian Lecture Series:\n",
     20     "\n",
     21     "* [www.quantopian.com/lectures](https://www.quantopian.com/lectures)\n",
     22     "* [github.com/quantopian/research_public](https://github.com/quantopian/research_public)\n",
     23     "\n",
     24     "\n",
     25     "\n",
     26     "----"
     27    ]
     28   },
     29   {
     30    "cell_type": "code",
     31    "execution_count": null,
     32    "metadata": {
     33     "collapsed": true,
     34     "deletable": true,
     35     "editable": true
     36    },
     37    "outputs": [],
     38    "source": [
     39     "# Useful Functions\n",
     40     "import numpy as np\n",
     41     "import pandas as pd\n",
     42     "import matplotlib.pyplot as plt"
     43    ]
     44   },
     45   {
     46    "cell_type": "markdown",
     47    "metadata": {
     48     "deletable": true,
     49     "editable": true
     50    },
     51    "source": [
     52     "----"
     53    ]
     54   },
     55   {
     56    "cell_type": "markdown",
     57    "metadata": {
     58     "deletable": true,
     59     "editable": true
     60    },
     61    "source": [
     62     "# Exercise 1\n",
     63     "## a. Series \n",
     64     "Given an array of data, please create a pandas Series `s` with a datetime index starting `2016-01-01`. The index should be daily frequency and should be the same length as the data."
     65    ]
     66   },
     67   {
     68    "cell_type": "code",
     69    "execution_count": null,
     70    "metadata": {
     71     "collapsed": false,
     72     "deletable": true,
     73     "editable": true
     74    },
     75    "outputs": [],
     76    "source": [
     77     "l = np.random.randint(1,100, size=1000)\n",
     78     "s = pd.Series(l)\n",
     79     "\n",
     80     "## Your code goes here"
     81    ]
     82   },
     83   {
     84    "cell_type": "markdown",
     85    "metadata": {
     86     "deletable": true,
     87     "editable": true
     88    },
     89    "source": [
     90     "## b. Accessing Series Elements.\n",
     91     "- Print every other element of the first 50 elements of series `s`.\n",
     92     "- Find the value associated with the index `2017-02-20`."
     93    ]
     94   },
     95   {
     96    "cell_type": "code",
     97    "execution_count": null,
     98    "metadata": {
     99     "collapsed": false,
    100     "deletable": true,
    101     "editable": true
    102    },
    103    "outputs": [],
    104    "source": [
    105     "## Your code goes here\n",
    106     "## Your code goes here"
    107    ]
    108   },
    109   {
    110    "cell_type": "markdown",
    111    "metadata": {
    112     "deletable": true,
    113     "editable": true
    114    },
    115    "source": [
    116     "## c. Boolean Indexing.\n",
    117     "In the series `s`, print all the values between 1 and 3."
    118    ]
    119   },
    120   {
    121    "cell_type": "code",
    122    "execution_count": null,
    123    "metadata": {
    124     "collapsed": false,
    125     "deletable": true,
    126     "editable": true
    127    },
    128    "outputs": [],
    129    "source": [
    130     "## Your code goes here"
    131    ]
    132   },
    133   {
    134    "cell_type": "markdown",
    135    "metadata": {
    136     "deletable": true,
    137     "editable": true
    138    },
    139    "source": [
    140     "----"
    141    ]
    142   },
    143   {
    144    "cell_type": "markdown",
    145    "metadata": {
    146     "deletable": true,
    147     "editable": true
    148    },
    149    "source": [
    150     "#Exercise 2 : Indexing and time series. \n",
    151     "###a. Display\n",
    152     "Print the first and last 5 elements of the series `s`."
    153    ]
    154   },
    155   {
    156    "cell_type": "code",
    157    "execution_count": null,
    158    "metadata": {
    159     "collapsed": false,
    160     "deletable": true,
    161     "editable": true
    162    },
    163    "outputs": [],
    164    "source": [
    165     "## Your code goes here\n",
    166     "## Your code goes here"
    167    ]
    168   },
    169   {
    170    "cell_type": "markdown",
    171    "metadata": {
    172     "deletable": true,
    173     "editable": true
    174    },
    175    "source": [
    176     "### b. Resampling\n",
    177     "- Using the resample method, upsample the daily data to monthly frequency. Use the median method so that each monthly value is the median price of all the days in that month.\n",
    178     "- Take the daily data and fill in every day, including weekends and holidays, using forward-fills. "
    179    ]
    180   },
    181   {
    182    "cell_type": "code",
    183    "execution_count": null,
    184    "metadata": {
    185     "collapsed": false,
    186     "deletable": true,
    187     "editable": true
    188    },
    189    "outputs": [],
    190    "source": [
    191     "symbol = \"CMG\"\n",
    192     "start = \"2012-01-01\"\n",
    193     "end = \"2016-01-01\"\n",
    194     "prices = get_pricing(symbol, start_date=start, end_date=end, fields=\"price\")\n",
    195     "\n",
    196     "## Your code goes here"
    197    ]
    198   },
    199   {
    200    "cell_type": "code",
    201    "execution_count": null,
    202    "metadata": {
    203     "collapsed": true,
    204     "deletable": true,
    205     "editable": true
    206    },
    207    "outputs": [],
    208    "source": [
    209     "## Your code goes here"
    210    ]
    211   },
    212   {
    213    "cell_type": "markdown",
    214    "metadata": {
    215     "deletable": true,
    216     "editable": true
    217    },
    218    "source": [
    219     "----"
    220    ]
    221   },
    222   {
    223    "cell_type": "markdown",
    224    "metadata": {
    225     "deletable": true,
    226     "editable": true
    227    },
    228    "source": [
    229     "#Exercise 3 : Missing Data\n",
    230     "- Replace all instances of `NaN` using the forward fill method. \n",
    231     "- Instead of filling, remove all instances of `NaN` from the data."
    232    ]
    233   },
    234   {
    235    "cell_type": "code",
    236    "execution_count": null,
    237    "metadata": {
    238     "collapsed": false,
    239     "deletable": true,
    240     "editable": true
    241    },
    242    "outputs": [],
    243    "source": [
    244     "## Your code goes here"
    245    ]
    246   },
    247   {
    248    "cell_type": "code",
    249    "execution_count": null,
    250    "metadata": {
    251     "collapsed": false,
    252     "deletable": true,
    253     "editable": true
    254    },
    255    "outputs": [],
    256    "source": [
    257     "## Your code goes here"
    258    ]
    259   },
    260   {
    261    "cell_type": "markdown",
    262    "metadata": {
    263     "deletable": true,
    264     "editable": true
    265    },
    266    "source": [
    267     "----"
    268    ]
    269   },
    270   {
    271    "cell_type": "markdown",
    272    "metadata": {
    273     "deletable": true,
    274     "editable": true
    275    },
    276    "source": [
    277     "# Exercise 4 : Time Series Analysis with pandas\n",
    278     "## a. General Information\n",
    279     "Print the count, mean, standard deviation, minimum, 25th, 50th, and 75th percentiles, and the max of our series s. "
    280    ]
    281   },
    282   {
    283    "cell_type": "code",
    284    "execution_count": null,
    285    "metadata": {
    286     "collapsed": false,
    287     "deletable": true,
    288     "editable": true
    289    },
    290    "outputs": [],
    291    "source": [
    292     "print \"Summary Statistics\"\n",
    293     "## Your code goes here"
    294    ]
    295   },
    296   {
    297    "cell_type": "markdown",
    298    "metadata": {
    299     "deletable": true,
    300     "editable": true
    301    },
    302    "source": [
    303     "## b. Series Operations\n",
    304     "- Get the additive and multiplicative returns of this series. \n",
    305     "- Calculate the rolling mean with a 60 day window.\n",
    306     "- Calculate the standard deviation with a 60 day window."
    307    ]
    308   },
    309   {
    310    "cell_type": "code",
    311    "execution_count": null,
    312    "metadata": {
    313     "collapsed": false,
    314     "deletable": true,
    315     "editable": true
    316    },
    317    "outputs": [],
    318    "source": [
    319     "data = get_pricing('GE', fields='open_price', start_date='2016-01-01', end_date='2017-01-01')\n",
    320     "\n",
    321     "## Your code goes here\n",
    322     "## Your code goes here"
    323    ]
    324   },
    325   {
    326    "cell_type": "code",
    327    "execution_count": null,
    328    "metadata": {
    329     "collapsed": true,
    330     "deletable": true,
    331     "editable": true
    332    },
    333    "outputs": [],
    334    "source": [
    335     "# Rolling mean\n",
    336     "\n",
    337     "## Your code goes here\n",
    338     "## Your code goes here"
    339    ]
    340   },
    341   {
    342    "cell_type": "code",
    343    "execution_count": null,
    344    "metadata": {
    345     "collapsed": true,
    346     "deletable": true,
    347     "editable": true
    348    },
    349    "outputs": [],
    350    "source": [
    351     "# Rolling Standard Deviation\n",
    352     "\n",
    353     "## Your code goes here\n",
    354     "## Your code goes here"
    355    ]
    356   },
    357   {
    358    "cell_type": "markdown",
    359    "metadata": {
    360     "deletable": true,
    361     "editable": true
    362    },
    363    "source": [
    364     "----"
    365    ]
    366   },
    367   {
    368    "cell_type": "markdown",
    369    "metadata": {
    370     "deletable": true,
    371     "editable": true
    372    },
    373    "source": [
    374     "# Exercise 5 : DataFrames\n",
    375     "## a. Indexing\n",
    376     "Form a DataFrame out of `dict_data` with `l` as its index."
    377    ]
    378   },
    379   {
    380    "cell_type": "code",
    381    "execution_count": null,
    382    "metadata": {
    383     "collapsed": false,
    384     "deletable": true,
    385     "editable": true
    386    },
    387    "outputs": [],
    388    "source": [
    389     "l = {'fifth','fourth', 'third', 'second', 'first'}\n",
    390     "dict_data = {'a' : [1, 2, 3, 4, 5], 'b' : ['L', 'K', 'J', 'M', 'Z'],'c' : np.random.normal(0, 1, 5)}\n",
    391     "\n",
    392     "## Your code goes here"
    393    ]
    394   },
    395   {
    396    "cell_type": "markdown",
    397    "metadata": {
    398     "deletable": true,
    399     "editable": true
    400    },
    401    "source": [
    402     "## b. DataFrames Manipulation\n",
    403     "- Concatenate the following two series to form a dataframe. \n",
    404     "- Rename the columns to `Good Numbers` and `Bad Numbers`. \n",
    405     "- Change the index to be a datetime index starting on `2016-01-01`."
    406    ]
    407   },
    408   {
    409    "cell_type": "code",
    410    "execution_count": null,
    411    "metadata": {
    412     "collapsed": true,
    413     "deletable": true,
    414     "editable": true
    415    },
    416    "outputs": [],
    417    "source": [
    418     "s1 = pd.Series([2, 3, 5, 7, 11, 13], name='prime')\n",
    419     "s2 = pd.Series([1, 4, 6, 8, 9, 10], name='other')\n",
    420     "\n",
    421     "## Your code goes here\n",
    422     "## Your code goes here\n",
    423     "## Your code goes here"
    424    ]
    425   },
    426   {
    427    "cell_type": "markdown",
    428    "metadata": {
    429     "deletable": true,
    430     "editable": true
    431    },
    432    "source": [
    433     "----"
    434    ]
    435   },
    436   {
    437    "cell_type": "markdown",
    438    "metadata": {
    439     "deletable": true,
    440     "editable": true
    441    },
    442    "source": [
    443     "# Exercise 6 : Accessing DataFrame elements.\n",
    444     "## a. Columns\n",
    445     "- Check the data type of one of the DataFrame's columns.\n",
    446     "- Print the values associated with time range `2013-01-01` to `2013-01-10`."
    447    ]
    448   },
    449   {
    450    "cell_type": "code",
    451    "execution_count": null,
    452    "metadata": {
    453     "collapsed": false,
    454     "deletable": true,
    455     "editable": true
    456    },
    457    "outputs": [],
    458    "source": [
    459     "symbol = [\"XOM\", \"BP\", \"COP\", \"TOT\"]\n",
    460     "start = \"2012-01-01\"\n",
    461     "end = \"2016-01-01\"\n",
    462     "prices = get_pricing(symbol, start_date=start, end_date=end, fields=\"price\")\n",
    463     "if isinstance(symbol, list):\n",
    464     "    prices.columns = map(lambda x: x.symbol, prices.columns)\n",
    465     "else:\n",
    466     "    prices.name = symbol\n",
    467     "\n",
    468     "# Check Type of Data for these two.    \n",
    469     "prices.XOM.head()\n",
    470     "prices.loc[:, 'XOM'].head()"
    471    ]
    472   },
    473   {
    474    "cell_type": "code",
    475    "execution_count": null,
    476    "metadata": {
    477     "collapsed": false,
    478     "deletable": true,
    479     "editable": true
    480    },
    481    "outputs": [],
    482    "source": [
    483     "## Your code goes here\n",
    484     "## Your code goes here"
    485    ]
    486   },
    487   {
    488    "cell_type": "code",
    489    "execution_count": null,
    490    "metadata": {
    491     "collapsed": true,
    492     "deletable": true,
    493     "editable": true
    494    },
    495    "outputs": [],
    496    "source": [
    497     "## Your code goes here"
    498    ]
    499   },
    500   {
    501    "cell_type": "markdown",
    502    "metadata": {
    503     "deletable": true,
    504     "editable": true
    505    },
    506    "source": [
    507     "----"
    508    ]
    509   },
    510   {
    511    "cell_type": "markdown",
    512    "metadata": {
    513     "deletable": true,
    514     "editable": true
    515    },
    516    "source": [
    517     "# Exercise 7 : Boolean Indexing\n",
    518     "## a. Filtering.\n",
    519     "- Filter pricing data from the last question (stored in `prices`) to only print values where:\n",
    520     "    - BP > 30\n",
    521     "    - XOM < 100\n",
    522     "    - The intersection of both above conditions (BP > 30 **and** XOM < 100)\n",
    523     "    - The union of the previous composite condition along with TOT having no `nan` values ((BP > 30 **and** XOM < 100) **or** TOT is non-`NaN`).\n",
    524     "- Add a column for TSLA and drop the column for XOM."
    525    ]
    526   },
    527   {
    528    "cell_type": "code",
    529    "execution_count": null,
    530    "metadata": {
    531     "collapsed": false,
    532     "deletable": true,
    533     "editable": true
    534    },
    535    "outputs": [],
    536    "source": [
    537     "# Filter the data for prices to only print out values where\n",
    538     "# BP > 30\n",
    539     "\n",
    540     "# XOM < 100\n",
    541     "\n",
    542     "# BP > 30 AND XOM < 100\n",
    543     "\n",
    544     "# The union of (BP > 30 AND XOM < 100) with TOT being non-nan\n",
    545     "\n",
    546     "## Your code goes here"
    547    ]
    548   },
    549   {
    550    "cell_type": "code",
    551    "execution_count": null,
    552    "metadata": {
    553     "collapsed": false,
    554     "deletable": true,
    555     "editable": true
    556    },
    557    "outputs": [],
    558    "source": [
    559     "# Add a column for TSLA and drop the column for XOM\n",
    560     "\n",
    561     "## Your code goes here"
    562    ]
    563   },
    564   {
    565    "cell_type": "markdown",
    566    "metadata": {
    567     "deletable": true,
    568     "editable": true
    569    },
    570    "source": [
    571     "## b. DataFrame Manipulation (again)\n",
    572     "- Concatenate these DataFrames.\n",
    573     "- Fill the missing data with 0s"
    574    ]
    575   },
    576   {
    577    "cell_type": "code",
    578    "execution_count": null,
    579    "metadata": {
    580     "collapsed": false,
    581     "deletable": true,
    582     "editable": true
    583    },
    584    "outputs": [],
    585    "source": [
    586     "# Concatenate these dataframes\n",
    587     "df_1 = get_pricing(['SPY', 'VXX'], start_date=start, end_date=end, fields='price')\n",
    588     "df_2 = get_pricing(['MSFT', 'AAPL', 'GOOG'], start_date=start, end_date=end, fields='price')\n",
    589     "\n",
    590     "## Your code goes here"
    591    ]
    592   },
    593   {
    594    "cell_type": "code",
    595    "execution_count": null,
    596    "metadata": {
    597     "collapsed": true,
    598     "deletable": true,
    599     "editable": true
    600    },
    601    "outputs": [],
    602    "source": [
    603     "# Fill GOOG missing data with 0\n",
    604     "\n",
    605     "## Your code goes here"
    606    ]
    607   },
    608   {
    609    "cell_type": "markdown",
    610    "metadata": {
    611     "deletable": true,
    612     "editable": true
    613    },
    614    "source": [
    615     "----"
    616    ]
    617   },
    618   {
    619    "cell_type": "markdown",
    620    "metadata": {
    621     "deletable": true,
    622     "editable": true
    623    },
    624    "source": [
    625     "# Exercise 8 : Time Series Analysis\n",
    626     "## a. Summary\n",
    627     "- Print out a summary of the `prices` DataFrame from above.\n",
    628     "- Take the log returns and print the first 10 values.\n",
    629     "- Print the multiplicative returns of each company.\n",
    630     "- Normalize and plot the returns from 2014 to 2015.\n",
    631     "- Plot a 60 day window rolling mean of the prices.\n",
    632     "- Plot a 60 day window rolling standfard deviation of the prices."
    633    ]
    634   },
    635   {
    636    "cell_type": "code",
    637    "execution_count": null,
    638    "metadata": {
    639     "collapsed": false,
    640     "deletable": true,
    641     "editable": true
    642    },
    643    "outputs": [],
    644    "source": [
    645     "# Print a summary of the 'prices' times series.\n",
    646     "## Your code goes here"
    647    ]
    648   },
    649   {
    650    "cell_type": "code",
    651    "execution_count": null,
    652    "metadata": {
    653     "collapsed": false,
    654     "deletable": true,
    655     "editable": true
    656    },
    657    "outputs": [],
    658    "source": [
    659     "# Print the natural log returns of the first 10 values\n",
    660     "## Your code goes here"
    661    ]
    662   },
    663   {
    664    "cell_type": "code",
    665    "execution_count": null,
    666    "metadata": {
    667     "collapsed": false,
    668     "deletable": true,
    669     "editable": true
    670    },
    671    "outputs": [],
    672    "source": [
    673     "# Print the Muliplicative returns \n",
    674     "## Your code goes here"
    675    ]
    676   },
    677   {
    678    "cell_type": "code",
    679    "execution_count": null,
    680    "metadata": {
    681     "collapsed": false,
    682     "deletable": true,
    683     "editable": true
    684    },
    685    "outputs": [],
    686    "source": [
    687     "# Normlalize the returns and plot \n",
    688     "## Your code goes here"
    689    ]
    690   },
    691   {
    692    "cell_type": "code",
    693    "execution_count": null,
    694    "metadata": {
    695     "collapsed": false,
    696     "deletable": true,
    697     "editable": true
    698    },
    699    "outputs": [],
    700    "source": [
    701     "# Rolling mean\n",
    702     "## Your code goes here\n",
    703     "\n",
    704     "# Rolling standard deviation\n",
    705     "## Your code goes here\n",
    706     "\n",
    707     "# Plotting \n",
    708     "## Your code goes here"
    709    ]
    710   },
    711   {
    712    "cell_type": "markdown",
    713    "metadata": {},
    714    "source": [
    715     "---"
    716    ]
    717   },
    718   {
    719    "cell_type": "markdown",
    720    "metadata": {
    721     "deletable": true,
    722     "editable": true
    723    },
    724    "source": [
    725     "Congratulations on completing the Introduction to pandas exercises!\n",
    726     "\n",
    727     "As you learn more about writing trading algorithms and the Quantopian platform, be sure to check out the daily [Quantopian Contest](https://www.quantopian.com/contest), in which you can compete for a cash prize every day.\n",
    728     "\n",
    729     "Start by going through the [Writing a Contest Algorithm](https://www.quantopian.com/tutorials/contest) Tutorial."
    730    ]
    731   },
    732   {
    733    "cell_type": "markdown",
    734    "metadata": {
    735     "deletable": true,
    736     "editable": true
    737    },
    738    "source": [
    739     "*This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. (\"Quantopian\"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company.  In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.*"
    740    ]
    741   }
    742  ],
    743  "metadata": {
    744   "kernelspec": {
    745    "display_name": "Python 2",
    746    "language": "python",
    747    "name": "python2"
    748   },
    749   "language_info": {
    750    "codemirror_mode": {
    751     "name": "ipython",
    752     "version": 2
    753    },
    754    "file_extension": ".py",
    755    "mimetype": "text/x-python",
    756    "name": "python",
    757    "nbconvert_exporter": "python",
    758    "pygments_lexer": "ipython2",
    759    "version": "2.7.12"
    760   }
    761  },
    762  "nbformat": 4,
    763  "nbformat_minor": 2
    764 }