ml-finance-python
python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb
(11668B)
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {
6 "deletable": true,
7 "editable": true
8 },
9 "source": [
10 "#Exercises: Spearman Rank Correlation\n",
11 "\n",
12 "## Lecture Link\n",
13 "\n",
14 "This exercise notebook refers to this lecture. Please use the lecture for explanations and sample code.\n",
15 "\n",
16 "https://www.quantopian.com/lectures#Spearman-Rank-Correlation\n",
17 "\n",
18 "Part of the Quantopian Lecture Series:\n",
19 "\n",
20 "* [www.quantopian.com/lectures](https://www.quantopian.com/lectures)\n",
21 "* [github.com/quantopian/research_public](https://github.com/quantopian/research_public)"
22 ]
23 },
24 {
25 "cell_type": "code",
26 "execution_count": null,
27 "metadata": {
28 "collapsed": false,
29 "deletable": true,
30 "editable": true
31 },
32 "outputs": [],
33 "source": [
34 "import numpy as np\n",
35 "import pandas as pd\n",
36 "import scipy.stats as stats\n",
37 "import matplotlib.pyplot as plt\n",
38 "import math"
39 ]
40 },
41 {
42 "cell_type": "markdown",
43 "metadata": {
44 "deletable": true,
45 "editable": true
46 },
47 "source": [
48 "#Exercise 1: Finding Correlations of Non-Linear Relationships\n",
49 "\n",
50 "##a. Traditional (Pearson) Correlation\n",
51 "\n",
52 "Find the correlation coefficient for the relationship between `x` and `y`."
53 ]
54 },
55 {
56 "cell_type": "code",
57 "execution_count": null,
58 "metadata": {
59 "collapsed": false,
60 "deletable": true,
61 "editable": true
62 },
63 "outputs": [],
64 "source": [
65 "n = 100\n",
66 "x = np.linspace(1, n, n)\n",
67 "y = x**5\n",
68 "\n",
69 "#Your code goes here"
70 ]
71 },
72 {
73 "cell_type": "markdown",
74 "metadata": {
75 "deletable": true,
76 "editable": true
77 },
78 "source": [
79 "# b. Spearman Rank Correlation\n",
80 "\n",
81 "Find the Spearman rank correlation coefficient for the relationship between `x` and `y` using the `stats.rankdata` function and the formula \n",
82 "\n",
83 "$$r_S = 1 - \\frac{6 \\sum_{i=1}^n d_i^2}{n(n^2 - 1)}$$\n",
84 "\n",
85 "where $d_i$ is the difference in rank of the `i`th pair of `x` and `y` values."
86 ]
87 },
88 {
89 "cell_type": "code",
90 "execution_count": null,
91 "metadata": {
92 "collapsed": false,
93 "deletable": true,
94 "editable": true
95 },
96 "outputs": [],
97 "source": [
98 "#Your code goes here"
99 ]
100 },
101 {
102 "cell_type": "markdown",
103 "metadata": {
104 "deletable": true,
105 "editable": true
106 },
107 "source": [
108 "Check your results against scipy's Spearman rank function. `stats.spearmanr`"
109 ]
110 },
111 {
112 "cell_type": "code",
113 "execution_count": null,
114 "metadata": {
115 "collapsed": false,
116 "deletable": true,
117 "editable": true
118 },
119 "outputs": [],
120 "source": [
121 "# Your code goes here"
122 ]
123 },
124 {
125 "cell_type": "markdown",
126 "metadata": {
127 "deletable": true,
128 "editable": true
129 },
130 "source": [
131 "#Exercise 2: Limitations of Spearman Rank Correlation\n",
132 "\n",
133 "##a. Lagged Relationships\n",
134 "\n",
135 "First, create a series `b` that is identical to `a` but lagged one step (`b[i] = a[i-1]`). Then, find the Spearman rank correlation coefficient of the relationship between `a` and `b`.\n"
136 ]
137 },
138 {
139 "cell_type": "code",
140 "execution_count": null,
141 "metadata": {
142 "collapsed": false,
143 "deletable": true,
144 "editable": true
145 },
146 "outputs": [],
147 "source": [
148 "n = 100\n",
149 "a = np.random.normal(0, 1, n)\n",
150 "\n",
151 "#Your code goes here"
152 ]
153 },
154 {
155 "cell_type": "markdown",
156 "metadata": {
157 "deletable": true,
158 "editable": true
159 },
160 "source": [
161 "##b. Non-Monotonic Relationships\n",
162 "\n",
163 "First, create a series `d` using the relationship $d=10c^2 - c + 2$. Then, find the Spearman rank rorrelation coefficient of the relationship between `c` and `d`."
164 ]
165 },
166 {
167 "cell_type": "code",
168 "execution_count": null,
169 "metadata": {
170 "collapsed": false,
171 "deletable": true,
172 "editable": true
173 },
174 "outputs": [],
175 "source": [
176 "n = 100\n",
177 "c = np.random.normal(0, 2, n)\n",
178 "\n",
179 "#Your code goes here"
180 ]
181 },
182 {
183 "cell_type": "markdown",
184 "metadata": {
185 "deletable": true,
186 "editable": true
187 },
188 "source": [
189 "#Exercise 3: Real World Example\n",
190 "\n",
191 "##a. Factor and Forward Returns\n",
192 "\n",
193 "Here we'll define a simple momentum factor (model). To evaluate it we'd need to look at how its predictions correlate with future returns over many days. We'll start by just evaluating the Spearman rank correlation between our factor values and forward returns on just one day.\n",
194 "\n",
195 "Compute the Spearman rank correlation between factor values and 10 trading day forward returns on 2015-1-2.\n",
196 "\n",
197 "For help on the pipeline API, see this tutorial: https://www.quantopian.com/tutorials/pipeline"
198 ]
199 },
200 {
201 "cell_type": "code",
202 "execution_count": null,
203 "metadata": {
204 "collapsed": false,
205 "deletable": true,
206 "editable": true
207 },
208 "outputs": [],
209 "source": [
210 "#Pipeline Setup\n",
211 "from quantopian.research import run_pipeline\n",
212 "from quantopian.pipeline import Pipeline\n",
213 "from quantopian.pipeline.data.builtin import USEquityPricing\n",
214 "from quantopian.pipeline.factors import CustomFactor, Returns, RollingLinearRegressionOfReturns\n",
215 "from quantopian.pipeline.classifiers.morningstar import Sector\n",
216 "from quantopian.pipeline.filters import QTradableStocksUS\n",
217 "from time import time\n",
218 "\n",
219 "#MyFactor is our custom factor, based off of asset price momentum\n",
220 "\n",
221 "class MyFactor(CustomFactor):\n",
222 " \"\"\" Momentum factor \"\"\"\n",
223 "\n",
224 " inputs = [USEquityPricing.close] \n",
225 " window_length = 60\n",
226 "\n",
227 " def compute(self, today, assets, out, close): \n",
228 " out[:] = close[-1]/close[0]\n",
229 " \n",
230 "universe = QTradableStocksUS()\n",
231 "\n",
232 "pipe = Pipeline(\n",
233 " columns = {\n",
234 " 'MyFactor' : MyFactor(mask=universe),\n",
235 " },\n",
236 " screen=universe\n",
237 ")\n",
238 "\n",
239 "start_timer = time()\n",
240 "results = run_pipeline(pipe, '2015-01-01', '2015-06-01')\n",
241 "end_timer = time()\n",
242 "results.fillna(value=0);\n",
243 "\n",
244 "print \"Time to run pipeline %.2f secs\" % (end_timer - start_timer)\n",
245 "\n",
246 "my_factor = results['MyFactor']"
247 ]
248 },
249 {
250 "cell_type": "code",
251 "execution_count": null,
252 "metadata": {
253 "collapsed": false,
254 "deletable": true,
255 "editable": true
256 },
257 "outputs": [],
258 "source": [
259 "n = len(my_factor)\n",
260 "\n",
261 "asset_list = results.index.levels[1].unique()\n",
262 "prices_df = get_pricing(asset_list, start_date='2015-01-01', end_date='2016-01-01', fields='price')\n",
263 "\n",
264 "# Compute 10-day forward returns, then shift the dataframe back by 10\n",
265 "forward_returns_df = prices_df.pct_change(10).shift(-10)\n",
266 "\n",
267 "# The first trading day is actually 2015-1-2\n",
268 "single_day_factor_values = my_factor['2015-1-2']\n",
269 "\n",
270 "# Because prices are indexed over the total time period, while the factor values dataframe\n",
271 "# has a dynamic universe that excludes hard to trade stocks, each day there may be assets in \n",
272 "# the returns dataframe that are not present in the factor values dataframe. We have to filter down\n",
273 "# as a result.\n",
274 "single_day_forward_returns = forward_returns_df.loc['2015-1-2'][single_day_factor_values.index]\n",
275 "\n",
276 "#Your code goes here"
277 ]
278 },
279 {
280 "cell_type": "markdown",
281 "metadata": {
282 "deletable": true,
283 "editable": true
284 },
285 "source": [
286 "##b. Rolling Spearman Rank Correlation\n",
287 "\n",
288 "Repeat the above correlation for the first 60 days in the dataframe as opposed to just a single day. You should get a time series of Spearman rank correlations. From this we can start getting a better sense of how the factor correlates with forward returns.\n",
289 "\n",
290 "What we're driving towards is known as an information coefficient. This is a very common way of measuring how predictive a model is. All of this plus much more is automated in our open source alphalens library. In order to see alphalens in action you can check out these resources:\n",
291 "\n",
292 "A basic tutorial:\n",
293 "https://www.quantopian.com/tutorials/getting-started#lesson4\n",
294 "\n",
295 "An in-depth lecture:\n",
296 "https://www.quantopian.com/lectures/factor-analysis"
297 ]
298 },
299 {
300 "cell_type": "code",
301 "execution_count": null,
302 "metadata": {
303 "collapsed": false,
304 "deletable": true,
305 "editable": true
306 },
307 "outputs": [],
308 "source": [
309 "rolling_corr = pd.Series(index=None, data=None)\n",
310 "\n",
311 "#Your code goes here"
312 ]
313 },
314 {
315 "cell_type": "markdown",
316 "metadata": {
317 "deletable": true,
318 "editable": true
319 },
320 "source": [
321 "##b. Rolling Spearman Rank Correlation\n",
322 "\n",
323 "Plot out the rolling correlation as a time series, and compute the mean and standard deviation."
324 ]
325 },
326 {
327 "cell_type": "code",
328 "execution_count": null,
329 "metadata": {
330 "collapsed": false,
331 "deletable": true,
332 "editable": true
333 },
334 "outputs": [],
335 "source": [
336 "# Your code goes here"
337 ]
338 },
339 {
340 "cell_type": "markdown",
341 "metadata": {
342 "deletable": true,
343 "editable": true
344 },
345 "source": [
346 "---\n",
347 "\n",
348 "Congratulations on completing the Spearman rank correlation exercises!\n",
349 "\n",
350 "As you learn more about writing trading models and the Quantopian platform, enter a daily [Quantopian Contest](https://www.quantopian.com/contest). Your strategy will be evaluated for a cash prize every day.\n",
351 "\n",
352 "Start by going through the [Writing a Contest Algorithm](https://www.quantopian.com/tutorials/contest) tutorial."
353 ]
354 },
355 {
356 "cell_type": "markdown",
357 "metadata": {
358 "deletable": true,
359 "editable": true
360 },
361 "source": [
362 "*This presentation is for informational purposes only and does not constitute an offer to sell, a solic\n",
363 "itation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. (\"Quantopian\"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.*"
364 ]
365 }
366 ],
367 "metadata": {
368 "kernelspec": {
369 "display_name": "Python 2",
370 "language": "python",
371 "name": "python2"
372 },
373 "language_info": {
374 "codemirror_mode": {
375 "name": "ipython",
376 "version": 2
377 },
378 "file_extension": ".py",
379 "mimetype": "text/x-python",
380 "name": "python",
381 "nbconvert_exporter": "python",
382 "pygments_lexer": "ipython2",
383 "version": "2.7.12"
384 }
385 },
386 "nbformat": 4,
387 "nbformat_minor": 0
388 }