ml-finance-python
python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
notebook.ipynb
(15314B)
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# Exercises: Confidence Intervals\n",
8 "By Christopher Fenaroli and Delaney Mackenzie\n",
9 "\n",
10 "## Lecture Link:\n",
11 "https://www.quantopian.com/lectures/confidence-intervals\n",
12 "\n",
13 "### IMPORTANT NOTE: \n",
14 "This lecture corresponds to the Confidence Intervals lecture, which is part of the Quantopian lecture series. This homework expects you to rely heavily on the code presented in the corresponding lecture. Please copy and paste regularly from that lecture when starting to work on the problems, as trying to do them from scratch will likely be too difficult.\n",
15 "\n",
16 "When you feel comfortable with the topics presented here, see if you can create an algorithm that qualifies for the Quantopian Contest. Participants are evaluated on their ability to produce risk-constrained alpha and the top 10 contest participants are awarded cash prizes on a daily basis.\n",
17 "\n",
18 "https://www.quantopian.com/contest\n",
19 "\n",
20 "Part of the Quantopian Lecture Series:\n",
21 "\n",
22 "* [www.quantopian.com/lectures](https://www.quantopian.com/lectures)\n",
23 "* [github.com/quantopian/research_public](https://github.com/quantopian/research_public)\n",
24 "\n",
25 "----"
26 ]
27 },
28 {
29 "cell_type": "markdown",
30 "metadata": {},
31 "source": [
32 "## Key Concepts"
33 ]
34 },
35 {
36 "cell_type": "code",
37 "execution_count": null,
38 "metadata": {
39 "collapsed": true
40 },
41 "outputs": [],
42 "source": [
43 "def generate_autocorrelated_data(theta, mu, sigma, N):\n",
44 " X = np.zeros((N, 1))\n",
45 " for t in range(1, N):\n",
46 " X[t] = theta * X[t-1] + np.random.normal(mu, sigma)\n",
47 " return X\n",
48 "\n",
49 "def newey_west_SE(data):\n",
50 " ind = range(0, len(data))\n",
51 " ind = sm.add_constant(ind)\n",
52 " model = regression.linear_model.OLS(data, ind).fit(cov_type='HAC',cov_kwds={'maxlags':1})\n",
53 " return model.bse[0]\n",
54 "\n",
55 "def newey_west_matrix(data):\n",
56 " ind = range(0, len(data))\n",
57 " ind = sm.add_constant(ind)\n",
58 " model = regression.linear_model.OLS(data, ind).fit()\n",
59 " return sw.cov_hac(model)"
60 ]
61 },
62 {
63 "cell_type": "code",
64 "execution_count": null,
65 "metadata": {
66 "collapsed": true
67 },
68 "outputs": [],
69 "source": [
70 "# Useful Libraries\n",
71 "import numpy as np\n",
72 "import seaborn as sns\n",
73 "from scipy import stats\n",
74 "import matplotlib.pyplot as plt\n",
75 "from statsmodels.stats.stattools import jarque_bera\n",
76 "import statsmodels.stats.sandwich_covariance as sw\n",
77 "from statsmodels import regression\n",
78 "import statsmodels.api as sm"
79 ]
80 },
81 {
82 "cell_type": "markdown",
83 "metadata": {},
84 "source": [
85 "#### Data"
86 ]
87 },
88 {
89 "cell_type": "code",
90 "execution_count": null,
91 "metadata": {
92 "collapsed": true
93 },
94 "outputs": [],
95 "source": [
96 "np.random.seed(11)\n",
97 "POPULATION_MU = 105\n",
98 "POPULATION_SIGMA = 20\n",
99 "sample_size = 50"
100 ]
101 },
102 {
103 "cell_type": "markdown",
104 "metadata": {},
105 "source": [
106 "# Exercise 1: Determining Confidence Intervals\n",
107 "\n",
108 "## a. Mean\n",
109 "\n",
110 "Determine the mean of the following artificial data in `sample`."
111 ]
112 },
113 {
114 "cell_type": "code",
115 "execution_count": null,
116 "metadata": {
117 "collapsed": true
118 },
119 "outputs": [],
120 "source": [
121 "sample = np.random.normal(POPULATION_MU, POPULATION_SIGMA, sample_size)\n",
122 "\n",
123 "#Your code goes here"
124 ]
125 },
126 {
127 "cell_type": "markdown",
128 "metadata": {},
129 "source": [
130 "## b. Standard Deviation\n",
131 "\n",
132 "Determine standard deviation of the sample."
133 ]
134 },
135 {
136 "cell_type": "code",
137 "execution_count": null,
138 "metadata": {
139 "collapsed": true
140 },
141 "outputs": [],
142 "source": [
143 "#Your code goes here"
144 ]
145 },
146 {
147 "cell_type": "markdown",
148 "metadata": {},
149 "source": [
150 "## c. Standard Error\n",
151 "\n",
152 "Using the standard deviation and `sample_size`, determine the standard error for the sample."
153 ]
154 },
155 {
156 "cell_type": "code",
157 "execution_count": null,
158 "metadata": {
159 "collapsed": true
160 },
161 "outputs": [],
162 "source": [
163 "#Your code goes here"
164 ]
165 },
166 {
167 "cell_type": "markdown",
168 "metadata": {},
169 "source": [
170 "## d. Confidence Intervals\n",
171 "\n",
172 "Using the standard error and mean, determine 95% `(Z = 1.96)`, 90% `(Z = 1.64)`, and 80% `(Z = 1.28)` confidence intervals for the sample. "
173 ]
174 },
175 {
176 "cell_type": "code",
177 "execution_count": null,
178 "metadata": {
179 "collapsed": true
180 },
181 "outputs": [],
182 "source": [
183 "#Your code goes here"
184 ]
185 },
186 {
187 "cell_type": "markdown",
188 "metadata": {},
189 "source": [
190 "-----"
191 ]
192 },
193 {
194 "cell_type": "markdown",
195 "metadata": {},
196 "source": [
197 "# Exercise 2: Interpreting Confidence Intervals\n",
198 "\n",
199 "Assuming our interval was correctly calculated and that the underlying data was independent, if we take many samples and make many 95% confidence intervals, the intervals will contain the true mean 95% of the time. Run 1000 samples and measure how many of their confidence intervals actually contain the true mean."
200 ]
201 },
202 {
203 "cell_type": "code",
204 "execution_count": null,
205 "metadata": {
206 "collapsed": true
207 },
208 "outputs": [],
209 "source": [
210 "n = 1000\n",
211 "correct = 0\n",
212 "samples = [np.random.normal(loc=POPULATION_MU, scale=POPULATION_SIGMA, size=sample_size) for i in range(n)]\n",
213 "\n",
214 "#Your code goes here"
215 ]
216 },
217 {
218 "cell_type": "markdown",
219 "metadata": {},
220 "source": [
221 "----"
222 ]
223 },
224 {
225 "cell_type": "markdown",
226 "metadata": {},
227 "source": [
228 "# Exercise 3: Central Limit Theorem\n",
229 "\n",
230 "## a. Plotting Sample Means - Normal\n",
231 "\n",
232 "Assuming our samples are independent, the distribution of the sample means should be normally distributed, regardless of the underlying distribution. \n",
233 "\n",
234 "Draw 500 samples of size `sample_size` from the same normal distribution from question 1, plot the means of each of the samples, and check to see if the distribution of the sample means is normal using the `jarque_bera` function (see [here](https://www.quantopian.com/lectures/statistical-moments) more information on the Jarque-Bera test)"
235 ]
236 },
237 {
238 "cell_type": "code",
239 "execution_count": null,
240 "metadata": {
241 "collapsed": true
242 },
243 "outputs": [],
244 "source": [
245 "n = 500\n",
246 "normal_samples = [np.mean(np.random.normal(loc=POPULATION_MU, scale=POPULATION_SIGMA, size=sample_size)) for i in range(n)]\n",
247 "\n",
248 "#Your code goes here"
249 ]
250 },
251 {
252 "cell_type": "markdown",
253 "metadata": {},
254 "source": [
255 "## b. Plotting Sample Means - Exponential\n",
256 "\n",
257 "Draw 500 samples of size `sample_size` from a new exponential distribution, plot the means of each of the samples, and check to see if the distribution of the sample means is normal."
258 ]
259 },
260 {
261 "cell_type": "code",
262 "execution_count": null,
263 "metadata": {
264 "collapsed": true
265 },
266 "outputs": [],
267 "source": [
268 "n = 500\n",
269 "expo_samples = [np.mean(np.random.exponential(POPULATION_MU, sample_size)) for i in range(n)]\n",
270 "\n",
271 "#Your code goes here"
272 ]
273 },
274 {
275 "cell_type": "markdown",
276 "metadata": {},
277 "source": [
278 "## c.i Plotting Sample Means - Autocorrelated\n",
279 "\n",
280 "Draw 500 samples of size `sample_size` from a new autocorrelated (dependent) distribution, plot the means of each of the samples, and check to see if the distribution of the sample means is normal."
281 ]
282 },
283 {
284 "cell_type": "code",
285 "execution_count": null,
286 "metadata": {
287 "collapsed": true
288 },
289 "outputs": [],
290 "source": [
291 "n = 500\n",
292 "autocorrelated_samples = [(generate_autocorrelated_data(0.5, 0, 1, sample_size) + POPULATION_MU) for i in range(n)]\n",
293 "autocorrelated_means = [np.mean(autocorrelated_samples[i]) for i in range(n)]\n",
294 "\n",
295 "#Your code goes here"
296 ]
297 },
298 {
299 "cell_type": "markdown",
300 "metadata": {},
301 "source": [
302 "## c.ii Plotting Sample Standard Deviations - Autocorrelated\n",
303 "\n",
304 "Draw 500 samples of size `sample_size` from the same autocorrelated distribution, plot the standard deviations of each of the samples, and check to see if the distribution of the sample standard deviations is normal."
305 ]
306 },
307 {
308 "cell_type": "code",
309 "execution_count": null,
310 "metadata": {
311 "collapsed": true
312 },
313 "outputs": [],
314 "source": [
315 "n = 500\n",
316 "autocorrelated_samples = [(generate_autocorrelated_data(0.5, 0, 1, sample_size) + POPULATION_MU) for i in range(n)]\n",
317 "autocorrelated_stds = [np.std(autocorrelated_samples[i]) for i in range(n)]\n",
318 "\n",
319 "#Your code goes here"
320 ]
321 },
322 {
323 "cell_type": "markdown",
324 "metadata": {},
325 "source": [
326 "----"
327 ]
328 },
329 {
330 "cell_type": "markdown",
331 "metadata": {},
332 "source": [
333 "# Exercise 4: Small Sample Sizes\n",
334 "\n",
335 "## a. Error Due to Small Sample Size\n",
336 "\n",
337 "Run 100 samples of size `small_size` and measure how many of their 95% confidence intervals actually contain the true mean."
338 ]
339 },
340 {
341 "cell_type": "code",
342 "execution_count": null,
343 "metadata": {
344 "collapsed": true
345 },
346 "outputs": [],
347 "source": [
348 "n = 100\n",
349 "small_size = 3\n",
350 "correct = 0\n",
351 "samples = [np.random.normal(loc=POPULATION_MU, scale=POPULATION_SIGMA, size=small_size) for i in range(n)]\n",
352 " \n",
353 "#Your code goes here"
354 ]
355 },
356 {
357 "cell_type": "markdown",
358 "metadata": {},
359 "source": [
360 "## b. T-distribution Correction\n",
361 "\n",
362 "Run 100 samples of size `small_size`, this time accouting for the small sample size using a t-distribution, and measure how many of their 95% confidence intervals actually contain the true mean."
363 ]
364 },
365 {
366 "cell_type": "code",
367 "execution_count": null,
368 "metadata": {
369 "collapsed": true
370 },
371 "outputs": [],
372 "source": [
373 "n = 100\n",
374 "small_size = 5\n",
375 "correct = 0\n",
376 "samples = [np.random.normal(loc=POPULATION_MU, scale=POPULATION_SIGMA, size=small_size) for i in range(n)]\n",
377 "\n",
378 "#Your code goes here"
379 ]
380 },
381 {
382 "cell_type": "markdown",
383 "metadata": {},
384 "source": [
385 "----"
386 ]
387 },
388 {
389 "cell_type": "markdown",
390 "metadata": {},
391 "source": [
392 "# Exercise 5: Dependence\n",
393 "\n",
394 "## a. Error due to Dependence\n",
395 "\n",
396 "Run 100 samples of the following autocorrelated distribution and measure how many of their 95% confidence intervals actually contain the true mean. (Use the helper function `generate_autocorrelated_data(theta, noise_mu, noise_sigma, sample_size)` to generate the samples)"
397 ]
398 },
399 {
400 "cell_type": "code",
401 "execution_count": null,
402 "metadata": {
403 "collapsed": true
404 },
405 "outputs": [],
406 "source": [
407 "n = 100\n",
408 "correct = 0\n",
409 "theta = 0.5\n",
410 "noise_mu = 0\n",
411 "noise_sigma = 1\n",
412 "\n",
413 "#Your code goes here"
414 ]
415 },
416 {
417 "cell_type": "markdown",
418 "metadata": {},
419 "source": [
420 "## b. T-distribution Correction\n",
421 "\n",
422 "Run 100 samples from the autocorrelated distribution, this time attempting to account for the autocorrelation using a t-distribution, and measure how many of their 95% confidence intervals actually contain the true mean to see if the correction works."
423 ]
424 },
425 {
426 "cell_type": "code",
427 "execution_count": null,
428 "metadata": {
429 "collapsed": true
430 },
431 "outputs": [],
432 "source": [
433 "n = 100\n",
434 "correct = 0\n",
435 "\n",
436 "#Your code goes here"
437 ]
438 },
439 {
440 "cell_type": "markdown",
441 "metadata": {},
442 "source": [
443 "## c. Newey-West Matrix\n",
444 "\n",
445 "Use the `newey_west_matrix` helper function to compute an adjusted (robust) covariance matrix for a single sample of the autocorrelated data. "
446 ]
447 },
448 {
449 "cell_type": "code",
450 "execution_count": null,
451 "metadata": {
452 "collapsed": true,
453 "scrolled": true
454 },
455 "outputs": [],
456 "source": [
457 "X = generate_autocorrelated_data(theta, noise_mu, noise_sigma, sample_size) + POPULATION_MU\n",
458 "\n",
459 "#Your code goes here"
460 ]
461 },
462 {
463 "cell_type": "markdown",
464 "metadata": {},
465 "source": [
466 "## d. Newey-West Correction\n",
467 "\n",
468 "Run 100 samples of the following autocorrelated distribution, this time accounting for the autocorrelation by using a Newey-West correction on the standard error, and measure how many of their 95% confidence intervals actually contain the true mean to see if the correction works. (Use the helper function `newey_west_SE` to find the corrected standard error)"
469 ]
470 },
471 {
472 "cell_type": "code",
473 "execution_count": null,
474 "metadata": {
475 "collapsed": true
476 },
477 "outputs": [],
478 "source": [
479 "n = 100\n",
480 "correct = 0\n",
481 "\n",
482 "#Your code goes here"
483 ]
484 },
485 {
486 "cell_type": "markdown",
487 "metadata": {},
488 "source": [
489 "---\n",
490 "\n",
491 "Congratulations on completing the Confidence Intervals exercises!\n",
492 "\n",
493 "As you learn more about writing trading models and the Quantopian platform, enter the daily [Quantopian Contest](https://www.quantopian.com/contest). Your strategy will be evaluated for a cash prize every day.\n",
494 "\n",
495 "Start by going through the [Writing a Contest Algorithm](https://www.quantopian.com/tutorials/contest) tutorial."
496 ]
497 },
498 {
499 "cell_type": "markdown",
500 "metadata": {},
501 "source": [
502 "*This presentation is for informational purposes only and does not constitute an offer to sell, a solic\n",
503 "itation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. (\"Quantopian\"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.*"
504 ]
505 }
506 ],
507 "metadata": {
508 "kernelspec": {
509 "display_name": "Python 2",
510 "language": "python",
511 "name": "python2"
512 },
513 "language_info": {
514 "codemirror_mode": {
515 "name": "ipython",
516 "version": 2
517 },
518 "file_extension": ".py",
519 "mimetype": "text/x-python",
520 "name": "python",
521 "nbconvert_exporter": "python",
522 "pygments_lexer": "ipython2",
523 "version": "2.7.10"
524 }
525 },
526 "nbformat": 4,
527 "nbformat_minor": 1
528 }