ml-finance-python

python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
README.md

(10596B)
      1 # Chapter 05: Strategy Evaluation & Portfolio Management
      2 
      3 This chapter covers:
      4 
      5 - How to build and test a portfolio based on alpha factors using zipline
      6 - How to measure portfolio risk and return
      7 - How to avoid the pitfalls of backtesting
      8 - How to evaluate portfolio performance using pyfolio
      9 - How to manage portfolio weights using mean-variance optimization and alternatives
     10 - How to use machine learning to optimize asset allocation in a portfolio context
     11 
     12 ## How to build and test a portfolio with `zipline`
     13 
     14 In [Chapter 4](../04_alpha_factor_research), we introduced `zipline` to simulate the computation of alpha factors from trailing cross-sectional market, fundamental, and alternative data. Now we will exploit the alpha factors to derive and act on buy and sell signals. 
     15 
     16 We will postpone optimizing the portfolio weights until later in this chapter, and for now, just assign positions of equal value to each holding. 
     17 
     18 The code for this section is in the subdirectory [trading_zipline](01_trading_zipline) subdirectory; the notebook [alpha_factor_zipline_with_trades](01_trading_zipline/alpha_factor_zipline_with_trades.ipynb) simulates the trading decisions that build a portfolio based on the simple MeanReversion alpha factor from the last chapter using zipline.
     19 
     20 ## How to measure performance with `pyfolio`
     21 
     22 ML is about optimizing objective functions. In algorithmic trading, the objectives are the return and the risk of the overall investment portfolio, typically relative to a benchmark (which may be cash, the risk-free interest rate, or an asset price index like the S&P 500).
     23 
     24 ### The Sharpe Ratio
     25 
     26 The ex-ante Sharpe Ratio (SR) compares the portfolio's expected excess portfolio to the volatility of this excess return, measured by its standard deviation. It measures the compensation as the average excess return per unit of risk taken. It can be estimated from data.
     27 
     28 Financial returns often violate the iid assumptions. Andrew Lo has derived the necessary adjustments to the distribution and the time aggregation for returns that are stationary but autocorrelated. This is important because the time-series properties of investment strategies (for example, mean reversion, momentum, and other forms of serial correlation) can have a non-trivial impact on the SR estimator itself, especially when annualizing the SR from higher-frequency data.
     29 
     30 - [The Statistics of Sharpe Ratios](https://www.jstor.org/stable/4480405?seq=1#page_scan_tab_contents), Andrew Lo, Financial Analysts Journal, 2002
     31 
     32 ### The Fundamental Law of Active Management
     33 
     34 A high Information Ratio (IR) implies attractive out-performance relative to the additional risk taken. The Fundamental Law of Active Management breaks the IR down into the information coefficient (IC) as a measure of forecasting skill, and the ability to apply this skill through independent bets. It summarizes the importance to play both often (high breadth) and to play well (high IC).
     35 
     36 The IC measures the correlation between an alpha factor and the forward returns resulting from its signals and captures the accuracy of a manager's forecasting skills. The breadth of the strategy is measured by the independent number of bets an investor makes in a given time period, and the product of both values is proportional to the IR, also known as appraisal risk (Treynor and Black).
     37 
     38 The fundamental law is important because it highlights the key drivers of outperformance: both accurate predictions and the ability to make independent forecasts and act on these forecasts matter. In practice, estimating the breadth of a strategy is difficult given the cross-sectional and time-series correlation among forecasts. 
     39 
     40 - [Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk](https://www.amazon.com/Active-Portfolio-Management-Quantitative-Controlling/dp/0070248826) by Richard Grinold and Ronald Kahn, 1999
     41 - [How to Use Security Analysis to Improve Portfolio Selection](https://econpapers.repec.org/article/ucpjnlbus/v_3a46_3ay_3a1973_3ai_3a1_3ap_3a66-86.htm), Jack L Treynor and Fischer Black, Journal of Business, 1973
     42 - [Portfolio Constraints and the Fundamental Law of Active Management](https://faculty.fuqua.duke.edu/~charvey/Teaching/BA491_2005/Transfer_coefficient.pdf), Clarke et al 2002
     43 
     44 ### In- and out-of-sample performance with `pyfolio`
     45 
     46 Pyfolio facilitates the analysis of portfolio performance and risk in-sample and out-of-sample using many standard metrics. It produces tear sheets covering the analysis of returns, positions, and transactions, as well as event risk during periods of market stress using several built-in scenarios, and also includes Bayesian out-of-sample performance analysis.
     47 
     48 #### Code Examples
     49 
     50 The directory [risk_metrics_pyfolio](02_risk_metrics_pyfolio) contains the notebook [pyfolio_demo](02_risk_metrics_pyfolio/pyfolio_demo.ipynb) that illustrates how to extract the `pyfolio` input from the backtest conducted in the previous folder. It then proceeds to calcuate several performance metrics and tear sheets using `pyfolio`
     51 
     52 ## How to avoid the pitfalls of backtesting
     53 
     54 ### Data Challenges
     55 
     56 Backtesting simulates an algorithmic strategy using historical data with the goal of identifying patterns that generalize to new market conditions. In addition to the generic challenges of predicting an uncertain future in changing markets, numerous factors make mistaking positive in-sample performance for the discovery of true patterns very likely. 
     57 
     58 These factors include aspects of the data, the implementation of the strategy simulation, and flaws with the statistical tests and their interpretation. The risks of false discoveries multiply with the use of more computing power, bigger datasets, and more complex algorithms that facilitate the identification of apparent patterns in the noise.
     59 
     60 ### Data-snooping and backtest overfitting
     61 
     62 The most prominent challenge to backtest validity, including to published results, relates to the discovery of spurious patterns due to multiple testing during the strategy-selection process. Selecting a strategy after testing different candidates on the same data will likely bias the choice because a positive outcome is more likely to be due to the stochastic nature of the performance measure itself. In other words, the strategy is overly tailored, or overfit, to the data at hand and produces deceptively positive results.
     63 
     64 [Marcos Lopez de Prado](http://www.quantresearch.info/) has published extensively on the risks of backtesting, and how to detect or avoid it. This includes an [online simulator of backtest-overfitting](http://datagrid.lbl.gov/backtest/).
     65 
     66 
     67 #### The deflated Sharpe Ratio
     68 
     69 De Lopez Prado and Bailey (2014) derive a deflated SR to compute the probability that the SR is statistically significant while controlling for the inflationary effect of multiple testing, non-normal returns, and shorter sample lengths.
     70 
     71 The pyton script [deflated_sharpe_ratio](03_multiple_testing/deflated_sharpe_ratio.py) in the directory [multiple_testing](03_multiple_testing) contains the Python implementation with references for the derivation of the related formulas. 
     72 
     73 #### References
     74 
     75 - [The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality](https://www.davidhbailey.com/dhbpapers/deflated-sharpe.pdf), Bailey, David and Lopez de Prado, Marcos, Journal of Portfolio Management, 2013
     76 - [Backtest Overfitting: An Interactive Example](http://datagrid.lbl.gov/backtest/)
     77 - [Backtesting](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2606462), Lopez de Prado, Marcos, 2015
     78 - [Secretary Problem (Optimal Stopping)](https://www.geeksforgeeks.org/secretary-problem-optimal-stopping-problem/)
     79 - [Optimal Stopping and Applications](https://www.math.ucla.edu/~tom/Stopping/Contents.html), Ferguson, Math Department, UCLA
     80 - [Advances in Machine Learning Lectures 4/10 - Backtesting I](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3257420), Marcos Lopez de Prado, 2018
     81 - [Advances in Machine Learning Lectures 5/10 - Backtesting II](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3257497), Marcos Lopez de Prado, 2018
     82 
     83 ## How to Manage Portfolio Risk & Return
     84 
     85 - [Portfolio Selection](https://www.math.ust.hk/~maykwok/courses/ma362/07F/markowitz_JF.pdf), Harry Markowitz, The Journal of Finance, 1952
     86 - [The Capital Asset Pricing Model: Theory and Evidence](http://mba.tuck.dartmouth.edu/bespeneckbo/default/AFA611-Eckbo%20web%20site/AFA611-S6B-FamaFrench-CAPM-JEP04.pdf), Eugene F. Fama and Kenneth R. French, Journal of Economic Perspectives, 2004
     87 
     88 ### Mean-variance optimization
     89 
     90 MPT solves for the optimal portfolio weights to minimize volatility for a given expected return, or maximize returns for a given level of volatility. The key requisite input are expected asset returns, standard deviations, and the covariance matrix. 
     91 
     92 #### Code Examples
     93 
     94 We can calculate an efficient frontier using scipy.optimize.minimize and the historical estimates for asset returns, standard deviations, and the covariance matrix. 
     95 
     96 The directory [efficient_frontier](04_efficient_frontier) contains the notebook [mean_variance_optimization](04_efficient_frontier/mean_variance_optimization.ipynb) to compute the efficient frontier in python.
     97 
     98 ### Alternatives to mean-variance optimization
     99 
    100 #### The Black-Litterman approach
    101 
    102 - [Global Portfolio Optimization](http://www.sef.hku.hk/tpg/econ6017/2011/black-litterman-1992.pdf), Black, Fischer; Litterman, Robert
    103 Financial Analysts Journal, 1992
    104 
    105 #### The Kelly Rule
    106 
    107 - [A New Interpretation of Information Rate](https://www.princeton.edu/~wbialek/rome/refs/kelly_56.pdf), John Kelly, 1956
    108 - [Beat the Dealer: A Winning Strategy for the Game of Twenty-One](https://www.amazon.com/Beat-Dealer-Winning-Strategy-Twenty-One/dp/0394703103), Edward O. Thorp,1966
    109 - [Beat the Market: A Scientific Stock Market System](https://www.researchgate.net/publication/275756748_Beat_the_Market_A_Scientific_Stock_Market_System) , Edward O. Thorp,1967
    110 - [Quantitative Trading: How to Build Your Own Algorithmic Trading Business](https://www.amazon.com/Quantitative-Trading-Build-Algorithmic-Business/dp/0470284889/ref=sr_1_2?s=books&ie=UTF8&qid=1545525861&sr=1-2), Ernie Chan, 2008
    111 
    112 ##### Code Example
    113 
    114 The directory [kelly](05_kelly) Kelly Rule contains the notebooks [kelly_rule](05_kelly/kelly_rule.ipynb) to compute the Kelly rule portfolio.
    115 
    116 #### Hierarchical Risk Parity
    117 
    118 - [Hierarchical Clustering Based Asset Allocation](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2840729), Thomas Raffinot, 2016