ml-finance-python

python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
README.md

(10940B)
      1 # Chapter 08: Linear Time Series Models
      2 
      3 This chapter covers:
      4 - How to use time series analysis to diagnose diagnostic statistics that inform the modeling process
      5 - How to estimate and diagnose autoregressive and moving-average time series models
      6 - How to build Autoregressive Conditional Heteroskedasticity (ARCH) models to predict volatility
      7 - How to build vector autoregressive models
      8 - How to use cointegration for a pairs trading strategy
      9 
     10 
     11 ## Analytical tools for diagnostics and feature extraction
     12 
     13 Most of the examples in this chapter use data provided by the Federal Reserve that you can access using the pandas datareader that we introduced in [Chapter 2, Market and Fundamental Data](../02_market_and_fundamental_data). 
     14 
     15 The code examples for this section are available in the notebook [](01_stationarity_and_arima.ipynb).
     16 
     17 ### How to decompose time series patterns
     18 
     19 Time series data typically contains a mix of various patterns that can be decomposed into several components, each representing an underlying pattern category. In particular, time series often consist of the systematic components trend, seasonality and cycles, and unsystematic noise. These components can be combined in an additive, linear model, in particular when fluctuations do not depend on the level of the series, or in a non-linear, multiplicative model. 
     20 
     21 - `pandas` Time Series and Date functionality [docs](https://pandas.pydata.org/pandas-docs/stable/timeseries.html)
     22 - [Forecasting - Principles & Practice, Hyndman, R. and Athanasopoulos, G., ch.6 'Time Series Decomposition'](https://otexts.org/fpp2/decomposition.html)
     23 
     24 ### How to compute rolling window statistics
     25 
     26 The pandas library includes very flexible functionality to define various window types, including rolling, exponentially weighted and expanding windows.
     27 
     28 - `pandas` window function [docs](https://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions)
     29 
     30 ### How to measure autocorrelation
     31 
     32 Autocorrelation (also called serial correlation) adapts the concept of correlation to the time series context: just as the correlation coefficient measures the strength of a linear relationship between two variables, the autocorrelation coefficient measures the extent of a linear relationship between time series values separated by a given lag.
     33 
     34 We present the following tools to measure autorcorrelation:
     35 - autocorrelation function (ACF)
     36 - partial autocorrelation function (PACF)
     37 - correlogram as a plot of ACF or PACF against the number of lags.
     38 
     39 ### How to diagnose and achieve stationarity
     40 
     41 The statistical properties, such as the mean, variance, or autocorrelation, of a stationary time series are independent of the period, that is, they don't change over time. Hence, stationarity implies that a time series does not have a trend or seasonal effects and that descriptive statistics, such as the mean or the standard deviation, when computed for different rolling windows, are constant or do not change much over time.
     42 
     43 ### How to apply time series transformations
     44 
     45 To satisfy the stationarity assumption of linear time series models, we need to transform the original time series, often in several steps. Common transformations include the application of the (natural) logarithm to convert an exponential growth pattern into a linear trend and stabilize the variance, or differencing.
     46 
     47 ### How to diagnose and address unit roots
     48 
     49 Unit roots pose a particular problem for determining the transformation that will render a time series stationary. In practice, time series of interest rates or asset prices are often not stationary, for example, because there does not exist a price level to which the series reverts. The most prominent example of a non-stationary series is the random walk.
     50 
     51 The defining characteristic of a unit-root non-stationary series is long memory: since current values are the sum of past disturbances, large innovations persist for much longer than for a mean-reverting, stationary series. Identifying the correct transformation, and in particular, the appropriate number and lags for differencing is not always clear-cut. We present a few heuristics to guide the process.
     52 
     53 Statistical unit root tests are a common way to determine objectively whether (additional) differencing is necessary. These are statistical hypothesis tests of stationarity that are designed to determine whether differencing is required.
     54 
     55 ## Univariate Time Series Models
     56 
     57  Univariate time series models relate the value of the time series at the point in time of interest to a linear combination of lagged values of the series and possibly past disturbance terms.
     58 
     59 While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data. ARIMA(p, d, q) models require stationarity and leverage two building blocks:
     60 - Autoregressive (AR) terms consisting of p-lagged values of the time series
     61 - Moving average (MA) terms that contain q-lagged disturbances
     62 
     63 - [Analysis of Financial Time Series, 3rd Edition, Ruey S. Tsay](https://www.wiley.com/en-us/Analysis+of+Financial+Time+Series%2C+3rd+Edition-p-9780470414354)
     64 
     65 - [Quantitative Equity Investing: Techniques and Strategies, Frank J. Fabozzi, Sergio M. Focardi, Petter N. Kolm](https://www.wiley.com/en-us/Quantitative+Equity+Investing%3A+Techniques+and+Strategies-p-9780470262474)
     66 
     67 - `statsmodels` Time Series Analysis [docs](https://www.statsmodels.org/dev/tsa.html)
     68 
     69 ### How to build autoregressive models
     70 
     71 An AR model of order p aims to capture the linear dependence between time series values at different lags. It closely resembles a multiple linear regression on lagged values of the outcome.
     72 
     73 ### How to build moving average models
     74 
     75 An MA model of order q uses q past disturbances rather than lagged values of the time series in a regression-like model. Since we do not observe the white-noise disturbance values, MA(q) is not a regression model like the ones we have seen so far. Rather than using least squares, MA(q) models are estimated using maximum likelihood (MLE).
     76 
     77 ### How to build ARIMA models and extensions
     78 
     79 Autoregressive integrated moving-average ARIMA(p, d, q) models combine AR(p) and MA(q) processes to leverage the complementarity of these building blocks and simplify model development by using a more compact form and reducing the number of parameters, in turn reducing the risk of overfitting.
     80 
     81 - statsmodels State-Space Models [docs](https://www.statsmodels.org/dev/statespace.html)
     82 
     83 ### How to forecast macro fundamentals
     84 
     85 We will build a SARIMAX model for monthly data on an industrial production time series for the 1988-2017 period. See notebook [stationarity_and_arima](01_stationarity_and_arima.ipynb) for implementation details.
     86 
     87 ### How to use time series models to forecast volatility
     88 
     89 A particularly important area of application for univariate time series models is the prediction of volatility. The volatility of financial time series is usually not constant over time but changes, with bouts of volatility clustering together. Changes in variance create challenges for time series forecasting using the classical ARIMA models.
     90 
     91 - NYU Stern [VLAB](https://vlab.stern.nyu.edu/)
     92 
     93 ### How to build a volatility-forecasting model
     94 
     95 The development of a volatility model for an asset-return series consists of four steps:
     96 1. Build an ARMA time series model for the financial time series based on the serial dependence revealed by the ACF and PACF.
     97 2. Test the residuals of the model for ARCH/GARCH effects, again relying on the ACF and PACF for the series of the squared residual.
     98 3. Specify a volatility model if serial correlation effects are significant, and jointly estimate the mean and volatility equations.
     99 4. Check the fitted model carefully and refine it if necessary.
    100 
    101 The notebook [arch_garch_models](02_arch_garch_models.ipynb) demonstrates the usage of the ARCH library to estimate time series models for volatility foreccasting with NASDAQ data.  
    102 
    103 - ARCH Library [examples](http://nbviewer.jupyter.org/github/bashtage/arch/blob/master/examples/univariate_volatility_modeling.ipynb)
    104 
    105 
    106 ## Multivariate Time Series Models
    107 
    108 Multivariate time series models are designed to capture the dynamic of multiple time series simultaneously and leverage dependencies across these series for more reliable predictions.
    109 
    110 - [New Introduction to Multiple Time Series Analysis, Lütkepohl, Helmut, Springer, 2005](https://www.springer.com/us/book/9783540401728)
    111 
    112 ### The vector autoregressive (VAR) model
    113 
    114 - `statsmodels` Vector Autoregression [docs](https://www.statsmodels.org/dev/vector_ar.html)
    115 
    116 - [Time Series Analysis in Python with statsmodels](https://conference.scipy.org/proceedings/scipy2011/pdfs/statsmodels.pdf), Wes McKinney, Josef Perktold, Skipper Seabold, SciPY Conference 2011
    117 
    118 ### How to use the VAR model for macro fundamentals forecasts
    119 
    120 The notebook [vector_autoregressive_model](03_vector_autoregressive_model.ipynb) demonstrates how to use `statsmodels` to estimate a VAR model for macro fundamentals time series.
    121 
    122 ### Cointegration – time series with a common trend
    123 
    124 The concept of an integrated multivariate series is complicated by the fact that all the component series of the process may be individually integrated but the process is not jointly integrated in the sense that one or more linear combinations of the series exist that produce a new stationary series.
    125 
    126 In other words, a combination of two co-integrated series has a stable mean to which this linear combination reverts. A multivariate series with this characteristic is said to be co-integrated. This also applies when the individual series are integrated of a higher order and the linear combination reduces the overall order of integration. 
    127 
    128 We demonstrate two major approaches to testing for cointegration:
    129 - The Engle–Granger two-step method
    130 - The Johansen procedure
    131 
    132 ### How to use cointegration for a pairs-trading strategy
    133 
    134 Pairs-trading relies on a stationary, mean-reverting relationship between two asset prices. In other words, the ratio or difference between the two prices, also called the spread, may over time diverge but should ultimately return to the same level. Given such a pair, the strategy consists of going long (that is, purchasing) the under-performing asset because it would require a period of outperformance to close the gap. At the same time, one would short the asset that has moved away from the price anchor in the positive direction to fund the purchase.
    135 
    136 In practice, given a universe of assets, a pairs-trading strategy will search for co-integrated pairs by running a statistical test on each pair. The key challenge here is to account for multiple testing biases, as outlined in [Chapter 6, Machine Learning Workflow](../06_machine_learning_process).
    137 
    138 - [Introduction to Pairs Trading](https://www.quantopian.com/lectures/introduction-to-pairs-trading)