ml-finance-python

python scripts for finance machine learning
git clone https://9o.is/git/ml-finance-python.git
README.md

(15819B)
      1 # Chapter 17: Convolutional Neural Networks
      2 
      3 CNNs are named after the linear algebra operation called convolution that replaces the general matrix multiplication typical of feed-forward networks. Research into CNN architectures has proceeded very rapidly and new architectures that improve performance on some benchmark continue to emerge frequently. CNNs are designed to learn hierarchical feature representations from grid-like data. One of their shortcomings is that they do not learn spatial relationships, i.e., the relative positions of these features. In the last section, we will outline how Capsule Networks work that have emerged to overcome these limitations. 
      4 
      5 More specifically, this chapter covers
      6 
      7 - How CNNs use key building blocks to efficiently model grid-like data
      8 - How to design CNN architectures using Keras and PyTorch
      9 - How to train, tune and regularize CNN for various data types
     10 - How to use transfer learning to streamline CNN, even with fewer data
     11 - How Capsule Networks improve on CNN and may enable a new wave of innovation
     12 
     13 ## How to build a Deep ConvNet
     14 
     15 CNNs are conceptually similar to the feedforward NNs we covered in the previous chapter. They consist of units that contain parameters called weights and biases, and the training process adjusts these parameters to optimize the network’s output for a given input. Each unit applies its parameters to a linear operation on the input data or activations received from other units, possibly followed by a non-linear transformation. 
     16 
     17 CNNs differ because they encode the assumption that the input has a structure most commonly found in image data where pixels form a two-dimensional grid, typically with several channels to represent the components of the color signal, such as the red, green and blue channels of the RGB color model.
     18 
     19 The most important element to encode the assumption of a grid-like topology is the convolution operation that gives CNNs their name, combined with pooling. We will see that the specific assumptions about the functional relationship between input and output data implies that CNNs need far fewer parameters and compute more efficiently.
     20 
     21 ### How Convolutional Layers work
     22 
     23 Fully-connected feedforwardNNs make no assumptions about the topology, or local structure of the input data so that arbitrarily reordering the features has no impact on the training result.
     24 
     25 For many data sources, however, local structure is quite significant. Examples include autocorrelation in time series or the spatial correlation among pixel values due to common patterns like edges or corners. For image data, this local structure has traditionally motivated the development of hand-coded filter methods that extract local patterns for the use as features in machine learning models.
     26 
     27 - [Deep Learning](http://www.deeplearningbook.org/contents/convnets.html), Chapter 9, Convolutional Networks, Ian Goodfellow et al, MIT Press, 2016
     28 - [Convolutional Neural Networks (CNNs / ConvNets)](http://cs231n.github.io/convolutional-networks/#conv), Module 2 in CS231n Convolutional Neural Networks for Visual Recognition, Lecture Notes by Andrew Karpathy, Stanford, 2016
     29 - [Convnet Benchmarks](https://github.com/soumith/convnet-benchmarks), Benchmarking of all publicly accessible implementations of convnets
     30 - [ConvNetJS](https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html), ConvNetJS CIFAR-10 demo in the browser by Andrew Karpathy
     31 - [An Interactive Node-Link Visualization of Convolutional Neural Networks](http://scs.ryerson.ca/~aharley/vis/), interactive CNN visualization
     32 - [GradientBased Learning Applied to Document Recognition](http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf), Yann LeCun Leon Bottou Yoshua Bengio and Patrick, IEEE, 1998
     33 - [Understanding Convolutions](http://colah.github.io/posts/2014-07-Understanding-Convolutions/), Christopher Olah, 2014
     34 - [Multi-Scale Context Aggregation by Dilated Convolutions](https://arxiv.org/abs/1511.07122), Fisher Yu, Vladlen Koltun, ICLR 2016
     35 
     36 #### Code examples
     37 
     38 - the python script [conv_filter_viz](01_conv_filter_viz.py) ([source](https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py))  creates a visualization of the filters learned by a deep CNN using the VGG16 architecture.
     39 - The notebook [filter_example](01_filter_example.ipynb) illustrates how to use hand-coded filters in a convolutional network and visualize the resulting transformation of the image.
     40 
     41 ### Computer Vision Tasks
     42 
     43 Image classification is a fundamental computer vision task that requires labeling an image based on certain objects it contains. Many practical applications, including investment and trading strategies, require additional information. 
     44 - The object detection task requires not only the identification but also the spatial location of all objects of interest, typically using bounding boxes. Several algorithms have been developed to overcome the inefficiency of brute-force sliding-window approaches, including region proposal methods (R-CNN) and the You Only Look Once (YOLO) real-time object detection algorithm (see references on GitHub).
     45 - The object segmentation task goes a step further and requires a class label and an outline of every object in the input image. This may be useful to count objects in an image and evaluate a level of activity. 
     46 - Semantic segmentation, also called scene parsing, makes dense predictions to assign a class label to each pixel in the image. As a result, the image is divided into semantic regions and each pixel is assigned to its enclosing object or region.
     47 
     48 - [YOLO: Real-Time Object Detection](https://pjreddie.com/darknet/yolo/), You Only Look Once real-time object detection
     49 - [Rich feature hierarchies for accurate object detection and semantic segmentation](https://arxiv.org/pdf/1311.2524.pdf), Girshick et al, Berkely, arxiv 2014
     50 - [Playing around with RCNN](https://cs.stanford.edu/people/karpathy/rcnn/), Andrew Karpathy, Stanford
     51 - [R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms](https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e), Rohith Ghandi, 2018
     52 
     53 ### Reference Architectures & Benchmarks
     54 
     55 - [Fully Convolutional Networks for Semantic Segmentation](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf), Long et al, Berkeley
     56 - [Mask R-CNN](https://arxiv.org/abs/1703.06870), Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, arxiv, 2017
     57 - [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/pdf/1505.04597.pdf), Olaf Ronneberger, Philipp Fischer, and Thomas Brox, arxiv 2015
     58 - [U-Net Tutorial](http://deeplearning.net/tutorial/unet.html)
     59 - [Very Deep Convolutional Networks for Large-Scale Visual Recognition](http://www.robots.ox.ac.uk/~vgg/research/very_deep/), Karen Simonyan and Andrew Zisserman on VGG16 that won the ImageNet ILSVRC-2014 competition
     60 - [Benchmarks for popular CNN models](https://github.com/jcjohnson/cnn-benchmarks)
     61 - [Analysis of deep neural networks](https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae), Alfredo Canziani, Thomas Molnar, Lukasz Burzawa, Dawood Sheik, Abhishek Chaurasia, Eugenio Culurciello, 2018
     62 - [LeNet-5 Demos](http://yann.lecun.com/exdb/lenet/index.html)
     63 - [Neural Network Architectures](https://towardsdatascience.com/neural-network-architectures-156e5bad51ba)
     64 - [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf), Kaiming He et al, Microsoft Research, 2015
     65 - [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567), Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, arxiv 2015
     66 - [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261), Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi, arxiv, 2016
     67 - [Network In Network](https://arxiv.org/pdf/1312.4400v3.pdf), Min Lin et al, arxiv 2014
     68 - [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167), Sergey Ioffe, Christian Szegedy, arxiv 2015
     69 - [An Overview of ResNet and its Variants](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035), Vincent Fung, 2017
     70 
     71 
     72 ## How to design and train a CNN using Python
     73 
     74 ### LeNet5 and MNIST using Keras
     75 
     76 All libraries we introduced in the last chapter provide support for convolutional layers. The notebook [mnist_with_ffnn_and_lenet5](02_mnist_with_ffnn_and_lenet5.ipynb) illustrates the LeNet5 architecture using the most basic MNIST handwritten digit dataset, and then use AlexNet on CIFAR10, a simplified version of the original ImageNet to demonstrate the use of data augmentation.
     77 
     78 ### AlexNet and CIFAR10 with Keras
     79 
     80 Fast-forward to 2012, and we move on to the deeper and more modern AlexNet architecture. We will use the CIFAR10 dataset that uses 60,000 ImageNet samples, compressed to 32x32 pixel resolution (from the original 224x224), but still with three color channels. There are only 10 of the original 1,000 classes. See the notebook [cifar10_image_classification](03_cifar10_image_classification.ipynb) for implementation. 
     81 
     82 ### How to use CNN with time series data
     83 
     84 The regular measurements of time series result in a similar grid-like data structure as for the image data we have focused on so far. As a result, we can use CNN architectures for univariate and multivariate time series. In the latter case, we consider different time series as channels, similar to the different color signals.
     85 
     86 The notebook [cnn_with_time_series](04_cnn_with_time_series.ipynb) illustrates the time series use case with the univariate asset price forecast example we introduced in the last chapter. Recall that we create rolling monthly stock returns and use the 24 lagged returns alongside one-hot-encoded month information to predict whether the subsequent monthly return is positive or negative.
     87 
     88 ## Transfer Learning
     89 
     90 In practice, we often do not have enough data to train a CNN from scratch with random initialization. Transfer learning is a machine learning technique that repurposes a model trained on one set of data for another task. Naturally, it works if the learning from the first task carries over to the task of interest. If successful, it can lead to better performance and faster training that requires less labeled data than training a neural network from scratch on the target task.
     91 
     92 - [Building powerful image classification models using very little data](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html)
     93 - [How transferable are features in deep neural networks?](https://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf), Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson, NIPS, 2014
     94 - [PyTorch Transfer Learning Tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)
     95 
     96 ### How to build on a pre-trained CNN
     97 
     98 The transfer learning approach to CNN relies on pre-training on a very large dataset like ImageNet. The goal is that the convolutional filters extract a feature representation that generalizes to new images. In a second step, it leverages the result to either initialize and retrain a new CNN or as inputs to in a new network that tackles the task of interest.
     99 
    100 CNN architectures typically use a sequence of convolutional layers to detect hierarchical patterns, adding one or more fully-connected layers to map the convolutional activations to the outcome classes or values. The output of the last convolutional layer that feeds into the fully-connected part is called bottleneck features. We can use the bottleneck features of a pre-trained network as inputs into a new fully-connected network, usually after applying a ReLU activation function. 
    101 
    102 In other words, we freeze the convolutional layers and replace the dense part of the network. An additional benefit is that we can then use inputs of different sizes because it is the dense layers that constrain the input size. 
    103 
    104 Alternatively, we can use the bottleneck features as inputs into a different machine learning algorithm. In the AlexNet architecture, e.g., the bottleneck layer computes a vector with 4096 entries for each 224 x 224 input image. We then use this vector as features for a new model.
    105 
    106 Alternatively, we can go a step further and not only replace and retrain the classifier on top of the CNN using new data but to also fine-tune the weights of the pre-trained CNN. To achieve this, we continue training, either only for later layers while freezing the weights of some earlier layers. The motivation is to preserve presumably more generic patterns learned by lower layers, such as edge or color blob detectors while allowing later layers of the CNN to adapt to the details of a new task. ImageNet, e.g., contains a wide variety of dog breeds which may lead to feature representations specifically useful for differentiating between these classes.
    107 
    108 ### How to extract bottleneck features
    109 
    110 Modern CNNs can take weeks to train on multiple GPUs on ImageNet, but fortunately, many researchers share their final weights. Keras, e.g., contains pre-trained models for several of the reference architectures discussed above, namely VGG16 and 19, ResNet50, InceptionV3 and InceptionResNetV2, MobileNet, DenseNet, NASNet and MobileNetV2
    111 
    112 The notebook [bottleneck_features](05_bottleneck_features.ipynb) illustrates how to download pre-trained VGG16 model, either with the final layers to generate predictions or without the final layers as illustrated in the figure below to extract the outputs produced by the bottleneck features.
    113 
    114 ### How to further train a pre-trained model
    115 
    116 The notebook [transfer_learning](06_transfer_learning.ipynb) demonstrates how to freeze some or all of the layers of a pre-trained model and continue training using a new fully-connected set of layers and data with a different format.
    117 
    118 ## How to detect objects
    119 
    120 Object detection requires the ability to distinguish between several classes of objects and to decide how many and which of these objects are present in an image.
    121 
    122 ### Google Street View Housenumber Dataset
    123 
    124  A prominent example is Ian Goodfellow’s identification of house numbers from Google’s street view dataset. It requires to identify 
    125 - how many of up to five digits make up the house number, 
    126 - The correct digit for each component, and
    127 - The proper order of the constituent digits.
    128 
    129 The notebooks [svhn_preprocessing](07_svhn_preprocessing.ipynb) contains code to produce a simplified, cropped dataset that uses bounding box information to create regularly shaped 32x32 images containing the digits; the original images are of arbitrary shape.
    130 
    131 The notebook [svhn_object_detection](08_svhn_object_detection.ipynb) goes on to illustrate how to build a deep CNN using Keras’ functional API to generate multiple outputs: one to predict how many digits are present, and five for the value of each in the order they appear.
    132 
    133 ## Capsule Nets
    134 
    135 - [Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829), Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, arxiv, 2017
    136 
    137 ## Resources
    138 
    139 - [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/syllabus.html), Stanford’s deep learning course. Helpful for building foundations, with engaging lectures and illustrative problem sets.
    140 - [ImageNet Large Scale Visual Recognition Challenge (ILSVRC)](http://www.image-net.org/challenges/LSVRC/)
    141 
    142 
    143 docker run -it -p 8889:8888 -v /home/stefan/projects/machine-learning-for-trading/17_convolutional_neural_nets:/cnn --name tensorflow tensorflow/tensorflow:latest-gpu-py3 bash
    144 
    145