Machine Learning, Stock Market and Chaos

Machine Learning of Chaotic
Systems
Solving Complex and Insoluble Problems via Artificial Intelligence
By Lipa Roitman PhD
November 1st, 2015

Contents
• Chaos VS Randomness
• Chaotic Processes
• Modeling Chaos- Statistics Approach
• Modeling Chaos- Artificial Intelligence and Machine
Learning Approach
• Steps in Machine Learning
• Financial Markets as Chaotic Processes

Chaos and Randomness
• Random noise
No known cause, no regularity, no rationality, no repeatability, no pattern
Impossible to predict

Chaos VS Randomness
• Randomness Examples
Previous coin flips do not predict the next one.
Brownian motion - random walk
Gaussian and non-Gaussian
Random (white) noise with frequency-independent power spectrum
Other modes of random processes.

• Stationary process: statistical properties: mean value,
variance, moments, and probability distribution do not change over time.
Stationary ergodic process: the process has constant statistical properties with time,
AND its global statistical properties can be reliably derived from a
long enough sample of the process.
Chaos VS Randomness

• Real life chaotic processes are neither stationary nor ergodic!
Their statistics have to be constantly monitored since they drift with time.
A nonparametric analysis is needed when the probability distribution of the system is
not normal.
Chaos VS Randomness

• Astronomy: Three-Body Problem
• Sunspots
• Geology: Earthquakes
• Oceanology: El Niño (Pacific ocean temperature) ,
Tides
• Meteorology: Weather
Chaos in Natural Processes

• Fluid flow: luminary vs turbulent
• Candle flame
• Quantum chaos
• Biology: Population growth
• Physiology: Arrhythmia, Epilepsy, Diabetis
• DNA code
• Epidemiology: diseases

• Social: fashion trends
• Wars
• Music and speech
• Stock markets, etc.

Chaotic Processes
Chaotic Processes
Three competing paradigms:
Stability
Instability
Sudden and Dramatic Change

Chaotic Systems Properties
What is the pattern?
• Stability: Persistent
trends.
• Memory: What
happens next
depends on prior
history.
• Predictable: One
can predict while
the pattern
continues.

• Instability - “tired trend” - accumulation of small random imbalances,
or of slow systematic imbalances that precede large change.
• “Sand pile avalanche model”
• Predictability is lower

• Change: paradigm changes suddenly, seemingly without warning. –
often with reversal of trend
• Fat-Tail: The change could be much stronger
from what is expected in the normal Gaussian
distribution.
• Black Swan Events

• Cycles of varying lengths.
• Periods of quiet followed by big jumps
• Chaotic patterns are predictable, but only in terms of probabilities.

• Measuring Chaos - Statistically
Modeling Chaos

• Mathematical modeling of chaotic systems is difficult:
Tiny changes in parameters can sometimes lead to extreme changes in the outcome.
There is no certainty, only probability.
Modeling Chaos

• The ubiquity of gradual trends and the rarity of the extreme events
resemble the spectral density of a stochastic process, having the form
• In this “1/f noise model” the magnitude of the signal (event) is
inversely proportional to its frequency.
Modeling Chaos
S(f)=1/f^α

Although 1/f noise is widely present in natural and social time series, the source of
such noise is not well and understood.
1/f noise is an intermediate between the white noise with no correlation in time and
random walk (Brownian motion) noise with no correlation between increments.
In most real chaotic processes the random (white) frequency-independent noise
overlaps the 1/f noise.
Modeling Chaos

In a random autoregressive process the autocorrelation functions
decay exponentially
In chaotic process, they leave a small persistent residue: “long
memory”.
Modeling Chaos

If one looks at a chaotic process at different degrees of magnification, one
finds they are similar. This self –similarity brings us to a subject of fractals
Self similarity =
Power laws scale invariance
fractals
(Mandelbrot)
Hurst exponent
Scale Invariance

• Chaos Fractals Connection
Modeling
Chaos

• Rescaling Range
• Given a relation
• Scaling the argument x by a constant factor c causes only a
proportionate scaling of the function itself
Modeling Chaos

• In other words:
 Scaling by a constant c simply multiplies the original power-law relation by
the constant c^{-k}. Thus “Self-Similarity”
Modeling Chaos

• “Power Law Signature”: Logarithms of both f(x) and x, have linear
relationship: straight-line on the log-log plot.
• Rescaled range - The slope of this line gives the Hurst exponent, H.
Modeling Chaos

• Hurst exponent can distinguish fractal from random time series, or
find the long memory cycles
Hurst Exponent H

• H =1/2 Random walk
- Brownian motion
-Normal Distribution
• H < 1/2 mean reverting
• negative feedback:
• high noise
• high fractal dimension
Hurst exponent H

• 1>H>1/2 Chaotic trending process:
Positive feedback
Less noise
Smaller fractional dimension
Fractional Brownian motion, or 1/f noise
Hurst exponent H

Maximal Lyapunov Exponent
Maximal Lyapunov exponent (MLE) is a measure of sensitivity to
initial conditions, i.e. unpredictability.
Positive MLE: chaos
The inverse of Lyapunov exponent: predictability: 1/MLE
Large MLE: shorter half-life of signal, faster loss of predictive
“power”.

• Maximal Lyapunov exponent (MLE) is a measure of sensitivity to
initial conditions, a property of chaos
• Hurst exponent H is a measure of persistency
Maximal Lyapunov Exponent

Fractal time series are good
approximations of chaotic
processes. They are complex
systems that have similar
properties.
Modeling Chaos with Fractals

Modeling Chaos with Fractals
Fat-tailed probability
distribution
Memory Effect: Slowly decaying
autocorrelation function
Power spectrum of 1/f type
Modeled with fractal dimension
and the Hurst parameter
Global or local self-similarity.

Fractal dimension D and Hurst exponent H each characterize the local
irregularity (D) and global persistence (H).
Thus D and H are the fractal analogues of variance and mean, which
are not constant in the chaotic time series.
Fractal Dimension and Hurst Exponent

Fractal Dimension and Hurst Exponent
• For self-affine processes, the local properties are
reflected in the global ones
• For a self-affine surface in n-dimensional space
• D+H=n+1
D: fractal dimension
H: Hurst exponent

Chaos and Fractals Connection
Fractals have self-similar patterns at different scales.
Fractal dimension
Multi fractal system - continuous spectrum of exponents - singularity
spectrum.

Random shocks to the process, such as news events. The shocks can
have both temporary and lasting effect
Combination of interdependent autoregressive processes, each with
its own statistical properties.
Two Reasons For 1/F Noise

Modeling Chaos:
 Artificial Intelligence and Machine Learning
Approach
Modeling Chaos - AI Approach

Artificial Intelligence
• Machine Learning Purpose: Generalization
• Find the laws within the data
• Predicting change
• Number crunching allows finding hidden laws, not obvious
to human eye

Artificial Intelligence Types
Rules Based AI
Man creates the rules: Expert Systems
The rule-based approach is time consuming and not very accurate

Supervised learning from examples
The examples must be representative of the entire data set.

Un-supervised learning
Classification: clustering

Deep learning
Deep learning models high-level abstractions in data by using multiple
processing layers with complex structures.

Deep learning can automatically select the features
For a simple machine learning, a human has to tell the algorithm which
combination of features to consider
Deep learning finds the relationships on its own
No human involvement

“Ultra Deep Learning”
Machine has learned so much, it can not only derive the rules, but
detect when the rules change: detect the change in paradigms.
Combines the supervised, un-supervised types and rule based machine
learning into a more intelligent system.

Steps in Machine Learning
Provide Framework
Mathematical and Programming Tools
Data preparation
Parameters estimation
Give examples to learn from: the input (and in some
methods the output)

Steps in Machine Learning
• Creating a Model (or Models).
• Fitness Function: What to optimize?
• Example: Make more good predictions than bad ones.

Data Preparation
Data preparation
Convert the generally non-stationary data into more-or-less stationary
Remove the cycles, trends to reduce the uniqueness of each data point

Parameters Estimation
• Parametric OR Nonparametric?
• Parametric model: fixed number of parameters
• Nonparametric: no assumptions about the probability distributions of
the variables.
• In non-parametric model the number of parameters increases with the
amount of training data.

Creating a Model
“All Models are Wrong, Some Models are Useful” –
George E. P. Box

Multivariate time series
Multivariate time series modeling is required when the outcome of
one process depends on other processes.
Examples are systems of interdependent global and local processes,
asset prices, exchange rates, interest rates, and other variables.

Multivariate time series
To create a model one could use the available knowledge about
interrelationship of the processes, and combine it with unknowns in
one or more of the linear or non-linear models.
The “fitness” or “error” function is then created, which compares the
model with the data.

Machine Learning
The fitness function is improved through machine learning by varying
the parameters in the model. The goal is to maximize the fitness of
the model to the data presented for learning (minimize the error).
Different models are screened
Part of the data is saved from the learning cycle to be used for testing.
The successful model should be able to perform adequately on the
test data.

Dimensionality Reduction
• Dimensionality reduction
• Speeds up algorithm execution
• Improves performance
• The less variables the better is generality

• Principal Component Analysis is one of
the methods of dimensionality reduction.
• Orthogonally transforms the original data set
into a new set of “principal components”
Dimensionality Reduction Methods

• Methods:
• Low Variance Filter.
• High Correlation Filter.
• Pruning the network.
• Adding and replacing inputs.
• Other methods.
Dimensionality Reduction Methods

Clustering
• The many examples in the data can be compressed into clusters
according to the similarity through fitting to one or more criteria.
• Each data member that belongs to a cluster is associated with a
number from 0 to 1 that shows the degree of belonging.
• Each data member can also belong to multiple clusters with each
specific degree of belonging.
• Clustering can be a goal in itself, or a part of a general model, that
includes the behavior of clusters as a whole.

Time Constraint
• A <insert favorite programming language> programmer knows the
value of everything, but the cost of nothing. -- Alan J. Perlis

Time Constraint
• Some problems are insoluble or too complex to be completely solved
in reasonable time.
• Compromises are necessary, e.g. speed vs precision vs generality
• Time complexity (big O notation) of an algorithm quantifies the
amount of time taken by an algorithm to run as a function of the
length of the string representing the input.

Time Complexity (Big O Notation)

Choice of Algorithm
• Which Algorithm?
 Depends on the task
 Depends on time available
 Depends on the precision required

Local and Global Minimum
accp1.org/pharmacometrics/theory.htm
Uphill SearchingDownhill Gradient
Searching

Local Search Algorithms
• Local search methods:
• steepest descent or
• best-first criterion,
• stochastic search.
• simulated annealing,
• genetic selection
• others

A random move altering the state
Assess the fitness of the new state
Compare the fitness to the previous state
Decide whether to accept the new solution or reject it.
Repeat until you have converged on an acceptable answer
Simulated Annealing

Global Search Algorithms
• Stochastic optimization
• Uphill searching
• Basin hopping

accp1.org/pharmacometrics/theory.htm
Local and Global Minimum

Basin Hopping
The algorithm is iterative with each cycle composed of the following
features
Random perturbation of the coordinates
Local minimization
Accept or reject the new coordinates based on the minimized
function value

Genetic Algorithms
• Many solutions are in the pool, some good, some not so.
• Each solution is analogous to a chromosome in genetics

Genetic Algorithms
• Ways to improve gene pool:
• Combination:
• Combine two or more solutions in hope of producing a
better solution.
• Mutation:
• -Modify a solution in random places in hope of producing a
better solution.
• Crossover:
• Import a solution from a similar problem
• Selection:
• Survival of the fittest

68
Bain-Template
Gene
Pool
Reprod
uceMutate
Select
Genetic
Algorithm

I Know First Predictive Algorithm
• Most financial time series exhibit classical chaotic behavior. The
chaos theory, the classification and predictive capabilities of the
machine learning has been applied to forecasting of such time
series.
• This artificial intelligence approach is in the root of I Know First
predictive algorithm.

I Know First Predictive Algorithm
 The following slides are the method and the
results of applying the algorithm to learn the
database of historical time series data.

The I Know First Algorithm
The results are constantly improving as the algorithm learns from its
successes and failures
Tracks and predicts the flow of
money from one market or
investment channel to another
The system is a
predictive model based
on Artificial Intelligence,
Machine Learning, and
incorporates elements of
Artificial Neural Networks
and Genetic Algorithms
Artificial
Intelligence
(AI)
Artificial
Neural
Networks
I Know First
predicts 2000
Market’s
Eeveryday

Synopsis of the Algorithm
The results are constantly improving as the algorithm learns
from its successes and failures

Two indicators:
Signal – Predicted movement
of the asset
Predictability Indicator –
Historical correlation between
the prediction and the actual
market movement
Daily Market Heat map

XOMA returned 61.45% in
1 month from this forecast

I Know First
beats the
S&P500 by
96.4%
View Full Portfolio
I Know First Live Portfolio 2015 Performance
The Performance

I Know First
beats the
S&P500 by
20.8%
The Performance

Main Features of the
Algorithm
Identifies The Best Market Opportunities Daily
6 Time Frames
Tracks Over 3,000 Markets
Self-Learning
Adaptable
Always Learning New Patterns
Scalable
A Decision Support System (DSS)
Predictability Indicator
Strong Historical Performance – 60.66% gain in
2013
The algorithm becomes more
and more accurate with every
prediction as it constantly tests
multiple models in different
market circumstances

More Applications Of I Know First Algorithm
• Time Series Forecasting of Multidimensional Chaotic Systems.
• What if? It is a Scenario-based Forecasting

Machine Learning, Stock Market and Chaos

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Machine Learning, Stock Market and Chaos

Similar a Machine Learning, Stock Market and Chaos (20)

Más de I Know First: Daily Market Forecast

Más de I Know First: Daily Market Forecast (8)

Último

Último (20)

Machine Learning, Stock Market and Chaos

Notas del editor