Machine learning techniques can be applied to high frequency trading by developing predictive models from large datasets capturing market microstructure features at fine granularities. However, this presents challenges due to the lack of understanding how low-level data relates to trading outcomes and lack of intuitions about how order book distributions impact prices. The study compares various machine learning strategies applied to data from Bloomberg Terminal to design an effective high frequency trading strategy.
3. What is High Frequency
Trading?
Let’s start with the first set of slides 1
4. “Mandatory Disclaimer
All characters and events depicted in this
project are entirely modeled. Any similarity to
actual events or stock price movements is
purely awesome.
5. “Using computer algorithms to rapidly trade securities
● Positions are held for seconds to minutes
● Reaction times to market changes are sub-millisecond
● HFT accounts for more than 60% of all trading volume in some
markets
● Due to market efficiency it is challenging to come up with robust
predictive models
● Columns include timestamp, price, order-flag, size, etc.
6. Challenges and Scope
◉ The special challenges for machine learning : due to very fine granularity of
data.
◉ A lack of understanding of how such low-level data relates to actionable
circumstances (such as profitably buying or selling shares, optimally
executing a large order, etc.).
◉ No prior intuitions about how (say) the distribution of liquidity in the order
book relates to future price movements, if at all.
SCOPE : a comparative study of various ML strategies and their performance on
the data obtained through the Bloomberg Terminal. Finally, to design a
successful strategy to operate in such a scenario.
10. A PICTURE IS WORTH A THOUSAND WORDS
A complex idea can be
conveyed with just a
single still image.
Namely making it
possible to absorb large
amounts of data quickly.
11.
12. Holt – Winter’s Model
Simple Exponential Smoothing is used for
applying as many as three low-pass filters with
exponential window functions. It was then used to
predict the output for the next tick and the MSE
was found to be : 253.9
Output curve is as shown :-
13. Feed Forward Model
A feed-forward neural network is an artificial neural
network where connections between the units
do not form a cycle. Here it was used to predict
the output for the next tick and the MSE was found
to be : 144.67
Output curve is as shown :-
14. News Sentiment Analysis
Stanford Parser was used to get the dependency
graph for a sentence.
Sentiment judged using senti-word-net.
Output graph is as shown :-
15. Black Schole’s Model
● Stock prices follows geometric Brownian motion i.e. returns follow lognormal distribution with
constant drift and volatility
● For one period binomial model call price under risk neutral probability measure Q in arbitrage
free market is
● The call price C(t,x) satisfies the equation
● Solving the above equation gives us
16. Limitations of Black Schole's R Simulation
● Log returns of stocks do not always follow normal distribution as shown in the graph below.
● Fixed risk free rate, no dividend and fixed volatility assumptions are not always valid.
● The market is not complete, because of the transaction cost i.e. not always possible to choose
suitable hedging portfolio (aH,bH) in risky and risk free assets.
● Out of the money performance of BS model is not as good as in the money performance.
● A fixed no-arbitrage price for any option on the stock.
17. Markov Diffusion Model
● The drift and volatility are finite state Markov chain in continuous time.
● For example - Assume drift and volatility both are 2-state processes. Then the stock price
follows Markov switching diffusion model -
● Then the arbitrage free European call price for maturity T, strike price K are two solutions -
18. Markov Diffusion Model R Simulation
• If at both states drift and volatility remains same, this model corresponds to Black and Scholes
model.
• The market is not complete because of the state transitions.
• The output is shown below :-
19.
20. Jump Diffusion Process
• Jumps should occur in an instantaneous fashion, neglecting the possibility of a Delta hedge
• Probability of any jump occurring in a particular interval of time should be approximately
proportional to length of that time interval
• All future jumps should have no "memory" of past jumps
• The extra parameters ν and m represent the standard deviation of the lognormal jump process
and the scale factor for jump intensity, respectively.
21. Jump Diffusion Result Discussion
High RMSE (=26.7) since, Jump-diffusion model cannot incorporate possible dependence structure
among asset returns (the so-called “volatility clustering effect”), simply because the model assumes
independent increments. Here is the plot for the same :-
22.
23. Support Vector Regression
• Motivation: Find a function that tries to predict dependent variable (Y) on the basis of
independent variables (X) such that deviation from predicted value and actual value is less than or
equal to a measure (ε) till which error is tolerated
• Methodology: Polynomial Kernel used - Radial Basis function kernel
• Advantage: In traditional methods, only Stock Price, Strike Price and Time to Maturity are used.
SVR can capture the effect of Risk free interest rate and volatility which are dynamic with time
• Conclusion: Decent RMSE (=19.56) compared to other parametric methods
24. Symmetrized Nearest Neighbour
• Semi-parametric estimation of liquidity effects on option pricing
• When obtaining volatility nonparametric function in money-ness intervals for which amount of
data is relatively small, use of multivariate kernel based on global smoothing parameter may
lead to poor estimation results
• When estimating in one point we calculate weight for rest of our observations by looking at
distance between values of empirical distribution at each point rather than distance between
points themselves
• Empirical distribution changes the random design to a uniform design with knots uniformly
spaced between zero and one
25. Symmetrized Nearest Neighbour Regression Results
Inputs:
• Money-ness
• Bid ask Spread
• Days until Expiration (T)
• Volatility calculated was given as input in BS
model
Conclusion: In-sample performance of model is
better compared to a competing model without
liquidity. However, out-of-sample performance is
quite disappointing resulting in high MSE
26. K Nearest Neighbour Implementation and Results
• Non-parametric algorithm that can be used for either classification or regression.
• For each data point, the algorithm finds the k closest observations, and then classifies the data
point to the majority.
• The data set was encoded so as to fit the KNN classifier to it. The data was compared to the
previous tick and if there was an increase from a previous value then the data was encoded as
1, 0 otherwise
• Accuracy was checked for the data and a corresponding confusion matrix was created for the
data
• Highest Accuracy was found for value of K=5
27. Artificial Neural Network : Conjugate Gradient
MOTIVATION :-
• Results suggest that machine learning can be used as a basis for effective option investment
strategies using several Multilayer Perceptron models.
• Existing models in finance for predicting the price of an option, most of which revolve around
the Black-Scholes model; however, these models tend to involve highly complex mathematics
and often make many assumptions about the underlying characteristics of the market.
ADVANTAGES:-
• Machine Learning techniques do not make any implicit assumptions about the relationships
between input variables. Using an Artificial Neural Network, we are able to let the learner
discover relationships that may not be included in standard models like Black-Scholes.
28. Artificial Neural Network : Conjugate Gradient (Cont…)
METHODOLOGY :-
From The data obtained by Bloomberg Terminal, we obtained the historical data of the following
attributes.
• Strike Price (X)
• Underlying Stock Price (S)
• 10-day Historical Volatility
• 30-day Historical Volatility
• Days until Expiration (T)
• The Market Price of the Option (P)
• The 10 previous days of stock prices
• Risk-free rate
• Expiration Price of Stock (E)
Apply ANN using these parameters
29. Artificial Neural Network : Conjugate Gradient (Cont…)
CONCLUSIONS :-
Using Artificial Neural Network on the test data for option pricing gives a more accurate result as
compared to the other base learner models.
For further improvements we can add the following parameters as the input to the specified model as
well.
• Measures of Volatility
• Previous Stock Prices
Although ANN using Conjugate Gradient is a very renowned and used Machine Learning method, it
doesn’t provide a very accurate result as Conjugate Gradient doesn’t account for the damping Least
Square Error..
Due to this problem we switch over to the Levenberg Marquardt Optimization in Neural Network
which takes care of the above specified problem.
30. Artificial Neural Network : Levenberg Marquardt
• Majorly used to solve non-linear least squares problems.
• Finds local minima instead of global minima.
• LMA interpolates between the Gauss-Newton Algorithm (GNA) and the method of gradient
descent.
33. DISCRETE TRADING MODEL
◉ Trading is possible at N discrete times
◉ No interest on cash position
◉ A trading strategy is given by (xi)i=0..N+1 where
xk = #units hold at t=k (i.e. we sell nk=xk-xk+1 at price Sk)
◉ Boundary conditions: x0 = X and xN+1 = 0
◉ Price dynamics:
Exogenous: Arithmetic Random Walk
Sk = Sk-1 + (k+), k=1..N
with k ~ N(0,1) i.i.d
Endogenous: Market Impact
34. HIDDEN MARKOV MODEL
◉Decide the “hidden” states: up trend, mean reverting, down trend
MEAN
REVERTING
μ ≈ 0
DOWN
μ >> 0
UP
μ >> 0
p11
p12
p13
p22
p21 p23
p33
p32
p31
35. TECHNICAL INDICATORS
BOLLINGER BANDS & STOCHASTIC OSCILLATORS
Stochastic Oscillator
◉Offers a measurement of deviance of
currency pair’s rate (price) from its
normal levels
◉Offers indications of when a currency
pair is overbought/oversold
◉Works well in markets that are not
trending, but rather just fluctuating
back and forth between an upper level
(resistance) and a lower level (support)
Bollinger Bands
◉Excellent range-bound indicator that
measures standard deviation from the
moving average
◉Operates under the logic that a currency
pair’s price is most likely to gravitate
towards its average, and hence when it
strays too far – such as two standard
deviations away – it is due to retrace back
to its moving average
36. “
Plots three
bands on a price
chart to create
two price
channels. The
security is said
to be overbought
if price line is
consistently
near or breaches
upper price
band. It may be
oversold if the
price line is
consistently
near or drops
below the lower
price band.
39. “
Overbought
position is
confirmed if
stochastic lines
cross above 80
and, at the
same time,
price line is
consistently
near upper
Bollinger Band.
At that level,
prices are
expected to
drop soon. The
opposite is also
true.
42. STRATEGY IS SIMPLE
The first condition you are
looking for is a candle breaking
the UPPER or LOWER Bollinger
Band
Look for stochastics to have
traveled above 80 line (for a
bullish candle traveling
outside upper Bollinger
Band), or below the 20 line
Wait for the next candle to
form before you get into
the trade
Once the next candle has formed
the stochastics lines should
have crossed and be heading
back towards the white line.
If the stochastic lines have
not yet crossed or they are
becoming further apart do
not take the trade, wait until
all of the conditions are met.
If the conditions have been
met, place a trade in the
opposite direction of the
previous candle
44. SHARPE RATIO TABLE TO COMPARE ALGORITHMS
Value
DataSet -0.00029266783
Black Scholes -0.1735110594
Jump Process -0.173409357
SNN & SVR -0.00012865643
46. CONCLUSIONS
• The out-of-sample performance is not comparable regardless of what option pricing model is employed in
the estimation
• Artificial Neural Network (Feed Forward) model gives best result among forecasting tools
• Semi-parametric implied volatility estimation is more effective than BS implied volatility
• Non-parametric method give better accuracy compared to parametric methods
• SVR takes less time and gives decent result among the non-parametric methods
• ANN able to capture effect of many more variables like dividend and historical volatilities but largely
depends on the volatility of data input
47. REFERENCES
• Machine Learning for Market Microstructure and High Frequency Trading
• Quest for Efficient Option Pricing Prediction model using Machine Learning Techniques - B.V. Phani, B.
Chandra, Vijay Raghav
• A Semiparametric Estimation of Liquidity Effects on Option Pricing - Eva Ferreira
• http://www.platonniaga.com/downloads/ea-
documents/Lesson%205%20Stochastics%20and%20Bollinger%20Bands.pdf