Anomaly Detection through Reinforcement Learning

PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 1
Anomaly Detection
through Reinforcement Learning
.
.
Dr. Hari Koduvely
Chief Data Scientist
ZIGHRA.COM

Outline of Talk:
.
● Zighra and SensifyID Platform
● Sequential Anomaly Detection Problem
● Introduction to Reinforcement Learning
● Markov Decision Process and Q-Learning
● Function Approximation using Neural Networks
● Application to Network Intrusion Detection Problem
● Implementation using TensorFlow

ZIGHRA.COM
.
● Zighra (https://zighra.com) provides solutions for Continuous Behavioural
Authentication & Threat Detection
● Highlights of our SensifyID Platform:
○ Core is an AI based 6-layer Anomaly Detection System combining
behavioral biometrics with contextual, social and other signals
○ Cover uses cases such as User Verification, Account Takeover,
Remote Attacks and Bot Attacks
○ Can be integrated to any Web, Mobile & IoT application
○ 2 patents granted and 10+ in application stage

Sequential Anomaly Detection Problem
.
● Classical Anomaly Detection Problem is to find patterns in a dataset that do
not conform to expected normal behavior
● Formulated as a one-class classification task in machine learning
● In many domains the data distribution changes continuously (concept shift)
● An online learning setting is more ideal to deal with concept shifts
current_week_purchase
average_weekly_purchase
Source of image https://www.linkedin.com/pulse/part-2-keep-simple-machine-learning-algorithms-big-dr-dinesh/

Sequential Anomaly Detection Problem
.
● In Sequential Anomaly Detection problem the goal is to find out if a
subsequence of a sequence of events shows anomaly or not
● Each event in isolation would appear to be normal and only the sequence
of events would indicate an anomaly
○ Username-Password, Username-Password, Username-Password,....
○ Login to corporate network in midnight, Access a DB rarely used, Download lot of data,
Transfer to USB,......
● A straightforward supervised learning is not feasible here because of credit
assignment problem

Introduction to Reinforcement Learning
.
● In Reinforcement Learning, an autonomous agent interacts with an environment and
takes certain actions at
in each state st
● The environment in return supplies a reward rt
for the action agent performed as a
supervision signal and also a new state st+1
Agent
Environm
ent
at
st
rt
, st+1

Introduction to Reinforcement Learning
.
● Reinforcement Learning can be formally defined as a Markov Decision Process
● A Markov Decision Process (MDP) is defined by the 5-tuple {st
, at
, P(st+1
|st
, at
), γ , rt
}
○ st
- State at time t
○ at
- Action in state s
○ P(st+1
|st
, at
) - State transition probabilities
○ γ - Discount factor
○ rt
- Reward function
● Objective of MDP is to come up with an Optimum Policy that achieves maximum
cumulative rewards over long period of time

Q-Learning and Markov Decision Process
.
● Q-value Function Q(s, a) - an estimate of maximum total long term rewards starting from
state s and performing action a
● Bellman Equation:
Q(s, a) = r(s) + γ maxa’
∑s’
P(s’ |s, a) Q(s’, a’)
Q-value for a state-action pair is the current reward plus the expected Q-value of its successor states
● Central theoretical concept used in almost all formulations of reinforcement learning
● It can be proved that starting from random initial conditions, upon iteration of Bellman
equation Q(s, a) will converge to an optimum quality function Q*(s, a)
● Optimum policy is given by
Π*(s) = argmaxa
Q*(s, a)

Q-Learning and Markov Decision Process
.
● It is difficult to know the state transition probabilities P(st+1
|st
, at
) for a given problem
● Bellman’s equation can be cast in a derivative form where transition probabilities are
not needed
● Only the actual observed state from the environment is used
● Temporal Difference Learning Algorithm:
When an agent makes a transition from state s by performing an action a to state s’,
its Q value is updated as follows:
Q(s, a) ← Q(s, a) + α [ r(s) + γ maxa’
Q(s’, a’) - Q(s, a) ]
α is a learning rate << 1
● The Q-values are adjusted towards the ideal local equilibrium when Bellman’s equation holds.

Function Approximation using Neural Networks
.
● The Bellman’s equation is a deterministic algorithm
● For problems where the state and action spaces are small one can use a table to
represent Q(s, a)
● In many practical applications, state and action spaces are continuous
● One needs an efficient function approximation method for representing Q(s, a)
● Two standard approaches for this are
○ Tile Coding: Partition continuous space into overlapping set of tiles.
➢ Success depends up on the number and width of tiles.
➢ It is a linear function approximation
○ Neural Networks: Nonlinear function approximation, more powerful representation

.
● One can use Neural Networks to approximate Q(s, a) as follows:
○ Inputs : State s represented by the D-dimensional vector {s1
,s2
,......,sD
}
○ Outputs: Q values for each of the N actions {Q1
, Q2
,........,QN
}
Hidden Layers
s1
s2
s3
sD
Q1
Q2
QN

.
● The loss function for training NN is taken as the difference between Q values predicted
by the DNN and target Q values given by the Bellman’s equation
L = ½ [ (r + γ maxa
Q(s’, a’)) - Q(s,a) ]2
● NN is trained using back propagation as follows:
1. Start an episode of explorations
2. Initialize NN and start from a random state s
3. Do a forward pass of state s through the DNN
and get Q-values for all actions
4. Perform an ε-greedy exploration for choosing an
action a for the current state s
5. Get the next state s’ and reward r from the
environment
6. Pass s’ also through the DNN and compute
maxa
Q(s’, a’)
7. Set the target Q-value for the output node
corresponding to action a to be
r + γ maxa’
Q(s’, a’)
8. For all other nodes, keep the target Q-value
same as that obtained from DNN prediction in
step 2
9. Update the weights using backpropagation
10. Repeat the steps 2-6 till a termination condition
is reached
11. Repeat the episodes till network is trained

.
● High Level TD NN Learning iteration flow
DNN Model
Iteration over episodes
Iteration over exploration

Network Intrusion Detection
.
● Can we use Reinforcement Learning for Network Intrusion Detection?
● Related research works:
○ James Cannady used a CMAC Neural Network and formulated Network Intrusion
Detection as an online learning problem 1
○ Xin Xu studied the problem of host-based intrusion detection as a multi-stage cyber
attack and applied reinforcement learning 2
○ Arturo Servin studied the DDoS attack as a traffic anomaly problem and used
reinforcement learning for detection 3
○ Kleanthis M used a distributed reinforcement network for network intrusion response
4
● None of these have used a DNN for function approximation

.
● Standard dataset for scientific research NSL-KDD Dataset 5
● Dataset contains 4 categories of attacks in a local area network
○ DOS - Denial of Service Attacks
○ R2L - Remote to Local where remote hacker trying to get local user privileges
○ U2R - Hacker operates as a normal user and exploit vulnerabilities
○ Probing - Hacker scans the machine to determine vulnerabilities
● Dataset contains 125, 973 connections for Training and 22, 543 for Testing
● Training set has 53.5% normal connections and 46.5% abnormal connections
● There are 41 features (32 continuous, 3 nominal and 6 binary)
● Eg. Type of protocol (TCP, UDP), port number, packet size, rate of transmission

.
Source of image https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/

.
● However NLS-KDD dataset cannot be used for sequential anomaly detection
○ There is not time stamp. Dataset is not a time series data
○ There is no way one can identify the different connections are from the same
user/hacker or not
○ One could use it for standard anomaly detection problem using reinforcement
learning

.
● However NLS-KDD dataset cannot be used for sequential anomaly detection
○ There is not time stamp. Dataset is not a time series data
○ There is no way one can identify the different connections are from the same
user/hacker or not
○ One could use the dataset for standard anomaly detection problem using
reinforcement learning

.
● Reinforcement Learning Formulation with NSL-KDD Dataset
○ The states are characterized by the 41 features in the data set
○ For every state the agent takes either of the two actions:
■ Send an alert
■ Not send an alert
○ The rewards generated by the environment:
■ +1 if the state is normal and action is not send alert
■ +1 if the state is malicious and action is send alert
■ -1 if the state is malicious and action is not send alert
■ -1 if the state is normal and action is send alert

Implementation using TensorFlow
.
● Creation of the Environment
○ Goal of the environment is to stimulate the reward scheme mentioned for the
NSL-KDD dataset and also supply a new state every time
○ This can be done using the Gym toolkit from Open AI
https://github.com/openai/gym/tree/master/gym/envs
gym-network_intrusion/
README.md
setup.py
gym_network_intrusion/
__init__.py
envs/
__init__.py
network_intrusion_env.py
from gym.envs.registration import register
register(
id='NetworkIntrusion-v0',
entry_point='gym_network_intrusion.envs:NetworkIntr
usionEnv',
)

.
● Creation of the Environment
import gym
from gym import error, spaces, utils
from gym.utils import seeding
class NetworkIntrusionEnv(gym.Env):
def __init__(self):
...
def _step(self, action):
return new_state, reward, episode_over, details
...
def _reset(self):
return initial_state
...
def _get_reward(self, action):

.
● Implementation using TensorFlow
● Two architectures:
○ Deep NN architecture:
■ Discretize continuous variables and use one hot representation
○ Deep and Wide NN architecture:
■ Useful for combining continuous and discrete variables into one NN model
■ Also combines the power of memorization and generalization
■ https://www.tensorflow.org/tutorials/wide_and_deep

.
● Implementation a simple NN using TensorFlow
○ Discretize continuous variables and use one hot representation
○ Used binning (#bins = 5) to convert continuous to categorical
○ There are 226 one hot vectors
○ 3 layer feed forward neural network (226 X 10 X 1)
● Code available at https://github.com/harik68/RL4AD

.
● Model performance (work in progress !)
Baseline DNN-RL Model V0.1
TPR
FPR
Source of image for baseline https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/

Next Steps
.
● Experiment with different discretization scheme or even tile coding
● Experiment with different NN architectures (Deep and Wide)

References
.
1. Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks,
J. Cannadey, 23rd National Information Systems Security Conference (2000)
2. Sequential anomaly detection based on temporal-difference learning: Principles,
models and case studies, Xin Xu, Applied Soft Computing 10 (2010) 859–867
3. Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow, A. Servin
[PDF] york.ac.uk
4. Distributed response to network intrusions using multiagent reinforcement learning, Engineering
Applications of Artificial Intelligence, Volume 41 Issue C, May 2015 Pages 270-284
5. NSL-KDD dataset, Canadian Institute for Cyber Security, University of New Brunswick,
(http://www.unb.ca/cic/datasets/nsl.html)
6. Artificial Intelligence a Modern Approach by Peter Norvig and Stuart J. Russell, Prentice Hall
(2009)

PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM
THANK YOU !
We are hiring Data Scientists, Machine Learning Engineers and Mobile Developers
Apply at career@zighra.com

Anomaly Detection through Reinforcement Learning

Anomaly Detection through Reinforcement Learning

Recomendados

Recomendados

Más contenido relacionado

Similar a Anomaly Detection through Reinforcement Learning

Similar a Anomaly Detection through Reinforcement Learning (20)

Último

Último (20)

Anomaly Detection through Reinforcement Learning