Se ha denunciado esta presentación.

Anomaly Detection through Reinforcement Learning

3

Compartir

Próximo SlideShare
Synopsis viva presentation
Synopsis viva presentation
Cargando en…3
×
1 de 28
1 de 28

Más Contenido Relacionado

Anomaly Detection through Reinforcement Learning

  1. 1. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 1 Anomaly Detection through Reinforcement Learning . . Dr. Hari Koduvely Chief Data Scientist ZIGHRA.COM
  2. 2. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 2 Outline of Talk: . ● Zighra and SensifyID Platform ● Sequential Anomaly Detection Problem ● Introduction to Reinforcement Learning ● Markov Decision Process and Q-Learning ● Function Approximation using Neural Networks ● Application to Network Intrusion Detection Problem ● Implementation using TensorFlow
  3. 3. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 3 ZIGHRA.COM . ● Zighra (https://zighra.com) provides solutions for Continuous Behavioural Authentication & Threat Detection ● Highlights of our SensifyID Platform: ○ Core is an AI based 6-layer Anomaly Detection System combining behavioral biometrics with contextual, social and other signals ○ Cover uses cases such as User Verification, Account Takeover, Remote Attacks and Bot Attacks ○ Can be integrated to any Web, Mobile & IoT application ○ 2 patents granted and 10+ in application stage
  4. 4. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 4 Sequential Anomaly Detection Problem . ● Classical Anomaly Detection Problem is to find patterns in a dataset that do not conform to expected normal behavior ● Formulated as a one-class classification task in machine learning ● In many domains the data distribution changes continuously (concept shift) ● An online learning setting is more ideal to deal with concept shifts current_week_purchase average_weekly_purchase Source of image https://www.linkedin.com/pulse/part-2-keep-simple-machine-learning-algorithms-big-dr-dinesh/
  5. 5. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 5 Sequential Anomaly Detection Problem . ● In Sequential Anomaly Detection problem the goal is to find out if a subsequence of a sequence of events shows anomaly or not ● Each event in isolation would appear to be normal and only the sequence of events would indicate an anomaly ○ Username-Password, Username-Password, Username-Password,.... ○ Login to corporate network in midnight, Access a DB rarely used, Download lot of data, Transfer to USB,...... ● A straightforward supervised learning is not feasible here because of credit assignment problem
  6. 6. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 6 Introduction to Reinforcement Learning . ● In Reinforcement Learning, an autonomous agent interacts with an environment and takes certain actions at in each state st ● The environment in return supplies a reward rt for the action agent performed as a supervision signal and also a new state st+1 Agent Environm ent at st rt , st+1
  7. 7. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 7 Introduction to Reinforcement Learning . ● Reinforcement Learning can be formally defined as a Markov Decision Process ● A Markov Decision Process (MDP) is defined by the 5-tuple {st , at , P(st+1 |st , at ), γ , rt } ○ st - State at time t ○ at - Action in state s ○ P(st+1 |st , at ) - State transition probabilities ○ γ - Discount factor ○ rt - Reward function ● Objective of MDP is to come up with an Optimum Policy that achieves maximum cumulative rewards over long period of time
  8. 8. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 8 Q-Learning and Markov Decision Process . ● Q-value Function Q(s, a) - an estimate of maximum total long term rewards starting from state s and performing action a ● Bellman Equation: Q(s, a) = r(s) + γ maxa’ ∑s’ P(s’ |s, a) Q(s’, a’) Q-value for a state-action pair is the current reward plus the expected Q-value of its successor states ● Central theoretical concept used in almost all formulations of reinforcement learning ● It can be proved that starting from random initial conditions, upon iteration of Bellman equation Q(s, a) will converge to an optimum quality function Q*(s, a) ● Optimum policy is given by Π*(s) = argmaxa Q*(s, a)
  9. 9. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 9 Q-Learning and Markov Decision Process . ● It is difficult to know the state transition probabilities P(st+1 |st , at ) for a given problem ● Bellman’s equation can be cast in a derivative form where transition probabilities are not needed ● Only the actual observed state from the environment is used ● Temporal Difference Learning Algorithm: When an agent makes a transition from state s by performing an action a to state s’, its Q value is updated as follows: Q(s, a) ← Q(s, a) + α [ r(s) + γ maxa’ Q(s’, a’) - Q(s, a) ] α is a learning rate << 1 ● The Q-values are adjusted towards the ideal local equilibrium when Bellman’s equation holds.
  10. 10. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 10 Function Approximation using Neural Networks . ● The Bellman’s equation is a deterministic algorithm ● For problems where the state and action spaces are small one can use a table to represent Q(s, a) ● In many practical applications, state and action spaces are continuous ● One needs an efficient function approximation method for representing Q(s, a) ● Two standard approaches for this are ○ Tile Coding: Partition continuous space into overlapping set of tiles. ➢ Success depends up on the number and width of tiles. ➢ It is a linear function approximation ○ Neural Networks: Nonlinear function approximation, more powerful representation
  11. 11. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 11 Function Approximation using Neural Networks . ● One can use Neural Networks to approximate Q(s, a) as follows: ○ Inputs : State s represented by the D-dimensional vector {s1 ,s2 ,......,sD } ○ Outputs: Q values for each of the N actions {Q1 , Q2 ,........,QN } Hidden Layers s1 s2 s3 sD Q1 Q2 QN
  12. 12. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 12 Function Approximation using Neural Networks . ● The loss function for training NN is taken as the difference between Q values predicted by the DNN and target Q values given by the Bellman’s equation L = ½ [ (r + γ maxa Q(s’, a’)) - Q(s,a) ]2 ● NN is trained using back propagation as follows: 1. Start an episode of explorations 2. Initialize NN and start from a random state s 3. Do a forward pass of state s through the DNN and get Q-values for all actions 4. Perform an ε-greedy exploration for choosing an action a for the current state s 5. Get the next state s’ and reward r from the environment 6. Pass s’ also through the DNN and compute maxa Q(s’, a’) 7. Set the target Q-value for the output node corresponding to action a to be r + γ maxa’ Q(s’, a’) 8. For all other nodes, keep the target Q-value same as that obtained from DNN prediction in step 2 9. Update the weights using backpropagation 10. Repeat the steps 2-6 till a termination condition is reached 11. Repeat the episodes till network is trained
  13. 13. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 13 Function Approximation using Neural Networks . ● High Level TD NN Learning iteration flow DNN Model Iteration over episodes Iteration over exploration
  14. 14. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 14 Network Intrusion Detection . ● Can we use Reinforcement Learning for Network Intrusion Detection? ● Related research works: ○ James Cannady used a CMAC Neural Network and formulated Network Intrusion Detection as an online learning problem 1 ○ Xin Xu studied the problem of host-based intrusion detection as a multi-stage cyber attack and applied reinforcement learning 2 ○ Arturo Servin studied the DDoS attack as a traffic anomaly problem and used reinforcement learning for detection 3 ○ Kleanthis M used a distributed reinforcement network for network intrusion response 4 ● None of these have used a DNN for function approximation
  15. 15. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 15 Network Intrusion Detection . ● Standard dataset for scientific research NSL-KDD Dataset 5 ● Dataset contains 4 categories of attacks in a local area network ○ DOS - Denial of Service Attacks ○ R2L - Remote to Local where remote hacker trying to get local user privileges ○ U2R - Hacker operates as a normal user and exploit vulnerabilities ○ Probing - Hacker scans the machine to determine vulnerabilities ● Dataset contains 125, 973 connections for Training and 22, 543 for Testing ● Training set has 53.5% normal connections and 46.5% abnormal connections ● There are 41 features (32 continuous, 3 nominal and 6 binary) ● Eg. Type of protocol (TCP, UDP), port number, packet size, rate of transmission
  16. 16. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 16 Network Intrusion Detection . Source of image https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/
  17. 17. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 17 Network Intrusion Detection . ● However NLS-KDD dataset cannot be used for sequential anomaly detection ○ There is not time stamp. Dataset is not a time series data ○ There is no way one can identify the different connections are from the same user/hacker or not ○ One could use it for standard anomaly detection problem using reinforcement learning
  18. 18. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 18 Network Intrusion Detection . ● However NLS-KDD dataset cannot be used for sequential anomaly detection ○ There is not time stamp. Dataset is not a time series data ○ There is no way one can identify the different connections are from the same user/hacker or not ○ One could use the dataset for standard anomaly detection problem using reinforcement learning
  19. 19. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 19 Network Intrusion Detection . ● Reinforcement Learning Formulation with NSL-KDD Dataset ○ The states are characterized by the 41 features in the data set ○ For every state the agent takes either of the two actions: ■ Send an alert ■ Not send an alert ○ The rewards generated by the environment: ■ +1 if the state is normal and action is not send alert ■ +1 if the state is malicious and action is send alert ■ -1 if the state is malicious and action is not send alert ■ -1 if the state is normal and action is send alert
  20. 20. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 20 Implementation using TensorFlow . ● Creation of the Environment ○ Goal of the environment is to stimulate the reward scheme mentioned for the NSL-KDD dataset and also supply a new state every time ○ This can be done using the Gym toolkit from Open AI https://github.com/openai/gym/tree/master/gym/envs gym-network_intrusion/ README.md setup.py gym_network_intrusion/ __init__.py envs/ __init__.py network_intrusion_env.py from gym.envs.registration import register register( id='NetworkIntrusion-v0', entry_point='gym_network_intrusion.envs:NetworkIntr usionEnv', )
  21. 21. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 21 Implementation using TensorFlow . ● Creation of the Environment import gym from gym import error, spaces, utils from gym.utils import seeding class NetworkIntrusionEnv(gym.Env): def __init__(self): ... def _step(self, action): return new_state, reward, episode_over, details ... def _reset(self): return initial_state ... def _get_reward(self, action):
  22. 22. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 22 Implementation using TensorFlow . ● Implementation using TensorFlow ● Two architectures: ○ Deep NN architecture: ■ Discretize continuous variables and use one hot representation ○ Deep and Wide NN architecture: ■ Useful for combining continuous and discrete variables into one NN model ■ Also combines the power of memorization and generalization ■ https://www.tensorflow.org/tutorials/wide_and_deep
  23. 23. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 23 Implementation using TensorFlow . ● Implementation a simple NN using TensorFlow ○ Discretize continuous variables and use one hot representation ○ Used binning (#bins = 5) to convert continuous to categorical ○ There are 226 one hot vectors ○ 3 layer feed forward neural network (226 X 10 X 1) ● Code available at https://github.com/harik68/RL4AD
  24. 24. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 24 Implementation using TensorFlow . ● Model performance (work in progress !) Baseline DNN-RL Model V0.1 TPR FPR Source of image for baseline https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/
  25. 25. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 25 Next Steps . ● Experiment with different discretization scheme or even tile coding ● Experiment with different NN architectures (Deep and Wide)
  26. 26. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 26 References . 1. Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks, J. Cannadey, 23rd National Information Systems Security Conference (2000) 2. Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies, Xin Xu, Applied Soft Computing 10 (2010) 859–867 3. Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow, A. Servin [PDF] york.ac.uk 4. Distributed response to network intrusions using multiagent reinforcement learning, Engineering Applications of Artificial Intelligence, Volume 41 Issue C, May 2015 Pages 270-284 5. NSL-KDD dataset, Canadian Institute for Cyber Security, University of New Brunswick, (http://www.unb.ca/cic/datasets/nsl.html) 6. Artificial Intelligence a Modern Approach by Peter Norvig and Stuart J. Russell, Prentice Hall (2009)
  27. 27. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM THANK YOU ! We are hiring Data Scientists, Machine Learning Engineers and Mobile Developers Apply at career@zighra.com

×