SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Self-Learning Systems for Cyber Security
Kim Hammar & Rolf Stadler
kimham@kth.se, stadler@kth.se
KTH Royal Institute of Technology
CDIS Spring Conference 2021
March 24, 2021
1/16
2/16
3/16
4/16
Challenges: Evolving and Automated Attacks
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
Attacker Client 1 Client 2 Client 3
Defender
R1
4/16
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Client 1 Client 2 Client 3
Defender
R1
4/16
Approach: Game Model & Reinforcement Learning
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach:
I Model network attack and defense as
games.
I Use reinforcement learning to learn
policies.
I Incorporate learned policies in
self-learning systems.
Attacker Client 1 Client 2 Client 3
Defender
R1
5/16
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero1
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling2
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory3
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
1
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
2
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
3
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
5/16
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero4
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling5
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory6
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
4
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
5
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
6
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
5/16
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero7
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling8
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory9
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
7
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
8
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
9
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
6/16
Our Work
I Use Case: Intrusion Prevention
I Our Method:
I Emulating computer infrastructures
I System identification and model creation
I Reinforcement learning and generalization
I Results: Learning to Capture The Flag
I Conclusions and Future Work
7/16
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and patching
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Client 1 Client 2 Client 3
Defender
R1
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Emulation System Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Emulated Infrastructures
R1 R1 R1
Emulation
A cluster of machines that runs a virtualized infrastructure
which replicates important functionality of target systems.
I The set of virtualized configurations define a
configuration space Σ = hA, O, S, U, T , Vi.
I A specific emulation is based on a configuration σi ∈ Σ.
9/16
Emulation System Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Emulated Infrastructures
R1 R1 R1
Emulation
A cluster of machines that runs a virtualized infrastructure
which replicates important functionality of target systems.
I The set of virtualized configurations define a
configuration space Σ = hA, O, S, U, T , Vi.
I A specific emulation is based on a configuration σi ∈ Σ.
10/16
Emulation: Execution Times of Replicated Operations
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Normalized
Frequency
Action execution times (costs)
|N| = 25
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 50
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 75
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 100
I Fundamental issue: Computational methods for policy
learning typically require samples on the order of 100k − 10M.
I =⇒ Infeasible to optimize in the emulation system
11/16
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
11/16
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
11/16
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
PT
t=0
γt
rt+1

I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ (o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N |·m·(m+1)
I Large action space |A| = |N | · (m + 1)
I Non-stationary Environment due to presence of adversary
I Generalization
I Finding Effective Security Strategies through
Reinforcement Learning and Self-Playa
a
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020) (CNSM 2020). Izmir,
Turkey, Nov. 2020.
Agent
Environment
at
st+1
rt+1
13/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation 
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning 
Generalization
Policy evaluation 
Model estimation
Automation 
Self-learning systems
14/16
Learning Capture-the-Flag Strategies
0 50 100 150 200 250 300 350 400
# Iteration
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
Avg
Episode
Regret
Episodic regret
Generated Simulation
Test Emulation Env
Train Emulation Env
lower bound π∗
Learning curves (train and eval) of our
proposed method.
Attacker Client 1 Client 2 Client 3
Defender
R1
Evaluation infrastructure.
15/16
Learning Capture-the-Flag Strategies
0 50 100 150 200 250 300 350 400
# Iteration
−0.5
0.0
0.5
1.0
Episodic rewards
0 50 100 150 200 250 300 350 400
# Iteration
0
5
10
15
20
Episodic regret
0 50 100 150 200 250 300 350 400
# Iteration
4
6
8
10
12
14
Episodic steps
Configuration 1 Configuration 2 Configuration 3 π∗
R1
alerts
Gateway
172.18.4.0/24
R1
alerts
Gateway
172.18.3.0/24
R1
Gateway
alerts 172.18.2.0/24
Application
server
Intrusion
detection
system
R1
Gateway
Access
switch
Flag Traffic
generator
Configuration 1 Configuration 3
Configuration 2
16/16
Conclusions  Future Work
I Conclusions:
I We develop a method to find effective strategies for intrusion
prevention
I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement
learning and (5) domain randomization and generalization.
I We show that self-learning can be successfully applied to
network infrastructures.
I Self-play reinforcement learning in Markov security game
I Key challenges: stable convergence, sample efficiency,
complexity of emulations, large state and action spaces
I Our research plans:
I Improving the system identification algorithm  generalization
I Evaluation on real world infrastructures

Más contenido relacionado

La actualidad más candente

Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Mohammed Almeshekah
 

La actualidad más candente (20)

Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Attack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeAttack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu Akindeinde
 
Truth and Consequences
Truth and ConsequencesTruth and Consequences
Truth and Consequences
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
 
Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement
 
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Adversary Emulation and Its Importance for Improving Security Posture in Orga...Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Adversary Emulation and Its Importance for Improving Security Posture in Orga...
 
SmartphoneHacking_Android_Exploitation
SmartphoneHacking_Android_ExploitationSmartphoneHacking_Android_Exploitation
SmartphoneHacking_Android_Exploitation
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A Survey
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
 
Transforming Adversary Emulation Into a Data Analysis Question
Transforming Adversary Emulation Into a Data Analysis QuestionTransforming Adversary Emulation Into a Data Analysis Question
Transforming Adversary Emulation Into a Data Analysis Question
 
Challenges in Applying AI to Enterprise Cybersecurity
Challenges in Applying AI to Enterprise CybersecurityChallenges in Applying AI to Enterprise Cybersecurity
Challenges in Applying AI to Enterprise Cybersecurity
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentation
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine Learning
 
Threat hunting for Beginners
Threat hunting for BeginnersThreat hunting for Beginners
Threat hunting for Beginners
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Financial security and machine learning
Financial security and machine learningFinancial security and machine learning
Financial security and machine learning
 
Lecture #3: Defense Strategies and Techniques: Part II
 Lecture #3: Defense Strategies and Techniques: Part II Lecture #3: Defense Strategies and Techniques: Part II
Lecture #3: Defense Strategies and Techniques: Part II
 

Similar a Self-Learning Systems for Cyber Security

Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 

Similar a Self-Learning Systems for Cyber Security (20)

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadler
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX Slides
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
 
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine LearningTackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
 
Symantec Cyber Security Services: Security Simulation
Symantec Cyber Security Services: Security SimulationSymantec Cyber Security Services: Security Simulation
Symantec Cyber Security Services: Security Simulation
 
Optimizing cybersecurity incident response decisions using deep reinforcemen...
Optimizing cybersecurity incident response decisions using deep  reinforcemen...Optimizing cybersecurity incident response decisions using deep  reinforcemen...
Optimizing cybersecurity incident response decisions using deep reinforcemen...
 
Threat hunting in cyber world
Threat hunting in cyber worldThreat hunting in cyber world
Threat hunting in cyber world
 
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 Ethical Hacking Conference 2015- Building Secure Products -a perspective Ethical Hacking Conference 2015- Building Secure Products -a perspective
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 
Huntpedia
HuntpediaHuntpedia
Huntpedia
 

Más de Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
Kim Hammar
 

Más de Kim Hammar (16)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Self-Learning Systems for Cyber Security

  • 1. Self-Learning Systems for Cyber Security Kim Hammar & Rolf Stadler kimham@kth.se, stadler@kth.se KTH Royal Institute of Technology CDIS Spring Conference 2021 March 24, 2021 1/16
  • 4. 4/16 Challenges: Evolving and Automated Attacks I Challenges: I Evolving & automated attacks I Complex infrastructures Attacker Client 1 Client 2 Client 3 Defender R1
  • 5. 4/16 Goal: Automation and Learning I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods Attacker Client 1 Client 2 Client 3 Defender R1
  • 6. 4/16 Approach: Game Model & Reinforcement Learning I Challenges: I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods I Our Approach: I Model network attack and defense as games. I Use reinforcement learning to learn policies. I Incorporate learned policies in self-learning systems. Attacker Client 1 Client 2 Client 3 Defender R1
  • 7. 5/16 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero1 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling2 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory3 I Markov decision theory I =⇒ Many security operations involves strategic decision making 1 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 2 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 3 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 8. 5/16 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero4 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling5 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory6 I Markov decision theory I =⇒ Many security operations involves strategic decision making 4 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 5 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 6 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 9. 5/16 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero7 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling8 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory9 I Markov decision theory I =⇒ Many security operations involves strategic decision making 7 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 8 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 9 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 10. 6/16 Our Work I Use Case: Intrusion Prevention I Our Method: I Emulating computer infrastructures I System identification and model creation I Reinforcement learning and generalization I Results: Learning to Capture The Flag I Conclusions and Future Work
  • 11. 7/16 Use Case: Intrusion Prevention I A Defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and patching I An Attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Client 1 Client 2 Client 3 Defender R1
  • 12. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 13. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 14. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 15. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 16. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 17. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 18. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 19. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 20. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 21. 9/16 Emulation System Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Emulated Infrastructures R1 R1 R1 Emulation A cluster of machines that runs a virtualized infrastructure which replicates important functionality of target systems. I The set of virtualized configurations define a configuration space Σ = hA, O, S, U, T , Vi. I A specific emulation is based on a configuration σi ∈ Σ.
  • 22. 9/16 Emulation System Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Emulated Infrastructures R1 R1 R1 Emulation A cluster of machines that runs a virtualized infrastructure which replicates important functionality of target systems. I The set of virtualized configurations define a configuration space Σ = hA, O, S, U, T , Vi. I A specific emulation is based on a configuration σi ∈ Σ.
  • 23. 10/16 Emulation: Execution Times of Replicated Operations 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Normalized Frequency Action execution times (costs) |N| = 25 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 50 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 75 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 100 I Fundamental issue: Computational methods for policy learning typically require samples on the order of 100k − 10M. I =⇒ Infeasible to optimize in the emulation system
  • 24. 11/16 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 25. 11/16 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 26. 11/16 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 27. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 28. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 29. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 30. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E PT t=0 γt rt+1 I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N |·m·(m+1) I Large action space |A| = |N | · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization I Finding Effective Security Strategies through Reinforcement Learning and Self-Playa a Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and Self-Play”. In: International Conference on Network and Service Management (CNSM 2020) (CNSM 2020). Izmir, Turkey, Nov. 2020. Agent Environment at st+1 rt+1
  • 31. 13/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning Generalization Policy evaluation Model estimation Automation Self-learning systems
  • 32. 14/16 Learning Capture-the-Flag Strategies 0 50 100 150 200 250 300 350 400 # Iteration 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Avg Episode Regret Episodic regret Generated Simulation Test Emulation Env Train Emulation Env lower bound π∗ Learning curves (train and eval) of our proposed method. Attacker Client 1 Client 2 Client 3 Defender R1 Evaluation infrastructure.
  • 33. 15/16 Learning Capture-the-Flag Strategies 0 50 100 150 200 250 300 350 400 # Iteration −0.5 0.0 0.5 1.0 Episodic rewards 0 50 100 150 200 250 300 350 400 # Iteration 0 5 10 15 20 Episodic regret 0 50 100 150 200 250 300 350 400 # Iteration 4 6 8 10 12 14 Episodic steps Configuration 1 Configuration 2 Configuration 3 π∗ R1 alerts Gateway 172.18.4.0/24 R1 alerts Gateway 172.18.3.0/24 R1 Gateway alerts 172.18.2.0/24 Application server Intrusion detection system R1 Gateway Access switch Flag Traffic generator Configuration 1 Configuration 3 Configuration 2
  • 34. 16/16 Conclusions Future Work I Conclusions: I We develop a method to find effective strategies for intrusion prevention I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement learning and (5) domain randomization and generalization. I We show that self-learning can be successfully applied to network infrastructures. I Self-play reinforcement learning in Markov security game I Key challenges: stable convergence, sample efficiency, complexity of emulations, large state and action spaces I Our research plans: I Improving the system identification algorithm generalization I Evaluation on real world infrastructures