How to Troubleshoot Apps for the Modern Connected Worker
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT Environment - Hammar & Stadler
1. 1/14
An Online Framework for Adapting Security
Policies in Dynamic IT Environments
International Conference on Network and Service Management
Thessaloniki, Greece, Oct 31 - Nov 4 2022
Kim Hammar & Rolf Stadler
kimham@kth.se stadler@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
3. 2/14
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
4. 2/14
Approach: Self-Learning Security Systems
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach: Self-Learning
Systems:
I real-time telemetry
I stream processing
I theories from control/game/decision
theory
I computational methods (e.g.
dynamic programming &
reinforcement learning)
I automated network management
(SDN, NFV, etc.)
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
5. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
7. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
8. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
9. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
10. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
11. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
12. 3/14
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
13. 4/14
Our Previous Work
I Finding Effective Security Strategies through Reinforcement
Learning and Self-Play1
I Learning Intrusion Prevention Policies through Optimal
Stopping2
I A System for Interactive Examination of Learned Security
Policies3
I Intrusion Prevention Through Optimal Stopping4
I Learning Security Strategies through Game Play and Optimal
Stopping5
1
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020). Izmir, Turkey, 2020.
2
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
3
Kim Hammar and Rolf Stadler. “A System for Interactive Examination of Learned Security Policies”. In:
NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. 2022, pp. 1–3. doi:
10.1109/NOMS54207.2022.9789707.
4
Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on
Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781.
5
Kim Hammar and Rolf Stadler. “Learning Security Strategies through Game Play and Optimal Stopping”. In:
Proceedings of the ML4Cyber workshop, ICML 2022, Baltimore, USA, July 17-23, 2022. PMLR, 2022.
14. 5/14
This Paper: Learning in Dynamic IT Environments6
I Challenge: operational IT environments are dynamic
I Components may fail, load patterns can shift, etc.
I Contribution: we present a framework for learning and
updating security policies in dynamic IT environments
Policy Learning
Agent
Environment
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
and Attack
Scenarios
Target
System
Model
M
Traces h1, h2, . . .
Policy π
Configuration I
and change events
Policy π
Policy evaluation &
Data collection
Automated
security policy
6
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
15. 6/14
Learning in Dynamic IT Environments
Algorithm 1: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
16. 6/14
Learning in Dynamic IT Environments
Algorithm 2: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
17. 6/14
The Digital Twin
Algorithm 3: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
18. 7/14
Creating a Digital Twin of the Target System
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
19. 7/14
Creating a Digital Twin of the Target System
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
20. 7/14
Creating a Digital Twin of the Target System
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
21. 7/14
Creating a Digital Twin of the Target System
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
22. 7/14
Creating a Digital Twin of the Target System
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
23. 7/14
Creating a Digital Twin of the Target System
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
24. 7/14
The System Identification Process
Algorithm 4: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
25. 8/14
System Model
I We model the evolution of the system with a discrete-time
dynamical system.
I We assume a Markovian system with stochastic dynamics and
partial observability.
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Controller
action at
observation
ot
state
st
belief
bt
26. 9/14
System Identification
ˆ
f
O
(o
t
|0)
Probability distribution of # IPS alerts weighted by priority ot
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
ˆ
f
O
(o
t
|1)
Fitted model Distribution st = 0 Distribution st = 1
I The distribution fO of defender observations (system metrics)
is unknown.
I We fit a Gaussian mixture distribution ˆ
fO as an estimate of fO
in the target system.
I For each state s, we obtain the conditional distribution ˆ
fO|s
through expectation-maximization.
27. 9/14
The Policy Learning Process
Algorithm 5: High-level execution of the framework
Input: emulator: method to create digital twin
ϕ: system identification algorithm
φ: policy learning algorithm
1 Algorithm (emulator, ϕ, φ)
2 do in parallel
3 DigitalTwin(emulator)
4 SystemIdProcess(ϕ)
5 LearningProcess(φ)
6 end
1 Procedure DigitalTwin(emulator)
2 Loop
3 π ← ReceiveFromLearningProcess()
4 ht ← CollectTrace(π)
5 SendToSystemIdProcess(ht)
6 UpdateDigitalTwin(emulator)
7 EndLoop
1 Procedure SystemIdProcess(ϕ)
2 Loop
3 h1, h2, . . . ← ReceiveFromDigitalTwin()
4 M ← ϕ(h1, h2, . . .) // estimate model
5 SendToLearningProcess(M)
6 EndLoop
1 Procedure LearningProcess(φ)
2 Loop
3 M ← ReceiveFromSystemIdProcess()
4 π ← φ(M) // learn policy π
5 SendToDigitalTwin(π)
6 EndLoop
28. 10/14
Learning Effective Defender Policies
I Optimization problem:
I Each stopping time = one
defensive action
I Maximize reward of
stopping times
τL, τL−1, . . . , τ1:
π∗
l ∈ arg max
πl
Eπl
" τL−1
X
t=1
γt−1
RC
st ,st+1,L
+ γτL−1
RS
sτL
,sτL+1,L + . . . +
τ1−1
X
t=τ2+1
γt−1
RC
st ,st+1,1 + γτ1−1
RS
sτ1
,sτ1+1,1
#
I Optimization methods:
Reinforcement learning,
dynamic programming,
computational game theory,
etc.
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
intrusion starts
Qt = 1
final stop
lt = 0
intrusion
prevented
lt = 0
29. 11/14
Putting it all together: Learning in Dynamic Environments
1. Changes in the target system are monitored.
2. When changes are detected, the emulation is updated.
3. Attack and defense scenarios are run in the emulation to
collect data.
4. The system model and the defender policy are updated
periodically with the new data.
Policy Learning
Agent
Environment
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
and Attack
Scenarios
Target
System
Model
M
Traces h1, h2, . . .
Policy π
Configuration I
and change events
Policy π
Policy evaluation &
Data collection
Automated
security policy
30. 12/14
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
31. 13/14
Results: Learning in a Dynamic IT Environment
200
400
600
#
clients
5000
10000
E[
Ẑ]
0 10 20 30 40 50
execution time (hours)
0
20
Avg
reward
E
h
Ẑt,O|1
i
E
h
Ẑt,O|0
i
E
h
Ẑ
[10]
t,O|1
i
E
h
Ẑ
[10]
t,O|0
i
upper bound Our framework [10]
Results from running our framework for 50 hours in the digital
twin/emulation.
32. 14/14
Conclusions
I We present a framework for learning
and updating security policies in
dynamic IT environments
I We apply the method to an intrusion
prevention use case.
I We show numerical results in a
realistic emulation environment.
I We design a solution framework guided
by the theory of optimal stopping.
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation &
Learning