SlideShare una empresa de Scribd logo
1 de 90
Descargar para leer sin conexión
1/54
Learning Optimal Intrusion Responses for
IT Infrastructures via Decomposition
Visit to Princeton University
Kim Hammar
kimham@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
May 17, 2023
2/54
Use Case: Intrusion Response
I A defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3/54
Automated Intrusion Response: Current Landscape
Levels of security automation
No automation.
Manual detection.
Manual prevention.
No alerts.
No automatic responses.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Manual detection.
Manual prevention.
Audit logs.
Security tools.
Partial automation.
System has automated functions
for detection/prevention
but requires manual
updating and configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
Automated attack detection.
Automated attack mitigation.
4/54
Can we use decision theory and learning-based methods to
automatically find effective security strategies?1
π
Σ Σ
security
objective
feedback
control
input
target
system
security
indicators
disturbance
1
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020). Izmir, Turkey, 2020,
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021). Izmir, Turkey, 2021, Kim Hammar
and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on Network and
Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781, Kim Hammar and
Rolf Stadler. Learning Near-Optimal Intrusion Responses Against Dynamic Attackers. 2023. doi:
10.48550/ARXIV.2301.06085. url: https://arxiv.org/abs/2301.06085.
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
5/54
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/54
Creating a Digital Twin of the Target Infrastructure
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6/54
Theoretical Analysis and Learning of Defender Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
7/54
Creating a Digital Twin of the Target Infrastructure
I An infrastructure is
defined by its
configuration.
I Set of configurations
supported by our
framework can be
seen as a
configuration space
I The configuration
space defines the
class of
infrastructures for
which we can create
digital twins.
Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Digital Twins
R1 R1 R1
8/54
The Target Infrastructure
I 64 nodes. 24 ovs switches, 3
gateways. 6 honeypots. 8 application
servers. 4 administration servers. 15
compute servers.
I Topology shown to the right
I 11 vulnerabilities (cve-2010-0426,
cve-2015-3306, cve-2015-5602, etc.)
I 4 zones: dmz, r&d zone, admin
zone, quarantine zone
I 9 workflows
I Management: 1 sdn controller, 1
Kafka server, 1 elastic server.
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
9/54
Emulating Physical Components
I We emulate physical components with
Docker containers
I Focus on linux-based systems
I The containers include everything
needed to emulate the host: a runtime
system, code, system tools, system
libraries, and configurations.
I Examples of containers: IDPS
container, client container, attacker
container, CVE-2015-1427 container,
Open vSwitch containers etc.
Containers
Physical server
Operating system
Docker engine
CSLE
10/54
Emulating Network Connectivity
Management node 1
Emulated IT infrastructure
Management node 2
Emulated IT infrastructure
Management node n
Emulated IT infrastructure
VXLAN VXLAN . . . VXLAN
IP network
I We emulate network connectivity on the same host using
network namespaces.
I Connectivity across physical hosts is achieved using VXLAN
tunnels with Docker swarm.
11/54
Emulating Network Conditions
I We do traffic shaping using
NetEm in the Linux kernel
I Emulate internal
connections are full-duplex
& loss-less with bit
capacities of 1000 Mbit/s
I Emulate external
connections are full-duplex
with bit capacities of 100
Mbit/s & 0.1% packet loss
in normal operation and
random bursts of 1% packet
loss
User space
. . .
Application processes
Kernel
TCP/UDP
IP/Ethernet/802.11
OS
TCP/IP
stack
Queueing
discipline
Device driver
queue (FIFO)
NIC
Netem config:
latency,
jitter, etc.
Sockets
12/54
Emulating Actors
I We emulate client arrivals
with Poisson processes
I We emulate client
interactions with load
generators
I Attackers are emulated by
automated programs that
select actions from a
pre-defined set
I Defender actions are
emulated through a custom
gRPC API.
Markov Decision Process
s1,1 s1,2 s1,3 . . . s1,4
s2,1 s2,2 s2,3 . . . s2,4
Digital Twin
. . .
Virtual
network
Virtual
devices
Emulated
services
Emulated
actors
IT Infrastructure
Configuration
& change events
System traces
Verified security policy
Optimized security policy
13/54
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
14/54
Outline
I Use Case & Digital Twin
I Use case: intrusion response
I Digital twin for data collection & evaluation
I System Model
I Discrete-time Markovian dynamical system
I Partially observed stochastic game
I System Decomposition
I Additive subgames on the workflow-level
I Optimal substructure on component-level
I Learning Near-Optimal Intrusion Responses
I Scalable learning through decomposition
I Digital twin for system identification & evaluation
I Efficient equilibrium approximation
I Conclusions & Future Work
14/54
Outline
I Use Case & Digital Twin
I Use case: intrusion response
I Digital twin for data collection & evaluation
I System Model
I Discrete-time Markovian dynamical system
I Partially observed stochastic game
I System Decomposition
I Additive subgames on the workflow-level
I Optimal substructure on component-level
I Learning Near-Optimal Intrusion Responses
I Scalable learning through decomposition
I Digital twin for system identification & evaluation
I Efficient equilibrium approximation
I Conclusions & Future Work
14/54
Outline
I Use Case & Digital Twin
I Use case: intrusion response
I Digital twin for data collection & evaluation
I System Model
I Discrete-time Markovian dynamical system
I Partially observed stochastic game
I System Decomposition
I Additive subgames on the workflow-level
I Optimal substructure on component-level
I Learning Near-Optimal Intrusion Responses
I Scalable learning through decomposition
I Digital twin for system identification & evaluation
I Efficient equilibrium approximation
I Conclusions & Future Work
14/54
Outline
I Use Case & Digital Twin
I Use case: intrusion response
I Digital twin for data collection & evaluation
I System Model
I Discrete-time Markovian dynamical system
I Partially observed stochastic game
I System Decomposition
I Additive subgames on the workflow-level
I Optimal substructure on component-level
I Learning Near-Optimal Intrusion Responses
I Scalable learning through decomposition
I Digital twin for system identification & evaluation
I Efficient equilibrium approximation
I Conclusions & Future Work
14/54
Outline
I Use Case & Digital Twin
I Use case: intrusion response
I Digital twin for data collection & evaluation
I System Model
I Discrete-time Markovian dynamical system
I Partially observed stochastic game
I System Decomposition
I Additive subgames on the workflow-level
I Optimal substructure on component-level
I Learning Near-Optimal Intrusion Responses
I Scalable learning through decomposition
I Digital twin for system identification & evaluation
I Efficient equilibrium approximation
I Conclusions & Future Work
15/54
System Model
I G = h{gw} ∪ V, Ei: directed graph
representing the virtual infrastructure
I V: finite set of virtual components.
I E: finite set of component
dependencies.
I Z: finite set of zones.
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
15/54
System Model
I G = h{gw} ∪ V, Ei: directed graph
representing the virtual infrastructure
I V: finite set of virtual components.
I E: finite set of component
dependencies.
I Z: finite set of zones.
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
15/54
System Model
I G = h{gw} ∪ V, Ei: directed graph
representing the virtual infrastructure
I V: finite set of virtual components.
I E: finite set of component
dependencies.
I Z: finite set of zones.
r&d zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
16/54
State Model
I Each i ∈ V has a state
vt,i = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St.
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
16/54
State Model
I Each i ∈ V has a state
vt,i = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St.
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
16/54
State Model
I Each i ∈ V has a state
vt,i = (v
(Z)
t,i
|{z}
D
, v
(I)
t,i , v
(R)
t,i
| {z }
A
)
I System state st = (vt,i )i∈V ∼ St.
I Markovian time-homogeneous
dynamics:
st+1 ∼ f (· | St, At)
At = (A
(A)
t , A
(D)
t ) are the actions.
s1 s2 s3
s4 s5 s4
.
.
.
.
.
.
.
.
.
17/54
Workflow Model
I Services are connected into workflows W = {w1, . . . , w|W|}.
17/54
Workflow Model
I Services are connected into workflows W = {w1, . . . , w|W|}.
gw fw idps lb
http
servers
auth
server
search
engine
db
cache
Dependency graph of an example workflow representing a web
application; gw, fw, idps, lb, and db are acronyms for gateway,
firewall, intrusion detection and prevention system, load balancer, and
database, respectively.
18/54
Workflow Model
I Services are connected into
workflows
W = {w1, . . . , w|W|}.
I Each w ∈ W is realized as a
directed acyclic subgraph (dag)
Gw = h{gw} ∪ Vw, Ewi of G
I W = {w1, . . . , w|W|} induces a
partitioning
V =
[
wi ∈W
Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅
Zone a
Zone b Zone c
gw
1 2 3
4 5 6
7
A workflow dag
18/54
Workflow Model
I Services are connected into
workflows
W = {w1, . . . , w|W|}.
I Each w ∈ W is realized as a
directed acyclic subgraph (dag)
Gw = h{gw} ∪ Vw, Ewi of G
I W = {w1, . . . , w|W|} induces a
partitioning
V =
[
wi ∈W
Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅
Zone a
Zone b Zone c
gw
1 2 3
4 5 6
7
A workflow dag
19/54
Client Model
Client population
. . .
Arrival rate λ Departure
Service time µ
. . .
.
.
.
.
.
.
.
.
.
w1 w2 w|W|
Workflows (Markov processes)
I Homogeneous client population
I Clients arrive according to Po(λ), Service times Exp(1
µ)
I Workflow selection: uniform
I Workflow interaction: Markov process
20/54
Observation Model
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
20/54
Observation Model
I idpss inspect network traffic and
generate alert vectors:
ot ,

ot,1, . . . , ot,|V|

∈ N
|V|
0
ot,i is the number of alerts related to
node i ∈ V at time-step t.
I ot = (ot,1, . . . , ot,|V|) is a realization
of the random vector Ot with joint
distribution Z
idps
idps
idps
idps
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
21/54
probability
ZO1
ZO2
ZO3
ZO4
ZO5
ZO6
ZO7
ZO8
probability
ZO9
ZO10
ZO11
ZO12
ZO13
ZO14
ZO15
ZO16
probability
ZO17
ZO18
ZO19
ZO20
ZO21
ZO22
ZO23
ZO24
probability
ZO25
ZO26
ZO27
ZO28
ZO29
ZO30
ZO31
ZO32
probability
ZO33
ZO34
ZO35
ZO36
ZO37
ZO38
ZO39
ZO40
probability
ZO41
ZO42
ZO43
ZO44
ZO45
ZO46
ZO47
ZO48
probability
ZO49
ZO50
ZO51
ZO52
ZO53
ZO54
ZO55
ZO56
250 500 750
O
probability
ZO57
250 500 750
O
ZO58
250 500 750
O
ZO59
250 500 750
O
ZO60
250 500 750
O
ZO61
250 500 750
O
ZO62
250 500 750
O
ZO63
250 500 750
O
ZO64
Distributions of # alerts weighted by priority ZOi
(Oi | S
(D)
i , A
(A)
i ) per node i ∈ V
no intrusion intrusion
22/54
Monitoring and Telemetry
Devices
Event bus
Security Policy
Storage Systems
Control actions
Data pipelines
Events
I Emulated devices run monitoring agents that periodically
push metrics to a Kafka event bus.
I The data in the event bus is consumed by data pipelines that
process the data and write to storage systems.
I The processed data is used by an automated security policy to
decide on control actions to execute in the digital twin.
23/54
Feature Selection
I Our framework collects 100s of metrics every time-step.
I We focus on the IDPS alert metric as it provides the most
information for detecting the type of attacks we consider.
0 2500 5000 7500
Probability
Weighted idps alerts
DKL(ZO|0 k ZO|1) = 0.49
−100 0 100
New login attempts
DKL(ZO|0 k ZO|1) = 0.07
−500 0
# New processes
DKL(ZO|0 k ZO|1) = 0.01
−100 0 100
Probability
New tcp connections
DKL(ZO|0 k ZO|1) = 0.01
0 10 20 30
# Blocks written to disk
DKL(ZO|0 k ZO|1) = 0.12
0 20 40
# Blocks read from disk
DKL(ZO|0 k ZO|1) = 0.0
ZO|s=0 (no intrusion) ZO|s=1 (intrusion)
24/54
Defender Model
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App server
Defender
Revoke
certificates
Blacklist
IP
1) Server migration 2) Flow migration and blocking
3) Shut down server 4) Access control
24/54
Defender Model
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App server
Defender
Revoke
certificates
Blacklist
IP
1) Server migration 2) Flow migration and blocking
3) Shut down server 4) Access control
24/54
Defender Model
I Defender action:
a
(D)
t ∈ {0, 1, 2, 3, 4}|V|
I 0 means do nothing. 1 − 4 correspond
to defensive actions (see fig)
I A defender strategy is a function
πD ∈ ΠD : HD → ∆(AD), where
h
(D)
t = (s
(D)
1 , a
(D)
1 , o1, . . . , a
(D)
t−1, s
(D)
t , ot) ∈ HD
I Objective: (i) maintain workflows; and
(ii) stop a possible intrusion:
J ,
T
X
t=1
γt−1
η
|W|
X
i=1
uW(wi , st)
| {z }
workflows utility
− (1 − η)
|V|
X
j=1
cI(st,j, at,j)
| {z }
intrusion and defense costs
!
dmz
rd
zone
admin
zone
Old path
New path
Honeypot App server
Defender
Revoke
certificates
Blacklist
IP
1) Server migration 2) Flow migration and blocking
3) Shut down server 4) Access control
25/54
Attacker Model
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
25/54
Attacker Model
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
25/54
Attacker Model
I Attacker action: a
(A)
t ∈ {0, 1, 2, 3}|V|
I 0 means do nothing. 1 − 3 correspond
to attacks (see fig)
I An attacker strategy is a function
πA ∈ ΠA : HA → ∆(AA), where HA
is the space of all possible attacker
histories
h
(A)
t = (s
(A)
1 , a
(A)
1 , o1, . . . , a
(A)
t−1, s
(A)
t , ot) ∈ HA
I Objective: (i) disrupt workflows; and
(ii) compromise nodes:
− J
.
.
.
Attacker
login attempts
configure
Automated
system
Server
2) Brute-force
1) Reconnaissance
3) Code execution
Attacker Server
TCP SYN
TCP SYN ACK
port open
Attacker Service Server
malicious
request inject code
execution
25/54
The Intrusion Response Problem
maximize
πD∈ΠD
minimize
πA∈ΠA
E(πD,πA) [J] (1a)
subject to s
(D)
t+1 ∼ fD · | A
(D)
t , A
(D)
t

∀t (1b)
s
(A)
t+1 ∼ fA · | S
(A)
t , At

∀t (1c)
ot+1 ∼ Z · | S
(D)
t+1, A
(A)
t ) ∀t (1d)
a
(A)
t ∼ πA · | H
(A)
t

, a
(A)
t ∈ AA(st) ∀t (1e)
a
(D)
t ∼ πD · | H
(D)
t

, a
(D)
t ∈ AD ∀t (1f)
where E(πD,πA) denotes the expectation of the random vectors
(St, Ot, At)t∈{1,...,T} under the strategy profile (πD, πA).
(1) can be formulated as a zero-sum Partially Observed Stochastic
Game with Public Observations (a PO-POSG):
Γ = hN, (Si )i∈N , (Ai )i∈N , (fi )i∈N , u, γ, (b
(i)
1 )i∈N , O, Zi
26/54
Existence of a Solution
Theorem
Given the po-posg Γ (2), the following holds:
(A) Γ has a mixed Nash equilibrium and a value function
V ∗ : BD × BA → R that maps each possible initial pair of
belief states (b
(D)
1 , bA
1 ) to the expected utility of the defender
in the equilibrium.
(B) For each strategy pair (πA, πD) ∈ ΠA × ΠD, the best response
sets BD(πA) and BA(πD) are non-empty and correspond to
optimal strategies in two Partially Observed Markov Decision
Processes (pomdps): M (D) and M (A). Further, a pair of
pure best response strategies (π̃D, π̃A) ∈ BD(πA) × BA(πD)
and a pair of value functions (V ∗
D,πA
, V ∗
A,πD
) exist.
27/54
The Curse of Dimensionality
I While (1) has a solution (i.e the game Γ has a value (Thm
1)), computing it is intractable since the state, action, and
observation spaces of the game grow exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
28/54
The Curse of Dimensionality
I While (1) has a solution (i.e the game Γ has a value (Thm
1)), computing it is intractable since the state, action, and
observation spaces of the game grow exponentially with |V|.
1 2 3 4 5
104
105
2
105
|S|
|O|
|Ai |
|V|
Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
We tackle the scability challenge with decomposition
29/54
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
Does not directly
depend on the state or
action of a node
down here
30/54
Intuitively..
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65
The optimal
action here...
But they are
not completely
independent either.
How can we
exploit this
structure?
Does not directly
depend on the state
or action of a node
down here
31/54
System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
31/54
System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
32/54
Additive Structure Across Workflows (Intuition)
“=”
I If there is no path between i and j in G, then i and j are
independent in the following sense:
I Compromising i has no affect on the state of j.
I Compromising i does not make it harder or easier to
compromise j.
I Compromising i does not affect the service provided by j.
I Defending i does not affect the state of j.
I Defending i does not affect the service provided by j.
33/54
Additive Structure Across Workflows
Definition (Transition independence)
A set of nodes Q are transition independent iff the transition
probabilities factorize as
f (St+1 | St, At) =
Y
i∈Q
f (St+1,i | St,i , At,i )
Definition (Utility independence)
A set of nodes Q are utility independent iff there exists functions
u1, . . . , u|Q| such that the utility function u decomposes as
u(St, At) = f (u1(St,1, At,1), . . . , u1(St,|Q|, At,Q))
and
ui ≤ u0
i ⇐⇒ f (u1, . . . , ui , . . . , u|Q|) ≤ f (u1, . . . , u0
i , . . . , u|Q|)
34/54
Additive Structure Across Workflows
Theorem (Additive structure across workflows)
(A) All nodes V in the game Γ are transition independent.
(B) If there is no path between i and j in the topology graph G,
then i and j are utility independent.
Corollary
Γ decomposes into |W| additive subproblems that can be solved
independently and in parallel.
π
(w1)
k
π
(w2)
k
π
(w|W|)
k
ot,w1
ot,w2
ot,w|W|
.
.
.
⊕
a
(k)
w1
a
(k)
w2
a
(k)
w|W|
a
(k)
t
35/54
Additive Structure Across Workflows: Minimal Example
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
a) IT infrastructure b) Transition dependencies
St, At St+1, Ot+1
S
(D)
t+1,1
S
(D)
t,1
A
(D)
t,1 S
(A)
t+1,1
S
(A)
t,1
Ot,1
A
(A)
t,1
S
(D)
t+1,2
S
(D)
t,2
A
(D)
t,2 S
(A)
t+1,2
S
(A)
t,2
Ot,2
A
(A)
t,2
S
(D)
t+1,3
S
(D)
t,3
A
(D)
t,3 S
(A)
t+1,3
S
(A)
t,3
Ot,1
A
(A)
t,3
c) Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
35/54
System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
36/54
Optimal Substructure Within a Workflow
I Nodes in the same workflow are utility
dependent.
I =⇒ Locally-optimal strategies for
each node can not simply be added
together to obtain an optimal strategy
for the workflow.
I However, the locally-optimal strategies
satisfy the optimal substructure
property.
I =⇒ there exists an algorithm for
constructing an optimal workflow
strategy from locally-optimal
strategies for each node.
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
IT infrastructure
Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
36/54
Optimal substructure within a workflow
I Nodes in the same workflow are utility
dependent.
I =⇒ Locally-optimal strategies for
each node can not simply be added
together to obtain an optimal strategy
for the workflow.
I However, the locally-optimal strategies
satisfy the optimal substructure
property.
I =⇒ there exists an algorithm for
constructing an optimal workflow
strategy from locally-optimal
strategies for each node.
Zone 1 Zone 2
1
2
3
gw
w1
w2
V = {1, 2, 3},
E = {(1, 2)},
W = {w1, w2},
Z = {1, 2}
IT infrastructure
Utility dependencies
St, At Ut
S
(D)
t,1
A
(D)
t,1
Ut,1
S
(A)
t,1
S
(D)
t,2
A
(D)
t,2
Ut,2
S
(A)
t,2
S
(D)
t,3
A
(D)
t,3
Ut,3
S
(A)
t,3
37/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
Algorithm 1: Algorithm for combining local strate-
gies
1 Input: Γ: the game,
2 πk: a vector with local strategies
3 Output: (πD, πA): global game strategies
4 Algorithm composite-strategy(Γ, πk)
5 for player k ∈ N do
6 πk ←λ (s
(k)
t , b
(k)
t )
7 a
(k)
t = ()
8 for workflow w ∈ W do
9 for node
i ∈ topological-sort(Vw) do
10 a
(k,i)
t ← π
(i)
k (s
(k)
t , b
(k)
t )
11 if gw 6→
a
(k)
t
t i then
12 a
(k,i)
t ← ⊥
13 end
14 a
(k)
t = a
(k)
t ⊕ a
(k,i)
t
15 end
16 end
17 return a
(k)
t
18 end
19 return (πD, πA)
π
(1)
k
→1
ot,1 a
(k)
t,1 a
(k),0
t,1
π
(2)
k
→2
⊕
ot,2 a
(k)
t,2 a
(k),0
t,2
π
(3)
k
→3
⊕
ot,3 a
(k)
t,3 a
(k),0
t,3
.
.
.
π
(|Vw|)
k
→|Vw|
⊕
ot,|Vw|
a
(k)
t,|Vw| a
(k),0
t,|Vw|
⊕ a
(k)
w
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
(π
(i)
D )i∈Vw : local strategies in the same workflow w ∈ W
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
a
(D,1)
t ∼ π
(1)
D
Step 1; select action for node 1 according to its local strategy
Workflow action:
(a
(D,1)
t )
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
a
(D,2)
t ∼ π
(2)
D
Step 2; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
move to the next node in the topological ordering (i.e. 2);
select the action for the next node according to its local strategy.
Workflow action:
(a
(D,1)
t , a
(D,2)
t )
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
Step 3; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
move to the next node in the topological ordering (i.e. 3);
select the action for the next node according to its local strategy.
Workflow action:
(a
(D,1)
t , a
(D,2)
t )
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
a
(D,4)
t = 0
Step 3; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
move to the next node in the topological ordering (i.e. 3);
select the action for the next node according to its local strategy.
Workflow action:
(a
(D,1)
t , a
(D,2)
t , ·, 0)
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
a
(D,3)
t ∼ π
(3)
D
Step 3; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
move to the next node in the topological ordering (i.e. 3);
select the action for the next node according to its local strategy.
Workflow action:
(a
(D,1)
t , a
(D,2)
t , a
(D,3)
t , 0)
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
a
(D,5)
t ∼ π
(5)
D
Step 4; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
move to the next node in the topological ordering (i.e. 5);
select the action for the next node according to its local strategy.
Workflow action:
(a
(D,1)
t , a
(D,2)
t , a
(D,3)
t , 0, a
(D,5)
t )
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
Step 5; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
Workflow action:
(a
(D,1)
t , a
(D,2)
t , a
(D,3)
t , 0, a
(D,5)
t )
38/54
Algorithm for Combining Locally-Optimal Node Strategies
into Optimal Workflow Strategies
π
(2)
D
π
(1)
D
π
(3)
D
π
(4)
D
π
(5)
D
π
(6)
D
a
(D,6)
t = 0
Step 5; update the topology based on the previous local action;
select action a = 0 for unreachable nodes;
Workflow action:
(a
(D,1)
t , a
(D,2)
t , a
(D,3)
t , 0, a
(D,5)
t , 0)
39/54
Computational Benefits of Decomposition
I ∴ we can obtain an optimal (best response) strategy for the
full game Γ by combining the solutions to V simpler
subproblems that can be solved in parallel and have
significantly smaller state, observation, and action spaces.
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
101
102
103
104
105
log10 |S| log10 |O| log10 |Ak|
log10 |S(i)| log10 |O(i)| log10 |A
(i)
k |
|V|
4 oom
3 oom
2 oom
Space complexity comparison between the full game and the decomposed
game.
40/54
System Decomposition
To avoid explicitly enumerating the very large state, observation,
and action spaces of Γ, we exploit three structural properties.
1. Additive structure across workflows.
I The game decomposes into additive subgames on the
workflow-level, which means that the strategy for each
subgame can be optimized independently
2. Optimal substructure within a workflow.
I The subgame for each workflow decomposes into subgames on
the node-level that satisfy the optimal substructure property
3. Threshold properties of local defender strategies.
I The optimal node-level strategies for the defender exhibit
threshold structures, which means that they can be estimated
efficiently
41/54
Can we Solve the Local Problems with Dynamic
Programming?
42/54
Scalable learning through decomposition (Simulation)
0.00
0.25
0.50
0.75
1.00
avg
defender
utility
|V| = 2, |W| = 1 |V| = 4, |W| = 2 |V| = 8, |W| = 2 |V| = 16, |W| = 2
0 50 100 150
running time (min)
0.00
0.25
0.50
0.75
1.00
avg
attacker
utility
0 50 100 150
running time (min)
0 50 100 150
running time (min)
0 50 100 150
running time (min)
πdecomposition
D πworkflow
D πfull
D upper bound πdecomposition
A πworkflow
A πfull
A
Learning curves obtained during training of ppo to find best response
strategies against randomized opponents; red, purple, blue and brown
curves relate to decomposed strategies; the orange and green curves
relate to the non-decomposed strategies.
43/54
Scalable learning through decomposition (Simulation)
1 2 3 4 5 6 7 8 9 10
2
4
6
8
10
linear
measured
# parallel processes n
|V| = 10
Speedup
S
n
Speedup of completion time when computing best response strategies for
the decomposed game with |V| = 10 nodes and different number of
parallel processes; the subproblems in the decomposition are split evenly
across the processes; let Tn denote the completion time when using n
processes, the speedup is then calculated as Sn = T1
Tn
; the error bars
indicate standard deviations from 3 measurements.
44/54
Threshold Properties of Local Defender Strategies.
I The local problem of the defender can be decomposed in the
temporal domain as
max
πD
T
X
t=1
J = max
πD
τ1
X
t=1
J1 +
τ2
X
t=1
J2 + . . . (2)
where τ1, τ2, . . . are stopping times.
I =⇒ (1) selection of defensive actions is simplified; and (2)
the optimal stopping times are given by a threshold strategy
that can be estimated efficiently:
Belief space B
(j)
D
Switching curve
Υ
Continuation set
C
Stopping set
S
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
45/54
Threshold Properties of Local Defender Strategies.
Belief space B
(j)
D
Switching curve
Υ
Continuation set
C
Stopping set
S
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
I A node can be in three attack states s
(A)
t : Healthy,
Discovered, Compromised.
I The defender has a belief state b
(D)
t
46/54
Threshold Properties of Local Defender Strategies.
We estimate the optimal switching curves using a linear
approximation
πD(b(D)
) =







Stop if
h
0 1 θT
i

b(D)
−1
#
 0
Continue otherwise
(3)
subject to θ ∈ R2
, θ2  0 and θ1 ≥ 1 (4)
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
(1, 0, 0)
j healthy
(0, 1, 0)
j discovered
(0, 0, 1)
j compromised
(a) (b) (c)
Examples of learned linear switching curves.
47/54
Proof Sketch (Threshold Properties)
I Let L(e1, b̂) denote the line segment
that starts at the belief state
e1 = (1, 0, 0) and ends at b̂, where b̂ is
in the sub-simplex that joins e2 and e3.
I All beliefs on L(e1, b̂) are totally
ordered according to the Monotone
Likelihood Ratio (MLR) order. =⇒ a
threshold belief state αb̂ ∈ L(e1, b̂)
exists where the optimal strategy
switches from C to S.
I Since the entire belief space can be
covered by the union of lines L(e1, b̂),
the threshold belief states αb̂1
, αb̂2
, . . .
yield a switching curve Υ.
Belief space B
(j)
D
(the 2-dimensional unit simplex)
sub-simplex B
(j)
D,e1
joining e2 and e3
b̂5
b̂4
b̂3
b̂2
b̂1
b̂6
b̂7
b̂8
b̂9
L(e1, b̂5)
Switching curve
Υ
Threshold
belief state αb̂9
e1
(1, 0, 0)
e2
(0, 1, 0)
e3
(0, 0, 1)
48/54
Learning Best Responses for the Target Infrastructure
(Simulation)
0 2 4 6 8
running time (min)
0.0
0.5
1.0
avg
defender
utility
0 20 40 60 80
running time (min)
0.0
0.5
1.0
avg
attacker
utility
upper bound
our method
ppo
random
ot,i  0
rd zone
App servers Honeynet
dmz
admin
zone
workflow
Gateway idps
quarantine
zone
alerts
Defender
. . .
Attacker Clients
2
1
3 12
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
21
22 23
24
25
26
27
28
29
30 31
32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
49/54
Decompositional Fictitious Play (DFSP) to Approximate
an Equilibrium
π̃2 ∈ B2(π1)
π2
π1
π̃1 ∈ B1(π2)
π̃0
2 ∈ B2(π0
1)
π0
2
π0
1
π̃0
1 ∈ B1(π0
2)
. . .
π∗
2 ∈ B2(π∗
1)
π∗
1 ∈ B1(π∗
2)
Fictitious play: iterative averaging of best responses.
I Learn best response strategies iteratively through the parallel
solving of subgames in the decomposition
I Average best responses to approximate the equilibrium
50/54
Learning Equilibrium Strategies
0 20 40 60 80
running time (h)
0
2
4
6
8
Approximate exploitability
0 20 40 60 80
running time (h)
0.0
0.2
0.4
0.6
0.8
1.0
Defender utility per episode
0 20 40 60 80
running time (h)
2.5
5.0
7.5
10.0
12.5
Episode length
dfsp simulation dfsp digital twin upper bound ot,i  0 random defense
Learning curves obtained during training of dfsp to find optimal
(equilibrium) strategies in the intrusion response game; red and blue
curves relate to dfsp; black, orange and green curves relate to baselines.
51/54
Evaluation in the Digital Twin
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation 
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning 
Generalization
Strategy evaluation 
Model estimation
Automation 
Self-learning systems
51/54
Learning Equilibrium Strategies
0 20 40 60 80
running time (h)
0
2
4
6
8
Approximate exploitability
0 20 40 60 80
running time (h)
0.0
0.2
0.4
0.6
0.8
1.0
Defender utility per episode
0 20 40 60 80
running time (h)
2.5
5.0
7.5
10.0
12.5
Episode length
dfsp simulation dfsp digital twin upper bound ot,i  0 random defense
Learning curves obtained during training of dfsp to find optimal
(equilibrium) strategies in the intrusion response game; red and blue
curves relate to dfsp; black, orange and green curves relate to baselines.
52/54
Learning Equilibrium Strategies (Comparison against
NFSP)
0 10 20 30 40 50 60 70 80
running time (h)
0.0
2.5
5.0
7.5
Approximate exploitability
dfsp nfsp
Learning curves obtained during training of dfsp and nfsp to find
optimal (equilibrium) strategies in the intrusion response game; the red
curve relate to dfsp and the purple curve relate to nfsp; all curves show
simulation results.
53/54
Conclusions
I We develop a framework to
automatically learn security strategies.
I We apply the method to an intrusion
response use case.
I We design a novel decompositional
approach to find near-optimal
intrusion responses for large-scale IT
infrastructures.
I We show that the decomposition
reduces both the computational
complexity of finding effective
strategies, and the sample complexity
of learning a system model by several
orders of magnitude.
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation
Target
System
Model Creation 
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation 
Learning
54/54
Current and Future Work
Time
st
st+1
st+2
st+3
. . .
rt+1
rt+2
rt+3
rrT
1. Extend use case
I Heterogeneous client population
I Extensive threat model of the attacker
2. Extend solution framework
I Model-predictive control
I Rollout-based techniques
I Extend system identification algorithm
3. Extend theoretical results
I Exploit symmetries and causal structure

Más contenido relacionado

Similar a Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decomposition

AI for Cybersecurity Innovation
AI for Cybersecurity InnovationAI for Cybersecurity Innovation
AI for Cybersecurity InnovationPete Burnap
 
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...Osama M. Khaled
 
Research of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortResearch of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortFrancis Yang
 
Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecurityKim Hammar
 
What are the best tools used in cybersecurity in 2023.pdf
What are the best tools used in cybersecurity in 2023.pdfWhat are the best tools used in cybersecurity in 2023.pdf
What are the best tools used in cybersecurity in 2023.pdftsaaroacademy
 
Architecture-centric Support for Integrating Security Tool in a Security Orch...
Architecture-centric Support for Integrating Security Tool in a Security Orch...Architecture-centric Support for Integrating Security Tool in a Security Orch...
Architecture-centric Support for Integrating Security Tool in a Security Orch...Chadni Islam
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Virtual Machines Security Internals: Detection and Exploitation
 Virtual Machines Security Internals: Detection and Exploitation Virtual Machines Security Internals: Detection and Exploitation
Virtual Machines Security Internals: Detection and ExploitationMattia Salvi
 
endpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdfendpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdfOlufemi37
 
Machine Learning-Based Phishing Detection
Machine Learning-Based Phishing DetectionMachine Learning-Based Phishing Detection
Machine Learning-Based Phishing DetectionIRJET Journal
 
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...Splunk
 
Security operation center (SOC)
Security operation center (SOC)Security operation center (SOC)
Security operation center (SOC)Ahmed Ayman
 
IRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdf
IRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdfIRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdf
IRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdfIRJET Journal
 
Artificial Intelligence Techniques for Cyber Security
Artificial Intelligence Techniques for Cyber SecurityArtificial Intelligence Techniques for Cyber Security
Artificial Intelligence Techniques for Cyber SecurityIRJET Journal
 
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...Graeme Jenkinson
 
Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...Eng. Mohammed Ahmed Siddiqui
 

Similar a Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decomposition (20)

AI for Cybersecurity Innovation
AI for Cybersecurity InnovationAI for Cybersecurity Innovation
AI for Cybersecurity Innovation
 
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
A SIMULATION APPROACH TO PREDICATE THE RELIABILITY OF A PERVASIVE SOFTWARE SY...
 
Research of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on SnortResearch of Intrusion Preventio System based on Snort
Research of Intrusion Preventio System based on Snort
 
Self-Learning Systems for Cyber Security
Self-Learning Systems for Cyber SecuritySelf-Learning Systems for Cyber Security
Self-Learning Systems for Cyber Security
 
What are the best tools used in cybersecurity in 2023.pdf
What are the best tools used in cybersecurity in 2023.pdfWhat are the best tools used in cybersecurity in 2023.pdf
What are the best tools used in cybersecurity in 2023.pdf
 
project(copy1)
project(copy1)project(copy1)
project(copy1)
 
Architecture-centric Support for Integrating Security Tool in a Security Orch...
Architecture-centric Support for Integrating Security Tool in a Security Orch...Architecture-centric Support for Integrating Security Tool in a Security Orch...
Architecture-centric Support for Integrating Security Tool in a Security Orch...
 
Smart surveillance using deep learning
Smart surveillance using deep learningSmart surveillance using deep learning
Smart surveillance using deep learning
 
Cyber Security
Cyber SecurityCyber Security
Cyber Security
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Virtual Machines Security Internals: Detection and Exploitation
 Virtual Machines Security Internals: Detection and Exploitation Virtual Machines Security Internals: Detection and Exploitation
Virtual Machines Security Internals: Detection and Exploitation
 
endpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdfendpoint-detection-and-response-datasheet.pdf
endpoint-detection-and-response-datasheet.pdf
 
Cloud Computing and PSo
Cloud Computing and PSoCloud Computing and PSo
Cloud Computing and PSo
 
Machine Learning-Based Phishing Detection
Machine Learning-Based Phishing DetectionMachine Learning-Based Phishing Detection
Machine Learning-Based Phishing Detection
 
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
SplunkLive! Paris 2018: Use Splunk for Incident Response, Orchestration and A...
 
Security operation center (SOC)
Security operation center (SOC)Security operation center (SOC)
Security operation center (SOC)
 
IRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdf
IRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdfIRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdf
IRJET-https://www.irjet.net/archives/V5/i3/IRJET-V5I377.pdf
 
Artificial Intelligence Techniques for Cyber Security
Artificial Intelligence Techniques for Cyber SecurityArtificial Intelligence Techniques for Cyber Security
Artificial Intelligence Techniques for Cyber Security
 
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
Applying Provenance in APT Monitoring and Analysis Practical Challenges for S...
 
Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...
 

Más de Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...Kim Hammar
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedKim Hammar
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhetKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 

Más de Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decomposition

  • 1. 1/54 Learning Optimal Intrusion Responses for IT Infrastructures via Decomposition Visit to Princeton University Kim Hammar kimham@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology May 17, 2023
  • 2. 2/54 Use Case: Intrusion Response I A defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I Has partial observability I An attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31
  • 3. 3/54 Automated Intrusion Response: Current Landscape Levels of security automation No automation. Manual detection. Manual prevention. No alerts. No automatic responses. Lack of tools. 1980s 1990s 2000s-Now Research Operator assistance. Manual detection. Manual prevention. Audit logs. Security tools. Partial automation. System has automated functions for detection/prevention but requires manual updating and configuration. Intrusion detection systems. Intrusion prevention systems. High automation. System automatically updates itself. Automated attack detection. Automated attack mitigation.
  • 4. 4/54 Can we use decision theory and learning-based methods to automatically find effective security strategies?1 π Σ Σ security objective feedback control input target system security indicators disturbance 1 Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and Self-Play”. In: International Conference on Network and Service Management (CNSM 2020). Izmir, Turkey, 2020, Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In: International Conference on Network and Service Management (CNSM 2021). Izmir, Turkey, 2021, Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781, Kim Hammar and Rolf Stadler. Learning Near-Optimal Intrusion Responses Against Dynamic Attackers. 2023. doi: 10.48550/ARXIV.2301.06085. url: https://arxiv.org/abs/2301.06085.
  • 5. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 6. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 7. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 8. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 9. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 10. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 11. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 12. 5/54 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 13. 6/54 Creating a Digital Twin of the Target Infrastructure s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 14. 6/54 Theoretical Analysis and Learning of Defender Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 15. 7/54 Creating a Digital Twin of the Target Infrastructure I An infrastructure is defined by its configuration. I Set of configurations supported by our framework can be seen as a configuration space I The configuration space defines the class of infrastructures for which we can create digital twins. Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Digital Twins R1 R1 R1
  • 16. 8/54 The Target Infrastructure I 64 nodes. 24 ovs switches, 3 gateways. 6 honeypots. 8 application servers. 4 administration servers. 15 compute servers. I Topology shown to the right I 11 vulnerabilities (cve-2010-0426, cve-2015-3306, cve-2015-5602, etc.) I 4 zones: dmz, r&d zone, admin zone, quarantine zone I 9 workflows I Management: 1 sdn controller, 1 Kafka server, 1 elastic server. r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 17. 9/54 Emulating Physical Components I We emulate physical components with Docker containers I Focus on linux-based systems I The containers include everything needed to emulate the host: a runtime system, code, system tools, system libraries, and configurations. I Examples of containers: IDPS container, client container, attacker container, CVE-2015-1427 container, Open vSwitch containers etc. Containers Physical server Operating system Docker engine CSLE
  • 18. 10/54 Emulating Network Connectivity Management node 1 Emulated IT infrastructure Management node 2 Emulated IT infrastructure Management node n Emulated IT infrastructure VXLAN VXLAN . . . VXLAN IP network I We emulate network connectivity on the same host using network namespaces. I Connectivity across physical hosts is achieved using VXLAN tunnels with Docker swarm.
  • 19. 11/54 Emulating Network Conditions I We do traffic shaping using NetEm in the Linux kernel I Emulate internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I Emulate external connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss User space . . . Application processes Kernel TCP/UDP IP/Ethernet/802.11 OS TCP/IP stack Queueing discipline Device driver queue (FIFO) NIC Netem config: latency, jitter, etc. Sockets
  • 20. 12/54 Emulating Actors I We emulate client arrivals with Poisson processes I We emulate client interactions with load generators I Attackers are emulated by automated programs that select actions from a pre-defined set I Defender actions are emulated through a custom gRPC API. Markov Decision Process s1,1 s1,2 s1,3 . . . s1,4 s2,1 s2,2 s2,3 . . . s2,4 Digital Twin . . . Virtual network Virtual devices Emulated services Emulated actors IT Infrastructure Configuration & change events System traces Verified security policy Optimized security policy
  • 21. 13/54 System Identification s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems
  • 22. 14/54 Outline I Use Case & Digital Twin I Use case: intrusion response I Digital twin for data collection & evaluation I System Model I Discrete-time Markovian dynamical system I Partially observed stochastic game I System Decomposition I Additive subgames on the workflow-level I Optimal substructure on component-level I Learning Near-Optimal Intrusion Responses I Scalable learning through decomposition I Digital twin for system identification & evaluation I Efficient equilibrium approximation I Conclusions & Future Work
  • 23. 14/54 Outline I Use Case & Digital Twin I Use case: intrusion response I Digital twin for data collection & evaluation I System Model I Discrete-time Markovian dynamical system I Partially observed stochastic game I System Decomposition I Additive subgames on the workflow-level I Optimal substructure on component-level I Learning Near-Optimal Intrusion Responses I Scalable learning through decomposition I Digital twin for system identification & evaluation I Efficient equilibrium approximation I Conclusions & Future Work
  • 24. 14/54 Outline I Use Case & Digital Twin I Use case: intrusion response I Digital twin for data collection & evaluation I System Model I Discrete-time Markovian dynamical system I Partially observed stochastic game I System Decomposition I Additive subgames on the workflow-level I Optimal substructure on component-level I Learning Near-Optimal Intrusion Responses I Scalable learning through decomposition I Digital twin for system identification & evaluation I Efficient equilibrium approximation I Conclusions & Future Work
  • 25. 14/54 Outline I Use Case & Digital Twin I Use case: intrusion response I Digital twin for data collection & evaluation I System Model I Discrete-time Markovian dynamical system I Partially observed stochastic game I System Decomposition I Additive subgames on the workflow-level I Optimal substructure on component-level I Learning Near-Optimal Intrusion Responses I Scalable learning through decomposition I Digital twin for system identification & evaluation I Efficient equilibrium approximation I Conclusions & Future Work
  • 26. 14/54 Outline I Use Case & Digital Twin I Use case: intrusion response I Digital twin for data collection & evaluation I System Model I Discrete-time Markovian dynamical system I Partially observed stochastic game I System Decomposition I Additive subgames on the workflow-level I Optimal substructure on component-level I Learning Near-Optimal Intrusion Responses I Scalable learning through decomposition I Digital twin for system identification & evaluation I Efficient equilibrium approximation I Conclusions & Future Work
  • 27. 15/54 System Model I G = h{gw} ∪ V, Ei: directed graph representing the virtual infrastructure I V: finite set of virtual components. I E: finite set of component dependencies. I Z: finite set of zones. r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 28. 15/54 System Model I G = h{gw} ∪ V, Ei: directed graph representing the virtual infrastructure I V: finite set of virtual components. I E: finite set of component dependencies. I Z: finite set of zones. r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 29. 15/54 System Model I G = h{gw} ∪ V, Ei: directed graph representing the virtual infrastructure I V: finite set of virtual components. I E: finite set of component dependencies. I Z: finite set of zones. r&d zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 30. 16/54 State Model I Each i ∈ V has a state vt,i = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St. I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 31. 16/54 State Model I Each i ∈ V has a state vt,i = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St. I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 32. 16/54 State Model I Each i ∈ V has a state vt,i = (v (Z) t,i |{z} D , v (I) t,i , v (R) t,i | {z } A ) I System state st = (vt,i )i∈V ∼ St. I Markovian time-homogeneous dynamics: st+1 ∼ f (· | St, At) At = (A (A) t , A (D) t ) are the actions. s1 s2 s3 s4 s5 s4 . . . . . . . . .
  • 33. 17/54 Workflow Model I Services are connected into workflows W = {w1, . . . , w|W|}.
  • 34. 17/54 Workflow Model I Services are connected into workflows W = {w1, . . . , w|W|}. gw fw idps lb http servers auth server search engine db cache Dependency graph of an example workflow representing a web application; gw, fw, idps, lb, and db are acronyms for gateway, firewall, intrusion detection and prevention system, load balancer, and database, respectively.
  • 35. 18/54 Workflow Model I Services are connected into workflows W = {w1, . . . , w|W|}. I Each w ∈ W is realized as a directed acyclic subgraph (dag) Gw = h{gw} ∪ Vw, Ewi of G I W = {w1, . . . , w|W|} induces a partitioning V = [ wi ∈W Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅ Zone a Zone b Zone c gw 1 2 3 4 5 6 7 A workflow dag
  • 36. 18/54 Workflow Model I Services are connected into workflows W = {w1, . . . , w|W|}. I Each w ∈ W is realized as a directed acyclic subgraph (dag) Gw = h{gw} ∪ Vw, Ewi of G I W = {w1, . . . , w|W|} induces a partitioning V = [ wi ∈W Vwi such that i 6= j =⇒ Vwi ∩ Vwj = ∅ Zone a Zone b Zone c gw 1 2 3 4 5 6 7 A workflow dag
  • 37. 19/54 Client Model Client population . . . Arrival rate λ Departure Service time µ . . . . . . . . . . . . w1 w2 w|W| Workflows (Markov processes) I Homogeneous client population I Clients arrive according to Po(λ), Service times Exp(1 µ) I Workflow selection: uniform I Workflow interaction: Markov process
  • 38. 20/54 Observation Model I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 39. 20/54 Observation Model I idpss inspect network traffic and generate alert vectors: ot , ot,1, . . . , ot,|V| ∈ N |V| 0 ot,i is the number of alerts related to node i ∈ V at time-step t. I ot = (ot,1, . . . , ot,|V|) is a realization of the random vector Ot with joint distribution Z idps idps idps idps alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 41. 22/54 Monitoring and Telemetry Devices Event bus Security Policy Storage Systems Control actions Data pipelines Events I Emulated devices run monitoring agents that periodically push metrics to a Kafka event bus. I The data in the event bus is consumed by data pipelines that process the data and write to storage systems. I The processed data is used by an automated security policy to decide on control actions to execute in the digital twin.
  • 42. 23/54 Feature Selection I Our framework collects 100s of metrics every time-step. I We focus on the IDPS alert metric as it provides the most information for detecting the type of attacks we consider. 0 2500 5000 7500 Probability Weighted idps alerts DKL(ZO|0 k ZO|1) = 0.49 −100 0 100 New login attempts DKL(ZO|0 k ZO|1) = 0.07 −500 0 # New processes DKL(ZO|0 k ZO|1) = 0.01 −100 0 100 Probability New tcp connections DKL(ZO|0 k ZO|1) = 0.01 0 10 20 30 # Blocks written to disk DKL(ZO|0 k ZO|1) = 0.12 0 20 40 # Blocks read from disk DKL(ZO|0 k ZO|1) = 0.0 ZO|s=0 (no intrusion) ZO|s=1 (intrusion)
  • 43. 24/54 Defender Model I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App server Defender Revoke certificates Blacklist IP 1) Server migration 2) Flow migration and blocking 3) Shut down server 4) Access control
  • 44. 24/54 Defender Model I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App server Defender Revoke certificates Blacklist IP 1) Server migration 2) Flow migration and blocking 3) Shut down server 4) Access control
  • 45. 24/54 Defender Model I Defender action: a (D) t ∈ {0, 1, 2, 3, 4}|V| I 0 means do nothing. 1 − 4 correspond to defensive actions (see fig) I A defender strategy is a function πD ∈ ΠD : HD → ∆(AD), where h (D) t = (s (D) 1 , a (D) 1 , o1, . . . , a (D) t−1, s (D) t , ot) ∈ HD I Objective: (i) maintain workflows; and (ii) stop a possible intrusion: J , T X t=1 γt−1 η |W| X i=1 uW(wi , st) | {z } workflows utility − (1 − η) |V| X j=1 cI(st,j, at,j) | {z } intrusion and defense costs ! dmz rd zone admin zone Old path New path Honeypot App server Defender Revoke certificates Blacklist IP 1) Server migration 2) Flow migration and blocking 3) Shut down server 4) Access control
  • 46. 25/54 Attacker Model I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 47. 25/54 Attacker Model I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 48. 25/54 Attacker Model I Attacker action: a (A) t ∈ {0, 1, 2, 3}|V| I 0 means do nothing. 1 − 3 correspond to attacks (see fig) I An attacker strategy is a function πA ∈ ΠA : HA → ∆(AA), where HA is the space of all possible attacker histories h (A) t = (s (A) 1 , a (A) 1 , o1, . . . , a (A) t−1, s (A) t , ot) ∈ HA I Objective: (i) disrupt workflows; and (ii) compromise nodes: − J . . . Attacker login attempts configure Automated system Server 2) Brute-force 1) Reconnaissance 3) Code execution Attacker Server TCP SYN TCP SYN ACK port open Attacker Service Server malicious request inject code execution
  • 49. 25/54 The Intrusion Response Problem maximize πD∈ΠD minimize πA∈ΠA E(πD,πA) [J] (1a) subject to s (D) t+1 ∼ fD · | A (D) t , A (D) t ∀t (1b) s (A) t+1 ∼ fA · | S (A) t , At ∀t (1c) ot+1 ∼ Z · | S (D) t+1, A (A) t ) ∀t (1d) a (A) t ∼ πA · | H (A) t , a (A) t ∈ AA(st) ∀t (1e) a (D) t ∼ πD · | H (D) t , a (D) t ∈ AD ∀t (1f) where E(πD,πA) denotes the expectation of the random vectors (St, Ot, At)t∈{1,...,T} under the strategy profile (πD, πA). (1) can be formulated as a zero-sum Partially Observed Stochastic Game with Public Observations (a PO-POSG): Γ = hN, (Si )i∈N , (Ai )i∈N , (fi )i∈N , u, γ, (b (i) 1 )i∈N , O, Zi
  • 50. 26/54 Existence of a Solution Theorem Given the po-posg Γ (2), the following holds: (A) Γ has a mixed Nash equilibrium and a value function V ∗ : BD × BA → R that maps each possible initial pair of belief states (b (D) 1 , bA 1 ) to the expected utility of the defender in the equilibrium. (B) For each strategy pair (πA, πD) ∈ ΠA × ΠD, the best response sets BD(πA) and BA(πD) are non-empty and correspond to optimal strategies in two Partially Observed Markov Decision Processes (pomdps): M (D) and M (A). Further, a pair of pure best response strategies (π̃D, π̃A) ∈ BD(πA) × BA(πD) and a pair of value functions (V ∗ D,πA , V ∗ A,πD ) exist.
  • 51. 27/54 The Curse of Dimensionality I While (1) has a solution (i.e the game Γ has a value (Thm 1)), computing it is intractable since the state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V|
  • 52. 28/54 The Curse of Dimensionality I While (1) has a solution (i.e the game Γ has a value (Thm 1)), computing it is intractable since the state, action, and observation spaces of the game grow exponentially with |V|. 1 2 3 4 5 104 105 2 105 |S| |O| |Ai | |V| Growth of |S|, |O|, and |Ai | in function of the number of nodes |V| We tackle the scability challenge with decomposition
  • 53. 29/54 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... Does not directly depend on the state or action of a node down here
  • 54. 30/54 Intuitively.. rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 65 The optimal action here... But they are not completely independent either. How can we exploit this structure? Does not directly depend on the state or action of a node down here
  • 55. 31/54 System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 56. 31/54 System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 57. 32/54 Additive Structure Across Workflows (Intuition) “=” I If there is no path between i and j in G, then i and j are independent in the following sense: I Compromising i has no affect on the state of j. I Compromising i does not make it harder or easier to compromise j. I Compromising i does not affect the service provided by j. I Defending i does not affect the state of j. I Defending i does not affect the service provided by j.
  • 58. 33/54 Additive Structure Across Workflows Definition (Transition independence) A set of nodes Q are transition independent iff the transition probabilities factorize as f (St+1 | St, At) = Y i∈Q f (St+1,i | St,i , At,i ) Definition (Utility independence) A set of nodes Q are utility independent iff there exists functions u1, . . . , u|Q| such that the utility function u decomposes as u(St, At) = f (u1(St,1, At,1), . . . , u1(St,|Q|, At,Q)) and ui ≤ u0 i ⇐⇒ f (u1, . . . , ui , . . . , u|Q|) ≤ f (u1, . . . , u0 i , . . . , u|Q|)
  • 59. 34/54 Additive Structure Across Workflows Theorem (Additive structure across workflows) (A) All nodes V in the game Γ are transition independent. (B) If there is no path between i and j in the topology graph G, then i and j are utility independent. Corollary Γ decomposes into |W| additive subproblems that can be solved independently and in parallel. π (w1) k π (w2) k π (w|W|) k ot,w1 ot,w2 ot,w|W| . . . ⊕ a (k) w1 a (k) w2 a (k) w|W| a (k) t
  • 60. 35/54 Additive Structure Across Workflows: Minimal Example Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} a) IT infrastructure b) Transition dependencies St, At St+1, Ot+1 S (D) t+1,1 S (D) t,1 A (D) t,1 S (A) t+1,1 S (A) t,1 Ot,1 A (A) t,1 S (D) t+1,2 S (D) t,2 A (D) t,2 S (A) t+1,2 S (A) t,2 Ot,2 A (A) t,2 S (D) t+1,3 S (D) t,3 A (D) t,3 S (A) t+1,3 S (A) t,3 Ot,1 A (A) t,3 c) Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 61. 35/54 System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 62. 36/54 Optimal Substructure Within a Workflow I Nodes in the same workflow are utility dependent. I =⇒ Locally-optimal strategies for each node can not simply be added together to obtain an optimal strategy for the workflow. I However, the locally-optimal strategies satisfy the optimal substructure property. I =⇒ there exists an algorithm for constructing an optimal workflow strategy from locally-optimal strategies for each node. Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} IT infrastructure Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 63. 36/54 Optimal substructure within a workflow I Nodes in the same workflow are utility dependent. I =⇒ Locally-optimal strategies for each node can not simply be added together to obtain an optimal strategy for the workflow. I However, the locally-optimal strategies satisfy the optimal substructure property. I =⇒ there exists an algorithm for constructing an optimal workflow strategy from locally-optimal strategies for each node. Zone 1 Zone 2 1 2 3 gw w1 w2 V = {1, 2, 3}, E = {(1, 2)}, W = {w1, w2}, Z = {1, 2} IT infrastructure Utility dependencies St, At Ut S (D) t,1 A (D) t,1 Ut,1 S (A) t,1 S (D) t,2 A (D) t,2 Ut,2 S (A) t,2 S (D) t,3 A (D) t,3 Ut,3 S (A) t,3
  • 64. 37/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies Algorithm 1: Algorithm for combining local strate- gies 1 Input: Γ: the game, 2 πk: a vector with local strategies 3 Output: (πD, πA): global game strategies 4 Algorithm composite-strategy(Γ, πk) 5 for player k ∈ N do 6 πk ←λ (s (k) t , b (k) t ) 7 a (k) t = () 8 for workflow w ∈ W do 9 for node i ∈ topological-sort(Vw) do 10 a (k,i) t ← π (i) k (s (k) t , b (k) t ) 11 if gw 6→ a (k) t t i then 12 a (k,i) t ← ⊥ 13 end 14 a (k) t = a (k) t ⊕ a (k,i) t 15 end 16 end 17 return a (k) t 18 end 19 return (πD, πA) π (1) k →1 ot,1 a (k) t,1 a (k),0 t,1 π (2) k →2 ⊕ ot,2 a (k) t,2 a (k),0 t,2 π (3) k →3 ⊕ ot,3 a (k) t,3 a (k),0 t,3 . . . π (|Vw|) k →|Vw| ⊕ ot,|Vw| a (k) t,|Vw| a (k),0 t,|Vw| ⊕ a (k) w
  • 65. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D (π (i) D )i∈Vw : local strategies in the same workflow w ∈ W
  • 66. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D a (D,1) t ∼ π (1) D Step 1; select action for node 1 according to its local strategy Workflow action: (a (D,1) t )
  • 67. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D a (D,2) t ∼ π (2) D Step 2; update the topology based on the previous local action; select action a = 0 for unreachable nodes; move to the next node in the topological ordering (i.e. 2); select the action for the next node according to its local strategy. Workflow action: (a (D,1) t , a (D,2) t )
  • 68. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D Step 3; update the topology based on the previous local action; select action a = 0 for unreachable nodes; move to the next node in the topological ordering (i.e. 3); select the action for the next node according to its local strategy. Workflow action: (a (D,1) t , a (D,2) t )
  • 69. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D a (D,4) t = 0 Step 3; update the topology based on the previous local action; select action a = 0 for unreachable nodes; move to the next node in the topological ordering (i.e. 3); select the action for the next node according to its local strategy. Workflow action: (a (D,1) t , a (D,2) t , ·, 0)
  • 70. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D a (D,3) t ∼ π (3) D Step 3; update the topology based on the previous local action; select action a = 0 for unreachable nodes; move to the next node in the topological ordering (i.e. 3); select the action for the next node according to its local strategy. Workflow action: (a (D,1) t , a (D,2) t , a (D,3) t , 0)
  • 71. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D a (D,5) t ∼ π (5) D Step 4; update the topology based on the previous local action; select action a = 0 for unreachable nodes; move to the next node in the topological ordering (i.e. 5); select the action for the next node according to its local strategy. Workflow action: (a (D,1) t , a (D,2) t , a (D,3) t , 0, a (D,5) t )
  • 72. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D Step 5; update the topology based on the previous local action; select action a = 0 for unreachable nodes; Workflow action: (a (D,1) t , a (D,2) t , a (D,3) t , 0, a (D,5) t )
  • 73. 38/54 Algorithm for Combining Locally-Optimal Node Strategies into Optimal Workflow Strategies π (2) D π (1) D π (3) D π (4) D π (5) D π (6) D a (D,6) t = 0 Step 5; update the topology based on the previous local action; select action a = 0 for unreachable nodes; Workflow action: (a (D,1) t , a (D,2) t , a (D,3) t , 0, a (D,5) t , 0)
  • 74. 39/54 Computational Benefits of Decomposition I ∴ we can obtain an optimal (best response) strategy for the full game Γ by combining the solutions to V simpler subproblems that can be solved in parallel and have significantly smaller state, observation, and action spaces. 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 101 102 103 104 105 log10 |S| log10 |O| log10 |Ak| log10 |S(i)| log10 |O(i)| log10 |A (i) k | |V| 4 oom 3 oom 2 oom Space complexity comparison between the full game and the decomposed game.
  • 75. 40/54 System Decomposition To avoid explicitly enumerating the very large state, observation, and action spaces of Γ, we exploit three structural properties. 1. Additive structure across workflows. I The game decomposes into additive subgames on the workflow-level, which means that the strategy for each subgame can be optimized independently 2. Optimal substructure within a workflow. I The subgame for each workflow decomposes into subgames on the node-level that satisfy the optimal substructure property 3. Threshold properties of local defender strategies. I The optimal node-level strategies for the defender exhibit threshold structures, which means that they can be estimated efficiently
  • 76. 41/54 Can we Solve the Local Problems with Dynamic Programming?
  • 77. 42/54 Scalable learning through decomposition (Simulation) 0.00 0.25 0.50 0.75 1.00 avg defender utility |V| = 2, |W| = 1 |V| = 4, |W| = 2 |V| = 8, |W| = 2 |V| = 16, |W| = 2 0 50 100 150 running time (min) 0.00 0.25 0.50 0.75 1.00 avg attacker utility 0 50 100 150 running time (min) 0 50 100 150 running time (min) 0 50 100 150 running time (min) πdecomposition D πworkflow D πfull D upper bound πdecomposition A πworkflow A πfull A Learning curves obtained during training of ppo to find best response strategies against randomized opponents; red, purple, blue and brown curves relate to decomposed strategies; the orange and green curves relate to the non-decomposed strategies.
  • 78. 43/54 Scalable learning through decomposition (Simulation) 1 2 3 4 5 6 7 8 9 10 2 4 6 8 10 linear measured # parallel processes n |V| = 10 Speedup S n Speedup of completion time when computing best response strategies for the decomposed game with |V| = 10 nodes and different number of parallel processes; the subproblems in the decomposition are split evenly across the processes; let Tn denote the completion time when using n processes, the speedup is then calculated as Sn = T1 Tn ; the error bars indicate standard deviations from 3 measurements.
  • 79. 44/54 Threshold Properties of Local Defender Strategies. I The local problem of the defender can be decomposed in the temporal domain as max πD T X t=1 J = max πD τ1 X t=1 J1 + τ2 X t=1 J2 + . . . (2) where τ1, τ2, . . . are stopping times. I =⇒ (1) selection of defensive actions is simplified; and (2) the optimal stopping times are given by a threshold strategy that can be estimated efficiently: Belief space B (j) D Switching curve Υ Continuation set C Stopping set S (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised
  • 80. 45/54 Threshold Properties of Local Defender Strategies. Belief space B (j) D Switching curve Υ Continuation set C Stopping set S (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised I A node can be in three attack states s (A) t : Healthy, Discovered, Compromised. I The defender has a belief state b (D) t
  • 81. 46/54 Threshold Properties of Local Defender Strategies. We estimate the optimal switching curves using a linear approximation πD(b(D) ) =        Stop if h 0 1 θT i b(D) −1 # 0 Continue otherwise (3) subject to θ ∈ R2 , θ2 0 and θ1 ≥ 1 (4) (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised (1, 0, 0) j healthy (0, 1, 0) j discovered (0, 0, 1) j compromised (a) (b) (c) Examples of learned linear switching curves.
  • 82. 47/54 Proof Sketch (Threshold Properties) I Let L(e1, b̂) denote the line segment that starts at the belief state e1 = (1, 0, 0) and ends at b̂, where b̂ is in the sub-simplex that joins e2 and e3. I All beliefs on L(e1, b̂) are totally ordered according to the Monotone Likelihood Ratio (MLR) order. =⇒ a threshold belief state αb̂ ∈ L(e1, b̂) exists where the optimal strategy switches from C to S. I Since the entire belief space can be covered by the union of lines L(e1, b̂), the threshold belief states αb̂1 , αb̂2 , . . . yield a switching curve Υ. Belief space B (j) D (the 2-dimensional unit simplex) sub-simplex B (j) D,e1 joining e2 and e3 b̂5 b̂4 b̂3 b̂2 b̂1 b̂6 b̂7 b̂8 b̂9 L(e1, b̂5) Switching curve Υ Threshold belief state αb̂9 e1 (1, 0, 0) e2 (0, 1, 0) e3 (0, 0, 1)
  • 83. 48/54 Learning Best Responses for the Target Infrastructure (Simulation) 0 2 4 6 8 running time (min) 0.0 0.5 1.0 avg defender utility 0 20 40 60 80 running time (min) 0.0 0.5 1.0 avg attacker utility upper bound our method ppo random ot,i 0 rd zone App servers Honeynet dmz admin zone workflow Gateway idps quarantine zone alerts Defender . . . Attacker Clients 2 1 3 12 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
  • 84. 49/54 Decompositional Fictitious Play (DFSP) to Approximate an Equilibrium π̃2 ∈ B2(π1) π2 π1 π̃1 ∈ B1(π2) π̃0 2 ∈ B2(π0 1) π0 2 π0 1 π̃0 1 ∈ B1(π0 2) . . . π∗ 2 ∈ B2(π∗ 1) π∗ 1 ∈ B1(π∗ 2) Fictitious play: iterative averaging of best responses. I Learn best response strategies iteratively through the parallel solving of subgames in the decomposition I Average best responses to approximate the equilibrium
  • 85. 50/54 Learning Equilibrium Strategies 0 20 40 60 80 running time (h) 0 2 4 6 8 Approximate exploitability 0 20 40 60 80 running time (h) 0.0 0.2 0.4 0.6 0.8 1.0 Defender utility per episode 0 20 40 60 80 running time (h) 2.5 5.0 7.5 10.0 12.5 Episode length dfsp simulation dfsp digital twin upper bound ot,i 0 random defense Learning curves obtained during training of dfsp to find optimal (equilibrium) strategies in the intrusion response game; red and blue curves relate to dfsp; black, orange and green curves relate to baselines.
  • 86. 51/54 Evaluation in the Digital Twin s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin Target Infrastructure Model Creation System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning Generalization Strategy evaluation Model estimation Automation Self-learning systems
  • 87. 51/54 Learning Equilibrium Strategies 0 20 40 60 80 running time (h) 0 2 4 6 8 Approximate exploitability 0 20 40 60 80 running time (h) 0.0 0.2 0.4 0.6 0.8 1.0 Defender utility per episode 0 20 40 60 80 running time (h) 2.5 5.0 7.5 10.0 12.5 Episode length dfsp simulation dfsp digital twin upper bound ot,i 0 random defense Learning curves obtained during training of dfsp to find optimal (equilibrium) strategies in the intrusion response game; red and blue curves relate to dfsp; black, orange and green curves relate to baselines.
  • 88. 52/54 Learning Equilibrium Strategies (Comparison against NFSP) 0 10 20 30 40 50 60 70 80 running time (h) 0.0 2.5 5.0 7.5 Approximate exploitability dfsp nfsp Learning curves obtained during training of dfsp and nfsp to find optimal (equilibrium) strategies in the intrusion response game; the red curve relate to dfsp and the purple curve relate to nfsp; all curves show simulation results.
  • 89. 53/54 Conclusions I We develop a framework to automatically learn security strategies. I We apply the method to an intrusion response use case. I We design a novel decompositional approach to find near-optimal intrusion responses for large-scale IT infrastructures. I We show that the decomposition reduces both the computational complexity of finding effective strategies, and the sample complexity of learning a system model by several orders of magnitude. s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation Target System Model Creation System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation Learning
  • 90. 54/54 Current and Future Work Time st st+1 st+2 st+3 . . . rt+1 rt+2 rt+3 rrT 1. Extend use case I Heterogeneous client population I Extensive threat model of the attacker 2. Extend solution framework I Model-predictive control I Rollout-based techniques I Extend system identification algorithm 3. Extend theoretical results I Exploit symmetries and causal structure