CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT Environment - Hammar & Stadler

1. 1/14 An Online Framework for Adapting Security Policies in Dynamic IT Environments International Conference on Network and Service Management Thessaloniki, Greece, Oct 31 - Nov 4 2022 Kim Hammar & Rolf Stadler kimham@kth.se stadler@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology

2. 2/14 Challenges: Evolving and Automated Attacks I Challenges I Evolving & automated attacks I Complex infrastructures Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31

3. 2/14 Goal: Automation and Learning I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31

4. 2/14 Approach: Self-Learning Security Systems I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods I Our Approach: Self-Learning Systems: I real-time telemetry I stream processing I theories from control/game/decision theory I computational methods (e.g. dynamic programming & reinforcement learning) I automated network management (SDN, NFV, etc.) Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31

5. 3/14 Our Framework for Automated Network Security s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Target System Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation System Reinforcement Learning & Generalization Strategy evaluation & Model estimation Automation & Self-learning systems

13. 4/14 Our Previous Work I Finding Effective Security Strategies through Reinforcement Learning and Self-Play1 I Learning Intrusion Prevention Policies through Optimal Stopping2 I A System for Interactive Examination of Learned Security Policies3 I Intrusion Prevention Through Optimal Stopping4 I Learning Security Strategies through Game Play and Optimal Stopping5 1 Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and Self-Play”. In: International Conference on Network and Service Management (CNSM 2020). Izmir, Turkey, 2020. 2 Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In: International Conference on Network and Service Management (CNSM 2021). http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021. 3 Kim Hammar and Rolf Stadler. “A System for Interactive Examination of Learned Security Policies”. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. 2022, pp. 1–3. doi: 10.1109/NOMS54207.2022.9789707. 4 Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781. 5 Kim Hammar and Rolf Stadler. “Learning Security Strategies through Game Play and Optimal Stopping”. In: Proceedings of the ML4Cyber workshop, ICML 2022, Baltimore, USA, July 17-23, 2022. PMLR, 2022.

14. 5/14 This Paper: Learning in Dynamic IT Environments6 I Challenge: operational IT environments are dynamic I Components may fail, load patterns can shift, etc. I Contribution: we present a framework for learning and updating security policies in dynamic IT environments Policy Learning Agent Environment System Identification s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin and Attack Scenarios Target System Model M Traces h1, h2, . . . Policy π Configuration I and change events Policy π Policy evaluation & Data collection Automated security policy 6 Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki, Greece, 2022.

15. 6/14 Learning in Dynamic IT Environments Algorithm 1: High-level execution of the framework Input: emulator: method to create digital twin ϕ: system identification algorithm φ: policy learning algorithm 1 Algorithm (emulator, ϕ, φ) 2 do in parallel 3 DigitalTwin(emulator) 4 SystemIdProcess(ϕ) 5 LearningProcess(φ) 6 end 1 Procedure DigitalTwin(emulator) 2 Loop 3 π ← ReceiveFromLearningProcess() 4 ht ← CollectTrace(π) 5 SendToSystemIdProcess(ht) 6 UpdateDigitalTwin(emulator) 7 EndLoop 1 Procedure SystemIdProcess(ϕ) 2 Loop 3 h1, h2, . . . ← ReceiveFromDigitalTwin() 4 M ← ϕ(h1, h2, . . .) // estimate model 5 SendToLearningProcess(M) 6 EndLoop 1 Procedure LearningProcess(φ) 2 Loop 3 M ← ReceiveFromSystemIdProcess() 4 π ← φ(M) // learn policy π 5 SendToDigitalTwin(π) 6 EndLoop

16. 6/14 Learning in Dynamic IT Environments Algorithm 2: High-level execution of the framework Input: emulator: method to create digital twin ϕ: system identification algorithm φ: policy learning algorithm 1 Algorithm (emulator, ϕ, φ) 2 do in parallel 3 DigitalTwin(emulator) 4 SystemIdProcess(ϕ) 5 LearningProcess(φ) 6 end 1 Procedure DigitalTwin(emulator) 2 Loop 3 π ← ReceiveFromLearningProcess() 4 ht ← CollectTrace(π) 5 SendToSystemIdProcess(ht) 6 UpdateDigitalTwin(emulator) 7 EndLoop 1 Procedure SystemIdProcess(ϕ) 2 Loop 3 h1, h2, . . . ← ReceiveFromDigitalTwin() 4 M ← ϕ(h1, h2, . . .) // estimate model 5 SendToLearningProcess(M) 6 EndLoop 1 Procedure LearningProcess(φ) 2 Loop 3 M ← ReceiveFromSystemIdProcess() 4 π ← φ(M) // learn policy π 5 SendToDigitalTwin(π) 6 EndLoop

17. 6/14 The Digital Twin Algorithm 3: High-level execution of the framework Input: emulator: method to create digital twin ϕ: system identification algorithm φ: policy learning algorithm 1 Algorithm (emulator, ϕ, φ) 2 do in parallel 3 DigitalTwin(emulator) 4 SystemIdProcess(ϕ) 5 LearningProcess(φ) 6 end 1 Procedure DigitalTwin(emulator) 2 Loop 3 π ← ReceiveFromLearningProcess() 4 ht ← CollectTrace(π) 5 SendToSystemIdProcess(ht) 6 UpdateDigitalTwin(emulator) 7 EndLoop 1 Procedure SystemIdProcess(ϕ) 2 Loop 3 h1, h2, . . . ← ReceiveFromDigitalTwin() 4 M ← ϕ(h1, h2, . . .) // estimate model 5 SendToLearningProcess(M) 6 EndLoop 1 Procedure LearningProcess(φ) 2 Loop 3 M ← ReceiveFromSystemIdProcess() 4 π ← φ(M) // learn policy π 5 SendToDigitalTwin(π) 6 EndLoop

18. 7/14 Creating a Digital Twin of the Target System I Emulate hosts with docker containers I Emulate IPS and vulnerabilities with software I Network isolation and traffic shaping through NetEm in the Linux kernel I Enforce resource constraints using cgroups. I Emulate client arrivals with Poisson process I Internal connections are full-duplex & loss-less with bit capacities of 1000 Mbit/s I External connections are full-duplex with bit capacities of 100 Mbit/s & 0.1% packet loss in normal operation and random bursts of 1% packet loss Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31

24. 7/14 The System Identification Process Algorithm 4: High-level execution of the framework Input: emulator: method to create digital twin ϕ: system identification algorithm φ: policy learning algorithm 1 Algorithm (emulator, ϕ, φ) 2 do in parallel 3 DigitalTwin(emulator) 4 SystemIdProcess(ϕ) 5 LearningProcess(φ) 6 end 1 Procedure DigitalTwin(emulator) 2 Loop 3 π ← ReceiveFromLearningProcess() 4 ht ← CollectTrace(π) 5 SendToSystemIdProcess(ht) 6 UpdateDigitalTwin(emulator) 7 EndLoop 1 Procedure SystemIdProcess(ϕ) 2 Loop 3 h1, h2, . . . ← ReceiveFromDigitalTwin() 4 M ← ϕ(h1, h2, . . .) // estimate model 5 SendToLearningProcess(M) 6 EndLoop 1 Procedure LearningProcess(φ) 2 Loop 3 M ← ReceiveFromSystemIdProcess() 4 π ← φ(M) // learn policy π 5 SendToDigitalTwin(π) 6 EndLoop

25. 8/14 System Model I We model the evolution of the system with a discrete-time dynamical system. I We assume a Markovian system with stochastic dynamics and partial observability. Stochastic System (Markov) Noisy Sensor Optimal filter Controller action at observation ot state st belief bt

26. 9/14 System Identification ˆ f O (o t |0) Probability distribution of # IPS alerts weighted by priority ot 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ˆ f O (o t |1) Fitted model Distribution st = 0 Distribution st = 1 I The distribution fO of defender observations (system metrics) is unknown. I We fit a Gaussian mixture distribution ˆ fO as an estimate of fO in the target system. I For each state s, we obtain the conditional distribution ˆ fO|s through expectation-maximization.

27. 9/14 The Policy Learning Process Algorithm 5: High-level execution of the framework Input: emulator: method to create digital twin ϕ: system identification algorithm φ: policy learning algorithm 1 Algorithm (emulator, ϕ, φ) 2 do in parallel 3 DigitalTwin(emulator) 4 SystemIdProcess(ϕ) 5 LearningProcess(φ) 6 end 1 Procedure DigitalTwin(emulator) 2 Loop 3 π ← ReceiveFromLearningProcess() 4 ht ← CollectTrace(π) 5 SendToSystemIdProcess(ht) 6 UpdateDigitalTwin(emulator) 7 EndLoop 1 Procedure SystemIdProcess(ϕ) 2 Loop 3 h1, h2, . . . ← ReceiveFromDigitalTwin() 4 M ← ϕ(h1, h2, . . .) // estimate model 5 SendToLearningProcess(M) 6 EndLoop 1 Procedure LearningProcess(φ) 2 Loop 3 M ← ReceiveFromSystemIdProcess() 4 π ← φ(M) // learn policy π 5 SendToDigitalTwin(π) 6 EndLoop

28. 10/14 Learning Effective Defender Policies I Optimization problem: I Each stopping time = one defensive action I Maximize reward of stopping times τL, τL−1, . . . , τ1: π∗ l ∈ arg max πl Eπl " τL−1 X t=1 γt−1 RC st ,st+1,L + γτL−1 RS sτL ,sτL+1,L + . . . + τ1−1 X t=τ2+1 γt−1 RC st ,st+1,1 + γτ1−1 RS sτ1 ,sτ1+1,1 # I Optimization methods: Reinforcement learning, dynamic programming, computational game theory, etc. 0 1 ∅ t ≥ 1 lt > 0 t ≥ 2 lt > 0 intrusion starts Qt = 1 final stop lt = 0 intrusion prevented lt = 0

29. 11/14 Putting it all together: Learning in Dynamic Environments 1. Changes in the target system are monitored. 2. When changes are detected, the emulation is updated. 3. Attack and defense scenarios are run in the emulation to collect data. 4. The system model and the defender policy are updated periodically with the new data. Policy Learning Agent Environment System Identification s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Digital Twin and Attack Scenarios Target System Model M Traces h1, h2, . . . Policy π Configuration I and change events Policy π Policy evaluation & Data collection Automated security policy

30. 12/14 Use Case: Intrusion Prevention I A Defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and active defense I Has partial observability I An Attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Clients . . . Defender 1 IPS 1 alerts Gateway 7 8 9 10 11 6 5 4 3 2 12 13 14 15 16 17 18 19 21 23 20 22 24 25 26 27 28 29 30 31

31. 13/14 Results: Learning in a Dynamic IT Environment 200 400 600 # clients 5000 10000 E[ Ẑ] 0 10 20 30 40 50 execution time (hours) 0 20 Avg reward E h Ẑt,O|1 i E h Ẑt,O|0 i E h Ẑ [10] t,O|1 i E h Ẑ [10] t,O|0 i upper bound Our framework [10] Results from running our framework for 50 hours in the digital twin/emulation.

32. 14/14 Conclusions I We present a framework for learning and updating security policies in dynamic IT environments I We apply the method to an intrusion prevention use case. I We show numerical results in a realistic emulation environment. I We design a solution framework guided by the theory of optimal stopping. s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation Target System Model Creation & System Identification Strategy Mapping π Selective Replication Strategy Implementation π Simulation & Learning

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT Environment - Hammar & Stadler

Recomendados

Recomendados

Más contenido relacionado

Similar a CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT Environment - Hammar & Stadler

Similar a CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT Environment - Hammar & Stadler (20)

Más de Kim Hammar

Más de Kim Hammar (17)

Último

Último (20)

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT Environment - Hammar & Stadler