Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Virtual Knowledge Graphs for Federated Log Analysis

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 26 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Virtual Knowledge Graphs for Federated Log Analysis (20)

Anuncio

Más reciente (20)

Virtual Knowledge Graphs for Federated Log Analysis

  1. 1. Virtual Knowledge Graphs for Federated Log Analysis Kabul Kurniawan (WU Wien, Uni Wien) Andreas Ekelhart (WU Wien) Elmar Kiesling (WU Wien) Dietmar Winkler (TU Wien) Gerald Quirchmayr (Uni Wien) A Min Tjoa (TU Wien) This work was funded by the Austrian Science Fund (FWF) and netidee SCIENCE under grant P30437-N31, as well as the Austrian Research Promotion Agency FFG under grant 877389 (OBARIS). Vienna, August 17 – 20, 2021
  2. 2. Mar 9 12:00:45 Client02 systemd-logind[1201]: New seat seat0. Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event0 (Power Button) Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event3 (AT Translated Set 2 keyboard) Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event1 (AT Translated Set 2 keyboard) Mar 9 12:00:45 Client02 sshd[1281]: Server listening on 0.0.0.0 port 22. Mar 9 12:00:45 Client02 sshd[1281]: Server listening on :: port 22. Mar 9 12:10:50 Client02 sshd[2124]: Accepted password for jhalley from 185.81.215.145 port 52410 ssh2 Mar 9 12:10:50 Client02 sshd[2124]: pam_unix(sshd:session): session opened for user jhalley by (uid=0) Mar 9 12:10:50 Client02 systemd-logind[1201]: New session 1 of user jhalley. Mar 9 12:10:50 Client02 systemd: pam_unix(systemd-user:session): session opened for user jhalley by (uid=0) Mar 9 12:15:34 Client02 sshd[2555]: Did not receive identification string from 51.68.71.229 port 38508 Mar 9 12:15:48 Client02 sudo: jhalley : TTY=pts/0 ; PWD=/home/jhalley ; USER=root ; COMMAND=/usr/bin/apt-get update Mar 9 12:15:48 Client02 sudo: pam_unix(sudo:session): session opened for user root by jhalley(uid=0) Mar 9 12:15:57 Client02 sudo: pam_unix(sudo:session): session closed for user root Mar 9 12:15:59 Client02 sudo: jhalley : TTY=pts/0 ; PWD=/home/jhalley ; USER=root ; COMMAND=/usr/bin/apt-get install xfce4 Mar 9 12:15:59 Client02 sudo: pam_unix(sudo:session): session opened for user root by jhalley(uid=0) Mar 9 12:17:01 Client02 CRON[4546]: pam_unix(cron:session): session opened for user root by (uid=0) Mar 9 12:17:01 Client02 CRON[4546]: pam_unix(cron:session): session closed for user root Mar 9 12:18:06 Client02 groupadd[6959]: group added to /etc/group: name=rtkit, GID=115 Mar 9 12:18:06 Client02 groupadd[6959]: group added to /etc/gshadow: name=rtkit Mar 9 12:18:06 Client02 groupadd[6959]: new group: name=rtkit, GID=115 Mar 9 12:18:06 Client02 useradd[6963]: new user: name=rtkit, UID=111, GID=115, home=/proc, shell=/usr/sbin/nologin Mar 9 12:18:06 Client02 usermod[6969]: change user 'rtkit' password Mar 9 12:18:06 Client02 chage[6974]: changed password expiry for rtkit Mar 9 12:18:06 Client02 chfn[6977]: changed user 'rtkit' information Mar 9 12:18:11 Client02 useradd[7149]: new user: name=usbmux, UID=112, GID=46, home=/var/lib/usbmux, shell=/usr/sbin/nologin Mar 9 12:18:11 Client02 usermod[7155]: change user 'usbmux' password Mar 9 12:18:11 Client02 chage[7160]: changed password expiry for usbmux Mar 9 12:18:11 Client02 chfn[7163]: changed user 'usbmux' information Mar 9 12:18:24 Client02 groupadd[7508]: group added to /etc/group: name=pulse, GID=116 Mar 9 12:18:24 Client02 groupadd[7508]: group added to /etc/gshadow: name=pulse 2 Motivation
  3. 3. 3 kern auth sys audit audit access ftp audit sys auth Challenges
  4. 4. Existing Solutions ▪ Centralized Log Management ▪ Ingest log sources from multiple endpoints, parse and index them into a central database to analyze. [Kotenko et al., 2013] ▪ Bandwidth-intensive and computationally demanding. [Grimaila et al., 2012], [Guillermo, 2013] ▪ Decentralized Log Analysis ▪ Partly shift the computational workloads (log pre-processing and analysis) to the log- producing hosts. [Grimaila et al., 2012] ▪ Primarily for correlation and alerting, rather than to query dispersed log data. [Krugel et al., 2001] ▪ Continuously ingest all log data may consume a lot of local point resources. ▪ Current solutions lack semantic relations between entities [Oliner et al., 2012], hence it is difficult to: ▪ Integrate partial and isolated views on system states. ▪ Contextualize, link and query log data. 4
  5. 5. 5 R1. Resource-efficiency ▪ Avoid unnecessary log processing, minimize source requirements (storage space and network bandwidth). R2. Aggregation and integration over multiple endpoints ▪ Concurrently execute federated endpoints and deliver results. R3. Contextualization & Background-Linking ▪ Ability to contextualize, integrate and link to background knowledge. R4. Standards-based query language ▪ Use of an expressive, standardized query language. Requirements
  6. 6. 6 Virtual Knowledge Graph (VKG) for Federated Log Analysis  A method to execute federated, graph pattern-based queries on dispersed, heterogeneous raw log data by dynamically constructing virtual knowledge graphs.  We introduce a method that:  Extracts only potentially relevant log messages only on demand.  Integrates the dispersed log events into a common graph.  Federates graph-pattern based queries across endpoints.  Links them to background knowledge. Proposed Approach
  7. 7. Virtual Knowledge Graph Concept ▪ Data Virtualization (V) ▪ No actual data source are exposed. ▪ No data integration materialization. ▪ Graph Representation (G) ▪ Nodes: Object/Data value representation. ▪ Edges: relations between nodes. ▪ Domain Knowledge (K) ▪ Concept and property hierarchies. ▪ Domain and range of properties. 7 Guohui Xiao et al, 2019
  8. 8. 8 Architecture Virtual Knowledge Graphs for Federated Log Analysis
  9. 9. 9 9 Virtual Knowledge Graphs for Federated Log Analysis
  10. 10. Query Parsing Example 10 PREFIX cl: <https://w3id.org/sepses/vocab/log/core#> PREFIX auth: <https://w3id.org/sepses/vocab/log/auth#> SELECT ?s ?message WHERE { ?s cl:message ?message. FILTER regex(?message,"Invalid user") } { "queryType": "SELECT", "variables": [ { "termType": "Variable", "value": "s" }, { "termType": "Variable", "value": "message" } ], "where": [ { "type": "bgp", "triples": [ { "subject": { "termType": "Variable", "value": "s" }, "predicate": { "termType": "NamedNode", "value": "https://w3id.org/sepses/vocab/log/core#message" }, …. SPARQL Query in a structured format (i.e. JSON) SPARQL Query
  11. 11. 11 Set the targeted host(s) Set the targeted knowledge Set the analysis time-frame defined log prefix Select log lines based on: triple patterns and/or filters Query Option: - Targeted host: {Host1, Host2, Host (n)… } - Targeted Background-knowledge: {K1, K2, K3, K(n)} - Timeframe: <startTime>- <endTime> Parse log data and execute the query based on this criteria! PREFIX 1: <URI1#> PREFIX 2: <URI2#> PREFIX (n): <URI(n)#> SELECT ?s ?p ?o. WHERE { ?s ?p ?o. …….. triple-pattern(1) ?s ?p2 ?o2. …….. triple-pattern(2) ?s ?p3 ?o3. ….…. triple-pattern(3) ?s ?p4 <String/Literal> ; ….…. triple-pattern(4) ?s ?p(n) ?o(n).. ……...triple-pattern(n) …….. FILTER regex(?o,“String/Literal") } SPARQL Query: Query Translation
  12. 12. 12 Query Translation
  13. 13. 13 Log Graph Generation Example
  14. 14. 14 QUERY Virtual Knowledge Graphs for Federated Log Analysis
  15. 15. 15 QUERY Virtual Knowledge Graphs for Federated Log Analysis
  16. 16. 16 • Vocabularies: https://w3id.org/sepses/vocab/ • Cybersecurity Knowledge Graph: http://sepses.ifs.tuwien.ac.at • Software: • Log Parser: Virtual Log Parser (Java-based) • Query Processor: Virtual Log Query Processor (Web-based) • State of the art Libraries: • SPARQL Parser: SPARQL.js • Log Extractor: Grok Pattern • RDF-Mapper: CARML • RDF-Compressor: HDT • Query Engine: Communica Targeted Hosts Targeted Background Knowledge Time Range Query Collections Query Editor Execution & Reset Button Query Result Query Processor (User Interface) Prototype Implementation
  17. 17. 17 Cybersecurity Knowledge Base IDS Server Analyst Scenario: “Find vulnerabilities and potential mitigations from the IDS- Snort log” SPARQL Query: Use-Case 1: Intrusion Detection and Background Linking
  18. 18. 18 Query Results: Graph Visualization: Use-Case 1: Intrusion Detection and Background Linking
  19. 19. 19 File Server Web Server Analyst Scenario: Database Server Internal- Background Knowledge “Successful Login Events from SSH Connections across hosts” SPARQL Query: Use-Case 2 : Network Monitoring
  20. 20. 20 Query Results: Graph Visualization: Use-Case 2 : Network Monitoring
  21. 21. 21 Evaluation Setup: • Machines: Microsoft Azure Virtual Machine with a Linux host (2.59 GHz vCPU, 16 GB RAM) and Windows host for log analysis (2.90 GHz CPU, 16 GB RAM). • Dataset: AIT log dataset (V1.1) that simulates six days of user access across multiple web servers. • We split large log files into smaller files. • We reported the average time over five runs for each experiment. Dataset description: Evaluation
  22. 22. 22 • Graph Compression • Dynamic Log Graph Generation • Log Graph Generation Evaluation – Single Host
  23. 23. 23 Experiment Timeframe Query execution time in a federated setting for different time frames Evaluation Setup: • Machines: Microsoft Azure Virtual Machines with seven hosts (4 Windows and 3 Linux - 2.59 GHz vCPU, 16 GB RAM) and a Windows host for log analysis (2.90 GHz CPU, 16 GB RAM) • Dataset: Apache Log from the AIT log dataset (V1.1) • Host 1 to host 4 store the data from the original 4 servers in the dataset • We reported the average times over five runs for each experiment. Evaluation – Multiple Hosts
  24. 24. ▪ A novel approach for federated log analysis based on virtual knowledge graphs. ▪ A prototype and vocabularies demonstrated in security analytics. ▪ Evaluation: the log processing time is primarily a function of the number of extracted (relevant) log lines and queried hosts. Limitations: ▪ The query parameters should restrict the extracted log lines. ▪ Not a replacement for existing SIEM system. Future Work: ▪ Query analysis improvement (i.e., automatic hosts and background-knowledge selection). ▪ Streaming based Virtual Knowledge Graph for Log Monitoring. 24 Conclusion
  25. 25. References 25 Resource Paper ▪ Oliner, A., Ganapathi, A., Xu, W.: Advances and challenges in log analysis. Communications of the ACM 55(2) (2012) ▪ Igor Kotenko, Olga Polubelova, Andrey Chechulin, and Igor Saenko. 2013. Design and Implementation of a Hybrid Ontological-Relational Data Repository for SIEM Systems. Future Internet 5, 3 (July 2013), 355–375. https://doi.org/10.3390/f5030355 ▪ Christopher Krügel, Thomas Toth, and Clemens Kerer. 2002. Decentralized Event Correlation for Intrusion Detection. In Information Security and Cryptology — ICISC 2001, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, and Kwangjo Kim (Eds.), Vol. 2288. Springer Berlin Heidelberg, Berlin, Heidelberg, 114–131. https: //doi.org/10.1007/3-540-45861-1_10 ▪ Esther Palomar Guillermo Suárez de Tangil. 2013. Advances in Security Information Management: Perceptions and Outcomes. NovaScience Publishers, Incorporated, Commack, NY, USA ▪ Michael R Grimaila, Justin Myers, Robert F Mills, and Gilbert Peterson. 2012. Design and Analysis of a Dynamically Confgured Log-based Distributed Security Event Detection Methodology. The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 9, 3 (July 2012), 219–241. https: //doi.org/10.1177/1548512911399303
  26. 26. Thank you! Kabul Kurniawan, Email: kabul.kurniawan@wu.ac.at Web: kabulkurniawan.github.io twitter: @kabulkurniawan 26

×