SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Virtual Knowledge Graphs for
Federated Log Analysis
Kabul Kurniawan (WU Wien, Uni Wien)
Andreas Ekelhart (WU Wien)
Elmar Kiesling (WU Wien)
Dietmar Winkler (TU Wien)
Gerald Quirchmayr (Uni Wien)
A Min Tjoa (TU Wien)
This work was funded by the Austrian Science Fund (FWF) and netidee SCIENCE under grant P30437-N31, as well as the
Austrian Research Promotion Agency FFG under grant 877389 (OBARIS).
Vienna, August 17 – 20, 2021
Mar 9 12:00:45 Client02 systemd-logind[1201]: New seat seat0.
Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event3 (AT Translated Set 2 keyboard)
Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event1 (AT Translated Set 2 keyboard)
Mar 9 12:00:45 Client02 sshd[1281]: Server listening on 0.0.0.0 port 22.
Mar 9 12:00:45 Client02 sshd[1281]: Server listening on :: port 22.
Mar 9 12:10:50 Client02 sshd[2124]: Accepted password for jhalley from 185.81.215.145 port 52410 ssh2
Mar 9 12:10:50 Client02 sshd[2124]: pam_unix(sshd:session): session opened for user jhalley by (uid=0)
Mar 9 12:10:50 Client02 systemd-logind[1201]: New session 1 of user jhalley.
Mar 9 12:10:50 Client02 systemd: pam_unix(systemd-user:session): session opened for user jhalley by (uid=0)
Mar 9 12:15:34 Client02 sshd[2555]: Did not receive identification string from 51.68.71.229 port 38508
Mar 9 12:15:48 Client02 sudo: jhalley : TTY=pts/0 ; PWD=/home/jhalley ; USER=root ; COMMAND=/usr/bin/apt-get update
Mar 9 12:15:48 Client02 sudo: pam_unix(sudo:session): session opened for user root by jhalley(uid=0)
Mar 9 12:15:57 Client02 sudo: pam_unix(sudo:session): session closed for user root
Mar 9 12:15:59 Client02 sudo: jhalley : TTY=pts/0 ; PWD=/home/jhalley ; USER=root ; COMMAND=/usr/bin/apt-get install xfce4
Mar 9 12:15:59 Client02 sudo: pam_unix(sudo:session): session opened for user root by jhalley(uid=0)
Mar 9 12:17:01 Client02 CRON[4546]: pam_unix(cron:session): session opened for user root by (uid=0)
Mar 9 12:17:01 Client02 CRON[4546]: pam_unix(cron:session): session closed for user root
Mar 9 12:18:06 Client02 groupadd[6959]: group added to /etc/group: name=rtkit, GID=115
Mar 9 12:18:06 Client02 groupadd[6959]: group added to /etc/gshadow: name=rtkit
Mar 9 12:18:06 Client02 groupadd[6959]: new group: name=rtkit, GID=115
Mar 9 12:18:06 Client02 useradd[6963]: new user: name=rtkit, UID=111, GID=115, home=/proc, shell=/usr/sbin/nologin
Mar 9 12:18:06 Client02 usermod[6969]: change user 'rtkit' password
Mar 9 12:18:06 Client02 chage[6974]: changed password expiry for rtkit
Mar 9 12:18:06 Client02 chfn[6977]: changed user 'rtkit' information
Mar 9 12:18:11 Client02 useradd[7149]: new user: name=usbmux, UID=112, GID=46, home=/var/lib/usbmux, shell=/usr/sbin/nologin
Mar 9 12:18:11 Client02 usermod[7155]: change user 'usbmux' password
Mar 9 12:18:11 Client02 chage[7160]: changed password expiry for usbmux
Mar 9 12:18:11 Client02 chfn[7163]: changed user 'usbmux' information
Mar 9 12:18:24 Client02 groupadd[7508]: group added to /etc/group: name=pulse, GID=116
Mar 9 12:18:24 Client02 groupadd[7508]: group added to /etc/gshadow: name=pulse
2
Motivation
3
kern
auth
sys
audit
audit
access
ftp
audit
sys
auth
Challenges
Existing Solutions
▪ Centralized Log Management
▪ Ingest log sources from multiple endpoints, parse and index them into a central
database to analyze. [Kotenko et al., 2013]
▪ Bandwidth-intensive and computationally demanding.
[Grimaila et al., 2012], [Guillermo, 2013]
▪ Decentralized Log Analysis
▪ Partly shift the computational workloads (log pre-processing and analysis) to the log-
producing hosts. [Grimaila et al., 2012]
▪ Primarily for correlation and alerting, rather than to query dispersed log data.
[Krugel et al., 2001]
▪ Continuously ingest all log data may consume a lot of local point resources.
▪ Current solutions lack semantic relations between entities [Oliner et al., 2012],
hence it is difficult to:
▪ Integrate partial and isolated views on system states.
▪ Contextualize, link and query log data.
4
5
R1. Resource-efficiency
▪ Avoid unnecessary log processing, minimize source requirements
(storage space and network bandwidth).
R2. Aggregation and integration over multiple endpoints
▪ Concurrently execute federated endpoints and deliver results.
R3. Contextualization & Background-Linking
▪ Ability to contextualize, integrate and link to background knowledge.
R4. Standards-based query language
▪ Use of an expressive, standardized query language.
Requirements
6
Virtual Knowledge Graph (VKG) for Federated Log Analysis
 A method to execute federated, graph pattern-based queries on dispersed,
heterogeneous raw log data by dynamically constructing virtual knowledge
graphs.
 We introduce a method that:
 Extracts only potentially relevant log messages only on demand.
 Integrates the dispersed log events into a common graph.
 Federates graph-pattern based queries across endpoints.
 Links them to background knowledge.
Proposed Approach
Virtual Knowledge Graph Concept
▪ Data Virtualization (V)
▪ No actual data source are exposed.
▪ No data integration materialization.
▪ Graph Representation (G)
▪ Nodes: Object/Data value
representation.
▪ Edges: relations between nodes.
▪ Domain Knowledge (K)
▪ Concept and property hierarchies.
▪ Domain and range of properties.
7
Guohui Xiao et al, 2019
8
Architecture
Virtual Knowledge Graphs for Federated
Log Analysis
9
9
Virtual Knowledge Graphs for Federated
Log Analysis
Query Parsing Example
10
PREFIX cl: <https://w3id.org/sepses/vocab/log/core#>
PREFIX auth: <https://w3id.org/sepses/vocab/log/auth#>
SELECT ?s ?message WHERE {
?s cl:message ?message.
FILTER regex(?message,"Invalid user")
}
{
"queryType": "SELECT",
"variables": [
{
"termType": "Variable",
"value": "s"
},
{
"termType": "Variable",
"value": "message"
}
],
"where": [
{
"type": "bgp",
"triples": [
{
"subject": {
"termType": "Variable",
"value": "s"
},
"predicate": {
"termType": "NamedNode",
"value": "https://w3id.org/sepses/vocab/log/core#message"
},
….
SPARQL Query in a structured format (i.e. JSON)
SPARQL Query
11
Set the targeted host(s)
Set the targeted knowledge
Set the analysis time-frame
defined log prefix
Select log lines based on:
triple patterns and/or
filters
Query Option:
- Targeted host: {Host1, Host2, Host (n)… }
- Targeted Background-knowledge: {K1, K2, K3, K(n)}
- Timeframe: <startTime>- <endTime>
Parse log data and
execute the query based
on this criteria!
PREFIX 1: <URI1#>
PREFIX 2: <URI2#>
PREFIX (n): <URI(n)#>
SELECT ?s ?p ?o. WHERE {
?s ?p ?o. …….. triple-pattern(1)
?s ?p2 ?o2. …….. triple-pattern(2)
?s ?p3 ?o3. ….…. triple-pattern(3)
?s ?p4 <String/Literal> ; ….…. triple-pattern(4)
?s ?p(n) ?o(n).. ……...triple-pattern(n)
……..
FILTER regex(?o,“String/Literal")
}
SPARQL Query:
Query Translation
12
Query Translation
13
Log Graph Generation Example
14
QUERY
Virtual Knowledge Graphs for Federated
Log Analysis
15
QUERY
Virtual Knowledge Graphs for Federated
Log Analysis
16
• Vocabularies: https://w3id.org/sepses/vocab/
• Cybersecurity Knowledge Graph: http://sepses.ifs.tuwien.ac.at
• Software:
• Log Parser: Virtual Log Parser (Java-based)
• Query Processor: Virtual Log Query Processor (Web-based)
• State of the art Libraries:
• SPARQL Parser: SPARQL.js
• Log Extractor: Grok Pattern
• RDF-Mapper: CARML
• RDF-Compressor: HDT
• Query Engine: Communica
Targeted Hosts
Targeted Background Knowledge
Time Range
Query Collections
Query Editor
Execution & Reset Button
Query Result
Query Processor (User Interface)
Prototype Implementation
17
Cybersecurity
Knowledge
Base
IDS
Server
Analyst
Scenario: “Find vulnerabilities and potential mitigations from the IDS-
Snort log”
SPARQL Query:
Use-Case 1: Intrusion Detection and
Background Linking
18
Query Results:
Graph Visualization:
Use-Case 1: Intrusion Detection and
Background Linking
19
File Server
Web Server
Analyst
Scenario:
Database Server
Internal-
Background
Knowledge
“Successful Login Events from SSH Connections across
hosts”
SPARQL Query:
Use-Case 2 : Network Monitoring
20
Query Results:
Graph Visualization:
Use-Case 2 : Network Monitoring
21
Evaluation Setup:
• Machines: Microsoft Azure Virtual Machine with a Linux host (2.59 GHz vCPU, 16 GB RAM)
and Windows host for log analysis (2.90 GHz CPU, 16 GB RAM).
• Dataset: AIT log dataset (V1.1) that simulates six days of user access across multiple web
servers.
• We split large log files into smaller files.
• We reported the average time over five runs for each experiment.
Dataset description:
Evaluation
22
• Graph Compression
• Dynamic Log
Graph Generation
• Log Graph
Generation
Evaluation – Single Host
23
Experiment Timeframe Query execution time in a federated setting for
different time frames
Evaluation Setup:
• Machines: Microsoft Azure Virtual Machines with seven hosts (4 Windows and 3 Linux -
2.59 GHz vCPU, 16 GB RAM) and a Windows host for log analysis (2.90 GHz CPU, 16 GB
RAM)
• Dataset: Apache Log from the AIT log dataset (V1.1)
• Host 1 to host 4 store the data from the original 4 servers in the dataset
• We reported the average times over five runs for each experiment.
Evaluation – Multiple Hosts
▪ A novel approach for federated log analysis based on virtual knowledge graphs.
▪ A prototype and vocabularies demonstrated in security analytics.
▪ Evaluation: the log processing time is primarily a function of the number of extracted
(relevant) log lines and queried hosts.
Limitations:
▪ The query parameters should restrict the extracted log lines.
▪ Not a replacement for existing SIEM system.
Future Work:
▪ Query analysis improvement (i.e., automatic hosts and background-knowledge
selection).
▪ Streaming based Virtual Knowledge Graph for Log Monitoring.
24
Conclusion
References
25
Resource
Paper
▪ Oliner, A., Ganapathi, A., Xu, W.: Advances and challenges in log analysis.
Communications of the ACM 55(2) (2012)
▪ Igor Kotenko, Olga Polubelova, Andrey Chechulin, and Igor Saenko. 2013. Design and
Implementation of a Hybrid Ontological-Relational Data Repository for SIEM Systems.
Future Internet 5, 3 (July 2013), 355–375. https://doi.org/10.3390/f5030355
▪ Christopher Krügel, Thomas Toth, and Clemens Kerer. 2002. Decentralized Event
Correlation for Intrusion Detection. In Information Security and Cryptology — ICISC
2001, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, and Kwangjo Kim (Eds.), Vol.
2288. Springer Berlin Heidelberg, Berlin, Heidelberg, 114–131. https:
//doi.org/10.1007/3-540-45861-1_10
▪ Esther Palomar Guillermo Suárez de Tangil. 2013. Advances in Security Information
Management: Perceptions and Outcomes. NovaScience Publishers, Incorporated,
Commack, NY, USA
▪ Michael R Grimaila, Justin Myers, Robert F Mills, and Gilbert Peterson. 2012. Design
and Analysis of a Dynamically Confgured Log-based Distributed Security Event
Detection Methodology. The Journal of Defense Modeling and Simulation: Applications,
Methodology, Technology 9, 3 (July 2012), 219–241. https:
//doi.org/10.1177/1548512911399303
Thank you!
Kabul Kurniawan,
Email: kabul.kurniawan@wu.ac.at
Web: kabulkurniawan.github.io
twitter: @kabulkurniawan
26

Más contenido relacionado

La actualidad más candente

AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
Víctor Zabalza
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
Víctor Zabalza
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Final Presentation IRT - Jingxuan Wei V1.2
Final Presentation  IRT - Jingxuan Wei V1.2Final Presentation  IRT - Jingxuan Wei V1.2
Final Presentation IRT - Jingxuan Wei V1.2
JINGXUAN WEI
 

La actualidad más candente (20)

Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Spark Streaming Intro @KTech
Spark Streaming Intro @KTechSpark Streaming Intro @KTech
Spark Streaming Intro @KTech
 
An Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning TechniquesAn Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning Techniques
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Final Presentation IRT - Jingxuan Wei V1.2
Final Presentation  IRT - Jingxuan Wei V1.2Final Presentation  IRT - Jingxuan Wei V1.2
Final Presentation IRT - Jingxuan Wei V1.2
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
 

Similar a Virtual Knowledge Graphs for Federated Log Analysis

BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
lilyco
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
IJERA Editor
 

Similar a Virtual Knowledge Graphs for Federated Log Analysis (20)

Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Apache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysisApache CarbonData:New high performance data format for faster data analysis
Apache CarbonData:New high performance data format for faster data analysis
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
[PLCUG] Splunk - complete Citrix environment monitoring
[PLCUG] Splunk - complete Citrix environment monitoring[PLCUG] Splunk - complete Citrix environment monitoring
[PLCUG] Splunk - complete Citrix environment monitoring
 
Dynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using ContainersDynamic Resource Allocation Algorithm using Containers
Dynamic Resource Allocation Algorithm using Containers
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case StudyDissecting Open Source Cloud Evolution: An OpenStack Case Study
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Intro to HPC
Intro to HPCIntro to HPC
Intro to HPC
 
Handout3o
Handout3oHandout3o
Handout3o
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
NBIS ChIP-seq course
NBIS ChIP-seq courseNBIS ChIP-seq course
NBIS ChIP-seq course
 

Último

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 

Último (20)

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

Virtual Knowledge Graphs for Federated Log Analysis

  • 1. Virtual Knowledge Graphs for Federated Log Analysis Kabul Kurniawan (WU Wien, Uni Wien) Andreas Ekelhart (WU Wien) Elmar Kiesling (WU Wien) Dietmar Winkler (TU Wien) Gerald Quirchmayr (Uni Wien) A Min Tjoa (TU Wien) This work was funded by the Austrian Science Fund (FWF) and netidee SCIENCE under grant P30437-N31, as well as the Austrian Research Promotion Agency FFG under grant 877389 (OBARIS). Vienna, August 17 – 20, 2021
  • 2. Mar 9 12:00:45 Client02 systemd-logind[1201]: New seat seat0. Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event0 (Power Button) Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event3 (AT Translated Set 2 keyboard) Mar 9 12:00:45 Client02 systemd-logind[1201]: Watching system buttons on /dev/input/event1 (AT Translated Set 2 keyboard) Mar 9 12:00:45 Client02 sshd[1281]: Server listening on 0.0.0.0 port 22. Mar 9 12:00:45 Client02 sshd[1281]: Server listening on :: port 22. Mar 9 12:10:50 Client02 sshd[2124]: Accepted password for jhalley from 185.81.215.145 port 52410 ssh2 Mar 9 12:10:50 Client02 sshd[2124]: pam_unix(sshd:session): session opened for user jhalley by (uid=0) Mar 9 12:10:50 Client02 systemd-logind[1201]: New session 1 of user jhalley. Mar 9 12:10:50 Client02 systemd: pam_unix(systemd-user:session): session opened for user jhalley by (uid=0) Mar 9 12:15:34 Client02 sshd[2555]: Did not receive identification string from 51.68.71.229 port 38508 Mar 9 12:15:48 Client02 sudo: jhalley : TTY=pts/0 ; PWD=/home/jhalley ; USER=root ; COMMAND=/usr/bin/apt-get update Mar 9 12:15:48 Client02 sudo: pam_unix(sudo:session): session opened for user root by jhalley(uid=0) Mar 9 12:15:57 Client02 sudo: pam_unix(sudo:session): session closed for user root Mar 9 12:15:59 Client02 sudo: jhalley : TTY=pts/0 ; PWD=/home/jhalley ; USER=root ; COMMAND=/usr/bin/apt-get install xfce4 Mar 9 12:15:59 Client02 sudo: pam_unix(sudo:session): session opened for user root by jhalley(uid=0) Mar 9 12:17:01 Client02 CRON[4546]: pam_unix(cron:session): session opened for user root by (uid=0) Mar 9 12:17:01 Client02 CRON[4546]: pam_unix(cron:session): session closed for user root Mar 9 12:18:06 Client02 groupadd[6959]: group added to /etc/group: name=rtkit, GID=115 Mar 9 12:18:06 Client02 groupadd[6959]: group added to /etc/gshadow: name=rtkit Mar 9 12:18:06 Client02 groupadd[6959]: new group: name=rtkit, GID=115 Mar 9 12:18:06 Client02 useradd[6963]: new user: name=rtkit, UID=111, GID=115, home=/proc, shell=/usr/sbin/nologin Mar 9 12:18:06 Client02 usermod[6969]: change user 'rtkit' password Mar 9 12:18:06 Client02 chage[6974]: changed password expiry for rtkit Mar 9 12:18:06 Client02 chfn[6977]: changed user 'rtkit' information Mar 9 12:18:11 Client02 useradd[7149]: new user: name=usbmux, UID=112, GID=46, home=/var/lib/usbmux, shell=/usr/sbin/nologin Mar 9 12:18:11 Client02 usermod[7155]: change user 'usbmux' password Mar 9 12:18:11 Client02 chage[7160]: changed password expiry for usbmux Mar 9 12:18:11 Client02 chfn[7163]: changed user 'usbmux' information Mar 9 12:18:24 Client02 groupadd[7508]: group added to /etc/group: name=pulse, GID=116 Mar 9 12:18:24 Client02 groupadd[7508]: group added to /etc/gshadow: name=pulse 2 Motivation
  • 4. Existing Solutions ▪ Centralized Log Management ▪ Ingest log sources from multiple endpoints, parse and index them into a central database to analyze. [Kotenko et al., 2013] ▪ Bandwidth-intensive and computationally demanding. [Grimaila et al., 2012], [Guillermo, 2013] ▪ Decentralized Log Analysis ▪ Partly shift the computational workloads (log pre-processing and analysis) to the log- producing hosts. [Grimaila et al., 2012] ▪ Primarily for correlation and alerting, rather than to query dispersed log data. [Krugel et al., 2001] ▪ Continuously ingest all log data may consume a lot of local point resources. ▪ Current solutions lack semantic relations between entities [Oliner et al., 2012], hence it is difficult to: ▪ Integrate partial and isolated views on system states. ▪ Contextualize, link and query log data. 4
  • 5. 5 R1. Resource-efficiency ▪ Avoid unnecessary log processing, minimize source requirements (storage space and network bandwidth). R2. Aggregation and integration over multiple endpoints ▪ Concurrently execute federated endpoints and deliver results. R3. Contextualization & Background-Linking ▪ Ability to contextualize, integrate and link to background knowledge. R4. Standards-based query language ▪ Use of an expressive, standardized query language. Requirements
  • 6. 6 Virtual Knowledge Graph (VKG) for Federated Log Analysis  A method to execute federated, graph pattern-based queries on dispersed, heterogeneous raw log data by dynamically constructing virtual knowledge graphs.  We introduce a method that:  Extracts only potentially relevant log messages only on demand.  Integrates the dispersed log events into a common graph.  Federates graph-pattern based queries across endpoints.  Links them to background knowledge. Proposed Approach
  • 7. Virtual Knowledge Graph Concept ▪ Data Virtualization (V) ▪ No actual data source are exposed. ▪ No data integration materialization. ▪ Graph Representation (G) ▪ Nodes: Object/Data value representation. ▪ Edges: relations between nodes. ▪ Domain Knowledge (K) ▪ Concept and property hierarchies. ▪ Domain and range of properties. 7 Guohui Xiao et al, 2019
  • 8. 8 Architecture Virtual Knowledge Graphs for Federated Log Analysis
  • 9. 9 9 Virtual Knowledge Graphs for Federated Log Analysis
  • 10. Query Parsing Example 10 PREFIX cl: <https://w3id.org/sepses/vocab/log/core#> PREFIX auth: <https://w3id.org/sepses/vocab/log/auth#> SELECT ?s ?message WHERE { ?s cl:message ?message. FILTER regex(?message,"Invalid user") } { "queryType": "SELECT", "variables": [ { "termType": "Variable", "value": "s" }, { "termType": "Variable", "value": "message" } ], "where": [ { "type": "bgp", "triples": [ { "subject": { "termType": "Variable", "value": "s" }, "predicate": { "termType": "NamedNode", "value": "https://w3id.org/sepses/vocab/log/core#message" }, …. SPARQL Query in a structured format (i.e. JSON) SPARQL Query
  • 11. 11 Set the targeted host(s) Set the targeted knowledge Set the analysis time-frame defined log prefix Select log lines based on: triple patterns and/or filters Query Option: - Targeted host: {Host1, Host2, Host (n)… } - Targeted Background-knowledge: {K1, K2, K3, K(n)} - Timeframe: <startTime>- <endTime> Parse log data and execute the query based on this criteria! PREFIX 1: <URI1#> PREFIX 2: <URI2#> PREFIX (n): <URI(n)#> SELECT ?s ?p ?o. WHERE { ?s ?p ?o. …….. triple-pattern(1) ?s ?p2 ?o2. …….. triple-pattern(2) ?s ?p3 ?o3. ….…. triple-pattern(3) ?s ?p4 <String/Literal> ; ….…. triple-pattern(4) ?s ?p(n) ?o(n).. ……...triple-pattern(n) …….. FILTER regex(?o,“String/Literal") } SPARQL Query: Query Translation
  • 14. 14 QUERY Virtual Knowledge Graphs for Federated Log Analysis
  • 15. 15 QUERY Virtual Knowledge Graphs for Federated Log Analysis
  • 16. 16 • Vocabularies: https://w3id.org/sepses/vocab/ • Cybersecurity Knowledge Graph: http://sepses.ifs.tuwien.ac.at • Software: • Log Parser: Virtual Log Parser (Java-based) • Query Processor: Virtual Log Query Processor (Web-based) • State of the art Libraries: • SPARQL Parser: SPARQL.js • Log Extractor: Grok Pattern • RDF-Mapper: CARML • RDF-Compressor: HDT • Query Engine: Communica Targeted Hosts Targeted Background Knowledge Time Range Query Collections Query Editor Execution & Reset Button Query Result Query Processor (User Interface) Prototype Implementation
  • 17. 17 Cybersecurity Knowledge Base IDS Server Analyst Scenario: “Find vulnerabilities and potential mitigations from the IDS- Snort log” SPARQL Query: Use-Case 1: Intrusion Detection and Background Linking
  • 18. 18 Query Results: Graph Visualization: Use-Case 1: Intrusion Detection and Background Linking
  • 19. 19 File Server Web Server Analyst Scenario: Database Server Internal- Background Knowledge “Successful Login Events from SSH Connections across hosts” SPARQL Query: Use-Case 2 : Network Monitoring
  • 21. 21 Evaluation Setup: • Machines: Microsoft Azure Virtual Machine with a Linux host (2.59 GHz vCPU, 16 GB RAM) and Windows host for log analysis (2.90 GHz CPU, 16 GB RAM). • Dataset: AIT log dataset (V1.1) that simulates six days of user access across multiple web servers. • We split large log files into smaller files. • We reported the average time over five runs for each experiment. Dataset description: Evaluation
  • 22. 22 • Graph Compression • Dynamic Log Graph Generation • Log Graph Generation Evaluation – Single Host
  • 23. 23 Experiment Timeframe Query execution time in a federated setting for different time frames Evaluation Setup: • Machines: Microsoft Azure Virtual Machines with seven hosts (4 Windows and 3 Linux - 2.59 GHz vCPU, 16 GB RAM) and a Windows host for log analysis (2.90 GHz CPU, 16 GB RAM) • Dataset: Apache Log from the AIT log dataset (V1.1) • Host 1 to host 4 store the data from the original 4 servers in the dataset • We reported the average times over five runs for each experiment. Evaluation – Multiple Hosts
  • 24. ▪ A novel approach for federated log analysis based on virtual knowledge graphs. ▪ A prototype and vocabularies demonstrated in security analytics. ▪ Evaluation: the log processing time is primarily a function of the number of extracted (relevant) log lines and queried hosts. Limitations: ▪ The query parameters should restrict the extracted log lines. ▪ Not a replacement for existing SIEM system. Future Work: ▪ Query analysis improvement (i.e., automatic hosts and background-knowledge selection). ▪ Streaming based Virtual Knowledge Graph for Log Monitoring. 24 Conclusion
  • 25. References 25 Resource Paper ▪ Oliner, A., Ganapathi, A., Xu, W.: Advances and challenges in log analysis. Communications of the ACM 55(2) (2012) ▪ Igor Kotenko, Olga Polubelova, Andrey Chechulin, and Igor Saenko. 2013. Design and Implementation of a Hybrid Ontological-Relational Data Repository for SIEM Systems. Future Internet 5, 3 (July 2013), 355–375. https://doi.org/10.3390/f5030355 ▪ Christopher Krügel, Thomas Toth, and Clemens Kerer. 2002. Decentralized Event Correlation for Intrusion Detection. In Information Security and Cryptology — ICISC 2001, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, and Kwangjo Kim (Eds.), Vol. 2288. Springer Berlin Heidelberg, Berlin, Heidelberg, 114–131. https: //doi.org/10.1007/3-540-45861-1_10 ▪ Esther Palomar Guillermo Suárez de Tangil. 2013. Advances in Security Information Management: Perceptions and Outcomes. NovaScience Publishers, Incorporated, Commack, NY, USA ▪ Michael R Grimaila, Justin Myers, Robert F Mills, and Gilbert Peterson. 2012. Design and Analysis of a Dynamically Confgured Log-based Distributed Security Event Detection Methodology. The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 9, 3 (July 2012), 219–241. https: //doi.org/10.1177/1548512911399303
  • 26. Thank you! Kabul Kurniawan, Email: kabul.kurniawan@wu.ac.at Web: kabulkurniawan.github.io twitter: @kabulkurniawan 26