SlideShare una empresa de Scribd logo
1 de 42
LDBC
Social Network Benchmark (SNB)
Graph Generator
graph benchmarking to the next level
Peter Boncz
Why Benchmarking?
• make
competing
products
comparable

© Jim Gray, 2005

• accelerate
progress,
make
technology
viable
What is the LDBC?
Linked Data Benchmark Council = LDBC
• Industry entity similar to TPC (www.tpc.org)
• Focusing on graph and RDF store benchmarking

•
•
•
•
•
•

Kick-started by an EU project
Runs from September 2012 – March 2015
9 project partners:
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
SNB Scenario: Social Network Analysis
• Intuitive: everybody knows what a SN is
– Facebook, Twitter, LinkedIn, …

• SNs can be easily represented as a graph

– Entities are the nodes (Person, Group, Tag, Post, ...)
– Relationships are the edges (Friend, Likes, Follows, …)

• Different scales: from small to very large SNs
– Up to billions of nodes and edges

• Multiple query needs:

– interactive, analytical, transactional

• Multiple types of uses:

– marketing, recommendation, social interactions, fraud
detection, ...

•
Audience
• For developers facing graph processing tasks
– recognizable scenario to compare merits of different
products and technologies

• For vendors of graph database technology
– checklist of features and performance characteristics

• For researchers, both industrial and academic
– challenges in multiple choke-point areas such as
graph query optimization and (distributed) graph
analysis

•
What was developed?
• Four main elements:
–
–
–

data schema: defines the structure of the data
workloads: defines the set of operations to perform
performance metrics: used to measure
(quantitatively) the performance of the systems
– execution rules: defined to assure that the results
from different executions of the benchmark are
valid and comparable

• Software as Open Source (GitHub)
– data generator, query drivers, validation tools, ...

•
•
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
Data Schema
• Specified in UML for portability
–
–
–

Classes
associations between classes
Attributes for classes and associations

• Some of the relationships represent dimensions
– Time (Y,QT,Month,Day)
– Geography (Continent,Country,Place)

• Data Formats

– CSV
– RDF (Turtle + N3)
–

•
Data Schema
Data Generation Process
• Produce synthetic data that mimics the
characteristics of real SN data
• Based on SIB–S3G2 Social Graph Generator

– www.w3.org/wiki/Social_Network_Intelligence_BenchM
ark

• Graph model:

– property (directed labeled) graph

• Building blocks
–
–
–

•

Correlated data dictionaries
Sequential subtree generation
Graph generation among correlation dimensions
Choke Points
• Hidden challenges in a benchmark
influence database system design, e.g. TPC-H
•
•
•

Functional Dependency Analysis in aggregation
Bloom Filters for sparse joins
Subquery predicate propagation

–

• LDBC explicitly designs benchmarks looking at
choke-point “coverage”
– requires access to database kernel architects

•
Data Correlations between attributes
SELECT personID from person
WHERE firstName = ‘Joachim’ AND addressCountry = ‘Germany’

Anti-Correlation

SELECT personID from person
WHERE firstName = ‘Cesare’

AND addressCountry = ‘Italy’

§
§ Query optimizers may underestimate or overestimate the result size of
conjunctive predicates
Joachim Loew
Cesare

Cesare
Joachim Prandelli
Property Dictionaries
• Property values for each literal property
defined by:
– a dictionary D, a fixed set of values (e.g., terms
from DBPedia or GeoNames)
– a ranking function R (between 1 and|D|)
– a probability density function F that specifies
how the data generator chooses values from
dictionary D using the rank R for each term in
the dictionary
Compact Correlated Property Value Generation
Using geometric distribution for function
F()
Correlated Edge Generation
“1990”

<Britney Spears>
rth
e>
Student
lik
Ye
<
“University of< stu
s>
ar
d yA
e> “Anna”
> know
m
Leipzig”
t>
<
tn a
firs
e>
< liveAt>
m
<knows>
P5
<
“Germany”
<firstna
P1
“Laura”
<birth
Year>
“1990”
s>
w
<Britney
no
<k
Spears>
P3
P2
>
<s
“University of
At
tu
y
d
dy
Leipzig”
At
stu
<
>
“University of
“University of
Amsterdam”
Leipzig”
“1990” “Netherlands”
<b
i

s
ow

< li
ke

n
<k

>

<is

>

P4

tu
<s

>

>
At
dy

>

ear>

At
<live

Y
<birth
How to Generate a Correlated Graph?
Ye

?

?

ear>

tu
<s
>

connection
probability

<birth
Y

?

<li
k

e>

>
ar

At
dy

<Britney Spears>

>

h
irt
<b

“University of < s
t ud
yAt
Leipzig”
>
me>
<firstna
P1
“Laura”

<Britney Spears>
• Computeke>
li similarity of two nodes based
P4 <
Student
on their (correlated) properties.
“Anna”
e>function wrt
• ?Use a probability density
m
tn a
irs
ffor connecting nodes
<
to this similarity <liveAt> “Germany”
P5
?
<is

“1990”

P3

“1990”

Y
<birth

“University of
>
P2
At
Leipzig”
highly similar  less similar
dy
tu
<s
<s
> tudy
At
“University this is very expensive to compute on a large
Danger:
“University of
of Leipzig”
Amsterdam”
graph! (quadratic, random access)
“1990”
“Netherlands”

?

At>
<live

e ar >
Window Optimization
Ye

<birth
Y

no
<k

e>

>
ar

>

n
<k

P3

At
dy

<Britney Spears>

s>
w
o

ear>

tu
<s
>

connection
probability

ws

<li
k

>

h
irt
<b

“University of < s
t ud
yAt
Leipzig”
>
me>
<firstna
P1
“Laura”

<is

“1990”

>
ke <Britney Spears>
i
P4 <l nodes with too large similarity distance
Trick: disregard
Student
(only connect nodes in a similarity window)
s>
e> “Anna”
ow
m
<kn
tn a
Window <firs
<knows>
P5
<liveAt> “Germany”

“1990”

Y
<birth

“University of
Leipzig”
highly similar  less similar
<s
<s
> tudy
A
“University
Probability that two nodes are connectedt is “University w.r.t
skewed of
of similarity between the nodes (due to probability
the Leipzig”
Amsterdam”
“1990”
“Netherlands”
>
At
y
ud
t

At>
<live

e ar >

distr.)

P2
Graph generation
• Create new edges and nodes in one pass, starting
from an existing node and start creating new
nodes to which it will be connected
• Uses degree distributions that follow a power law
• Generates correlated and highly connected graph
data by creating edges after generating many
nodes
– Similarity metric
– Random dimension
Implementation
• MapReduce

– A Map function runs on different parts of the input data, in
parallel and on many clusters
– Main Reduce function:
•
•

Sort the correlation dimensions
Generate edges based on sliding windows (similarity metric)

– Successive MapReduce phases for different correlation
dimensions

• Scalable, e.g.:

– Generate 1.2TB of data in 30 minutes (16 4-core nodes)
–

•
Validation: Metrics
•
•
•
•
•
•
•
•
•

Largest Connected Component
Average Clustering Coefficient
Diameter
Average Path Length
Hop-plot User-Knows
Attribute distributions
Degree distributions
Time evolution
Statistics (100K users / 1 year)
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
Workloads
• On-Line: tests a system's throughput with relatively
simple queries with concurrent updates

– Show all photos posted by my friends that I was tagged in
–

• Business Intelligence: consists of complex structured
queries for analyzing online behavior
– Influential people the topic of open source development?
–

• Graph Analytics: tests the functionality and scalability
on most of the data as a single operation
– PageRank, Shortest Path(s), Community Detection

•
Workloads by system
Interactive Query Set
• Tests system throughput with relatively simple queries and
concurrent updates
• Current set: 12 read-only queries
• For each query:
–
–
–
–
–

Name and detailed description in plain English
List of input parameters
Expected result: content and format
Textual functional description
Relevance:

• textual description (plain English) of the reasoning for including this
query in the workload
• discussion about the technical challenges (Choke Points) targeted

– Validation parameters and validation results
– SPARQL query
•

•
–
Some SNB Interactive Choke Points
• Graph Traversals. Query execution time heavily depends
on the ability to quickly traverse friends graph.
• Plan Variablility. Each query have many different best
plans depending on parameter choices (eg. Hash- vs
index-based joins).
• Top k and distinct: Many queries return the first results
in a specific order: Late projection, pushing conditions
from the sort into the query
• Repetitive short queries, differing only in literals,
opportunity for query plan recycling
Choke Point Coverage
Example: Q3

•

Name: Friends within 2 hops that have been in two countries
Description:
Find Friends and Friends of Friends of the user A that have made a post in
the foreign countries X and Y within a specified period. We count only posts
that are made in the country that is different from the country of a friend. The
result should be sorted descending by total number of posts, and then by person
URI. Top 20 should be shown. The user A (as friend of his friend) should not be
in the result
Parameter:
- Person
- CountryX
- CountryY
- startDate - the beginning of the requested period
- Duration - requested period in days
Result:
- Person.id, Person.firstname, Person.lastName
- Number of post of each country and the sum of all posts
Relevance:
- Choke Points: CP3.3
- If one country is large but anticorrelated with the country of self then
processing this before a smaller but positively correlated country can be
beneficial
Example: Q5 - SPARQL
select ?group count (*)
where {
{select distinct ?fr
where {
{%Person% snvoc:knows ?fr.} union
{%Person% snvoc:knows ?fr2.
?fr2 snvoc:knows ?fr. filter (?fr != %Person%)}
}
} .
?group snvoc:hasMember ?mem . ?mem snvoc:hasPerson ?fr .
?mem snvoc:joinDate ?date . filter (?date >=
"%Date0%"^^xsd:date) .
?post snvoc:hasCreator ?fr . ?group snvoc:containerOf ?post
}
group by ?group
order by desc(2) ?group
limit 20
Example: Q5 - Cypher
MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person)
WHERE person.id={person_id}
MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum)
WHERE membership.joinDate>{join_date}
MATCH (friend)<-[:HAS_CREATOR]-(comment:Comment)
WHERE (comment)-[:REPLY_OF*0..]->(:Comment)-[:REPLY_OF]->(:Post)<[:CONTAINER_OF]-(forum)
RETURN forum.title AS forum, count(comment) AS commentCount
ORDER BY commentCount DESC
MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person)
WHERE person.id={person_id}
MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum)
WHERE membership.joinDate>{join_date}
MATCH (friend)<-[:HAS_CREATOR]-(post:Post)<-[:CONTAINER_OF]-(forum)
RETURN forum.title AS forum, count(post) AS postCount
ORDER BY postCount DESC
Example: Q5 - DEX
v.setLongVoid(personId);
v.setLongVoid(personId);
long personOID = graph.findObject(personId, v);
long personOID = graph.findObject(personId, v);
Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing);
Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing);
Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing);
Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing);
allFriends.union(friends);
allFriends.union(friends);
allFriends.remove(personOID);
allFriends.remove(personOID);
friends.close();
friends.close();
Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing);
Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing);
v.setTimestampVoid(date);
v.setTimestampVoid(date);
Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members);
Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members);
Objects finalSelection = graph.tails(candidate);
Objects finalSelection = graph.tails(candidate);
candidate.close();
candidate.close();
members.close();
members.close();
Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing);
Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing);
ObjectsIterator iterator = finalSelection.iterator();
ObjectsIterator iterator = finalSelection.iterator();
while (iterator.hasNext()) {
while (iterator.hasNext()) {
long oid = iterator.next();
long oid = iterator.next();
Container c = new Container();
Container c = new Container();
Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing);
Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing);
Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing);
Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing);
long moderatorOid = moderators.any();
long moderatorOid = moderators.any();
moderators.close();
moderators.close();
Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing);
Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing);
postsGroup.difference(postsModerator);
postsGroup.difference(postsModerator);
postsModerator.close();
postsModerator.close();
postsGroup.intersection(posts);
postsGroup.intersection(posts);
long count = postsGroup.size();
long count = postsGroup.size();
if (count > 0) {
if (count > 0) {
graph.getAttribute(oid, forumId, v);
graph.getAttribute(oid, forumId, v);
c.row[0] = db.getForumURI(v.getLong());
c.row[0] = db.getForumURI(v.getLong());
c.compare2 = String.valueOf(v.getLong());
c.compare2 = String.valueOf(v.getLong());
c.row[1] = String.valueOf(count);
c.row[1] = String.valueOf(count);
c.compare = count;
c.compare = count;
results.add(c);
results.add(c);
}
}
postsGroup.close()
postsGroup.close()
}
}
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
SNB Software
• SNB data
– Dataset generator
– Dataset Validator
– SN Graph Analyzer

• SNB Query drivers
– BSBM based
– YCSB based
Some Experiments
• Virtuoso (RDF)
– 100k users during 3 years period (3.3 billion triples,
60GB)
– Ten SPARQL query mixes
– 4 x Intel Xeon 2.30GHz CPU, 193 GB of RAM

• DEX (Graph Database)
– Validation setup: 10k users during 3 years (19GB)
– Validation query set and parameters (API-based)
– 2 x Intel Xeon 2.40Ghz CPU, 128 GB of RAM

•
Virtuoso Interactive Workload
• Some queries could not be considered as truly interactive
– e.g. Q4, Q5 and Q9
– … still all queries are very interesting challenges

• ”Irregular” data distribution reflecting the reality of the SN
– … but complicates the selection of query parameters

•
•
Exploration in Scale
• 3.3 bn RDF triples per 100K users, 24G in triples,
36G in literals
• 2/3 of data in interactive working set, 1/4 in BI
working set
• scale out becomes increasingly necessary after 1M
• 10-100M users are data center scales
– as in real social networks
– larger scales will favor space efficient data models,
e.g. column store with a schema, but
– larger scales also have greater need for schema-last
features
DEX Interactive Workload
•
•
•
•
•
•

Query validation (no SPARQL)
Identified some of implementation choke points
New optimizations implemented and tested
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
Ongoing Work
• Finishing Interactive workload
– updates (transactional)
– substitution parameters

• New BI and Graph Analytical Workloads
• Data Generator Improvements

– improve dictionaries and distributions for BI
– Scale factors and dataset (SN graph) validation

• Query Drivers

– Parallel update generator

• Auditing Rules for SNB
•
•
Pointers
• Code&Queries: github.com/ldbc
– ldbc_socialnet_bm

• ldbc_socialnet_dbgen
• ldbc_socialnet_qgen

• Wiki: ldbc.eu:8090/display/TUC

– Background & Discussions + Detailed report:
ldbc.eu:8090/download/attachments/4325436/LDBC_S

• LDBC Technical User Community (TUC) meeting:
– Thursday April 3, CWI Amsterdam (see wiki – next
week)

•
Thank you!

Más contenido relacionado

La actualidad más candente

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeo4j
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondNeo4j
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudPeter Haase
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?Mo Patel
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
 
Swt infontology and ambient intelligence
Swt infontology and ambient intelligenceSwt infontology and ambient intelligence
Swt infontology and ambient intelligencekeith scharding
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT OperationsNeo4j
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected Data World
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jijtsrd
 
ISO 18876
ISO 18876ISO 18876
ISO 18876lenand
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data ScienceNeo4j
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 

La actualidad más candente (20)

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Tara Raafat
Tara RaafatTara Raafat
Tara Raafat
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with Graphs
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
Swt infontology and ambient intelligence
Swt infontology and ambient intelligenceSwt infontology and ambient intelligence
Swt infontology and ambient intelligence
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
ISO 18876
ISO 18876ISO 18876
ISO 18876
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 

Similar a FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz

FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big datajins0618
 
VCCORP SoICT 2018
VCCORP SoICT 2018VCCORP SoICT 2018
VCCORP SoICT 2018Tuan Hoang
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...Gábor Szárnyas
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesCédric Fauvet
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Databricks
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015Nathan Halko
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jNeo4j
 

Similar a FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz (20)

FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
VCCORP SoICT 2018
VCCORP SoICT 2018VCCORP SoICT 2018
VCCORP SoICT 2018
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 

Más de Ioan Toma

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczIoan Toma
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXIoan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionIoan Toma
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...Ioan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingIoan Toma
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadIoan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesIoan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsIoan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesIoan Toma
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in productionIoan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphIoan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...Ioan Toma
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolutionIoan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyIoan Toma
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 

Más de Ioan Toma (17)

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz

  • 1. LDBC Social Network Benchmark (SNB) Graph Generator graph benchmarking to the next level Peter Boncz
  • 2. Why Benchmarking? • make competing products comparable © Jim Gray, 2005 • accelerate progress, make technology viable
  • 3. What is the LDBC? Linked Data Benchmark Council = LDBC • Industry entity similar to TPC (www.tpc.org) • Focusing on graph and RDF store benchmarking • • • • • • Kick-started by an EU project Runs from September 2012 – March 2015 9 project partners:
  • 5. SNB Scenario: Social Network Analysis • Intuitive: everybody knows what a SN is – Facebook, Twitter, LinkedIn, … • SNs can be easily represented as a graph – Entities are the nodes (Person, Group, Tag, Post, ...) – Relationships are the edges (Friend, Likes, Follows, …) • Different scales: from small to very large SNs – Up to billions of nodes and edges • Multiple query needs: – interactive, analytical, transactional • Multiple types of uses: – marketing, recommendation, social interactions, fraud detection, ... •
  • 6. Audience • For developers facing graph processing tasks – recognizable scenario to compare merits of different products and technologies • For vendors of graph database technology – checklist of features and performance characteristics • For researchers, both industrial and academic – challenges in multiple choke-point areas such as graph query optimization and (distributed) graph analysis •
  • 7. What was developed? • Four main elements: – – – data schema: defines the structure of the data workloads: defines the set of operations to perform performance metrics: used to measure (quantitatively) the performance of the systems – execution rules: defined to assure that the results from different executions of the benchmark are valid and comparable • Software as Open Source (GitHub) – data generator, query drivers, validation tools, ... • •
  • 9. Data Schema • Specified in UML for portability – – – Classes associations between classes Attributes for classes and associations • Some of the relationships represent dimensions – Time (Y,QT,Month,Day) – Geography (Continent,Country,Place) • Data Formats – CSV – RDF (Turtle + N3) – •
  • 11. Data Generation Process • Produce synthetic data that mimics the characteristics of real SN data • Based on SIB–S3G2 Social Graph Generator – www.w3.org/wiki/Social_Network_Intelligence_BenchM ark • Graph model: – property (directed labeled) graph • Building blocks – – – • Correlated data dictionaries Sequential subtree generation Graph generation among correlation dimensions
  • 12. Choke Points • Hidden challenges in a benchmark influence database system design, e.g. TPC-H • • • Functional Dependency Analysis in aggregation Bloom Filters for sparse joins Subquery predicate propagation – • LDBC explicitly designs benchmarks looking at choke-point “coverage” – requires access to database kernel architects •
  • 13. Data Correlations between attributes SELECT personID from person WHERE firstName = ‘Joachim’ AND addressCountry = ‘Germany’ Anti-Correlation SELECT personID from person WHERE firstName = ‘Cesare’ AND addressCountry = ‘Italy’ § § Query optimizers may underestimate or overestimate the result size of conjunctive predicates Joachim Loew Cesare Cesare Joachim Prandelli
  • 14. Property Dictionaries • Property values for each literal property defined by: – a dictionary D, a fixed set of values (e.g., terms from DBPedia or GeoNames) – a ranking function R (between 1 and|D|) – a probability density function F that specifies how the data generator chooses values from dictionary D using the rank R for each term in the dictionary
  • 15. Compact Correlated Property Value Generation Using geometric distribution for function F()
  • 16. Correlated Edge Generation “1990” <Britney Spears> rth e> Student lik Ye < “University of< stu s> ar d yA e> “Anna” > know m Leipzig” t> < tn a firs e> < liveAt> m <knows> P5 < “Germany” <firstna P1 “Laura” <birth Year> “1990” s> w <Britney no <k Spears> P3 P2 > <s “University of At tu y d dy Leipzig” At stu < > “University of “University of Amsterdam” Leipzig” “1990” “Netherlands” <b i s ow < li ke n <k > <is > P4 tu <s > > At dy > ear> At <live Y <birth
  • 17. How to Generate a Correlated Graph? Ye ? ? ear> tu <s > connection probability <birth Y ? <li k e> > ar At dy <Britney Spears> > h irt <b “University of < s t ud yAt Leipzig” > me> <firstna P1 “Laura” <Britney Spears> • Computeke> li similarity of two nodes based P4 < Student on their (correlated) properties. “Anna” e>function wrt • ?Use a probability density m tn a irs ffor connecting nodes < to this similarity <liveAt> “Germany” P5 ? <is “1990” P3 “1990” Y <birth “University of > P2 At Leipzig” highly similar  less similar dy tu <s <s > tudy At “University this is very expensive to compute on a large Danger: “University of of Leipzig” Amsterdam” graph! (quadratic, random access) “1990” “Netherlands” ? At> <live e ar >
  • 18. Window Optimization Ye <birth Y no <k e> > ar > n <k P3 At dy <Britney Spears> s> w o ear> tu <s > connection probability ws <li k > h irt <b “University of < s t ud yAt Leipzig” > me> <firstna P1 “Laura” <is “1990” > ke <Britney Spears> i P4 <l nodes with too large similarity distance Trick: disregard Student (only connect nodes in a similarity window) s> e> “Anna” ow m <kn tn a Window <firs <knows> P5 <liveAt> “Germany” “1990” Y <birth “University of Leipzig” highly similar  less similar <s <s > tudy A “University Probability that two nodes are connectedt is “University w.r.t skewed of of similarity between the nodes (due to probability the Leipzig” Amsterdam” “1990” “Netherlands” > At y ud t At> <live e ar > distr.) P2
  • 19. Graph generation • Create new edges and nodes in one pass, starting from an existing node and start creating new nodes to which it will be connected • Uses degree distributions that follow a power law • Generates correlated and highly connected graph data by creating edges after generating many nodes – Similarity metric – Random dimension
  • 20. Implementation • MapReduce – A Map function runs on different parts of the input data, in parallel and on many clusters – Main Reduce function: • • Sort the correlation dimensions Generate edges based on sliding windows (similarity metric) – Successive MapReduce phases for different correlation dimensions • Scalable, e.g.: – Generate 1.2TB of data in 30 minutes (16 4-core nodes) – •
  • 21. Validation: Metrics • • • • • • • • • Largest Connected Component Average Clustering Coefficient Diameter Average Path Length Hop-plot User-Knows Attribute distributions Degree distributions Time evolution
  • 24. Workloads • On-Line: tests a system's throughput with relatively simple queries with concurrent updates – Show all photos posted by my friends that I was tagged in – • Business Intelligence: consists of complex structured queries for analyzing online behavior – Influential people the topic of open source development? – • Graph Analytics: tests the functionality and scalability on most of the data as a single operation – PageRank, Shortest Path(s), Community Detection •
  • 26. Interactive Query Set • Tests system throughput with relatively simple queries and concurrent updates • Current set: 12 read-only queries • For each query: – – – – – Name and detailed description in plain English List of input parameters Expected result: content and format Textual functional description Relevance: • textual description (plain English) of the reasoning for including this query in the workload • discussion about the technical challenges (Choke Points) targeted – Validation parameters and validation results – SPARQL query • • –
  • 27. Some SNB Interactive Choke Points • Graph Traversals. Query execution time heavily depends on the ability to quickly traverse friends graph. • Plan Variablility. Each query have many different best plans depending on parameter choices (eg. Hash- vs index-based joins). • Top k and distinct: Many queries return the first results in a specific order: Late projection, pushing conditions from the sort into the query • Repetitive short queries, differing only in literals, opportunity for query plan recycling
  • 29. Example: Q3 • Name: Friends within 2 hops that have been in two countries Description: Find Friends and Friends of Friends of the user A that have made a post in the foreign countries X and Y within a specified period. We count only posts that are made in the country that is different from the country of a friend. The result should be sorted descending by total number of posts, and then by person URI. Top 20 should be shown. The user A (as friend of his friend) should not be in the result Parameter: - Person - CountryX - CountryY - startDate - the beginning of the requested period - Duration - requested period in days Result: - Person.id, Person.firstname, Person.lastName - Number of post of each country and the sum of all posts Relevance: - Choke Points: CP3.3 - If one country is large but anticorrelated with the country of self then processing this before a smaller but positively correlated country can be beneficial
  • 30. Example: Q5 - SPARQL select ?group count (*) where { {select distinct ?fr where { {%Person% snvoc:knows ?fr.} union {%Person% snvoc:knows ?fr2. ?fr2 snvoc:knows ?fr. filter (?fr != %Person%)} } } . ?group snvoc:hasMember ?mem . ?mem snvoc:hasPerson ?fr . ?mem snvoc:joinDate ?date . filter (?date >= "%Date0%"^^xsd:date) . ?post snvoc:hasCreator ?fr . ?group snvoc:containerOf ?post } group by ?group order by desc(2) ?group limit 20
  • 31. Example: Q5 - Cypher MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person) WHERE person.id={person_id} MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum) WHERE membership.joinDate>{join_date} MATCH (friend)<-[:HAS_CREATOR]-(comment:Comment) WHERE (comment)-[:REPLY_OF*0..]->(:Comment)-[:REPLY_OF]->(:Post)<[:CONTAINER_OF]-(forum) RETURN forum.title AS forum, count(comment) AS commentCount ORDER BY commentCount DESC MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person) WHERE person.id={person_id} MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum) WHERE membership.joinDate>{join_date} MATCH (friend)<-[:HAS_CREATOR]-(post:Post)<-[:CONTAINER_OF]-(forum) RETURN forum.title AS forum, count(post) AS postCount ORDER BY postCount DESC
  • 32. Example: Q5 - DEX v.setLongVoid(personId); v.setLongVoid(personId); long personOID = graph.findObject(personId, v); long personOID = graph.findObject(personId, v); Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing); Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing); Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing); Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing); allFriends.union(friends); allFriends.union(friends); allFriends.remove(personOID); allFriends.remove(personOID); friends.close(); friends.close(); Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing); Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing); v.setTimestampVoid(date); v.setTimestampVoid(date); Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members); Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members); Objects finalSelection = graph.tails(candidate); Objects finalSelection = graph.tails(candidate); candidate.close(); candidate.close(); members.close(); members.close(); Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing); Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing); ObjectsIterator iterator = finalSelection.iterator(); ObjectsIterator iterator = finalSelection.iterator(); while (iterator.hasNext()) { while (iterator.hasNext()) { long oid = iterator.next(); long oid = iterator.next(); Container c = new Container(); Container c = new Container(); Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing); Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing); Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing); Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing); long moderatorOid = moderators.any(); long moderatorOid = moderators.any(); moderators.close(); moderators.close(); Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing); Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing); postsGroup.difference(postsModerator); postsGroup.difference(postsModerator); postsModerator.close(); postsModerator.close(); postsGroup.intersection(posts); postsGroup.intersection(posts); long count = postsGroup.size(); long count = postsGroup.size(); if (count > 0) { if (count > 0) { graph.getAttribute(oid, forumId, v); graph.getAttribute(oid, forumId, v); c.row[0] = db.getForumURI(v.getLong()); c.row[0] = db.getForumURI(v.getLong()); c.compare2 = String.valueOf(v.getLong()); c.compare2 = String.valueOf(v.getLong()); c.row[1] = String.valueOf(count); c.row[1] = String.valueOf(count); c.compare = count; c.compare = count; results.add(c); results.add(c); } } postsGroup.close() postsGroup.close() } }
  • 34. SNB Software • SNB data – Dataset generator – Dataset Validator – SN Graph Analyzer • SNB Query drivers – BSBM based – YCSB based
  • 35. Some Experiments • Virtuoso (RDF) – 100k users during 3 years period (3.3 billion triples, 60GB) – Ten SPARQL query mixes – 4 x Intel Xeon 2.30GHz CPU, 193 GB of RAM • DEX (Graph Database) – Validation setup: 10k users during 3 years (19GB) – Validation query set and parameters (API-based) – 2 x Intel Xeon 2.40Ghz CPU, 128 GB of RAM •
  • 36. Virtuoso Interactive Workload • Some queries could not be considered as truly interactive – e.g. Q4, Q5 and Q9 – … still all queries are very interesting challenges • ”Irregular” data distribution reflecting the reality of the SN – … but complicates the selection of query parameters • •
  • 37. Exploration in Scale • 3.3 bn RDF triples per 100K users, 24G in triples, 36G in literals • 2/3 of data in interactive working set, 1/4 in BI working set • scale out becomes increasingly necessary after 1M • 10-100M users are data center scales – as in real social networks – larger scales will favor space efficient data models, e.g. column store with a schema, but – larger scales also have greater need for schema-last features
  • 38. DEX Interactive Workload • • • • • • Query validation (no SPARQL) Identified some of implementation choke points New optimizations implemented and tested
  • 40. Ongoing Work • Finishing Interactive workload – updates (transactional) – substitution parameters • New BI and Graph Analytical Workloads • Data Generator Improvements – improve dictionaries and distributions for BI – Scale factors and dataset (SN graph) validation • Query Drivers – Parallel update generator • Auditing Rules for SNB • •
  • 41. Pointers • Code&Queries: github.com/ldbc – ldbc_socialnet_bm • ldbc_socialnet_dbgen • ldbc_socialnet_qgen • Wiki: ldbc.eu:8090/display/TUC – Background & Discussions + Detailed report: ldbc.eu:8090/download/attachments/4325436/LDBC_S • LDBC Technical User Community (TUC) meeting: – Thursday April 3, CWI Amsterdam (see wiki – next week) •

Notas del editor

  1. 2
  2. 26
  3. 27
  4. Sources for generating edges correlated with property values 32
  5. Sources for generating edges correlated with property values 33
  6. Sources for generating edges correlated with property values 34