SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
On Statistical Characteristics
of Real-life Knowledge Graphs
Weining Qian
Institute for Data Science and Engineering
East China Normal University
wnqian@sei.ecnu.edu.cn
NSFC key project on Big Data Benchmarks
• Theory and Methods of Benchmarking Big Data
Management Systems
– 2015.1 – 2019.12
2
East China
Normal University
Renmin University
of China
Institute of Computing Technology
Chinese Academy of Sciences
6/22/2016 LDBC TUC 2016
Research focuses
• BigDataBench Suite (@ict.ac)
– http://prof.ict.ac.cn
• Benchmarking transaction processing in
NewSQL systems (@ecnu)
– SecKillBench
• Benchmarking Big Data systems (@rmu)
• Benchmarks for graph data (@ecnu)
– Social media, knowledge graphs, ...
36/22/2016 LDBC TUC 2016
Knowledge graphs
4
knowledge
basesemantic
web
knowledge
graph
linked data
6/22/2016 LDBC TUC 2016
Some KG‘s
5
• YAGO
– 10M entities in 350K classes
– 120M facts for 100 relations
– 100 languages
– 95% accuracy
• DBPedia
– 4M entities in 250 classes
– 500M facts for 6000 properties
– live updates
• Kosmix
– 6.5M concepts, 6.7M concept
instances,
– 165M relationship instances
• Freebase
– 40M entities in 15000 topics
– 1B facts for 4000 properties
– core of Google Knowledge
Graph
• Google Knowledge Graph
– 600M entities in 15000 topics
– 20B facts
• Probase
– 2.7 million+ concepts
• And many domain/application-
specific knowledge graphs
6/22/2016 LDBC TUC 2016
A natural question
• Knowledge graph can serve as the backbone
of many Web-scale applications, such as
search engine, question answering, text
understanding etc.
• How to effectively and efficiently manage a
large-scale knowledge graph?
– MySQL, Oracle, Neo4j, TITAN, Trinity, or other
triple stores???
66/22/2016 LDBC TUC 2016
Social networks vs. Knowledge graphs
• Though there are some benchmarks for social
networks exist
– Facebook LinkBench, LDBC SNB, BSMA, ...
• Knowledge graph is different with social network
– More semantic labels in both entities and relations
– Topic or domain sensitive
– Contains various kinds of knowledge
– Hard to define a unified schema
76/22/2016 LDBC TUC 2016
Why study their statistical characteristics?
• To better understand knowledge graphs
• To help the selection of seeding data sets in
benchmarks
• To help the development of data generators
86/22/2016 LDBC TUC 2016
Characteristics of large-scale graphs
• Previous research works on analyzing structural
properties of large scale graphs, e.g.
– [Broder et al. Comput. Netw. 2000] studied the web
structure as a graph via a series of metrics, e.g degree,
diameter, component.
– [Kumar et al. KDD, 2006] studied the dynamic social
network’s structure properties, e.g. degree, hop etc.
– [Boccaletti et al. Phys. Rep. 2006] surveyed the studies of
the structure and dynamics of complex network.
96/22/2016 LDBC TUC 2016
Real-life knowledge graphs
• YAGO2
– A huge semantic knowledge graph based on WordNet,
Wikipedia and GeoNames
– 10+ million entities, 120+ million facts
• Separate the YAGO2 into three sub-graphs
– YagoTax: Taxonomy tree of YAGO2
– YagoFact: Facts in YAGO2
– YagoWiki: Hyperlink relations in YAGO2 based on
Wikipedia
106/22/2016 LDBC TUC 2016
Real-life knowledge graphs
• WordNet
– A lexical network for the English language.
– Synonym set as node and semantic relation as edge.
– 98,000 entities, 154,000 relationships
• DBpedia
– A multi-language knowledge base extracted from
Wikipedia info-boxes
– English version of DBPedia
– 4.58 million things and 2,795 different kinds of properties
116/22/2016 LDBC TUC 2016
Real-life knowledge graphs
• Enterprise Knowledge Graph (EKG)
– Describes an enterprise relationships in Chinese
– Extracted from reports from enterprises in Shanghai Stock
Market
– Used for credit and risk analysis in financial companies
– A domain specific knowledge graph
– Seven kinds of relationships between two entities
• Assignment, hold, subcompany, changename, manager,
cooperate, merge
– 51,853 entities and 430,973 relationships.
126/22/2016 LDBC TUC 2016
Plus two social networks
• SNRand
– 0.2 million randomly selected users
– 5 million fellowship relations between users
• SNRank
– 0.2 million most active users.
– 36+ million fellowship relations between users
• The raw data is collected from a famous social
media platform named Sina Weibo in China
136/22/2016 LDBC TUC 2016
Statistical characteristics
146/22/2016 LDBC TUC 2016
Four kinds of distributions
• Distribution of degrees
– In-degree and out-degree
– Power-law distribution
• Distribution of hops
– Reflects the connectivity cost inside a graph
• Distribution of connected components
– Strongly and weakly connected components
– Reflects the connectivity of a graph
• Distribution of clustering coefficients
– Measures the nodes’ tendency to cluster together
156/22/2016 LDBC TUC 2016
In-degrees and out-degrees
16
All the in/out-degree distributions exhibit the power-law (or piece-wise power-
law), except for some initial segments that deviate the power-law.
6/22/2016 LDBC TUC 2016
Size of connected components
17
Both the strongly and weakly connected component distributions of knowledge
graphs exhibit the power-law distribution in general. While the social networks
are nearly in a whole strongly connected component.
6/22/2016 LDBC TUC 2016
Distance between two vertices
• Social networks are
“small worlds”
• Taxonomy’s
diameter is large
(tree-alike)
• Basically, the larger
the network is, the
smaller the diameter
is
186/22/2016 LDBC TUC 2016
196/22/2016 LDBC TUC 2016
Cluster coefficient
• Taxonomy and social
networks are
different
• All other KG’s are of
power-law
distributions
206/22/2016 LDBC TUC 2016
Cluster coefficient
• Taxonomy and social
networks are
different
• All other KG’s are of
power-law
distributions
216/22/2016 LDBC TUC 2016
Node degrees of different parts in EKG
22
Different relationships show different out-degree distributions
6/22/2016 LDBC TUC 2016
Size of connected components of
different parts in EKG
23
Few connected components are much larger than others
6/22/2016 LDBC TUC 2016
Different parts in EKG
24
Highly clustered Small-world network
6/22/2016 LDBC TUC 2016
Conclusions and discussions
25
• KG’s are different to SN’s
– Taxonomy/ontology + fact
• Following different statistical distributions
– KG’s are labeled
• Different subgraphs are of different sizes and characteristics
• Both triple stores and relational databases have reasons to be
used
– The key is to avoid joins over power-law distributed data
Wenliang Cheng, Chengyu Wang, Bing Xiao, Weining Qian, Aoying Zhou:
On
Statistical Characteristics of Real-Life Knowledge Graphs. BPOE 2015: 37-49
6/22/2016 LDBC TUC 2016
谢谢!
Thanks!
http://dase.ecnu.edu.cn
wnqian@sei.ecnu.edu.cn
Statistical characteristics
6/22/2016 LDBC TUC 2016 27

Más contenido relacionado

Destacado

Informe de composición de drogas ilíicitas en Euskadi 2015
Informe de composición de drogas ilíicitas en Euskadi 2015Informe de composición de drogas ilíicitas en Euskadi 2015
Informe de composición de drogas ilíicitas en Euskadi 2015Ai Laket!! elkartea
 
Plc scada by vishal kumar from niec delhi
Plc scada by vishal kumar from niec delhiPlc scada by vishal kumar from niec delhi
Plc scada by vishal kumar from niec delhi7532993375
 
Challenges faced by women entrepreneurs in the present technological era
Challenges faced by women entrepreneurs in the present technological eraChallenges faced by women entrepreneurs in the present technological era
Challenges faced by women entrepreneurs in the present technological era7411010287
 
Termòmetre Lingüístic: Un eina de recerca per la pràctica docent
Termòmetre Lingüístic: Un eina de recerca  per la pràctica docent Termòmetre Lingüístic: Un eina de recerca  per la pràctica docent
Termòmetre Lingüístic: Un eina de recerca per la pràctica docent Neus Lorenzo
 

Destacado (7)

Close to Home Cruises
Close to Home Cruises Close to Home Cruises
Close to Home Cruises
 
Informe de composición de drogas ilíicitas en Euskadi 2015
Informe de composición de drogas ilíicitas en Euskadi 2015Informe de composición de drogas ilíicitas en Euskadi 2015
Informe de composición de drogas ilíicitas en Euskadi 2015
 
Plc scada by vishal kumar from niec delhi
Plc scada by vishal kumar from niec delhiPlc scada by vishal kumar from niec delhi
Plc scada by vishal kumar from niec delhi
 
Stili comunicativi
Stili comunicativiStili comunicativi
Stili comunicativi
 
Challenges faced by women entrepreneurs in the present technological era
Challenges faced by women entrepreneurs in the present technological eraChallenges faced by women entrepreneurs in the present technological era
Challenges faced by women entrepreneurs in the present technological era
 
Termòmetre Lingüístic: Un eina de recerca per la pràctica docent
Termòmetre Lingüístic: Un eina de recerca  per la pràctica docent Termòmetre Lingüístic: Un eina de recerca  per la pràctica docent
Termòmetre Lingüístic: Un eina de recerca per la pràctica docent
 
Sidetur
SideturSidetur
Sidetur
 

Similar a Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Graphs.

Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET Journal
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open DataBlerina Spahiu
 
Rubrics for DMPs
Rubrics for DMPsRubrics for DMPs
Rubrics for DMPsJisc RDM
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Anna Fensel
 
An Open Spatial Systems Framework for Place-Based Decision-Making
An Open Spatial Systems Framework for Place-Based Decision-MakingAn Open Spatial Systems Framework for Place-Based Decision-Making
An Open Spatial Systems Framework for Place-Based Decision-MakingRaed Mansour
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningIIRindia
 
Provenance and social network analysis for recommender systems: a literature...
Provenance and social network analysis for recommender  systems: a literature...Provenance and social network analysis for recommender  systems: a literature...
Provenance and social network analysis for recommender systems: a literature...IJECEIAES
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017Bethan Ruddock
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphBesnik Fetahu
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...Adel Sabour
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POIIRJET Journal
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Citadelh2020
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Gayane Sedrakyan
 
Overview of XSEDE Systems Engineering
Overview of XSEDE Systems EngineeringOverview of XSEDE Systems Engineering
Overview of XSEDE Systems EngineeringJohn Towns
 

Similar a Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Graphs. (20)

Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Rubrics for DMPs
Rubrics for DMPsRubrics for DMPs
Rubrics for DMPs
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
ONS local presents clustering
ONS local presents clusteringONS local presents clustering
ONS local presents clustering
 
An Open Spatial Systems Framework for Place-Based Decision-Making
An Open Spatial Systems Framework for Place-Based Decision-MakingAn Open Spatial Systems Framework for Place-Based Decision-Making
An Open Spatial Systems Framework for Place-Based Decision-Making
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
 
Provenance and social network analysis for recommender systems: a literature...
Provenance and social network analysis for recommender  systems: a literature...Provenance and social network analysis for recommender  systems: a literature...
Provenance and social network analysis for recommender systems: a literature...
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017
 
Towards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data GraphTowards Integration of Web Data into a coherent Educational Data Graph
Towards Integration of Web Data into a coherent Educational Data Graph
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
UNIT_1-BD.pptx
UNIT_1-BD.pptxUNIT_1-BD.pptx
UNIT_1-BD.pptx
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
Overview of XSEDE Systems Engineering
Overview of XSEDE Systems EngineeringOverview of XSEDE Systems Engineering
Overview of XSEDE Systems Engineering
 

Más de LDBC council

8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...LDBC council
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics EngineLDBC council
 
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...LDBC council
 
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network BenchmarkLDBC council
 
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...LDBC council
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...LDBC council
 
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...LDBC council
 
8th TUC Meeting -
8th TUC Meeting - 8th TUC Meeting -
8th TUC Meeting - LDBC council
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...LDBC council
 
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...LDBC council
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
LDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC council
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXLDBC council
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionLDBC council
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...LDBC council
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC council
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadLDBC council
 

Más de LDBC council (20)

8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
 
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
 
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
 
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
 
8th TUC Meeting -
8th TUC Meeting - 8th TUC Meeting -
8th TUC Meeting -
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
 
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
LDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusions
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 

Último

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Graphs.

  • 1. On Statistical Characteristics of Real-life Knowledge Graphs Weining Qian Institute for Data Science and Engineering East China Normal University wnqian@sei.ecnu.edu.cn
  • 2. NSFC key project on Big Data Benchmarks • Theory and Methods of Benchmarking Big Data Management Systems – 2015.1 – 2019.12 2 East China Normal University Renmin University of China Institute of Computing Technology Chinese Academy of Sciences 6/22/2016 LDBC TUC 2016
  • 3. Research focuses • BigDataBench Suite (@ict.ac) – http://prof.ict.ac.cn • Benchmarking transaction processing in NewSQL systems (@ecnu) – SecKillBench • Benchmarking Big Data systems (@rmu) • Benchmarks for graph data (@ecnu) – Social media, knowledge graphs, ... 36/22/2016 LDBC TUC 2016
  • 5. Some KG‘s 5 • YAGO – 10M entities in 350K classes – 120M facts for 100 relations – 100 languages – 95% accuracy • DBPedia – 4M entities in 250 classes – 500M facts for 6000 properties – live updates • Kosmix – 6.5M concepts, 6.7M concept instances, – 165M relationship instances • Freebase – 40M entities in 15000 topics – 1B facts for 4000 properties – core of Google Knowledge Graph • Google Knowledge Graph – 600M entities in 15000 topics – 20B facts • Probase – 2.7 million+ concepts • And many domain/application- specific knowledge graphs 6/22/2016 LDBC TUC 2016
  • 6. A natural question • Knowledge graph can serve as the backbone of many Web-scale applications, such as search engine, question answering, text understanding etc. • How to effectively and efficiently manage a large-scale knowledge graph? – MySQL, Oracle, Neo4j, TITAN, Trinity, or other triple stores??? 66/22/2016 LDBC TUC 2016
  • 7. Social networks vs. Knowledge graphs • Though there are some benchmarks for social networks exist – Facebook LinkBench, LDBC SNB, BSMA, ... • Knowledge graph is different with social network – More semantic labels in both entities and relations – Topic or domain sensitive – Contains various kinds of knowledge – Hard to define a unified schema 76/22/2016 LDBC TUC 2016
  • 8. Why study their statistical characteristics? • To better understand knowledge graphs • To help the selection of seeding data sets in benchmarks • To help the development of data generators 86/22/2016 LDBC TUC 2016
  • 9. Characteristics of large-scale graphs • Previous research works on analyzing structural properties of large scale graphs, e.g. – [Broder et al. Comput. Netw. 2000] studied the web structure as a graph via a series of metrics, e.g degree, diameter, component. – [Kumar et al. KDD, 2006] studied the dynamic social network’s structure properties, e.g. degree, hop etc. – [Boccaletti et al. Phys. Rep. 2006] surveyed the studies of the structure and dynamics of complex network. 96/22/2016 LDBC TUC 2016
  • 10. Real-life knowledge graphs • YAGO2 – A huge semantic knowledge graph based on WordNet, Wikipedia and GeoNames – 10+ million entities, 120+ million facts • Separate the YAGO2 into three sub-graphs – YagoTax: Taxonomy tree of YAGO2 – YagoFact: Facts in YAGO2 – YagoWiki: Hyperlink relations in YAGO2 based on Wikipedia 106/22/2016 LDBC TUC 2016
  • 11. Real-life knowledge graphs • WordNet – A lexical network for the English language. – Synonym set as node and semantic relation as edge. – 98,000 entities, 154,000 relationships • DBpedia – A multi-language knowledge base extracted from Wikipedia info-boxes – English version of DBPedia – 4.58 million things and 2,795 different kinds of properties 116/22/2016 LDBC TUC 2016
  • 12. Real-life knowledge graphs • Enterprise Knowledge Graph (EKG) – Describes an enterprise relationships in Chinese – Extracted from reports from enterprises in Shanghai Stock Market – Used for credit and risk analysis in financial companies – A domain specific knowledge graph – Seven kinds of relationships between two entities • Assignment, hold, subcompany, changename, manager, cooperate, merge – 51,853 entities and 430,973 relationships. 126/22/2016 LDBC TUC 2016
  • 13. Plus two social networks • SNRand – 0.2 million randomly selected users – 5 million fellowship relations between users • SNRank – 0.2 million most active users. – 36+ million fellowship relations between users • The raw data is collected from a famous social media platform named Sina Weibo in China 136/22/2016 LDBC TUC 2016
  • 15. Four kinds of distributions • Distribution of degrees – In-degree and out-degree – Power-law distribution • Distribution of hops – Reflects the connectivity cost inside a graph • Distribution of connected components – Strongly and weakly connected components – Reflects the connectivity of a graph • Distribution of clustering coefficients – Measures the nodes’ tendency to cluster together 156/22/2016 LDBC TUC 2016
  • 16. In-degrees and out-degrees 16 All the in/out-degree distributions exhibit the power-law (or piece-wise power- law), except for some initial segments that deviate the power-law. 6/22/2016 LDBC TUC 2016
  • 17. Size of connected components 17 Both the strongly and weakly connected component distributions of knowledge graphs exhibit the power-law distribution in general. While the social networks are nearly in a whole strongly connected component. 6/22/2016 LDBC TUC 2016
  • 18. Distance between two vertices • Social networks are “small worlds” • Taxonomy’s diameter is large (tree-alike) • Basically, the larger the network is, the smaller the diameter is 186/22/2016 LDBC TUC 2016
  • 20. Cluster coefficient • Taxonomy and social networks are different • All other KG’s are of power-law distributions 206/22/2016 LDBC TUC 2016
  • 21. Cluster coefficient • Taxonomy and social networks are different • All other KG’s are of power-law distributions 216/22/2016 LDBC TUC 2016
  • 22. Node degrees of different parts in EKG 22 Different relationships show different out-degree distributions 6/22/2016 LDBC TUC 2016
  • 23. Size of connected components of different parts in EKG 23 Few connected components are much larger than others 6/22/2016 LDBC TUC 2016
  • 24. Different parts in EKG 24 Highly clustered Small-world network 6/22/2016 LDBC TUC 2016
  • 25. Conclusions and discussions 25 • KG’s are different to SN’s – Taxonomy/ontology + fact • Following different statistical distributions – KG’s are labeled • Different subgraphs are of different sizes and characteristics • Both triple stores and relational databases have reasons to be used – The key is to avoid joins over power-law distributed data Wenliang Cheng, Chengyu Wang, Bing Xiao, Weining Qian, Aoying Zhou:
On Statistical Characteristics of Real-Life Knowledge Graphs. BPOE 2015: 37-49 6/22/2016 LDBC TUC 2016