SlideShare una empresa de Scribd logo
1 de 38
hosted by
HBaseConAsia2018
JanusGraph —
Distributed graph database with HBase
XueMin Zhang @ TalkingData
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
About us
• Seven years of practical experience in technical research and development(R&D),focusing
on distributed storage, distributed computing, real-time computing, etc.
• Successively worked in Sina Weibo and TalkingData, and served as the big data Team
Leader of Sina r&d center.
• Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference
and DTCC.
About me
• Founded in 2011, TalkingData is China’s leading third-party big data platform. With
SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers
enterprises and helps them achieve a data-driven digital transformation.
• From the beginning, TalkingData’s vision of using “big data for smarter business
decisions and a better world” has allowed it to gradually become China’s leading data
intelligence solution provider. TalkingData creates value for clients and serves as their
“performance partner,” helping modern enterprises achieve data-driven transformation
and accelerating the digitization of clients from various industries. Using data-generated
insights to change how people see the world and themselves, TalkingData hopes to
ultimately improve people’s lives.
About TalkingData
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Something about Graph
What is a Graph Database
 As name suggests, it is a database.
 Uses graph structures for semantic queries with nodes, edges
and properties to represent and store data.
 Allow data in the store to be linked together directly.
 compare with traditional relational databases
 Hybrid relations.
 Handy in finding connections between entities.
hosted by
Something about Graph
Graph Structures - Vertices
 Vertices are the nodes or
points in a graph
structure
 Every vertex may contain
a unique ID.
hosted by
Something about Graph
Graph Structures - Vertices
 Vertices are the nodes or
points in a graph
structure
 Every vertex may contain
a unique ID.
 Vertices can be
associated with a set of
properties (key-value
pairs)
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the
connections between the
vertices in a graph
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the connections
between the vertices in a
graph
 Edges can be
nondirectional, directional,
or bidirectional
hosted by
Something about Graph
Graph Structures - Edges
 Edges are the connections
between the vertices in a
graph
 Edges can be
nondirectional,
directional, or
bidirectional
 Edges like vertices can
have properties and id
hosted by
Something about Graph
Graph Structures - Graph
 G = (V, E)
 The graph is the
collection of vertices,
edges, and associated
properties
 Vertices and edges can
use label classification
hosted by
Something about Graph
Graph Storage Model - Adjacency Matrix
0 1 1 1 0 0
1 0 0 0 0 0
1 0 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 0
0 0 1 0 0 0
1 2 3 4 5 6G.vertices =
G.edges = 1
2
3
4
5
6
1 2 3 4 5 6
hosted by
Something about Graph
Graph Storage Model - Adjacency Lists
1
2
3
4
5
6
2 3 4 Λ
1 Λ
1 4 6 Λ
1 3 5 Λ
4 Λ
3 Λ
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
Introduction to JanusGraph
 Scalable graph database distribute on multi-maching clusters with
pluggable storage and indexing.
 Fully compliant with Apache TinkerPop graph computing framework.
 Optimized for storing/querying billions of vertices and edges.
 Supports thousands of concurrent users.
 Can execute local queries (OLTP) or cross-cluster distributed queries
(OLAP).
 Sponsored by the Linux Foundation.
 Apache License 2.0
hosted by
Introduction to JanusGraph
Architecture
hosted by
Introduction to JanusGraph
Apache Tinkerpop & Gremlin
 A graph computing
framework for both graph
databases (OLTP) and
graph analytic systems
(OLAP)
 Gremlin graph traversal
language
hosted by
Introduction to JanusGraph
Schema and Data Modeling
 Consist of edge labels, property keys, vertex labels ,index
 Explicit or Implicit
 Can evolve over time without database downtime
hosted by
Introduction to JanusGraph
Schema - Edge Label Multiplicity
 MULTI: Multiple edges of the same label between vertices
 SIMPLE: One edge with that label (unique per label)
 MANY2ONE: One outgoing edge with that label
 ONE2MANY: One incoming edge with that label
 ONE2ONE: One incoming, one outgoing edge with that label
hosted by
Introduction to JanusGraph
Schema - Property Key Data Types
hosted by
Introduction to JanusGraph
Schema - Property Key Cardinality
 SINGLE: At most one value per element.
 LIST: Arbitrary number of values per element. Allows duplicates.
 SET: Multiple values, but no duplicates.
hosted by
Introduction to JanusGraph
Storage Model
hosted by
Introduction to JanusGraph
What is Graph Partitioning?
 When the JanusGraph cluster consists of multiple storage backend
instances, the graph must be partitioned across those machines.
 Stores graph in an adjacency list , ssignment of vertices to machines
determines the partitioning.
 Different ways to partition a graph
 Random Graph Partitioning
 Explicit Graph Partitioning
hosted by
Introduction to JanusGraph
Random Graph Partitioning
 Pros
 Very efficient
 Requires no configuration
 Results in balanced partitions
 Cons
 Less efficient query processing as the cluster grows
 Requires more cross-instance communication to retrieve the
desired
hosted by
Introduction to JanusGraph
Explicit Graph Partitioning
 Pros
 Ensures strongly connected subgraphs are stored on the same
instance
 Reduces the communication overhead significantly
 Easy to setup
 Cons
 Only enabled against storage backends that support ordered key
 Hotspot issue
hosted by
Introduction to JanusGraph
Edge Cut & Vertex Cut
 Edge Cut
 Vertices are hosted on separate machines.
 Optimization aims to reduce the cross communication and thereby
improve query execution.
 Vertex Cut (by label)
 A vertex label can be defined as partitioned which means that all
vertices of that label will be partitiond across the cluster.
 In other words, Storing a subset of that vertex’s adjacency list on each
partition .
 Address the hotspot issue caused by vertices with a large number of
incident edges.
hosted by
Introduction to JanusGraph
What is Graph Index?
 graph indexes : efficient retrieval of vertices or edges by their
properties
 Composite Index (supported through the primary storage
backend)
 Mixed Index (supported through external indexing backend)
 vertex-centric indexes : effectively address query performance for
large degree vertices
hosted by
Content
01
02
04
03
About Us
Something about Graph
Introduction to JanusGraph
JanusGraph with HBase
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 Tight integration with the Apache Hadoop ecosystem.
 Native support for strong consistency.
 Linear scalability with the addition of more machines.
 Scalability and partitioning
 Read and write speed
 Big enough for your biggest graph
 Support for exporting metrics via JMX.
 Great open community
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 Simple configuration
 storage.backend=hbase
 storage.hostname=zk-host1,zk-host2,zk-host3
 storage.hbase.table=janusgraph
 storage.port=2181
 storage.hbase.ext.zookeeper.znode.parent=/hbase
hosted by
JanusGraph with HBase
HBase – Perfect Storage Backend for JanusGraph
 A variety of reading and writing way
 Batch to mutate
 Get or Multi Get
 Key range scan
 ColumnRangeFilter
 ColumnPaginationFilter
hosted by
JanusGraph with HBase
HBase Storage Model - Column Families
CF attributes can be set. E.g. compression, TTL.
 Edge store -> e
 Index store -> g
 Id store -> i
 Transaction log store -> l
 System property store -> s
hosted by
JanusGraph with HBase
HBase Storage Model - Edge store -> e
 Storage vertex label, edge, property data
 RowKey -> Vertex ID
• Count
• ID padding
• Partition ID
 Vertex label save as edge
 Vertex property and edge save as relation
• Relation ID( Property key id / Edge label id + direction )
hosted by
JanusGraph with HBase
HBase Storage Model - Edge store -> e
hosted by
JanusGraph with HBase
HBase Storage Model - Index store -> g
 Storage graph indexes (Composite Index) data
 Rowkey -> property values
 Cell value->
• relationId
• outVertexId
• typeId
• inVertexId
hosted by
JanusGraph with HBase
Optimization Suggestions
 hbase.regionserver.thread.compaction.large/small
 hbase.hstore.flusher.count
 hbase.hregion.memstore.flush.sizeh
 base.hregion.memstore.block.multiplier
 hbase.hregion.percolumnfamilyflush.size.lower.bound
 hbase.regionserver.global.memstore.size
 hfile.block.cache.size
 hbase.regionserver.global.memstore.size.lower.limit
(hbase.regionserver.global.memstore.lowerLimit)
 Random vs. Explicit Partitioning
hosted by
Thanks

Más contenido relacionado

La actualidad más candente

dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Network Observability: Delivering Actionable Insights to Network Operations
Network Observability: Delivering Actionable Insights to Network OperationsNetwork Observability: Delivering Actionable Insights to Network Operations
Network Observability: Delivering Actionable Insights to Network OperationsEnterprise Management Associates
 
Hadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networksHadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networksHariniA7
 
IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介
IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介
IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介Trainocate Japan, Ltd.
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
Performance evolution of raid
Performance evolution of raidPerformance evolution of raid
Performance evolution of raidZubair Sami
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesSudarshan Dhondaley
 
Introduction of VLAN and VSAN with its benefits,
Introduction of VLAN and VSAN with its benefits,Introduction of VLAN and VSAN with its benefits,
Introduction of VLAN and VSAN with its benefits,Dr Neelesh Jain
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Software Define Networking (SDN)
Software Define Networking (SDN)Software Define Networking (SDN)
Software Define Networking (SDN)Pradeep Kumar TS
 
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDoKC
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning Arno Huetter
 
Design Issues and Challenges in Wireless Sensor Networks
Design Issues and Challenges in Wireless Sensor NetworksDesign Issues and Challenges in Wireless Sensor Networks
Design Issues and Challenges in Wireless Sensor NetworksKhushbooGupta145
 
Cross layer design and optimization
Cross layer design and optimizationCross layer design and optimization
Cross layer design and optimizationDANISHAMIN950
 
Dark Side of AI in Healthcare
Dark Side of AI in HealthcareDark Side of AI in Healthcare
Dark Side of AI in HealthcareYuliia Sereda
 

La actualidad más candente (20)

Microsoft azure
Microsoft azureMicrosoft azure
Microsoft azure
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Network Observability: Delivering Actionable Insights to Network Operations
Network Observability: Delivering Actionable Insights to Network OperationsNetwork Observability: Delivering Actionable Insights to Network Operations
Network Observability: Delivering Actionable Insights to Network Operations
 
Hadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networksHadoop, mapreduce and yarn networks
Hadoop, mapreduce and yarn networks
 
IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介
IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介
IoTデバイスを脅威から守るセキュリティ機能-RXセキュリティMCUのご紹介
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
Performance evolution of raid
Performance evolution of raidPerformance evolution of raid
Performance evolution of raid
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
 
Introduction of VLAN and VSAN with its benefits,
Introduction of VLAN and VSAN with its benefits,Introduction of VLAN and VSAN with its benefits,
Introduction of VLAN and VSAN with its benefits,
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Software Define Networking (SDN)
Software Define Networking (SDN)Software Define Networking (SDN)
Software Define Networking (SDN)
 
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo WorkflowsDok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Design Issues and Challenges in Wireless Sensor Networks
Design Issues and Challenges in Wireless Sensor NetworksDesign Issues and Challenges in Wireless Sensor Networks
Design Issues and Challenges in Wireless Sensor Networks
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Cross layer design and optimization
Cross layer design and optimizationCross layer design and optimization
Cross layer design and optimization
 
Dark Side of AI in Healthcare
Dark Side of AI in HealthcareDark Side of AI in Healthcare
Dark Side of AI in Healthcare
 

Similar a HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeArangoDB Database
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelMax Neunhöffer
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Martin Junghanns
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil AmbagadeSigmoid
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphDataWorks Summit
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAsLuis Marques
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari
 

Similar a HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase (20)

Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 

Más de Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on CloudMichael Stack
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at PinterestMichael Stack
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., LtdMichael Stack
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at DidiMichael Stack
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...Michael Stack
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at TencentMichael Stack
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index ComponentMichael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 

Más de Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloudhbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltdhbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
 
hbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didihbaseconasia2019 HBase at Didi
hbaseconasia2019 HBase at Didi
 
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
 
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencenthbaseconasia2019 HBase at Tencent
hbaseconasia2019 HBase at Tencent
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 

Último

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 

Último (11)

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 

HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase

  • 1. hosted by HBaseConAsia2018 JanusGraph — Distributed graph database with HBase XueMin Zhang @ TalkingData
  • 2. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 3. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 4. hosted by About us • Seven years of practical experience in technical research and development(R&D),focusing on distributed storage, distributed computing, real-time computing, etc. • Successively worked in Sina Weibo and TalkingData, and served as the big data Team Leader of Sina r&d center. • Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference and DTCC. About me • Founded in 2011, TalkingData is China’s leading third-party big data platform. With SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers enterprises and helps them achieve a data-driven digital transformation. • From the beginning, TalkingData’s vision of using “big data for smarter business decisions and a better world” has allowed it to gradually become China’s leading data intelligence solution provider. TalkingData creates value for clients and serves as their “performance partner,” helping modern enterprises achieve data-driven transformation and accelerating the digitization of clients from various industries. Using data-generated insights to change how people see the world and themselves, TalkingData hopes to ultimately improve people’s lives. About TalkingData
  • 5. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 6. hosted by Something about Graph What is a Graph Database  As name suggests, it is a database.  Uses graph structures for semantic queries with nodes, edges and properties to represent and store data.  Allow data in the store to be linked together directly.  compare with traditional relational databases  Hybrid relations.  Handy in finding connections between entities.
  • 7. hosted by Something about Graph Graph Structures - Vertices  Vertices are the nodes or points in a graph structure  Every vertex may contain a unique ID.
  • 8. hosted by Something about Graph Graph Structures - Vertices  Vertices are the nodes or points in a graph structure  Every vertex may contain a unique ID.  Vertices can be associated with a set of properties (key-value pairs)
  • 9. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph
  • 10. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph  Edges can be nondirectional, directional, or bidirectional
  • 11. hosted by Something about Graph Graph Structures - Edges  Edges are the connections between the vertices in a graph  Edges can be nondirectional, directional, or bidirectional  Edges like vertices can have properties and id
  • 12. hosted by Something about Graph Graph Structures - Graph  G = (V, E)  The graph is the collection of vertices, edges, and associated properties  Vertices and edges can use label classification
  • 13. hosted by Something about Graph Graph Storage Model - Adjacency Matrix 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 2 3 4 5 6G.vertices = G.edges = 1 2 3 4 5 6 1 2 3 4 5 6
  • 14. hosted by Something about Graph Graph Storage Model - Adjacency Lists 1 2 3 4 5 6 2 3 4 Λ 1 Λ 1 4 6 Λ 1 3 5 Λ 4 Λ 3 Λ
  • 15. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 16. hosted by Introduction to JanusGraph  Scalable graph database distribute on multi-maching clusters with pluggable storage and indexing.  Fully compliant with Apache TinkerPop graph computing framework.  Optimized for storing/querying billions of vertices and edges.  Supports thousands of concurrent users.  Can execute local queries (OLTP) or cross-cluster distributed queries (OLAP).  Sponsored by the Linux Foundation.  Apache License 2.0
  • 17. hosted by Introduction to JanusGraph Architecture
  • 18. hosted by Introduction to JanusGraph Apache Tinkerpop & Gremlin  A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)  Gremlin graph traversal language
  • 19. hosted by Introduction to JanusGraph Schema and Data Modeling  Consist of edge labels, property keys, vertex labels ,index  Explicit or Implicit  Can evolve over time without database downtime
  • 20. hosted by Introduction to JanusGraph Schema - Edge Label Multiplicity  MULTI: Multiple edges of the same label between vertices  SIMPLE: One edge with that label (unique per label)  MANY2ONE: One outgoing edge with that label  ONE2MANY: One incoming edge with that label  ONE2ONE: One incoming, one outgoing edge with that label
  • 21. hosted by Introduction to JanusGraph Schema - Property Key Data Types
  • 22. hosted by Introduction to JanusGraph Schema - Property Key Cardinality  SINGLE: At most one value per element.  LIST: Arbitrary number of values per element. Allows duplicates.  SET: Multiple values, but no duplicates.
  • 23. hosted by Introduction to JanusGraph Storage Model
  • 24. hosted by Introduction to JanusGraph What is Graph Partitioning?  When the JanusGraph cluster consists of multiple storage backend instances, the graph must be partitioned across those machines.  Stores graph in an adjacency list , ssignment of vertices to machines determines the partitioning.  Different ways to partition a graph  Random Graph Partitioning  Explicit Graph Partitioning
  • 25. hosted by Introduction to JanusGraph Random Graph Partitioning  Pros  Very efficient  Requires no configuration  Results in balanced partitions  Cons  Less efficient query processing as the cluster grows  Requires more cross-instance communication to retrieve the desired
  • 26. hosted by Introduction to JanusGraph Explicit Graph Partitioning  Pros  Ensures strongly connected subgraphs are stored on the same instance  Reduces the communication overhead significantly  Easy to setup  Cons  Only enabled against storage backends that support ordered key  Hotspot issue
  • 27. hosted by Introduction to JanusGraph Edge Cut & Vertex Cut  Edge Cut  Vertices are hosted on separate machines.  Optimization aims to reduce the cross communication and thereby improve query execution.  Vertex Cut (by label)  A vertex label can be defined as partitioned which means that all vertices of that label will be partitiond across the cluster.  In other words, Storing a subset of that vertex’s adjacency list on each partition .  Address the hotspot issue caused by vertices with a large number of incident edges.
  • 28. hosted by Introduction to JanusGraph What is Graph Index?  graph indexes : efficient retrieval of vertices or edges by their properties  Composite Index (supported through the primary storage backend)  Mixed Index (supported through external indexing backend)  vertex-centric indexes : effectively address query performance for large degree vertices
  • 29. hosted by Content 01 02 04 03 About Us Something about Graph Introduction to JanusGraph JanusGraph with HBase
  • 30. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  Tight integration with the Apache Hadoop ecosystem.  Native support for strong consistency.  Linear scalability with the addition of more machines.  Scalability and partitioning  Read and write speed  Big enough for your biggest graph  Support for exporting metrics via JMX.  Great open community
  • 31. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  Simple configuration  storage.backend=hbase  storage.hostname=zk-host1,zk-host2,zk-host3  storage.hbase.table=janusgraph  storage.port=2181  storage.hbase.ext.zookeeper.znode.parent=/hbase
  • 32. hosted by JanusGraph with HBase HBase – Perfect Storage Backend for JanusGraph  A variety of reading and writing way  Batch to mutate  Get or Multi Get  Key range scan  ColumnRangeFilter  ColumnPaginationFilter
  • 33. hosted by JanusGraph with HBase HBase Storage Model - Column Families CF attributes can be set. E.g. compression, TTL.  Edge store -> e  Index store -> g  Id store -> i  Transaction log store -> l  System property store -> s
  • 34. hosted by JanusGraph with HBase HBase Storage Model - Edge store -> e  Storage vertex label, edge, property data  RowKey -> Vertex ID • Count • ID padding • Partition ID  Vertex label save as edge  Vertex property and edge save as relation • Relation ID( Property key id / Edge label id + direction )
  • 35. hosted by JanusGraph with HBase HBase Storage Model - Edge store -> e
  • 36. hosted by JanusGraph with HBase HBase Storage Model - Index store -> g  Storage graph indexes (Composite Index) data  Rowkey -> property values  Cell value-> • relationId • outVertexId • typeId • inVertexId
  • 37. hosted by JanusGraph with HBase Optimization Suggestions  hbase.regionserver.thread.compaction.large/small  hbase.hstore.flusher.count  hbase.hregion.memstore.flush.sizeh  base.hregion.memstore.block.multiplier  hbase.hregion.percolumnfamilyflush.size.lower.bound  hbase.regionserver.global.memstore.size  hfile.block.cache.size  hbase.regionserver.global.memstore.size.lower.limit (hbase.regionserver.global.memstore.lowerLimit)  Random vs. Explicit Partitioning