The Perfect Fit: Scalable Graph for Big Data

Grab some
coffee and
enjoy the
pre-show
banter
before the
top of the
hour!

The Briefing Room
The Perfect Fit: Scalable Graph for Big Data

Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh

  Reveal the essential characteristics of enterprise
software, good and bad
  Provide a forum for detailed analysis of today s innovative
technologies
  Give vendors a chance to explain their product to savvy
analysts
  Allow audience members to pose serious questions... and
get answers!
Mission

Topics
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA

When You’re Hot…
Ø  Biggest Web engines use
graph
Ø  Very powerful for finding
relationships
Ø  More versatile than other
DB formats
Ø  Great for unwinding
complex scenarios

Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor

SYSTAP
  SYSTAP builds highly-scalable open source solutions for big
graphs
  Its flagship product is Blazegraph, a platform that supports
semantic web and graph database APIs. It features fault
tolerant storage & query capabilities and online backup &
failover.
Blazegraph achieves its scale and high throughput by
leveraging GPU acceleration via its Mapgraph technology

Guest: Brad Bebee
Brad Bebee is the CEO and Managing Partner at
SYSTAP, LLC. Brad leads the efforts to use SYSTAP
technologies for high performance graph databases
and analytics to delivery solutions for multiple
business and mission areas. Over the course of his
career, he has served as a CTO, CFO, managed
operating divisions, and performed advanced
technology development for commercial and
government customers. He is an active contributor to
SYSTAP’s open source software projects. His
technology experience ranges from early work in
modeling methodologies and knowledge
representation dating back to precursors of DARPA’s
DAML program to more recent work with large scale
data analytics using the Hadoop ecosystem,
Accumulo, and related technologies. He has extensive
experience in architecture and software modeling
methodologies, where he has lead and collaborated
upon multiple publications receiving recognition for
his research.

http://blazegraph.com/
The
Perfect
Fit:
Scalable
Graph
for

Big
Data

June
30,
2015

Bloor
Group
Brieﬁng
Room

11
Big
Data
Startup
Award
Winner:

2015
Big
Data
InnovaBons
Summit

Helping
customers
achieve
their
business

objecBves
with
graph
data
is
our
vision,

mission,
and
the
essence
of
our
soJware

soluBons.

Today,
we
serve
Fortune
500
companies,

startups,
governments,
and
research

organizaBons
with
technology
to
power

their
graphs.

Graph Databases Grew at Over 500% in the Last
Two Years
Popularity changes per category – March 2015
PopularityChanges
Graph
Databases
12

The Amount of Graph Data is Exploding
Billion+ Edges
13
SYSTAP™, LLC
© 2006-2015 All Rights Reserved

SYSTAP™, LLC
© 2006-2015 All Rights Reserved 14
Graph Applications are Everywhere
•  Community
Detection / Clustering
•  Recommendation
Systems
•  Fault Prediction
in Industrial and
Internet of
Things (IoT)
•  Drug Discovery /
Repurposing
•  Precision Medicine /
Genomics
•  Fraud Detection
•  Time Series,
Compliance
•  Cyber
•  Defense / Security

Graphs
are
diﬀerent.

You
need
the

right
paradigm
and
hardware
to
scale

https://datatake.files.wordpress.com/2015/09/latency.png
Graph Cache Thrash
The CPU just waits for
graph data from main
memory...
TypeofCacheorMemory
Access Latency Per Clock Cycle
SYSTAP™, LLC
15

Solutions to the Graph Scaling Problem Using
Graph Databases and GPUs
●  Embedded
●  High Availability
●  Scale-out
●  GPU Acceleration
●  100s of Times Faster
than CPU main
memory-based systems
●  Up to 40X Cheaper
●  10,000X Faster than
disk-based technologies

Uncovering inﬂuence links in molecular knowledge networks to streamline personalized medicine |
Shin, Dmitriy et al.Journal of Biomedical Informatics , Volume 52 , 394 - 405
Finding
the
Next
Cure
for
Cancer
is
a

Billion+
Edge
Graph
Challenge

17

Graph is BIG and changing 
(Trillion+ Edges)
18

Graphs Enable People to Find Knowledge
A Bunch of Pages An Answer
19

Graphs Enable Enterprises to Manage
Metadata
•  Data
outlives
speciﬁc
system
implementaBons.

•  Data
outlives
applicaBons.

•  Achieve
Metadata
independence
using
declaraBve
standards

to
manage
metadata
and
express
transformaBons.

Data Sources
Data Providers
Knowledge Graph: Instance Data + Ontology (RDF + OWL)
ACLs
Query Catalog
Constraints Rules Events Mappings Widgets Views
20

Knowledge
Base
of
Biology
(KaBOB)

Open
Biomedical
Ontologies

biomedical

data
&

informaBon

applicaBon

data

biomedical

knowledge

Entrez

Gene

17
databases

DIP

UniProt

GOA

GAD

HGNC

InterPro

Gene

Ontology

Sequence

Ontology

Cell
Type

Ontology

ChEBI

NCBI

Taxonomy

Protein

Ontology

12
ontologies

…
…
21

Powering
Their
Graphs
with

Blazegraph™

SYSTAP™, LLC
Information Management /
Retrieval
Genomics / Precision
Medicine
Defense, Intel, Cyber
22

The
right
scaling
approach

depends
on
the
business
need

SYSTAP™, LLC
Single
GPU

(500+M)

MulB-‐GPU

Clusters

(100+B)

23
Fast
Fastest
Speed

Data
Scale
(Edges)

Scale
Out

(1T+)

High

Availability

(50B)

JVM

Journal

Embedded

Single
Server

(50B)

Millions

Billions

Trillions

Blazegraph™
stands
out!

•  Wikimedia
EvaluaBon:

hfps://docs.google.com/a/systap.com/spreadsheets/d/
1MXikljoSUVP77w7JKf9EXN40OB-‐ZkMqT8Y5b2NYVKbU/edit#gid=0

SYSTAP™, LLC
© 2006-2015 All Rights Reserved 24

Blazegraph™:

Embedded
and
Single
Server

•  High
performance,
Scalable

–  50B
edges/node

–  RDF/SPARQL
level
query
language

–  Eﬃcient
Graph
Traversal

–  High
9s
soluBon

•  Property
graphs

–  Blueprints,
gremlin,
rextser

•  REST
API
(NSS)

•  Extension
points

–  Stored
queries
for
custom
applicaBon
logic
on

the
server.

–  Custom
services
&
indices

–  Custom
funcBons

–  Vertex-‐centric
programs

•  Embedded
Server

•  Standalone
Server

JVM

Journal

WAR

Journal

25

Blazegraph™:

High
Availability

•  Shared
nothing
architecture

–  Same
data
on
each
node

–  Coordinate
only
at
commit

–  Transparent
load
balancing

•  Scaling

–  50
billion
triples
or
quads

–  Query
throughput
scales
linearly

•  Self
healing

–  AutomaBc
failover

–  AutomaBc
resync
aJer
disconnect

–  Online
single
node
disaster
recovery

•  Online
Backup

–  Online
snapshots
(full
backups)

–  HA
Logs
(incremental
backups)

•  Point
in
Bme
recovery
(oﬄine)

HAService

Quorum

k=3

size=3

follower

leader

HAService

HAService

26

Blazegraph™:

Scale-‐out

•  Shard-‐based
horizontal
scale-‐
out
to
support
1
Trillion+

Edge
Graphs

•  Fast
parallel
load

•  Eﬃcient
Query
Through

CoordinaBon
Between
Data

Services

•  Coming
soon!
Support
for

HDFS
for
failover.

27

How
do
I
use
GPUs
to
scale
graphs?

●  Parallel Processing on
GPU Clusters for
Trillion+ Edge Graphs
●  High-Level API
●  Partitioning and
Overlapping
Communications
●  HPC and DARPA
Pedigree
28

Blazegraph GPU: Ridiculously
Fast for Graphs
Blazegraph™ plug-in for GPU Acceleration
with familiar graph APIs
Graph

DB

29

Mapgraph HPC with NVIDIA GPUs
$16K / GTEP (K40 - Today)
$4K / GTEP (Pascal 2016)
Blazegraph
MulB-‐GPU:

Extreme

Scale,
40X
more
Aﬀordable!

Cray XMT-2
$~180K / GTEP
Large Hadoop Cluster
$~18M / GTEP
Future Blazegraph SaaS
On-demand
1 GTEP = 1 Billion
Traversed Edges Per
Second
40X!
10X!
30

Perceptions & Questions
Analyst:
Robin Bloor

Of Graphs and Networks
Robin Bloor, PhD

Johnny-Come-Lately
Aside from the three letter agencies,
until recently, nobody cared much
about graphs…
WHY?

Reasons for Graph Apathy…
1  Unfamiliarity (it’s obscure
because it’s obscure)
2  RDBMS do not store graphs
well and SQL is inadequate
for querying graphs
3  No common BI applications,
it’s mainly analytics
4  Semantic technology has
taken a lifetime to evolve

Reasons to Care
u  Graphs express very
different (and important)
data relationships
u  Graphs are largely
unexplored
u  Graphs are ideal for MDM
u  Graphs express semantic
relationships

Semantics: The Type 0 Language
Colorless green ideas sleep furiously
Colorless green
sleep
furiously
ideas

The Net Net
The ultimate goal is INFERENCING:
Knowledge discovery
(rather than pattern discovery)
through graph processing

u  What are the “low hanging fruit” graphical
applications – in your company’s experience?
u  Does your company find itself competing
with Hadoop Giraph? What are the
compelling differences?
u  Is Blazegraph a triple-store at the physical
level (i.e., a pure RDF implementation) or
does it implement a variety of physical
structures?

u  At what level of data volume/workload is
hardware acceleration a necessity?
u  What is the largest amount of data currently
under management with any of your customers?
u  Which companies/technologies do you compete
with directly?

Upcoming Topics
www.insideanalysis.com
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA

THANK YOU
for your
ATTENTION!
Some images provided courtesy of Wikimedia Commons

The Perfect Fit: Scalable Graph for Big Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a The Perfect Fit: Scalable Graph for Big Data

Similar a The Perfect Fit: Scalable Graph for Big Data (20)

Más de Inside Analysis

Más de Inside Analysis (20)

Último

Último (20)

The Perfect Fit: Scalable Graph for Big Data