The document discusses SYSTAP and their graph database product Blazegraph. It provides an overview of SYSTAP and Blazegraph, highlighting that Blazegraph can scale to handle large graph datasets with billions or trillions of edges through various deployment options including embedded, high availability, scale-out, and GPU acceleration configurations. The document also discusses how Blazegraph is being used by organizations for applications like knowledge graphs, genomics, and defense/intelligence.
3. Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise
software, good and bad
Provide a forum for detailed analysis of today s innovative
technologies
Give vendors a chance to explain their product to savvy
analysts
Allow audience members to pose serious questions... and
get answers!
Mission
5. Twitter Tag: #briefr The Briefing Room
Topics
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA
6. Twitter Tag: #briefr The Briefing Room
When You’re Hot…
Ø Biggest Web engines use
graph
Ø Very powerful for finding
relationships
Ø More versatile than other
DB formats
Ø Great for unwinding
complex scenarios
7. Twitter Tag: #briefr The Briefing Room
Analyst: Robin Bloor
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor
8. Twitter Tag: #briefr The Briefing Room
SYSTAP
SYSTAP builds highly-scalable open source solutions for big
graphs
Its flagship product is Blazegraph, a platform that supports
semantic web and graph database APIs. It features fault
tolerant storage & query capabilities and online backup &
failover.
Blazegraph achieves its scale and high throughput by
leveraging GPU acceleration via its Mapgraph technology
9. Twitter Tag: #briefr The Briefing Room
Guest: Brad Bebee
Brad Bebee is the CEO and Managing Partner at
SYSTAP, LLC. Brad leads the efforts to use SYSTAP
technologies for high performance graph databases
and analytics to delivery solutions for multiple
business and mission areas. Over the course of his
career, he has served as a CTO, CFO, managed
operating divisions, and performed advanced
technology development for commercial and
government customers. He is an active contributor to
SYSTAP’s open source software projects. His
technology experience ranges from early work in
modeling methodologies and knowledge
representation dating back to precursors of DARPA’s
DAML program to more recent work with large scale
data analytics using the Hadoop ecosystem,
Accumulo, and related technologies. He has extensive
experience in architecture and software modeling
methodologies, where he has lead and collaborated
upon multiple publications receiving recognition for
his research.
11. http://blazegraph.com/
11
Big
Data
Startup
Award
Winner:
2015
Big
Data
InnovaBons
Summit
Helping
customers
achieve
their
business
objecBves
with
graph
data
is
our
vision,
mission,
and
the
essence
of
our
soJware
soluBons.
Today,
we
serve
Fortune
500
companies,
startups,
governments,
and
research
organizaBons
with
technology
to
power
their
graphs.
16. http://blazegraph.com/
Solutions to the Graph Scaling Problem Using
Graph Databases and GPUs
● Embedded
● High Availability
● Scale-out
● GPU Acceleration
● 100s of Times Faster
than CPU main
memory-based systems
● Up to 40X Cheaper
● 10,000X Faster than
disk-based technologies
17. http://blazegraph.com/
Uncovering influence links in molecular knowledge networks to streamline personalized medicine |
Shin, Dmitriy et al.Journal of Biomedical Informatics , Volume 52 , 394 - 405
Finding
the
Next
Cure
for
Cancer
is
a
Billion+
Edge
Graph
Challenge
17
25. http://blazegraph.com/
Blazegraph™:
Embedded
and
Single
Server
• High
performance,
Scalable
– 50B
edges/node
– RDF/SPARQL
level
query
language
– Efficient
Graph
Traversal
– High
9s
soluBon
• Property
graphs
– Blueprints,
gremlin,
rextser
• REST
API
(NSS)
• Extension
points
– Stored
queries
for
custom
applicaBon
logic
on
the
server.
– Custom
services
&
indices
– Custom
funcBons
– Vertex-‐centric
programs
• Embedded
Server
• Standalone
Server
JVM
Journal
WAR
Journal
25
26. http://blazegraph.com/
Blazegraph™:
High
Availability
• Shared
nothing
architecture
– Same
data
on
each
node
– Coordinate
only
at
commit
– Transparent
load
balancing
• Scaling
– 50
billion
triples
or
quads
– Query
throughput
scales
linearly
• Self
healing
– AutomaBc
failover
– AutomaBc
resync
aJer
disconnect
– Online
single
node
disaster
recovery
• Online
Backup
– Online
snapshots
(full
backups)
– HA
Logs
(incremental
backups)
• Point
in
Bme
recovery
(offline)
HAService
Quorum
k=3
size=3
follower
leader
HAService
HAService
26
27. http://blazegraph.com/
Blazegraph™:
Scale-‐out
• Shard-‐based
horizontal
scale-‐
out
to
support
1
Trillion+
Edge
Graphs
• Fast
parallel
load
• Efficient
Query
Through
CoordinaBon
Between
Data
Services
• Coming
soon!
Support
for
HDFS
for
failover.
27
28. http://blazegraph.com/
How
do
I
use
GPUs
to
scale
graphs?
● Parallel Processing on
GPU Clusters for
Trillion+ Edge Graphs
● High-Level API
● Partitioning and
Overlapping
Communications
● HPC and DARPA
Pedigree
28
34. Reasons for Graph Apathy…
1 Unfamiliarity (it’s obscure
because it’s obscure)
2 RDBMS do not store graphs
well and SQL is inadequate
for querying graphs
3 No common BI applications,
it’s mainly analytics
4 Semantic technology has
taken a lifetime to evolve
35. Reasons to Care
u Graphs express very
different (and important)
data relationships
u Graphs are largely
unexplored
u Graphs are ideal for MDM
u Graphs express semantic
relationships
36. Semantics: The Type 0 Language
Colorless green ideas sleep furiously
Colorless green
sleep
furiously
ideas
37. The Net Net
The ultimate goal is INFERENCING:
Knowledge discovery
(rather than pattern discovery)
through graph processing
38. u What are the “low hanging fruit” graphical
applications – in your company’s experience?
u Does your company find itself competing
with Hadoop Giraph? What are the
compelling differences?
u Is Blazegraph a triple-store at the physical
level (i.e., a pure RDF implementation) or
does it implement a variety of physical
structures?
39. u At what level of data volume/workload is
hardware acceleration a necessity?
u What is the largest amount of data currently
under management with any of your customers?
u Which companies/technologies do you compete
with directly?