2. 1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
NoSQL & NewSQL
Databases
SCALE
COMPLEXITY
COMPLEXITY
Business Intelligence (BI)
Web Applications
Artificial Intelligence (AI)
?
AI S YST EMS P RO CES S KN OW L EDG E T HAT I S TO O CO MP L EX FO R CURREN T DATABAS ES
Punch cards
& Tapes
Navigational
Databases
Record Keeping
SCALE
Follow us @GraknLabs
3. 1960 1970 1980 1990 2000 2010 2020 2030
Relational/SQL
Databases
NoSQL & NewSQL
Databases
Business Intelligence (BI)
Web Applications
Artificial Intelligence (AI)
SCALE
COMPLEXITY
SCALE
COMPLEXITY
WHAT RELATIONAL DID FOR BI, IS WHAT GRAKN WILL DO FOR AI
Punch cards
& Tapes
Navigational
Databases
Record Keeping
Follow us @GraknLabs
4. Follow us @GraknLabs
What is the problem with complex data?
Too complex to model
Current modelling
techniques only based on
binary relationships
Could not model complex
domains
Too complex to query
Current languages only allow
you to query for explicitly
stored data
Could not simplify verbose
queries
Too expensive analytics
Automated distributed
algorithms (BSP) expensive
and not reusable
Could not reuse analytics
algorithms
DB QLs are too low-level
Strong abstraction over low-
level constructs and
complex relationships
Difficult to work with
complex data
5. Follow us @GraknLabs
GRAKN.AI is a hyper-relational database
for knowledge-oriented systems
i.e.
GRAKN.AI is a knowledge baseKnowledge Storage System
Novel Knowledge Representation System based on
Hypergraph Theory
Knowledge Inference
OLTP Reasoning Engine
Knowledge Analytics
OLAP Distributed Analytics
6. Follow us @GraknLabs
What is a hyper-relational database?
Hyper-expressive schema
Flexible Entity-Relationship
concept-level schema to
build knowledge models
Model complex
domains
Real-time inference
Automated deductive
reasoning of data points
during runtime (OLTP)
Derive implicit facts &
simplification
Analytics as a Language
Automated distributed
algorithms (BSP) as a
language (OLAP)
Automated large scale
analytics
High-level query language
Strong abstraction over low-
level constructs and
complex relationships
Easier to work with
complex data
8. Follow us @GraknLabs
THE CENTRAL DOGMA
TRANSLATION
RNA to PROTEINS
TRANSCRIPTION
DNA to RNA
REPLICATION
DNA to DNA
Francis Crick, 1958
Nobel Prize Winner 1962
9. Follow us @GraknLabs
https://www.ncbi.nlm.nih.gov http://www.uniprot.org http://www.geneontology.org
http://reactome.org
http://www.mirbase.org http://mircancer.ecu.edu
http://bioinfo.life.hust.edu.cn/miRNASNP2/index.php
http://mirtarbase.mbc.nctu.edu.tw http://www.genenames.org
http://www.microrna.org/microrna/home.do
A SMALL SAMPLE…
11. Follow us @GraknLabs
Schema Example: Basic Model
Employ-
ment
Person CompanyName
Employee Employer
has has
relates relates
plays plays
12. Follow us @GraknLabs
Schema Example: Type-Hierarchy
Employ-
ment
Person
Customer
Company
Startup
Name
Employee Employer
has has
sub sub
relates relates
plays plays
plays plays
18. Follow us @GraknLabs
THE CENTRAL DOGMA: INFERRED
TRANSLATION
RNA to PROTEINS
TRANSCRIPTION
DNA to RNA
REPLICATION
DNA to DNA
Francis Crick, 1958
Nobel Prize Winner 1962
21. Follow us @GraknLabs
Schema Example: Type-Hierarchy
Employ-
ment
Person
Customer
Company
Startup
Name
Employee Employer
has has
sub sub
relates relates
plays plays
Husband
Wife
Marriage
plays
plays
relates
relates
22. Follow us @GraknLabs
Valid Data Insertion
Alice Bob
IBM
Grakn
mar
emp
emp
employer
employer
wife husband
✓ Write commit success
customerperson
startup
23. Follow us @GraknLabs
Invalid Data insertions – [intelligent] Schema Constraints are Back!
Charlie Applemar
husband wife
companyperson
❌ Write commit fails
❌ Invalid relationship
29. Follow us @GraknLabs
Complex Query Example
drive
drive
drive
travel
travel
travel
Alice
Full-time Emp
Bob
Part-time Emp
Charlie
Temporary Emp
AB123
Bus
BC234
Van
CD345
Truck
Kings
Cross
Ward
London
City
UK
Country
loc
loc
Who are all the
drivers that will be
arriving in the UK?
The query would be very
long and complex in SQL,
NoSQL or even Graphs
30. Follow us @GraknLabs
Complex Query Example: Type and Relationship Inference
drive
drive
drive
travel
travel
travel
Alice
Full-time Emp
Bob
Part-time Emp
Charlie
Temporary Emp
AB123
Bus
BC234
Van
CD345
Truck
Kings
Cross
Ward
London
City
UK
Country
loc
loc
Who are all the
drivers that will be
arriving in the UK?
31. Follow us @GraknLabs
THE ANALYTICS OLAP LANGUAGE
Large-scale analytics is like teenage sex: everyone talks about it,
nobody really knows how to do it, everyone thinks everyone else is
doing it, so everyone claims they are doing it too.
At the end of the day, very few people know how to code it.
32. Follow us @GraknLabs
Example of a Distributed Analytics Algorithm
For each vertex V,
Superstep 1:
V sends its own id via both out going and incoming edges
V sets its own id as cluster label
Do superstep n:
For every received message m of V, compare it to its current cluster label L:
If m > L, set the label to m;
If the cluster label has not changed in this super step, vote to halt;
Else, send the new cluster label via all edges;
Global operation:
While not every vertex votes to halt, and n < N, do another superstep n + 1.
Connected Component: a clustering algorithm (pseudocode)
An efficient implementation
of this algorithm is about
200 lines of code in Java
33. Follow us @GraknLabs
Example of a Distributed Analytics Algorithm
For each vertex V,
Superstep 1:
V sends its own id via both out going and incoming edges
V sets its own id as cluster label
Do superstep n:
For every received message m of V, compare it to its current cluster label L:
If m > L, set the label to m;
If the cluster label has not changed in this super step, vote to halt;
Else, send the new cluster label via all edges;
Global operation:
While not every vertex votes to halt, and n < N, do another superstep n + 1.
Connected Component: a clustering algorithm (pseudocode)
An efficient implementation
of this algorithm is about
200 lines of code in Java
34. Follow us @GraknLabs
Graql Distributed Analytics Queries
And we’ll continue to add more
algorithms into the language,
such as PageRank, K-Core, Triangle
Count, Density, Cliques, Centrality,
and so on