A complete rundown of Graph db by Aneesh Mon from the
Mixed Nuts, a meetup organized by Pramati Technologies in Chennai. Mixed Nuts organizes Meetups and Workshops on a diverse range of tech topics are hosted here.
https://www.pramati.com/
https://blog.imaginea.com/
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Graph db - Pramati Technologies [Meetup]
1. So what are graph
databases pistas
at?
Aneesh Mon N
06-Nov-2019
Chennai
2. About Mixed-Nuts-at-Pramati
Mixed Nuts is a meetup organized by Pramati Technologies in
Chennai. Meetups and Workshops on a diverse range of tech
topics are hosted here.
Who are we? Website : https://www.pramati.com/
Blog : https://blog.imaginea.com/
3. About Me
Aneesh has been working with huge data sets and is proficient in
integrating, centralizing and maintaining data using database, ETL
and SOLR skills. Recently he has started exploring Graph
databases, to solve open issues of RDBMS.
LinkedIn: https://www.linkedin.com/in/aneeshmonn
4. Agenda
1. RDBMS Challenges
2. GraphDB and Why GraphDB?
a. Popular GraphDB’s
b. Popular Query languages for GraphDB
3. Property Graph Model
4. Neo4J
a. Neo4j Cypher & Sandbox
5. IPL Data Analysis GraphDB Vs RDBMS
6. Demo
6. RDBMS Challenges
1. RDBMS lacks performance when the data is highly connected
a. cause a lot of joins
b. over 7 self/recursive joins, the RDMS starts to get really slow
2. RDBMS aren’t designed to capture this rich relationship and
lacks flexibility
a. don’t adapt well to change
b. fixed schema
c. addresses known problems
3. Complex Queries
a. query syntax becomes complex and large as the joins increase
7. GraphDB and Why GraphDB?
● GraphDB
○ designed to treat the relationships between data as equally
important to the data itself
○ intended to hold data without constricting it to a pre-defined
model
○ data is stored like we first draw it out
● Why GraphDB
○ Its connected world! There are no isolated pieces of information
○ stores connections alongside the data
○ graph databases excel at managing highly-connected data
9. Popular Query languages for GraphDB
● Cypher: a graph query declarative language for Neo4j that
enables ad hoc and programmatic (SQL-like) access to the
graph
● GraphQL: an open-source data query and manipulation
language for APIs
● Gremlin: a graph programming language that is a part of
Apache TinkerPop open-source project
● AQL (ArangoDB Query Language): a SQL-like query language
used in ArangoDB for both documents and graphs
10. Property Graph Model
● Nodes
○ entities in the graph
○ can be tagged with labels, representing their different roles in your domain
○ can hold any number of attributes (key-value pairs) called properties
● Relationships
○ provide directed,
named,
semantically-
relevant connections
between two node
entities
○ has a direction, a
type, a start node,
and an end node
11. Neo4j
● Open-source, NoSQL, native graph database
● Provides full database characteristics, including ACID
transaction compliance
● Implements the property graph model down to the storage
level
● Uses Cypher Query Language
● Constant time traversals
● Flexible property graph schema
● Drivers for popular programming languages
12. Neo4j Cypher & Sandbox
● Play area for us to understand Neo4J
● https://neo4j.com/sandbox-v2/
16. Q1. List down all the matches played by
CSK.
Cypher SQL
MATCH p=()<-[:PLAYED]-(team:Team)
where team.abb="CSK"
RETURN p LIMIT 25
SELECT
s.season,
(select team_name from graphdb.teams where id=home_team_id)||' vs
'||(select team_name from graphdb.teams where id=away_team_id)
FROM
graphdb.seasons s
inner join graphdb.matches m on m.season_id=s.id
inner join graphdb.teams on
teams.id=any(ARRAY[home_team_id,away_team_id])
where teams.short_name='CSK'
17. Q2. Winning percentage for toss winner
season by season
Cypher SQL
match (s:Season)-[:PLAYED_IN]-(m:Match)
with s,count(m) as match_count
match (s)-[:PLAYED_IN]-(m1:Match)-[:TOSS_WON_BY]-(t:Team)
WHERE (m1)-[:WON_BY]-(t)
with s,match_count,count(m1) as win_count
return s.name,round((win_count*1.0/match_count)*100) as game_win_pct,
win_count, match_count
order by game_win_pct desc
SELECT
season as season,
round((toss_winner_winner_cnt / total_matches), 2) * 100 AS
win_percentage,
toss_winner_winner_cnt,
total_matches
FROM (
SELECT
season,
count(DISTINCT m.id) FILTER(WHERE
toss_winner_team_id=match_winner_team_id)::numeric as
toss_winner_winner_cnt,
COUNT(DISTINCT m.id)::numeric as total_matches
FROM graphdb.seasons s inner join graphdb.matches m on
m.season_id=s.id
inner join graphdb.match_win_info mw on mw.match_id=m.id
GROUP BY 1)t
order by 2 desc;
18. Q3. IPL Highest Partnership for a
season
Cypher SQL
Match(s:Season)<-[r:PLAYED_IN]-(m:Match)<-[b:BELONGS_TO_MATCH]-(i:Innings)-
[i2:IN_INNINGS]-(o:Over)-[b2:BELONGS_TO_OVER]-(b3:Ball)-
[s2:STRIKER|NON_STRIKER]-(p:Player)
where s.name="2013"
WITH s.name as name,m.match_id as match_id,m.name as match_name,i.innings as
innings,b3.number as ball_no,toInteger(b3.runs)+toInteger(b3.extra) as
runs,collect(p.name) as batsmans
UNWIND(batsmans) as player
WITH name,match_id,match_name,innings,ball_no,runs,player
ORDER BY player
WITH name as season,match_id,match_name,innings,ball_no,runs,COLLECT(distinct
player) as batsmans
WITH season,match_id,match_name,innings,batsmans,sum(runs) as partnership_runs
return season,match_name,batsmans,max(partnership_runs) as
max_partnership_runs
order by max_partnership_runs desc
limit 5
select
distinct
season,
partnership,
partnership_runs,
played_by
from
(
select season,
match_id,
(select short_name from graphdb.teams t inner join graphdb.matches m on
m.home_team_id=t.id and m.id=match_id limit 1)||' Vs '||(select short_name from
graphdb.teams t inner join graphdb.matches m on m.away_team_id=t.id and
m.id=match_id limit 1) as played_by,
match_innings,
partnership,
sum(runs) as partnership_runs
from
(
select season as season,mb.match_id,mb.match_innings,array_to_string((select
array_agg(player_name order by player_name) from (select
19. Q4. Batting Average of a Player in a
season
Cypher SQL
match (p:Player)<-[:STRIKER]-(b:Ball)-[:BELONGS_TO_OVER]->(:Over)-
[:IN_INNINGS]->(:Innings)-[:BELONGS_TO_MATCH]->(m:Match)-
[:PLAYED_IN]->(s:Season)
where s.name='2017' and p.name='V Kohli'
with s.name as season,p.name as player,sum(toInteger(b.runs)) as
total_runs,collect(b.ball_id) as ball_idså
match (b1:Ball)<-[:GOT_OUT]-(p1:Player)
where b1.ball_id in (ball_ids) and p1.name=player
with season,player,total_runs,count(b1) as out_balls
return season,player,total_runs,out_balls,total_runs*1.0/out_balls as
bat_avg
order by bat_avg desc
limit 10
select
s.season,
p.player_name,
sum(mb.runs) as total_runs,
round(case when count(distinct lb.match_id) filter (where
p.id=ANY(lb.dismiss_player_ids)) = 0 then null else
sum(mb.runs)::numeric/count(distinct mb.match_id)::numeric-count(distinct
lb.match_id) filter (where not p.id=ANY(lb.dismiss_player_ids))::numeric
end,2) as bat_avg
from
graphdb.seasons s
join
graphdb.matches m on s.id=m.season_id
join
graphdb.match_ball_wise_info mb on m.id=mb.match_id
join
graphdb.teams t on (t.id!=mb.team_id and (t.id=m.home_team_id or
t.id=m.away_team_id))