Graph-based RelationalData Visualization

17th International Conference
Information Visualization
GGrraapphh--bbaasseedd RReellaattiioonnaall
DDaattaa VViissuuaalliizzaattiioonn
DDaanniieell MMáárriioo
pdf at: www.icmc.usp.br/pessoas/junio
ddee LLiimmaa
JJoosséé FFeerrnnaannddoo
RRooddrriigguueess JJrr..
AAggmmaa JJuuccii
MMaacchhaaddoo TTrraaiinnaa
<<ddaanniieellmm@@iiccmmcc..
uusspp..bbrr>>
<<jjuunniioo@@iiccmmcc..uusspp..bbrr>> <<aaggmmaa@@iiccmmcc..uusspp..bbrr>>
Instituto de Ciências Matemáticas e de Computação
Universidade de São Paulo
15, 16, 17 and 18 July 2013
SOAS, University of London ● London ● UK
pdf at http://www.icmc.usp.br/~junio/PublishedPapers/Lima-et_al_IV-2013.pdf

OOuuttlliinnee
1. Introduction
2. Method
3. Experiments
4. Conclusions

11.. IInnttrroodduuccttiioonn

IInnttrroodduuccttiioonn
• Large datasets are common
• unstructured: text
• semi-structured: XML, RDF, sensor data
• structured: relational (DBMS), network (graph-like)
• Analysis Process
• Data Representation / Transformation
• Storage / Retrieval
• Statistics
• Visualization
• Analysis
Iterate

IInnttrroodduuccttiioonn
• How to spot interesting facts in the relationships
of large relational databases?
• How are the entities on the database related to
each other?
• How are the entities distributed over the
relations of the database?
• How do the several attributes of the database
influence the relationships of the entities?
• How do we quickly and intuitively browse the
relational database, considering its complex
structure?

OOuurr aapppprrooaacchh
• Use graph representation
• Graph-partitioning techniques
• Graph-processing
• Interactive Visualization
Database  Graph  Partitioning  Visualization  Analysis

22.. MMeetthhoodd

DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RReellaattiioonnsshhiippss aass GGrraapphhss
Author Publish Work

RReellaattiioonnsshhiippss aass GGrraapphhss
Author Publish Work
Alice A
Bob B
Charles C
…
A 1
B 2
C 3
A 2
…
1 Optic Fiber
2 Networks
3 Cryptography
…
11
22
33
AA
BB
CC

GGrraapphh PPaarrttiittiioonniinngg

HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg

subgraph 1 subgraph 2
cut 0

cut 0

cut 1 cut 0 cut 2

cut 1 cut 0 cut 2
subgraph 1-1
subgraph 1-2
subgraph 2-1
subgraph 2-2

SSuuppeerrGGrraapphh
SuperNode 1-1
cut 1 cut 0 cut 2
SuperNode 1-2
subgraph 2-1
subgraph 2-2

SuperNode 1-1
SuperEdge 1
SuperNode 1-2
subgraph 2-1
cut 0 cut 2
subgraph 2-2

SuperEdge 2
SuperNode 2-1
SuperNode 2-2
cut 0

subgraph 1 subgraph 2
cut 0

SuperNode 1 SuperNode 2
cut 0

SuperNode 1 SuperNode 2
SuperEdge 0

• Further details in the paper

AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
Paper Author
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}

Paper Author
PPaappeerr AAuutthhoorr
PP AA

Paper Author
PP AA
local

Paper Author
UUSS
PP AA
UUSS BBRR
BBRR
local

year
Paper Author
UUSS
PP AA
UUSS BBRR
BBRR
local

year
local
Paper Author
UUSS
PP AA
UUSS BBRR
BBRR
’0’000-’-0’066
’0’066-’-1’111

year
local
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
UUSS
PP AA
UUSS BBRR
BBRR
’0’000-’-0’066
’0’066-’-1’111

year
local age
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
UUSS
PP AA
UUSS BBRR
BBRR
’0’000-’-0’066
’0’066-’-1’111

year
local age
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
’0’066-’-1’111

year dept
local age
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
’0’066-’-1’111

year dept
local age
Paper Author
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
IMIMEE **
’0’066++ **
EEEESSCC
ICICMMCC
‘9‘955++ ’0’022++
’0’066-’-1’111
FFFFLLCCHH

year dept
local age
Paper Author
‘9‘955++ ’0’022++
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
IMIMEE **
’0’066++ **
’0’066-’-1’111
FFFFLLCCHH
Connectivity
SuperEdges

year dept
local age
Paper Author
‘9‘955++ ’0’022++
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
IMIMEE **
’0’066++ **
FFFFLLCCHH
Connectivity
SuperEdges

RR--MMiinnee PPrroottoottyyppee
• Based on the GMine System
• Test platform with minimalistic design
• SuperNode tree:
• node-link, radial layout, partial focus
• SuperEdge graphs:
• node-link, bipartite layout, edge filtering
• Leaf SuperNode graphs: typical node-link

RR--MMiinnee PPrroottoottyyppee

33.. EExxppeerriimmeennttss

TTyycchhoo UUSSPP ddaattaabbaassee
• Data from several USP systems
• Personnel, Supervisions, Publications, Events…

TTyycchhoo UUSSPP ddaattaabbaassee
• Using 5 entities and 5 relationships
• 350k events
• 380k examinations
• 691k publications
• 50k people
• 26k supervisions
• 1.5 million nodes total
• 1.8 million edges (relationships)

QQ11:: aaccttiivvee aauutthhoorrss
• Which group of People (by age) have the
largest number of recent publications?
SQL:
SELECT a.age, count(*) num
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
JOIN Person a ON a.id = x.author
GROUP BY a.age ORDER BY num DESC

QQ11:: aaccttiivvee aauutthhoorrss

QQ11..bb:: aaccttiivvee aauutthhoorrss
• Who are them?
• SQL: SELECT a.name, p.title
AND p.year >= 2008
WHERE a.age IN
(SELECT age FROM
(SELECT a.age age, count(*) num
AND p.year >= 2008
GROUP BY a.age ORDER BY num DESC) T)

QQ11..bb:: aaccttiivvee aauutthhoorrss

QQ22:: ffaavvoorriittee ccoouunnttrriieess
• Which country receives the largest number of recent
publications from this group of people?
AND p.year >= 2008
AND a.age BETWEEN 56 AND 63
WHERE p.country IN
(SELECT country FROM
(SELECT p.country country, count(*) num
AND p.year >= 2008
AND a.age BETWEEN 56 AND 63
GROUP BY p.country ORDER BY num DESC) T)

QQ22:: ffaavvoorriittee ccoouunnttrriieess

QQ33:: aaccttiivvee aauutthhoorrss ppeerr
ccoouunnttrryy • Now in one specific country, which group of People is the
most active recently?
AND p.year >= 2008
AND p.country = ‘Estados Unidos’
WHERE a.age IN
(SELECT age FROM
(SELECT a.age age, count(*) num
AND p.year >= 2008
AND p.country = ‘Estados Unidos’
GROUP BY a.age ORDER BY num DESC) T)

QQ33:: aaccttiivvee aauutthhoorrss ppeerr
ccoouunnttrryy

PPeerrffoorrmmaannccee:: iinnddiivviidduuaall
qquueerriieess
150 analytical questions: PostgreSQL × R-Mine

PPeerrffoorrmmaannccee:: aaccccuummuullaatteedd
ttiimmee
150 analytical questions: PostgreSQL × R-Mine

PPeerrffoorrmmaannccee:: llooaaddiinngg ttiimmee
SuperNode Load(s)
Connectivity
to all siblings
(seconds)
SQL
(seconds)
(initial loading) 6.032 - -
Person 0.057 5.847 7.349
Event 0.271 5.276 26.716
Publication 0.160 4.484 27.677
Total 6.520 15.607 61.742

44.. CCoonncclluussiioonnss

OOuurr aapppprrooaacchh
• Can use the Relational information
• To guide the partitioning
• To give an initial context to the analyst
• Faster than running SQL queries
• Make neighborhood exploration easy
• Interactive Visualization environment

CCoonnssiiddeerraattiioonnss
• Initial parameters
• Which entities, relationships and attributes?
• In which order?
• How to define partitions? Ranges?
• How many partitions?
• Different interaction tasks
• Ongoing usability evaluation

TThhaannkkss

Graph-based RelationalData Visualization

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Más de Universidade de São Paulo

Más de Universidade de São Paulo (11)