Relational databases are rigid-structured data sources characterized by complex relationships among a set of relations (tables). Making sense of such relationships is a challenging problem because users must consider multiple relations, understand their ensemble of integrity constraints, interpret dozens of attributes, and draw complex SQL queries for each desired data exploration. In this scenario, we introduce a twofold methodology; we use a hierarchical graph representation to efficiently model the database relationships and, on top of it, we designed a visualization technique for rapidly relational exploration. Our results demonstrate that the exploration of databases is profoundly simplified as the user is able to visually browse the data with little or no knowledge about its structure, dismissing the need of complex SQL queries. We believe our findings will bring a novel paradigm in what concerns relational data comprehension.
1. 17th International Conference
Information Visualization
GGrraapphh--bbaasseedd RReellaattiioonnaall
DDaattaa VViissuuaalliizzaattiioonn
DDaanniieell MMáárriioo
pdf at: www.icmc.usp.br/pessoas/junio
ddee LLiimmaa
JJoosséé FFeerrnnaannddoo
RRooddrriigguueess JJrr..
AAggmmaa JJuuccii
MMaacchhaaddoo TTrraaiinnaa
<<ddaanniieellmm@@iiccmmcc..
uusspp..bbrr>>
<<jjuunniioo@@iiccmmcc..uusspp..bbrr>> <<aaggmmaa@@iiccmmcc..uusspp..bbrr>>
Instituto de Ciências Matemáticas e de Computação
Universidade de São Paulo
15, 16, 17 and 18 July 2013
SOAS, University of London ● London ● UK
pdf at http://www.icmc.usp.br/~junio/PublishedPapers/Lima-et_al_IV-2013.pdf
4. IInnttrroodduuccttiioonn
• Large datasets are common
• unstructured: text
• semi-structured: XML, RDF, sensor data
• structured: relational (DBMS), network (graph-like)
• Analysis Process
• Data Representation / Transformation
• Storage / Retrieval
• Statistics
• Visualization
• Analysis
pdf at: www.icmc.usp.br/pessoas/junio
Iterate
5. IInnttrroodduuccttiioonn
• How to spot interesting facts in the relationships
of large relational databases?
• How are the entities on the database related to
each other?
• How are the entities distributed over the
relations of the database?
• How do the several attributes of the database
influence the relationships of the entities?
• How do we quickly and intuitively browse the
relational database, considering its complex
structure?
pdf at: www.icmc.usp.br/pessoas/junio
8. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RReellaattiioonnsshhiippss aass GGrraapphhss
Author Publish Work
pdf at: www.icmc.usp.br/pessoas/junio
9. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RReellaattiioonnsshhiippss aass GGrraapphhss
Author Publish Work
pdf at: www.icmc.usp.br/pessoas/junio
Alice A
Bob B
Charles C
…
A 1
B 2
C 3
A 2
…
1 Optic Fiber
2 Networks
3 Cryptography
…
11
22
33
AA
BB
CC
10. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RReellaattiioonnsshhiippss aass GGrraapphhss
Author Publish Work
pdf at: www.icmc.usp.br/pessoas/junio
Alice A
Bob B
Charles C
…
A 1
B 2
C 3
A 2
…
1 Optic Fiber
2 Networks
3 Cryptography
…
11
22
33
AA
BB
CC
11. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RReellaattiioonnsshhiippss aass GGrraapphhss
Author Publish Work
pdf at: www.icmc.usp.br/pessoas/junio
Alice A
Bob B
Charles C
…
A 1
B 2
C 3
A 2
…
1 Optic Fiber
2 Networks
3 Cryptography
…
11
22
33
AA
BB
CC
12. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
GGrraapphh PPaarrttiittiioonniinngg
pdf at: www.icmc.usp.br/pessoas/junio
13. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
GGrraapphh PPaarrttiittiioonniinngg
pdf at: www.icmc.usp.br/pessoas/junio
14. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
GGrraapphh PPaarrttiittiioonniinngg
pdf at: www.icmc.usp.br/pessoas/junio
15. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg
pdf at: www.icmc.usp.br/pessoas/junio
16. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg
subgraph 1 subgraph 2
cut 0
pdf at: www.icmc.usp.br/pessoas/junio
17. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg
subgraph 1 subgraph 2
cut 0
pdf at: www.icmc.usp.br/pessoas/junio
18. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg
subgraph 1 subgraph 2
cut 0
pdf at: www.icmc.usp.br/pessoas/junio
19. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg
cut 0
pdf at: www.icmc.usp.br/pessoas/junio
20. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
HHiieerraarrcchhiiccaall PPaarrttiittiioonniinngg
cut 1 cut 0 cut 2
pdf at: www.icmc.usp.br/pessoas/junio
25. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
SSuuppeerrGGrraapphh
subgraph 1 subgraph 2
cut 0
pdf at: www.icmc.usp.br/pessoas/junio
26. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
SSuuppeerrGGrraapphh
SuperNode 1 SuperNode 2
cut 0
pdf at: www.icmc.usp.br/pessoas/junio
27. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
SSuuppeerrGGrraapphh
SuperNode 1 SuperNode 2
SuperEdge 0
pdf at: www.icmc.usp.br/pessoas/junio
28. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
SSuuppeerrGGrraapphh
pdf at: www.icmc.usp.br/pessoas/junio
29. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
SSuuppeerrGGrraapphh
• Further details in the paper
pdf at: www.icmc.usp.br/pessoas/junio
30. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
Paper Author
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
pdf at: www.icmc.usp.br/pessoas/junio
31. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
Paper Author
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
pdf at: www.icmc.usp.br/pessoas/junio
32. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
Paper Author
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
PP AA
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
33. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
Paper Author
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
PP AA
local
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
34. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
Paper Author
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR
BBRR
local
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
35. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year
Paper Author
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR
BBRR
local
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
36. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year
local
Paper Author
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR
BBRR
’0’000-’-0’066
’0’066-’-1’111
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
37. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year
local
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR
BBRR
’0’000-’-0’066
’0’066-’-1’111
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
38. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year
local age
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR
BBRR
’0’000-’-0’066
’0’066-’-1’111
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
39. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year
local age
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
’0’066-’-1’111
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
40. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year dept
local age
Paper Author
‘9‘955++ ’0’022++
’0’066++ **
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
’0’066-’-1’111
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
41. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year dept
local age
Paper Author
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
IMIMEE **
’0’066++ **
EEEESSCC
ICICMMCC
‘9‘955++ ’0’022++
’0’066-’-1’111
FFFFLLCCHH
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
42. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year dept
local age
Paper Author
‘9‘955++ ’0’022++
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
’0’000-’-0’066
IMIMEE **
’0’066++ **
’0’066-’-1’111
FFFFLLCCHH
Connectivity
SuperEdges
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
43. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
AAttttrriibbuuttee--bbaasseedd PPaarrttiittiioonniinngg
year dept
local age
Paper Author
‘9‘955++ ’0’022++
PPaappeerr AAuutthhoorr
pdf at: www.icmc.usp.br/pessoas/junio
UUSS
PP AA
UUSS BBRR <<4400 >>4400
<<4400
>>4400
BBRR
IMIMEE **
’0’066++ **
FFFFLLCCHH
Connectivity
SuperEdges
Left relation: Paper = {idPaper, country, year, title}
Rght relation: Author = {idAuthor, age, dept, authorName}
44. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RR--MMiinnee PPrroottoottyyppee
• Based on the GMine System
• Test platform with minimalistic design
• SuperNode tree:
• node-link, radial layout, partial focus
• SuperEdge graphs:
• node-link, bipartite layout, edge filtering
• Leaf SuperNode graphs: typical node-link
pdf at: www.icmc.usp.br/pessoas/junio
45. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
RR--MMiinnee PPrroottoottyyppee
pdf at: www.icmc.usp.br/pessoas/junio
46. DDBB m m××mm GGrraapphh PPaarrttititioionniningg GGrraapphhTTrreeee VVisisuuaalilzizaattioionn AAnnaalylyssisis
33.. EExxppeerriimmeennttss
pdf at: www.icmc.usp.br/pessoas/junio
47. TTyycchhoo UUSSPP ddaattaabbaassee
• Data from several USP systems
• Personnel, Supervisions, Publications, Events…
pdf at: www.icmc.usp.br/pessoas/junio
48. TTyycchhoo UUSSPP ddaattaabbaassee
• Using 5 entities and 5 relationships
• 350k events
• 380k examinations
• 691k publications
• 50k people
• 26k supervisions
• 1.5 million nodes total
• 1.8 million edges (relationships)
pdf at: www.icmc.usp.br/pessoas/junio
49. QQ11:: aaccttiivvee aauutthhoorrss
• Which group of People (by age) have the
largest number of recent publications?
SQL:
SELECT a.age, count(*) num
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
JOIN Person a ON a.id = x.author
GROUP BY a.age ORDER BY num DESC
pdf at: www.icmc.usp.br/pessoas/junio
51. QQ11..bb:: aaccttiivvee aauutthhoorrss
• Who are them?
• SQL: SELECT a.name, p.title
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
JOIN Person a ON a.id = x.author
WHERE a.age IN
(SELECT age FROM
(SELECT a.age age, count(*) num
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
JOIN Person a ON a.id = x.author
GROUP BY a.age ORDER BY num DESC) T)
pdf at: www.icmc.usp.br/pessoas/junio
53. QQ22:: ffaavvoorriittee ccoouunnttrriieess
• Which country receives the largest number of recent
publications from this group of people?
• SQL: SELECT a.name, p.title
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
JOIN Person a ON a.id = x.author
AND a.age BETWEEN 56 AND 63
WHERE p.country IN
(SELECT country FROM
(SELECT p.country country, count(*) num
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
JOIN Person a ON a.id = x.author
AND a.age BETWEEN 56 AND 63
GROUP BY p.country ORDER BY num DESC) T)
pdf at: www.icmc.usp.br/pessoas/junio
56. QQ33:: aaccttiivvee aauutthhoorrss ppeerr
ccoouunnttrryy • Now in one specific country, which group of People is the
most active recently?
• SQL: SELECT a.name, p.title
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
AND p.country = ‘Estados Unidos’
JOIN Person a ON a.id = x.author
WHERE a.age IN
(SELECT age FROM
(SELECT a.age age, count(*) num
FROM PersonPublication x
JOIN Publication p ON p.id = x.publication
AND p.year >= 2008
AND p.country = ‘Estados Unidos’
JOIN Person a ON a.id = x.author
GROUP BY a.age ORDER BY num DESC) T)
pdf at: www.icmc.usp.br/pessoas/junio
63. OOuurr aapppprrooaacchh
• Can use the Relational information
• To guide the partitioning
• To give an initial context to the analyst
• Faster than running SQL queries
• Make neighborhood exploration easy
• Interactive Visualization environment
pdf at: www.icmc.usp.br/pessoas/junio
64. CCoonnssiiddeerraattiioonnss
• Initial parameters
• Which entities, relationships and attributes?
• In which order?
• How to define partitions? Ranges?
• How many partitions?
• Different interaction tasks
• Ongoing usability evaluation
pdf at: www.icmc.usp.br/pessoas/junio
This presentation is divided in four parts: Introduction of the problem, Our proposed Method, Experimental demonstrations and Conclusions.
With the current state of Information and Communication Technology, there is a crescent number of large datasets available, such as (dataset examples).
To obtain knowledge and information from these data, the analytical process goes through some steps (in the slide), by several iterations, until deemed appropriate by the analyst.
Given the widespread use of relational databases, and their increasing size and complexity, the challenge is to discover useful things about them -- relationships, distribution, influence – in a quick and uncomplicated way.
So, our method attacks this challenge by using graph representations along with graph techniques and interactive visualization, following the pipeline below.
Starting with a many-to-many relationship between two entities,
Each entity is a relation with several rows, each one constituted by several attributes, and a relationship table (a 3rd relation) that holds the primary keys of the related objects
Naturally, this model can be represented as the nodes of a graph
And the relationship rows hold the edges linking the nodes:
A, Alice to 1, Optic Fiber;
B, Bob to 2, Networks;
A, Alice to 2, Networks; (which is the same as Bob)
And so on…
Larger schemata with more relations can be transformed in just one big graph.
At the end, the database will be transformed in a graph, and graph partitioning techniques can be used.
The partitioning is an operation that divides the graph into disjoint sets of nodes, following the minimum cut (considering the smallest number of edges between subsets) technique.
These are the edges of the minimum cut.
And after separating them from the graph…
…the result is the separate subgraphs (1 and 2), and the cut-set of edges (cut 0). The cut-set contains the edges between the nodes of these subgraphs.
And the whole process can be repeated recursively…
… by selecting edges,
… splitting the subgraphs,
… and resulting in more subgraphs and cut-sets, in a hierarchy. This hierarchy is stored in the GraphTree structure, formally defined by a SuperGraph.
In the SuperGraph, each subgraph resulted from the partitioning corresponds to a SuperNode,
and each cut-set forms a SuperEdge between them,
… so this process is applied to all subgraphs,
recursively, because it is possible to obtain the initial unpartitioned parent subgraph by reuniting the smaller subgraphs and cut-sets.
Here, each group linked by SuperEdges will form a new SuperNode.
And the cut-set 0 will form the SuperEdge between them.
Resulting in the SuperGraph structure with all the information from the original unpartitioned graph.
So, after we organize the partitioned graph in a SuperGraph, we can store it in a disk-based data structure called the GraphTree. The definition and further details are in the paper.
But how we define a partitioning?
Starting from a schema of interest, here with the base case: a many-to-many relationship.
We start representing the database as a graph,
And make an initial partitioning by the entities involved. On the right we can see the hierarchical structure of the corresponding SuperGraph.
For the next partitioning, we select an attribute from one entity, local of publishing
And split the subgraph according to categories of this attribute (like a group by query), in this case, the Papers are split in papers published in Brazil and papers published in the United States
And we go on, selecting another attibute, year of publication
In this case, dividing papers in US in two categories, one from 2000 to 2006, and the other from 2006 to 2011
And the Brazilian node in three categories.
The asterisk category includes the “other nodes”, nodes that do not belong to any of the categories of interest, but that are still relevant.
Now we select an attribute from the other entity, the ages of authors
Which split them in two groups, more than 40 years old, and less than 40 years old
And now by author department
Which splits the previous authors’ partitions, showing the largest categories and giving a hint about how their sizes compare. In this example, we see that in the United States there were more papers published from 2006 to 2011 than from 2000 to 2006
And we can also compute connectivity SuperEdges. They retrieve the edges between any two SuperNodes in the SuperGraph, and therefore allows to make natural joins between the subgroups on demand.
In the example, we can analyze the publications published in brazil since 2006 from authors with less than 40 years old
Or how many papers were published in the United States by authors of the FFLCH department with more than 40 years old.
To evaluate this method, we implemented the RMine prototype based on the GMine System, a previous tool for general graph visualization with a minimalistic design.
The SuperNodes are viewed as a node-link radial layout, with per-entity focusable subgraphs.
The SuperEdge and LeafSuperNode visualizations are typical graph layouts, such as influence, force-directed and bipartite node-link layouts.
R-Mine implements the attribute-based partitioning idea, with hierarchical SuperNodes inside bubbles, and SuperEdges linking them.
The SuperEdges summarize the weight of the edges in between.
And additionally, each time the user selects another category of the partitioning, the SuperNode is focused and occupy the area of the entity SuperNode, therefore allowing better screen usage.
Here: follow images (a) through (d), pinpointing the tree and then the visualization, which have a correspondence.
The Tycho USP database is a database with the academic data from several USP systems. We use a subset of the database with some entities we are interested in.
And some numbers for the relations used in the experiment.
Now we answer some questions with an SQL query and compare it with the interaction in R-Mine.
(question)
That’s answered by joining the relationship Person-Publication, Publications published after 2008, and counting the rows grouped by the author age.
In the R-Mine system, we first choose the entities and attributes of the database to prepare the SuperGraph.
With the SuperGraph ready, we open the Publication and Person SuperNodes, select the SuperNode Publications from 2008 to 2012;
and visualize the weight of the SuperEdges to the people SuperNodes (which were partitioned by age).
And the group of authors between 56 and 63 years old had the biggest number of publications.
But, how we fetch those authors? By selecting each one contained in the previous group by.
In R-Mine, we open the SuperEdge, and the graph of the active authors and their respective recent publications is loaded and visualized.
Now, where these authors published their works? What are their favorite countries?
The SQL query is similar, but we select a specific group of people, and group by publication country.
And in R-Mine, we open the SuperNode of Publications from 2008 to 2012, revealing the next partitioning by country; and select the person group with ages from 56 to 63.
We can see that Brazil, United States, Argentina and England are the preferred countries for those authors.
And again the nodes and edges inside the SuperEdge.
Now reverting the question: within one specific country, who are the most active authors?
The SQL query is the same as the first, but with an additional predicate defining the country.
In R-Mine we just select the United States SuperNodes, and by inspecting the SuperEdges to people groups, we can observe that the most active authors in the United States are a different group, now from 42 to 49 years old.
In our experiments, 150 queries like the previous questions were computed in the R-Mine system and in PostgreSQL with the corresponding indexes.
The R-Mine System showed better performance in all the cases.
By summing the time of these queries, we observe that a session with an increasing number of questions would be 10-fold more efficient within a SuperGraph-based tool, such as R-Mine.
This table shows the results of another experiment, where we compute connectivity SuperEdges to all the siblings of a given SuperNode, in both R-Mine and PostgreSQL.
So, we saw that:
- our approach can use the Relational information to guide the partitioning, thus giving an initial context to the analyst
- it was faster than running sql queries
- and it makes neighborhood exploration easy, because of the visual environment
In the current prototype, the SuperGraph is built before the visualization, thus we need to decide which relations and attributes are included beforehand.
Attribute ordering will define the sequence of exploration.
Different attribute distributions can result in very unbalanced partitions, but for numerical attributes we can easily group by equally sized ranges.
Too many partitions are difficult to see, but too few can be useless. As a general recommendation we follow Miller’s Law, that is, 7 +- 2 elements per task.
We would like to include other analytical tasks and visualizations along the existing ones.
Finally, there is a usability evaluation in progress to improve the visual layout and to compare with other visualization approaches.