Final presentation of my dissertation thesis focused on orientation, analyzing and finding information in large or unknown relational databases and data visualisation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Data Processing over very Large Relational Databases
1. Data Processing over Very Large Databases
Ing. Ľuboš Takáč
Supervisor: doc. Ing. Michal Zábovský, PhD.
Faculty of Management Science and Informatics
University of Žilina
2. Large Databases
• VLDB (very large databases)
• Relational Databases with hundreds of tables and millions
of rows
3. The Problem
• How to understand relational database model so that we
could find information in them.
• Orientation in large RDB
– given by the complexity of RDB model
• Modification and development of RDB.
4. Existing approaches
• Database metrics
• Database visualization
• Database to ontology mapping and examination of ontology
5. Database Metrics
• Database metric is a function that assigns to an object from the
database a numeric value.
• Examples of table metrics
– DRT(T) – depth of relational tree
– TS(T) – table size
– RD(T) – referential degree
– …
• Rankings – grouping metrics with different weights.
11. Visualization of RDB schema graph
• Vertex and edge weighted graph based on RDB metrics.
• Using Gephi for visualization
– automatic generated layout
– interactive visualization (selections, examinations of nodes and
edges)
– using graph algorithms
12.
13.
14. Analyzing of RDB graph
• Three approaches
– graph of RDB model (vertex – table, edges – foreign key relations)
– alternative (vertex – table, edge – foreign key relation for each
tuple)
– graph of tuples (vertex – tuple, edge – foreign key relation between
tuples)
15. Analyzing of RDB Graph – first approach
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
1 2 3 4 5 6 7 8 9 10 11 13 17 18 29
probability
vertex degree
Distribution function of vertex degree.
16. Analyzing of RDB Graph – second approach
probability
vertex degree
Distribution function of vertex degree.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
17. Analyzing of RDB Graph – third approach
count
vertex degree
Distribution function of vertex degree.
20. Analyzing of RDB Graph - Conclusion
• RDB model is scale-free.
• To understand RDB you must to understand centers at first.
(there is not a lot of centres)
• Very useful metric NR(T) – number of references validated
by analyzing of RDB Graph.
• We created 2 new metrics based on mentioned three
approaches.
21. A Method for Analyzing Large RDB
• Find components of schema graph (tables = vertices, FK =
edges)
• Examine each component starting in order with largest first
– If you get alone table, very probably is an archive, try to check it or
find another purpose.
– Else visualize it via ER diagram, Schamaball or graph using table
metrics.
30. RDB to Ontology Mapping
– better understanding and searching for information without
knowledge of RDB model, data mining from RDB
– can be used by web search engines to search in RDBs
– getting information from RDB by people, whose do not understand
RDB technology (layman)
– a method how to merge multiple databases (ontology merging)
– interactive searching for information (Protégé)
34. How to find information in Ontologies
• using query language (SPARQL)
• interactive (e.g. Protégé)
– using OntoGraf combined with text searching
– explore entities and individuals
35.
36. Disadvantages & Problems of mapped RDBs to
Ontologies
• Difficult to maintain actual data (static & dynamic Ontology
creation).
• Aggregated queries are very slow.
• Existing tools are not capable with large RDBs (or large
ontologies).
37. Conclusion & Scientific Contribution
• Design and creation of method for orientation, understanding
and finding information in large or unknown relational
databases. (RDBAnalyzer supports mentioned principles)
• Detection of RDB graph characteristics (Scale free network) and
using this knowledge to create 2 new and validate 1 existing
metric.
• Design and creation of method for finding information in
ontologies generated from RDB.