3. • Rich data
• Dialogue-based interac#on
• Based on intensional characteriza#on
of the informa#on
• Meaningful feedback (relevance)
• User experience
Database Explora#on as a viewpoint of
Exploratory Compu5ng:
à only, more emphasis on efficiency
4. • Starting point: a large,
“semantically-rich” db
• Goals
• explore, to learn
interesting things
• without a clear, a-priori
perception of what we
are looking for
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17. • A classical db is inherently
transactional
• “Data Enthusiasts” are not
willing to afford building a
warehouse
• Interactive Data Cleaning
• Let’s do it on the database!
18. The UI Layer
The Engine Layer
The DB Layer
“interesting”
attributes
Ac#vity
id
type
start
length
userId
19. AcmeUser Ac#vity Loca#on Sleep
The Engine Layer
The DB Layer
AcmeUser ⨝
Loca#on
Ac#vity ⨝
AcmeUser
Sleep ⨝
AcmeUser
type sex
quality
view X is a parent
of view Y means
Y contains X as a
subexpression
20. • Query Engine
• Frequency distributions
of attribute values
• Sampling
• Statistical hypothesis
tests:
• Real-valued attributes:
• Kolmogorov-Smirnov
• Categorical attributes
• Chi-Square
• or Entropy Test for low
frequencies
Query Engine
Computing Distributions
Running Hypothesis Tests
22. An interactive dialogue:
• Users may change their
minds
• Feedback: emphasis on
dataset properties, not on
extensions
• Summarization
What is interesting is
discovered:
• Discontinuities
• Niche knowledge detection
is serendipitous: surprise vs.
previous subsets or vs. user’s
expectations
• At each iteration the user
should understand
• the “current” subset of
items (its properties)
• the main differences vs.
one or more of the
previous subsets
• where to focus her
attention (what is
interesting?)
• Statistical approach to
finding discrepancies
• A way to highlight relevant
properties