Le#zia	Tanca	
Politecnico	di	Milano	
joint	work	with	Università	della	Basilicata	
(credits	in	the	last	slide)	
Cogni#ve	Sy...
User Interaction	
Visualize	
Annotation	
Collaboration	
Efficiency	
Explanations	
Sampling	
Personalization	
Intensional v...
•  Rich	data	
•  Dialogue-based	interac#on	
•  Based	on	intensional	characteriza#on		
of	the	informa#on	
•  Meaningful	fee...
•  Starting point: a large,
“semantically-rich” db
•  Goals
•  explore, to learn
interesting things
•  without a clear, a-...
•  A classical db is inherently
transactional
•  “Data Enthusiasts” are not
willing to afford building a
warehouse
•  Inte...
The UI Layer
The Engine Layer
The DB Layer
“interesting”
attributes
Ac#vity	
id	
type	
start	
length	
userId
AcmeUser	 Ac#vity	Loca#on	 Sleep	
The Engine Layer
The DB Layer
AcmeUser	⨝			
Loca#on	
Ac#vity	⨝			
AcmeUser	
Sleep	⨝			
A...
•  Query Engine
•  Frequency distributions
of attribute values
•  Sampling
•  Statistical hypothesis
tests:
•  Real-valued...
1)	Extrac#on	
3)	Itera#on	
4)		Ranking	of	the	
analyses	based	on	the	
Hellinger	Distance		
between	the	distribu#ons
An interactive dialogue:
•  Users may change their
minds
•  Feedback: emphasis on
dataset properties, not on
extensions
• ...
•  Politecnico	di	Milano:	Paolo	Paolini,	NicoleQa	Di	Blas,	Elisa	
Quintarelli,	Manuel	Roveri,	Mirjana	Mazuran	
•  Universi...
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Letizia Tanca - Exploring Databases:  The Indiana Project
Próxima SlideShare
Cargando en…5
×

Letizia Tanca - Exploring Databases: The Indiana Project

189 visualizaciones

Publicado el

Letizia Tanca, Politecnico di Milano, made this presentation for the Cognitive Systems Institute Speaker Series on July 21, 2016..

Publicado en: Tecnología
0 comentarios
0 recomendaciones
Estadísticas
Notas
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Sin descargas
Visualizaciones
Visualizaciones totales
189
En SlideShare
0
De insertados
0
Número de insertados
1
Acciones
Compartido
0
Descargas
2
Comentarios
0
Recomendaciones
0
Insertados 0
No insertados

No hay notas en la diapositiva.

Letizia Tanca - Exploring Databases: The Indiana Project

  1. 1. Le#zia Tanca Politecnico di Milano joint work with Università della Basilicata (credits in the last slide) Cogni#ve Systems Ins#tute Speaker Series
  2. 2. User Interaction Visualize Annotation Collaboration Efficiency Explanations Sampling Personalization Intensional view Query Suggestion
  3. 3. •  Rich data •  Dialogue-based interac#on •  Based on intensional characteriza#on of the informa#on •  Meaningful feedback (relevance) •  User experience Database Explora#on as a viewpoint of Exploratory Compu5ng: à only, more emphasis on efficiency
  4. 4. •  Starting point: a large, “semantically-rich” db •  Goals •  explore, to learn interesting things •  without a clear, a-priori perception of what we are looking for
  5. 5. •  A classical db is inherently transactional •  “Data Enthusiasts” are not willing to afford building a warehouse •  Interactive Data Cleaning •  Let’s do it on the database!
  6. 6. The UI Layer The Engine Layer The DB Layer “interesting” attributes Ac#vity id type start length userId
  7. 7. AcmeUser Ac#vity Loca#on Sleep The Engine Layer The DB Layer AcmeUser ⨝ Loca#on Ac#vity ⨝ AcmeUser Sleep ⨝ AcmeUser type sex quality view X is a parent of view Y means Y contains X as a subexpression
  8. 8. •  Query Engine •  Frequency distributions of attribute values •  Sampling •  Statistical hypothesis tests: •  Real-valued attributes: •  Kolmogorov-Smirnov •  Categorical attributes •  Chi-Square •  or Entropy Test for low frequencies Query Engine Computing Distributions Running Hypothesis Tests
  9. 9. 1) Extrac#on 3) Itera#on 4) Ranking of the analyses based on the Hellinger Distance between the distribu#ons
  10. 10. An interactive dialogue: •  Users may change their minds •  Feedback: emphasis on dataset properties, not on extensions •  Summarization What is interesting is discovered: •  Discontinuities •  Niche knowledge detection is serendipitous: surprise vs. previous subsets or vs. user’s expectations •  At each iteration the user should understand •  the “current” subset of items (its properties) •  the main differences vs. one or more of the previous subsets •  where to focus her attention (what is interesting?) •  Statistical approach to finding discrepancies •  A way to highlight relevant properties
  11. 11. •  Politecnico di Milano: Paolo Paolini, NicoleQa Di Blas, Elisa Quintarelli, Manuel Roveri, Mirjana Mazuran •  Università della Basilicata: Giansalvatore Mecca, Donatello Santoro, Marcello Buoncris#ano, Antonio Giuzio •  M. Buoncris#ano, G. Mecca, E. Quintarelli, M.Roveri,D. Santoro, L. Tanca: Database Challenges for Exploratory Compu5ng. SIGMOD Record, 2015 •  N. Di Blas, M. Mazuran, P. Paolini, E. Quintarelli, L.Tanca: Exploratory compu5ng: a dra= Manifesto. DSAA 2014 •  S. Idreos, O. Papaemmanouil, S. Chaudhuri: Overview of Data Explora5on Techniques. SIGMOD 2015. •  My post on the SIGMOD Blog

×