SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Alex Endert, William Ribarsky,
Cagatay Turkay, William Wong,
Ian Nabney, Ignacio Díaz Blanco,
Fabrice Rossi
The State of the Art in
Integrating Machine Learning into
Visual Analytics
Dagstuhl -- Bridging Information Visualization with
Machine Learning (1-6 March, 2015)
This STAR …
Advances made at the intersection of Visual Analytics (VA) and
Machine Learning (ML)
Describes the extent to which machine learning methods are utilized in
visual analytics
Discuss open challenges and opportunities for both communities
Initiated by discussions on the “Role of the User” at the workshop
Why ML + VA ?
• Reasoning about data complicated and difficult as data scales and
complexities increase
• Powerful tools to draw valid conclusions from data, while maintaining
trustworthy and interpretable results
VA and ML have complementing strengths and weaknesses
Report Structure
• Categories of models and frameworks to describe the cognitive stages
people progress through data analysis
• Overview of existing ML and VA integrations
• Overview of application domains
• Open challenges and opportunities for ML and VA domains and
communities
Review methodology
• Existing literature on the integration of ML and VIS from three
different perspectives:
• Models and frameworks,
• Techniques
• Application areas
• Resources from both the visualisation and machine learning
domain
Review methodology
On VIS side (major resources and starting points):
Journals (major resources):
IEEE Transactions on Visualization and Computer Graphics
Computer Graphics Forum
IEEE Computer Graphics and Applications
Information Visualization
Conferences (major resources):
IEEE Visual Analytics Science and Technology
IEEE Symposium on Information Visualization
EuroVis
IEEE Pacific Visualization Symposium (PacificVis)
EuroVis workshop on Visual Analytics (EuroVA)
Review methodology
On ML side (major resources and starting points):
Journals :
Journal of Machine Learning Research
Neurocomputing
IEEE Transactions on Knowledge and Data Engineering
Conferences :
International Conference on Machine Learning (ICML)
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
European Symposium on Artificial Neural Networks, Computational Intelligence and
Machine Learning (ESANN)
Review methodology
• 133 papers on VIS side resources
• 46 on ML side resources
• Total 186 papers cited (including others)
• Later – 69 papers under the Existing Techniques taxonomy
Rest of today
PART-1: MODELS & FRAMEWORKS
PART-2: EXISTING TECHNIQUES OF ML + VA (& APPLICATIONS)
PART-3: OPPORTUNITIES & CHALLENGES
PART-1
MODELS & FRAMEWORKS
Categories of models and frameworks
• Models and frameworks to describe the cognitive stages
people progress through data analysis
• Guidance and context for designing solutions
• Reviewed under three categories:
• Sensemaking and Knowledge Discovery (Human Reasoning)
• Interactivity in Visual Analytics
• Machine Learning
Sensemaking and Knowledge Discovery (Human Reasoning)
• Models in this category helps us in “Understanding the cognitive
processes of people as they reason about data”
• Provide foundational basis and design guidelines for visual analytics
[PIROLLI & CARD, 2005]
[SACHA et al., 2014]
[KLEIN et al., 2006]
Sensemaking loop by Pirolli and Card
[PIROLLI & CARD, 2005]
Sensemaking in two primary phases
foraging and synthesis
synthesis more “cognitively intensive”
Criticised for being linear despite the loops
The Data-Frame Model of Sense-making by Klein et al.
Exchange of information between the
human and the data in terms of frames
Data connects with the frame
Elaboration to strengthen the frame
Reframe to augmented or to
create a new one
designing visual analytics in a way that
encourages elaboration and reframing
[KLEIN et al., 2006]
Human-Computer knowledge generation model by Sacha et al.
[SACHA et al., 2014]
Much more concrete and explicit roles for human and computer
Three loops: exploration, verification, and knowledge generation
Two main roles for ML:
- Transform unstructured or semi-structured data into a form more meaningful for
human exploration and insight discovery.
- Unsupervised/semi-supervised ML to guide the analysis, e.g., best visualizations,
sequences of steps in the exploration, verification, or knowledge generation processes
Models in “Interactivity in Visual Analytics”
[ENDERT et al., 2012]
[HEER, 2006]
how data characteristics are extracted and
assigned visual attributes or encodings,
ultimately creating a visualization
Semantic interaction as an approach
in which analytical reasoning of the
user is inferred and in turn used to
steer the underlying models implicitly
Machine Learning Models and Frameworks
[ CRISP-DM by Shearer, 2000 ]
comparable to knowledge discovery and
visual analytics frameworks
No clear room for user apart from
continuous feedback and evaluation
(mostly deployment related)
Interactive Machine Learning &
Active Learning
… ML algorithms are able to determine
interesting inputs for which they
do not know the desired outputs (in the
training set), in such a way that given those outputs the
predictive performances of the model would greatly improve
[FAILS et al., 2003]
Machine Learning Models and Frameworks
PART-2
ML + VA TECHNIQUES (& APPLICATIONS )
First perspective: types of ML algorithms
• Considered analytical tasks that often require the joint capabilities of
computation and user expertise
• Types of ML algorithms that have been considered within visual
analytics literature (other groupings possible):
• Dimension reduction
• Clustering
• Classification
• Regression/correlation analysis (Prediction)
Second perspective: Interaction intent
• Characterise why interaction takes place?
• Resonates with the “user intent” categories suggested by Yi et al., 2007 but posed
at a higher-level:
Modify parameters and computation domain:
- Modifying the parameters of an algorithm or changing the algorithm
- Defining the measures used in the computations
- Modify the computational domain to which the algorithm is applied
Define analytical expectations:
- Communicate expected results (to the computational method)
- Communicate examples of relevant, domain-knowledge informed relations
Modify parameters and
computation domain
Define analytical
expectations
Dimension
Reduction
11 10
Clustering 18 8
Classification 9 4
Regression /
correlation
5 4
A total of 69 papers
Modify parameters and
computation domain
Define analytical
expectations
Dimension
Reduction
Clustering
Classification
Regression /
correlation
Dimension Reduction - Modify parameters and computation domain
• Steer the computation to where it matters
• Assist reduction with user-defined quality
• Create “user-defined” local projections
[Jeong et al., 2009]
[Turkay et al. , 2011]
[Williams and Munzner , 2006]
[Johansson and Johansson, 2009]
Clustering- Modify parameters and computation domain
• Multiple clustering algorithms with multiple parameters
• Compare over quality metrics / (dis)similarities
[Seo & Shneiderman, 2002]
[Lex et al., 2010][Schreck et al., 2009]
Classification - Modify parameters and computation domain
• Embedded classification methods
• Reducing search space / Interactively labelling data
• Interactively generating the classification structures
• Evaluate ensembles
[Choo et al., 2010]
[Elzen & Wijk, 2010]
[KRAUSE et al., 2014]
Regression / Correlation - Modify parameters and computation domain
• Interactive visual validation of models
• Visual representations to derive explanations
• Selecting feature subsets – “local” models
[Muhlbacher and Piringer, 2013] [Klemm et al., 2016]
[Malik et al., 2012]
Modify parameters and
computation domain
Define analytical
expectations
Dimension
Reduction
Clustering
Classification
Regression /
correlation
Dimension Reduction - Define analytical expectations
• Dimension reduction is a suitable candidate given their unsupervised
nature
• Observation-level interactions (Endert et al. 2011)
• Re-compute based on user expectations
[Endert et al., 2011][Kwon et al., 2016]
[Hu et al., 2013]
Clustering- Define analytical expectations
• Interactively introduce grouping constraints
• Results are “user-optimized”, e.g., these items are (not) similar
• Learn-by-example
• Iterative refinement through interaction
[HOSSAIN et al., 2012]
[Choo et al., 2013]
Classification- Define analytical expectations
• Classification tasks are suitable for methods where users communicate
known/expected/wrong classification results back to the algorithm
• Iteratively learns user-preferences
• Relevance feedback & Model learning
[Behrisch et al., 2014]
[Schreck et al., 2009]
Regression/Correlation - Define analytical expectations
• Not a populated field
• Some examples in Ensemble Simulation analysis
• Selecting ”targets” interactively
[Matkovic et al., 2008]
Application Areas
Areas representing unique and important challenges, thus different
combinations of interactive visualizations and ML techniques are used:
• Text Analytics and Topic Modelling
• Multimedia Visual Analytics
• Streaming Data: Finance, Cyber Security, Social Media
• Biological Data
PART-3
OPPORTUNITIES & CHALLENGES
Balancing Human and Machine Effort, Responsibility, and Tasks
• For mixed-initiative systems, it is a common notion that there exists a
balance of effort between the user and the machine
• Decomposing larger task into subtasks and assign to the person, or
more quickly to be performed by the system
• Well-defined and quantitative (i.e., solved by computation)
• Subjective and less formally defined
Challenges:
• Not clear the extent to which tasks should be divided
• Need to measure the amount of effort expended by both the user and
the system
Creating and Training Models from User Interaction Data
• User interaction logs contain rich information about the process and
interests of the user
• Opportunity exists for ML techniques to leverage the real-time user
interaction data generated from the analysts using the system to steer
the computation
• Two broad models can be created
• Data models:
• weighted data items and attributes (but form inferences from user interaction)
• User Models:
• computational approximations of (the state of) the user (e.g., cognitive load, personality
traits)
Complex Computation Systems & Automation Surprise
• many inter-related and inter-dependent “black boxes” of automated
components
• difficult to know what input leads to what output
• interactions between automated “black boxes” can create automation
surprises
• Leads to:
• error
• loss of trust the technology
Visualizing Intermediate Results and Computational Process
• a.k.a. Progressive Analytics (Stolper et al., 2014)
• Many kinds of ML algorithms undergo a continuous
convergence process towards the final solution
• Rendering visualizations of the intermediate results during
convergence
• Steerable ML algorithms:
• Prior knowledge on the relevance of features
• Insight on the similarities between items
• Prior knowledge on class information
To conclude
• Already a good level of integration within the domains
• Only a small subset of ML techniques incorporated
• VIS is slow in catching up with advanced ML techniques
• Increasing awareness/interest in the ML domain
• Several problems and opportunities
• Formalizing and establishing steerable ML
• Better determine how tasks should be divided between humans and machines
• Bridging the two communities further
Alex Endert, William Ribarsky,
Cagatay Turkay, William Wong,
Ian Nabney, Ignacio Díaz Blanco,
Fabrice Rossi
The State of the Art in
Integrating Machine Learning into
Visual Analytics

Más contenido relacionado

Similar a The state of the art in integrating machine learning into visual analytics

Goal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationGoal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to Implementation
Amjad Adib
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
IJEACS
 

Similar a The state of the art in integrating machine learning into visual analytics (20)

2016-03-02 research seminar
2016-03-02 research seminar2016-03-02 research seminar
2016-03-02 research seminar
 
ppt_ooad.pdf
ppt_ooad.pdfppt_ooad.pdf
ppt_ooad.pdf
 
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
WDES 2015 paper: A Systematic Mapping on the Relations between Systems-of-Sys...
 
Interactive Machine Learning
Interactive  Machine LearningInteractive  Machine Learning
Interactive Machine Learning
 
Goal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationGoal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to Implementation
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
 
Software Development Methodologies-HSM, SSADM
Software Development Methodologies-HSM, SSADMSoftware Development Methodologies-HSM, SSADM
Software Development Methodologies-HSM, SSADM
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Review and analysis of machine learning and soft computing approaches for use...
Review and analysis of machine learning and soft computing approaches for use...Review and analysis of machine learning and soft computing approaches for use...
Review and analysis of machine learning and soft computing approaches for use...
 
Ibm colloquium 070915_nyberg
Ibm colloquium 070915_nybergIbm colloquium 070915_nyberg
Ibm colloquium 070915_nyberg
 
Cognitive automation
Cognitive automationCognitive automation
Cognitive automation
 
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine LearningA Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized Learning
 
Machine learning with an effective tools of data visualization for big data
Machine learning with an effective tools of data visualization for big dataMachine learning with an effective tools of data visualization for big data
Machine learning with an effective tools of data visualization for big data
 
Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning Systems
 
Open Data Infrastructures Evaluation Framework using Value Modelling
Open Data Infrastructures Evaluation Framework using Value Modelling Open Data Infrastructures Evaluation Framework using Value Modelling
Open Data Infrastructures Evaluation Framework using Value Modelling
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
1. oose
1. oose1. oose
1. oose
 

Más de Cagatay Turkay

Más de Cagatay Turkay (8)

Visual Analytics for User Behaviour Analysis in Cyber Systems
Visual Analytics for User Behaviour Analysis in Cyber SystemsVisual Analytics for User Behaviour Analysis in Cyber Systems
Visual Analytics for User Behaviour Analysis in Cyber Systems
 
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
The Inquisitive Data Scientist: Facilitating Well-Informed Data Science throu...
 
Visualisation for Data Science: Advances and Opportunities in Visualisation R...
Visualisation for Data Science: Advances and Opportunities in Visualisation R...Visualisation for Data Science: Advances and Opportunities in Visualisation R...
Visualisation for Data Science: Advances and Opportunities in Visualisation R...
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
 
Enhancing a Social Science Model-building Workflow with Interactive Visualisa...
Enhancing a Social Science Model-building Workflow with Interactive Visualisa...Enhancing a Social Science Model-building Workflow with Interactive Visualisa...
Enhancing a Social Science Model-building Workflow with Interactive Visualisa...
 
Visualization, A Primer - Basics, Techniques and Guidelines
Visualization, A Primer - Basics, Techniques and GuidelinesVisualization, A Primer - Basics, Techniques and Guidelines
Visualization, A Primer - Basics, Techniques and Guidelines
 
Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?
 
Designing Interactive Visualisations to Solve Analytical Problems in Biology
Designing Interactive Visualisations to Solve Analytical Problems in BiologyDesigning Interactive Visualisations to Solve Analytical Problems in Biology
Designing Interactive Visualisations to Solve Analytical Problems in Biology
 

Último

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Último (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 

The state of the art in integrating machine learning into visual analytics

  • 1. Alex Endert, William Ribarsky, Cagatay Turkay, William Wong, Ian Nabney, Ignacio Díaz Blanco, Fabrice Rossi The State of the Art in Integrating Machine Learning into Visual Analytics
  • 2.
  • 3. Dagstuhl -- Bridging Information Visualization with Machine Learning (1-6 March, 2015)
  • 4. This STAR … Advances made at the intersection of Visual Analytics (VA) and Machine Learning (ML) Describes the extent to which machine learning methods are utilized in visual analytics Discuss open challenges and opportunities for both communities Initiated by discussions on the “Role of the User” at the workshop
  • 5. Why ML + VA ? • Reasoning about data complicated and difficult as data scales and complexities increase • Powerful tools to draw valid conclusions from data, while maintaining trustworthy and interpretable results VA and ML have complementing strengths and weaknesses
  • 6. Report Structure • Categories of models and frameworks to describe the cognitive stages people progress through data analysis • Overview of existing ML and VA integrations • Overview of application domains • Open challenges and opportunities for ML and VA domains and communities
  • 7. Review methodology • Existing literature on the integration of ML and VIS from three different perspectives: • Models and frameworks, • Techniques • Application areas • Resources from both the visualisation and machine learning domain
  • 8. Review methodology On VIS side (major resources and starting points): Journals (major resources): IEEE Transactions on Visualization and Computer Graphics Computer Graphics Forum IEEE Computer Graphics and Applications Information Visualization Conferences (major resources): IEEE Visual Analytics Science and Technology IEEE Symposium on Information Visualization EuroVis IEEE Pacific Visualization Symposium (PacificVis) EuroVis workshop on Visual Analytics (EuroVA)
  • 9. Review methodology On ML side (major resources and starting points): Journals : Journal of Machine Learning Research Neurocomputing IEEE Transactions on Knowledge and Data Engineering Conferences : International Conference on Machine Learning (ICML) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
  • 10. Review methodology • 133 papers on VIS side resources • 46 on ML side resources • Total 186 papers cited (including others) • Later – 69 papers under the Existing Techniques taxonomy
  • 11. Rest of today PART-1: MODELS & FRAMEWORKS PART-2: EXISTING TECHNIQUES OF ML + VA (& APPLICATIONS) PART-3: OPPORTUNITIES & CHALLENGES
  • 13. Categories of models and frameworks • Models and frameworks to describe the cognitive stages people progress through data analysis • Guidance and context for designing solutions • Reviewed under three categories: • Sensemaking and Knowledge Discovery (Human Reasoning) • Interactivity in Visual Analytics • Machine Learning
  • 14. Sensemaking and Knowledge Discovery (Human Reasoning) • Models in this category helps us in “Understanding the cognitive processes of people as they reason about data” • Provide foundational basis and design guidelines for visual analytics [PIROLLI & CARD, 2005] [SACHA et al., 2014] [KLEIN et al., 2006]
  • 15. Sensemaking loop by Pirolli and Card [PIROLLI & CARD, 2005] Sensemaking in two primary phases foraging and synthesis synthesis more “cognitively intensive” Criticised for being linear despite the loops
  • 16. The Data-Frame Model of Sense-making by Klein et al. Exchange of information between the human and the data in terms of frames Data connects with the frame Elaboration to strengthen the frame Reframe to augmented or to create a new one designing visual analytics in a way that encourages elaboration and reframing [KLEIN et al., 2006]
  • 17. Human-Computer knowledge generation model by Sacha et al. [SACHA et al., 2014] Much more concrete and explicit roles for human and computer Three loops: exploration, verification, and knowledge generation Two main roles for ML: - Transform unstructured or semi-structured data into a form more meaningful for human exploration and insight discovery. - Unsupervised/semi-supervised ML to guide the analysis, e.g., best visualizations, sequences of steps in the exploration, verification, or knowledge generation processes
  • 18. Models in “Interactivity in Visual Analytics” [ENDERT et al., 2012] [HEER, 2006] how data characteristics are extracted and assigned visual attributes or encodings, ultimately creating a visualization Semantic interaction as an approach in which analytical reasoning of the user is inferred and in turn used to steer the underlying models implicitly
  • 19. Machine Learning Models and Frameworks [ CRISP-DM by Shearer, 2000 ] comparable to knowledge discovery and visual analytics frameworks No clear room for user apart from continuous feedback and evaluation (mostly deployment related)
  • 20. Interactive Machine Learning & Active Learning … ML algorithms are able to determine interesting inputs for which they do not know the desired outputs (in the training set), in such a way that given those outputs the predictive performances of the model would greatly improve [FAILS et al., 2003] Machine Learning Models and Frameworks
  • 21. PART-2 ML + VA TECHNIQUES (& APPLICATIONS )
  • 22.
  • 23. First perspective: types of ML algorithms • Considered analytical tasks that often require the joint capabilities of computation and user expertise • Types of ML algorithms that have been considered within visual analytics literature (other groupings possible): • Dimension reduction • Clustering • Classification • Regression/correlation analysis (Prediction)
  • 24. Second perspective: Interaction intent • Characterise why interaction takes place? • Resonates with the “user intent” categories suggested by Yi et al., 2007 but posed at a higher-level: Modify parameters and computation domain: - Modifying the parameters of an algorithm or changing the algorithm - Defining the measures used in the computations - Modify the computational domain to which the algorithm is applied Define analytical expectations: - Communicate expected results (to the computational method) - Communicate examples of relevant, domain-knowledge informed relations
  • 25. Modify parameters and computation domain Define analytical expectations Dimension Reduction 11 10 Clustering 18 8 Classification 9 4 Regression / correlation 5 4 A total of 69 papers
  • 26. Modify parameters and computation domain Define analytical expectations Dimension Reduction Clustering Classification Regression / correlation
  • 27. Dimension Reduction - Modify parameters and computation domain • Steer the computation to where it matters • Assist reduction with user-defined quality • Create “user-defined” local projections [Jeong et al., 2009] [Turkay et al. , 2011] [Williams and Munzner , 2006] [Johansson and Johansson, 2009]
  • 28. Clustering- Modify parameters and computation domain • Multiple clustering algorithms with multiple parameters • Compare over quality metrics / (dis)similarities [Seo & Shneiderman, 2002] [Lex et al., 2010][Schreck et al., 2009]
  • 29. Classification - Modify parameters and computation domain • Embedded classification methods • Reducing search space / Interactively labelling data • Interactively generating the classification structures • Evaluate ensembles [Choo et al., 2010] [Elzen & Wijk, 2010] [KRAUSE et al., 2014]
  • 30. Regression / Correlation - Modify parameters and computation domain • Interactive visual validation of models • Visual representations to derive explanations • Selecting feature subsets – “local” models [Muhlbacher and Piringer, 2013] [Klemm et al., 2016] [Malik et al., 2012]
  • 31. Modify parameters and computation domain Define analytical expectations Dimension Reduction Clustering Classification Regression / correlation
  • 32. Dimension Reduction - Define analytical expectations • Dimension reduction is a suitable candidate given their unsupervised nature • Observation-level interactions (Endert et al. 2011) • Re-compute based on user expectations [Endert et al., 2011][Kwon et al., 2016] [Hu et al., 2013]
  • 33. Clustering- Define analytical expectations • Interactively introduce grouping constraints • Results are “user-optimized”, e.g., these items are (not) similar • Learn-by-example • Iterative refinement through interaction [HOSSAIN et al., 2012] [Choo et al., 2013]
  • 34. Classification- Define analytical expectations • Classification tasks are suitable for methods where users communicate known/expected/wrong classification results back to the algorithm • Iteratively learns user-preferences • Relevance feedback & Model learning [Behrisch et al., 2014] [Schreck et al., 2009]
  • 35. Regression/Correlation - Define analytical expectations • Not a populated field • Some examples in Ensemble Simulation analysis • Selecting ”targets” interactively [Matkovic et al., 2008]
  • 36. Application Areas Areas representing unique and important challenges, thus different combinations of interactive visualizations and ML techniques are used: • Text Analytics and Topic Modelling • Multimedia Visual Analytics • Streaming Data: Finance, Cyber Security, Social Media • Biological Data
  • 38. Balancing Human and Machine Effort, Responsibility, and Tasks • For mixed-initiative systems, it is a common notion that there exists a balance of effort between the user and the machine • Decomposing larger task into subtasks and assign to the person, or more quickly to be performed by the system • Well-defined and quantitative (i.e., solved by computation) • Subjective and less formally defined Challenges: • Not clear the extent to which tasks should be divided • Need to measure the amount of effort expended by both the user and the system
  • 39. Creating and Training Models from User Interaction Data • User interaction logs contain rich information about the process and interests of the user • Opportunity exists for ML techniques to leverage the real-time user interaction data generated from the analysts using the system to steer the computation • Two broad models can be created • Data models: • weighted data items and attributes (but form inferences from user interaction) • User Models: • computational approximations of (the state of) the user (e.g., cognitive load, personality traits)
  • 40. Complex Computation Systems & Automation Surprise • many inter-related and inter-dependent “black boxes” of automated components • difficult to know what input leads to what output • interactions between automated “black boxes” can create automation surprises • Leads to: • error • loss of trust the technology
  • 41. Visualizing Intermediate Results and Computational Process • a.k.a. Progressive Analytics (Stolper et al., 2014) • Many kinds of ML algorithms undergo a continuous convergence process towards the final solution • Rendering visualizations of the intermediate results during convergence • Steerable ML algorithms: • Prior knowledge on the relevance of features • Insight on the similarities between items • Prior knowledge on class information
  • 42. To conclude • Already a good level of integration within the domains • Only a small subset of ML techniques incorporated • VIS is slow in catching up with advanced ML techniques • Increasing awareness/interest in the ML domain • Several problems and opportunities • Formalizing and establishing steerable ML • Better determine how tasks should be divided between humans and machines • Bridging the two communities further
  • 43. Alex Endert, William Ribarsky, Cagatay Turkay, William Wong, Ian Nabney, Ignacio Díaz Blanco, Fabrice Rossi The State of the Art in Integrating Machine Learning into Visual Analytics