Slides for my talk on our paper at EuroVis 2017 on the STAR track:
Endert, A., Ribarsky, W., Turkay, C., Wong, B.L., Nabney, I., Blanco, I.D. and Rossi, F., 2017, March. The state of the art in integrating machine learning into visual analytics. In Computer Graphics Forum.
http://openaccess.city.ac.uk/16739/
The state of the art in integrating machine learning into visual analytics
1. Alex Endert, William Ribarsky,
Cagatay Turkay, William Wong,
Ian Nabney, Ignacio Díaz Blanco,
Fabrice Rossi
The State of the Art in
Integrating Machine Learning into
Visual Analytics
2.
3. Dagstuhl -- Bridging Information Visualization with
Machine Learning (1-6 March, 2015)
4. This STAR …
Advances made at the intersection of Visual Analytics (VA) and
Machine Learning (ML)
Describes the extent to which machine learning methods are utilized in
visual analytics
Discuss open challenges and opportunities for both communities
Initiated by discussions on the “Role of the User” at the workshop
5. Why ML + VA ?
• Reasoning about data complicated and difficult as data scales and
complexities increase
• Powerful tools to draw valid conclusions from data, while maintaining
trustworthy and interpretable results
VA and ML have complementing strengths and weaknesses
6. Report Structure
• Categories of models and frameworks to describe the cognitive stages
people progress through data analysis
• Overview of existing ML and VA integrations
• Overview of application domains
• Open challenges and opportunities for ML and VA domains and
communities
7. Review methodology
• Existing literature on the integration of ML and VIS from three
different perspectives:
• Models and frameworks,
• Techniques
• Application areas
• Resources from both the visualisation and machine learning
domain
8. Review methodology
On VIS side (major resources and starting points):
Journals (major resources):
IEEE Transactions on Visualization and Computer Graphics
Computer Graphics Forum
IEEE Computer Graphics and Applications
Information Visualization
Conferences (major resources):
IEEE Visual Analytics Science and Technology
IEEE Symposium on Information Visualization
EuroVis
IEEE Pacific Visualization Symposium (PacificVis)
EuroVis workshop on Visual Analytics (EuroVA)
9. Review methodology
On ML side (major resources and starting points):
Journals :
Journal of Machine Learning Research
Neurocomputing
IEEE Transactions on Knowledge and Data Engineering
Conferences :
International Conference on Machine Learning (ICML)
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
European Symposium on Artificial Neural Networks, Computational Intelligence and
Machine Learning (ESANN)
10. Review methodology
• 133 papers on VIS side resources
• 46 on ML side resources
• Total 186 papers cited (including others)
• Later – 69 papers under the Existing Techniques taxonomy
11. Rest of today
PART-1: MODELS & FRAMEWORKS
PART-2: EXISTING TECHNIQUES OF ML + VA (& APPLICATIONS)
PART-3: OPPORTUNITIES & CHALLENGES
13. Categories of models and frameworks
• Models and frameworks to describe the cognitive stages
people progress through data analysis
• Guidance and context for designing solutions
• Reviewed under three categories:
• Sensemaking and Knowledge Discovery (Human Reasoning)
• Interactivity in Visual Analytics
• Machine Learning
14. Sensemaking and Knowledge Discovery (Human Reasoning)
• Models in this category helps us in “Understanding the cognitive
processes of people as they reason about data”
• Provide foundational basis and design guidelines for visual analytics
[PIROLLI & CARD, 2005]
[SACHA et al., 2014]
[KLEIN et al., 2006]
15. Sensemaking loop by Pirolli and Card
[PIROLLI & CARD, 2005]
Sensemaking in two primary phases
foraging and synthesis
synthesis more “cognitively intensive”
Criticised for being linear despite the loops
16. The Data-Frame Model of Sense-making by Klein et al.
Exchange of information between the
human and the data in terms of frames
Data connects with the frame
Elaboration to strengthen the frame
Reframe to augmented or to
create a new one
designing visual analytics in a way that
encourages elaboration and reframing
[KLEIN et al., 2006]
17. Human-Computer knowledge generation model by Sacha et al.
[SACHA et al., 2014]
Much more concrete and explicit roles for human and computer
Three loops: exploration, verification, and knowledge generation
Two main roles for ML:
- Transform unstructured or semi-structured data into a form more meaningful for
human exploration and insight discovery.
- Unsupervised/semi-supervised ML to guide the analysis, e.g., best visualizations,
sequences of steps in the exploration, verification, or knowledge generation processes
18. Models in “Interactivity in Visual Analytics”
[ENDERT et al., 2012]
[HEER, 2006]
how data characteristics are extracted and
assigned visual attributes or encodings,
ultimately creating a visualization
Semantic interaction as an approach
in which analytical reasoning of the
user is inferred and in turn used to
steer the underlying models implicitly
19. Machine Learning Models and Frameworks
[ CRISP-DM by Shearer, 2000 ]
comparable to knowledge discovery and
visual analytics frameworks
No clear room for user apart from
continuous feedback and evaluation
(mostly deployment related)
20. Interactive Machine Learning &
Active Learning
… ML algorithms are able to determine
interesting inputs for which they
do not know the desired outputs (in the
training set), in such a way that given those outputs the
predictive performances of the model would greatly improve
[FAILS et al., 2003]
Machine Learning Models and Frameworks
23. First perspective: types of ML algorithms
• Considered analytical tasks that often require the joint capabilities of
computation and user expertise
• Types of ML algorithms that have been considered within visual
analytics literature (other groupings possible):
• Dimension reduction
• Clustering
• Classification
• Regression/correlation analysis (Prediction)
24. Second perspective: Interaction intent
• Characterise why interaction takes place?
• Resonates with the “user intent” categories suggested by Yi et al., 2007 but posed
at a higher-level:
Modify parameters and computation domain:
- Modifying the parameters of an algorithm or changing the algorithm
- Defining the measures used in the computations
- Modify the computational domain to which the algorithm is applied
Define analytical expectations:
- Communicate expected results (to the computational method)
- Communicate examples of relevant, domain-knowledge informed relations
25. Modify parameters and
computation domain
Define analytical
expectations
Dimension
Reduction
11 10
Clustering 18 8
Classification 9 4
Regression /
correlation
5 4
A total of 69 papers
27. Dimension Reduction - Modify parameters and computation domain
• Steer the computation to where it matters
• Assist reduction with user-defined quality
• Create “user-defined” local projections
[Jeong et al., 2009]
[Turkay et al. , 2011]
[Williams and Munzner , 2006]
[Johansson and Johansson, 2009]
28. Clustering- Modify parameters and computation domain
• Multiple clustering algorithms with multiple parameters
• Compare over quality metrics / (dis)similarities
[Seo & Shneiderman, 2002]
[Lex et al., 2010][Schreck et al., 2009]
29. Classification - Modify parameters and computation domain
• Embedded classification methods
• Reducing search space / Interactively labelling data
• Interactively generating the classification structures
• Evaluate ensembles
[Choo et al., 2010]
[Elzen & Wijk, 2010]
[KRAUSE et al., 2014]
30. Regression / Correlation - Modify parameters and computation domain
• Interactive visual validation of models
• Visual representations to derive explanations
• Selecting feature subsets – “local” models
[Muhlbacher and Piringer, 2013] [Klemm et al., 2016]
[Malik et al., 2012]
32. Dimension Reduction - Define analytical expectations
• Dimension reduction is a suitable candidate given their unsupervised
nature
• Observation-level interactions (Endert et al. 2011)
• Re-compute based on user expectations
[Endert et al., 2011][Kwon et al., 2016]
[Hu et al., 2013]
33. Clustering- Define analytical expectations
• Interactively introduce grouping constraints
• Results are “user-optimized”, e.g., these items are (not) similar
• Learn-by-example
• Iterative refinement through interaction
[HOSSAIN et al., 2012]
[Choo et al., 2013]
34. Classification- Define analytical expectations
• Classification tasks are suitable for methods where users communicate
known/expected/wrong classification results back to the algorithm
• Iteratively learns user-preferences
• Relevance feedback & Model learning
[Behrisch et al., 2014]
[Schreck et al., 2009]
35. Regression/Correlation - Define analytical expectations
• Not a populated field
• Some examples in Ensemble Simulation analysis
• Selecting ”targets” interactively
[Matkovic et al., 2008]
36. Application Areas
Areas representing unique and important challenges, thus different
combinations of interactive visualizations and ML techniques are used:
• Text Analytics and Topic Modelling
• Multimedia Visual Analytics
• Streaming Data: Finance, Cyber Security, Social Media
• Biological Data
38. Balancing Human and Machine Effort, Responsibility, and Tasks
• For mixed-initiative systems, it is a common notion that there exists a
balance of effort between the user and the machine
• Decomposing larger task into subtasks and assign to the person, or
more quickly to be performed by the system
• Well-defined and quantitative (i.e., solved by computation)
• Subjective and less formally defined
Challenges:
• Not clear the extent to which tasks should be divided
• Need to measure the amount of effort expended by both the user and
the system
39. Creating and Training Models from User Interaction Data
• User interaction logs contain rich information about the process and
interests of the user
• Opportunity exists for ML techniques to leverage the real-time user
interaction data generated from the analysts using the system to steer
the computation
• Two broad models can be created
• Data models:
• weighted data items and attributes (but form inferences from user interaction)
• User Models:
• computational approximations of (the state of) the user (e.g., cognitive load, personality
traits)
40. Complex Computation Systems & Automation Surprise
• many inter-related and inter-dependent “black boxes” of automated
components
• difficult to know what input leads to what output
• interactions between automated “black boxes” can create automation
surprises
• Leads to:
• error
• loss of trust the technology
41. Visualizing Intermediate Results and Computational Process
• a.k.a. Progressive Analytics (Stolper et al., 2014)
• Many kinds of ML algorithms undergo a continuous
convergence process towards the final solution
• Rendering visualizations of the intermediate results during
convergence
• Steerable ML algorithms:
• Prior knowledge on the relevance of features
• Insight on the similarities between items
• Prior knowledge on class information
42. To conclude
• Already a good level of integration within the domains
• Only a small subset of ML techniques incorporated
• VIS is slow in catching up with advanced ML techniques
• Increasing awareness/interest in the ML domain
• Several problems and opportunities
• Formalizing and establishing steerable ML
• Better determine how tasks should be divided between humans and machines
• Bridging the two communities further
43. Alex Endert, William Ribarsky,
Cagatay Turkay, William Wong,
Ian Nabney, Ignacio Díaz Blanco,
Fabrice Rossi
The State of the Art in
Integrating Machine Learning into
Visual Analytics