The inherent complexity of ontologies poses a number of cognitive and perceptual challenges for ontology authors. We investigate how users deal with the complexity of the authoring process by analysing how one of the most widespread ontology development tools (i.e. Protégé) is used. To do so, we build Protégé4US (Protégé for User Studies) by extending Protégé in order to generate log files that contain ontology authoring events. These log files not only contain data about the interaction with the environment, but also about OWL entities and axioms. We illustrate the usefulness of Protégé4US with a case study with 15 participants. The data generated from the study allows us to know more about how Protégé is used (e.g. most frequently used tabs), how well users perform (e.g. task completion times) and identify emergent authoring strategies, including moving down the class hierarchy or saving the cur- rent workspace before running the reasoner. We argue that Protégé4US is an valuable instrument to identify ontology authoring patterns.
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Protégé4US: Harvesting Ontology Authoring Data with Protégé
1. Protégé4US:
Harvesting Ontology Authoring Data
with Protégé
Markel Vigo, Caroline Jay, Robert Stevens
firstname.lastname@manchester.ac.uk
@markelvigo, @CarolineEJay, @stevensrd65
Workshop on Human-Semantic Web Interaction, HSWI 2014 May 26. Crete (Greece)
2. Little is known about the human factors
of ontology authoring
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• What we know is based on anecdotal evidence
• We asked about problems and strategies
– Making sense
– Search and retrieval
– Efficient population
– On-the-fly reasoning
– Overloaded explanations
– Lack of evaluation methods
Design insights for the next wave
ontology authoring tools.
CHI 2014
http://dx.doi.org/10.1145/2556288.2557284
Introduction > Protégé4US > Results > Conclusion
3. Lack of sound HCI research in the
Semantic Web
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• HCI does not pervade all computing disciplines
• Instruments to run user studies are scarce
• Consequences for the OWL realm
– No understanding about the authoring process
– Authoring tools are not human-centered
• What if we want to go further?
– Automatic detection of authoring patterns
– Intelligent support for authoring
Introduction > Protégé4US > Results > Conclusion
4. Protégé4US: a step towards having
observational instruments
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Protégé4US: Protégé for User Studies
• Logging capabilities of:
– Interaction events: click, hover, expand hierarchy...
– Authoring events: add siblings, add restrictions...
– Environment commands: reason, search, undo...
76585,2,Classes,Element edited,Juliette subclass of: Potato and hasCroppingTime some ’Main
cropping’
77786,3,Classes,Save ontology,http://owl.cs.manchester.ac.uk/ontology/start-here.owl
80204,3,Classes,Reasoner invoked,HermiT 1.3.8
80647,1,Classes,Mouse entered, Class hierarchy (inferred)
82910,1,Classes,Element hovered,Early_cropping_potato
83049,1,Classes,Element selected,Early_cropping_potato
83661,1,Classes,Hierarchy expanded,Early_cropping_potato
Introduction > Protégé4US > Results > Conclusion
5. User study to show the strengths of
Protégé4US
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Experimental design:
– Participants: 15 expert authors
– Stimuli: a potato ontology and Protégé4US
– 3 authoring tasks with an increased complexity
• Collected data (apart from Protégé4US logs):
– Completion times
– Self reported expertise
– Perceived task difficulty
Introduction > Protégé4US > Results > Conclusion
6. Protégé4US in action
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
Introduction > Protégé4US > Results > Conclusion
7. Data analysis to check the intention of
the experimental design
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Tasks given had an increased complexity
– Completion times significantly different: T1<T2<<T3
– Task difficulty significantly different: T1<T2<T3
– Expertise with Protégé negatively correlated with
completion times for T1
– No correlation between completion times and
perceived difficulty
• Completion time indicator of editing action than
of cognitive difficulty
Introduction > Protégé4US > Results > Conclusion
8. Some events may be indicators of
problematic situations
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Positive and significant correlations between
task completion time and:
– Expansions of the class hierarchy
– Running the reasoner
– Renaming entities
• Log analysis allows to profile users based on
their tab use
Introduction > Protégé4US > Results > Conclusion
9. Reconstructing the interaction allows to
identify patterns through visualisation
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Web diagrams show the most frequent
transitions between statesP15 log
Back
Check property
Class addition
Convert into
defined class:finished
Convert into
defined class:start
Entity renamed
Explanation invoked
Entity deleted
Entity dragged
Entity edited:finishEntity edited:start
Entity selected
Link clicked
Property addition
Reasoner finished
Reasoner invoked
Save
Set active ontology
Hierarchy collapsed
Hierarchy expanded Undo
P9 log
Back
Check property
Class addition
Convert into
defined class:finished
Convert into
defined class:start
Entity renamed
Explanation invoked
Entity deleted
Entity dragged
Entity edited:finishEntity edited:start
Entity selected
Link clicked
Property addition
Reasoner finished
Reasoner invoked
Save
Set active ontology
Hierarchy collapsed
Hierarchy expanded Undo
Introduction > Protégé4US > Results > Conclusion
10. Reconstructing the interaction allows to
identify patterns through visualisation
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Time diagrams show the authoring rhythm
Introduction > Protégé4US > Results > Conclusion
11. Reconstructing the interaction allows to
identify patterns through visualisation
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• The analysis of diagrams across users allows
to sketch decision trees
– Reasoner invoked after
1. A class is converted into a defined class
2. Ontology is saved
3. An entity is selected
– Hierarchy is expanded after
1. Reasoner finishes
2. Hierarchy is expanded
3. Entity addition is invoked
Introduction > Protégé4US > Results > Conclusion
12. Future work and Conclusions
Protégé4US: Harvesting Ontology Authoring Data with Protégé HSWI @ ESWC 2014
• Future work
– Statistical analysis of patterns and identification of
strategies
– Incorporation of eye-tracking data
– Ontology authoring into the wild
• Conclusions
– Protégé4US may give urgency to a more human-
centred Semantic Web
– Better tools for ontology and linked data authoring
Introduction > Protégé4US > Results > Conclusion
13. Protégé4US:
Harvesting Ontology Authoring Data
with Protégé
Markel Vigo, Caroline Jay, Robert Stevens
firstname.lastname@manchester.ac.uk
@markelvigo, @CarolineEJay, @stevensrd65
Workshop on Human-Semantic Web Interaction, HSWI 2014 May 26. Crete (Greece)
WhatIf: Answering “What if...” questions for Ontology Authoring.
EPSRC reference EP/J014176/1
Notas del editor
Final slide, same as 1st slide
1st bullet point: little systematic and replicable research.
Items in 2nd bullet point about the outcomes of the interview study with 15 authors:
Making sense: exploration, getting an overview and understanding the consequences of actions.
Search and retrieval: search on other ontologies and be able to retrieve, map and align.
Efficient population: adding large number of axioms or classes; also making minor edits on large ontologies
On-the-fly reasoning to test the latest modifications
Overloaded explanations make debugging hard
Evaluation: apart from being consistent, there is no way to know if the ontology meets its requirements. Competence questions, unit tests etc.
There is a note indicating where our CHI paper can be found.
Goal of the slide: reasons why HCI has not been embraced by the Semantic Web community
3rd bullet point is about how this lack of HCI reflects on the ontology realm. Perhaps it could be extended talking about the consequences of linked data.
4th bullet point conveys that this situation does not allow us to see beyond and be more ambitious.
Goal of the slide: we built Protégé4US to ameliorate the current situation.
1st bullet point: explain that Protégé4US is an instrumented version of Protégé 4.3
2nd bullet point explains the types of events we collect: interaction and authoring events and Protégé environment commands
The final item is not a bullet point but a text area with a sample of a log: 7 lines of a log containing 1) Timestamp, 2) a number indicating the type of event, 3) The active tab, 4) The event itself (eg. Reasoner invoked, Element selected, Hierachy expanded), and finally 5) the object of the event (eg, which reasoner has been invoked, which element has been selected, and where in the class hierarchy has the expansion occurred. The example shows the edition of the Juliette class, which is a subclass of the Potato class and hasCroppingTime some Main cropping. Then the ontology is saved, the reasoner is invoked and the mouse enters into the class hierarchy where a class is hovered and then selected and then the hierarchy is expanded on that element.
Goal of the slide: the objective of the study (for this paper) is a proof of concept and feasibility study of Protégé4US.
1st bullet point: brief summary of the study (participants, stimuli and tasks). Perhaps you want to expand on the potato ontology and tasks.
2nd bullet point what we collected apart from the Protégé4US log data: completion times, self reported expertise and task difficulty.
Just an embedded video of 30 seconds: first, the user adds a property (hasPreferredServingMethod some Salad) to the class of potatoes PinkFirApple. Second it runs the user runs the reasoner and goes to the inferred hierarchy tab, where the user clicks in a couple of entities. Gaze plots (in this case scan paths) indicate that after clicking on these entities of the class hierarchy the user looks at the description section of Protégé, which gives some extra information about the clicked entity.
Goal of the slide: show how Protégé4US allows to confirm the intentions of the experimental design. In our case, the intended increased difficulty of tasks
1st bullet point summarizes the analysis of the relationship between metrics: completion times, perceived difficulty and expertise metrics
2nd bullet point, based on the above, concludes that task complexity wasn’t due to cognitive demand but because of the number of edits. We can say that this sort of analysis helps to contextualise the outcomes of any study.
Goal of the slide: show how Protégé4US allows to confirm the intentions of the experimental design. In our case, the intended increased difficulty of tasks
1st bullet point allows to talk about how these events suggest that users were drilling down the hierarchy (as an indicator of disorientation) and how the reasoner is used time and again as a debugging strategy
2nd bullet point: these profiles are 1) Users who stick to 1 tab (Classes or Entities) and 2) Users who use 2 tabs, one of which is the Classes tab
Slide showing the web diagrams of two participants. In our web diagrams the nodes are situated in a circular fashion (clockwise) and are the states of our transition matrices. For us, the states are the events we consider meaningful such as Entity selected, Hierachy expanded, Save or Undo. Therefore in these web diagrams the nodes are the events and the edges are the transitions between events. The thicker an edge is the more frequent that particular transition has been logged. Reflexive transitions are allowed and are denoted by a thick circle in a state.
We can see there are some interesting patterns that confirm what we learned from our interview study: a) Strategies of search and overview: hierarchies are expanded after expanding hierarchies, entities are selected after selecting entities; b) Convert into a defined class, save and invoke the reasoner.
Slide showing the time diagrams of one participant. On the x-axis it’s time (a maximum of 35 minutes in this one), on the y-axis is the name of the event. Blocks are ordered sequentially, there is not any overlap on the x-axis. The blocks indicate the time elapsed between events and consequently periods of inactivity. That’s why the red dots on top of each block indicate mouse mouse movement.
Slide showing some of the patterns we observed, see the notes of the web diagrame slide.
All of the bullet points are self explaining. I ended mentioning linked data.