SlideShare a Scribd company logo
1 of 30
Download to read offline
1
An incremental learning method to support the
annotation of workflows with data-to-data relations
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta
Feedback: @enridaga
20th International Conference on Knowledge Engineering
and Knowledge Management
Bologna, Italy
19-23 November 2016
http://link.springer.com/chapter/10.1007/978-3-319-49004-5_9
“LipidMaps Query”
from http://
www.myexperiment.org
/workflows/1052
Workflow models are
focused on actions, to
support multiple and
parametric executions
There are scenarios in

which we need to 

focus on the data…
… and understand how

the data is affected by

the actions of the
workflow.
Data flow (DF): to
express the

implications of the
actions on the data.
Datanode, a taxonomy

of the relations between

data objects, used for
example to
support reasoning on
policy propagation
http://purl.org/datanode/ns/
Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Propagation of policies in rich data flows.
In: Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
http://doi.acm.org/10. 1145/2815833.2815839 

8
Our objective is to derive such data flows from the
representation of existing workflows.
9
APPROACH: to learn how to label data-to-data relations
using the description of the actions in the workflow.
ASSUMPTION: there is a correlation between the features
of a workflow action and the labels.
PROBLEM: Cold start - this requires a pre-existing training
set, that we do not have!
10
Incremental learning method
11
HYPOTHESIS: the quality of the recommendations
improves in time
12
13
WORKFLOW to DATA FLOW
Arcs
=
I/O port pairs (1->3 ; 2->3)
1234 Workflows from www.myexperiments.org = 30612 I/O port pairs
14
FEATURES
Direct:
About the ports and
processors involved:
ids, data types,
annotations, scripts …
Derived:
From annotations: Bag
of words, NER/DBPedia
entities plus types and
categories.
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
Distribution:
(30612 I/O port pairs)
15
FEATURES
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
. This processor has three ports: two input ports (1 and 2) and one output port
e can translate this model into a graph connecting the data objects of the inputs
one of the output.
1. Sample of the features extracted for the IO port pair 1 ! 3 in the example
ure 3.
Type Value
From/FromPortName string
To/ToPortName split
Activity/ActivityConfField script
Activity/ActivityType http://ns.taverna.org.uk/2010/
activity/beanshell
Activity/ActivityName reformat list
Activity/ConfField/derivedFrom http://ns.taverna.org.uk/2010/
activity/localworker/org.embl.
ebi.escience.scuflworkers.java.
SplitByRegex
Activity/ConfField/script List split = new ArrayList();if
(!string.equals(””)) { String regexString =
”,”; if (regex != void) ...
Processor/ProcessorType Processor
Processor/ProcessorName reformat list
owever, the objective of these feature sets is to support the clustering of
nnotated IO port pair through finding similarities with IO port pairs to be
ated. At this stage of the study we performed a preliminary evaluation of
stribution of the features extracted. We discovered that very few of them
shared between a significant number of port pairs (see Figure 4). In order
rease the number of shared features we generated a set of derived fea-
by extracting bags of words from lexical feature values and by performing
d Entity Recognition on the features that constituted textual annotations
s and comments), when present. Moreover, from the extracted entities we
dded the related DBPedia categories and types as additional features. As
ple, Table 2 shows a sample of the bag of words and entities extracted from
atures listed in the previous Table 1.
An incremental learning method to support the annotation of workflow
Table 2. Example of derived features (bag of words and DBPedia entities)
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
<
1
>
Fig. 5. Distribution of featur
ing derived features).
3.3 Retrieval of association rules and generation of
recommendations
Direct: Derived:
Distribution:
(30612 I/O port pairs)
16
17
Formal Concept Analysis (FCA)
• FCA is a clustering method for association rule mining
• Lattice of ordered closed item sets - concepts
• Item: I/O port pair <-> features + annotations
• FCA Concept:
• Extent (I/O port pairs)
• Intent (features, annotations)
• Incremental lattice construction (Godin algorithm).
• Lattice is reconstructed on each item addition.

18
Step 0
At the beginning, the user adds a single item, without
support. The lattice contains a single concept.
19
Step 1
By adding new annotations, the lattice allows to derive
association rules.
(f1, f2, ..., fn) → (a1, a2, ..., an)
20
Step 2
By adding new annotations, the lattice grows…

allowing to generate recommendations.
(f1, f2, ..., fn) → (a1, a2, ..., an)
21
Step 3
By adding new annotations, the lattice grows…
allowing to generate more recommendations.
(f1, f2, ..., fn) → (a1, a2, ..., an)
22
Step 4
By adding new annotations, the lattice grows…
allowing to generate many recommendations.
(f1, f2, ..., fn) → (a1, a2, ..., an)
23
ASSOCIATION RULE MINING
Generating all association rules on each iteration is
expensive
We query the lattice to retrieve only rules applicable to
a given I/O port pair.
• only rules that have annotations in the rule consequence:
• This: (f1, f2, ..., fn) → (a1, a2, ..., an)
• Not these: (f1, f2, a6) → (f3, f4), (f1, f2, a6) → (f3, a4)
• avoid redundancies (select the best for a certain head)
• rank the rules according to: support, confidence and
relevance.
24
io6: f7,f8,f9,f10,f11,a?
(f7,f8) →(a0) (f8,f9) →(a2)
25
EVALUATION
• Expectation: the quality of the recommendations
improves in time.
• EXPERIMENT:
• Dinowolf (Datanode in workflows) 

http://github.com/enridaga/dinowolf 

Uses SCUFL2, Apache Taverna, Apache Lucene, DBPedia
Spotlight
• 6 users to annotate 20 workflows from
www.myexperiments.org for a total of 260 I/O
port pairs.
26
RESULTS
of selected recommendations. The vertical axis represents the score placing at
the top the first position. This confirms our hypothesis that the quality of rec-
ommendations increases, stabilizing within the upper region after a critical mass
of annotated items is produced, reflecting the same behavior observed in Fig. 7.
20 40 60 80 100 120 140 160 180 200 220 240 260
5s
20s
1m
5m
10m
Fig. 6. Evolution of the time spent by each user on a given annotation page of the tool
before a decision was made.
An Incremental Learning Method to Support the Annotation of Workflows 141
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 7. Progress of the ratio of annotations selected from recommendations.
Time required to make a choice:
Selections from recommendations:
Effort reduced.
Cold start problem tackled.
27
RESULTS
20 40 60 80 100 120 140 160 180 200 220 240 260
Fig. 7. Progress of the ratio of annotations selected from recommendations.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 8. Average rank of selected recommendations. The vertical axis represents the
score placing at the top the first position.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 9. Progress of the average relevance score of picked recommendations.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
Fig. 7. Progress of the ratio of annotations selected from recommendations.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 8. Average rank of selected recommendations. The vertical axis represents the
score placing at the top the first position.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 9. Progress of the average relevance score of picked recommendations.
Rank of selected recommendations:
Relevance score of selected recommendations:
Quality of recommendations increases.
28
CONCLUSIONS
• Supporting users on annotating workflows with data-to-data
relations with recommendations is problematic because of the lack
of an initial training set (cold start problem).  We tackled this issue
by means of an incremental learning process that leverages FCA
and an information retrieval approach to ARM.
• Future work:
• Integrate this approach in Data Hub metadata management to
support policy propagation.
• Study the quality and consistency of annotations.
• Agreement/disagreement between users.
• The solution is domain independent, can be applied to other
scenarios.
29
Thank you
Enrico Daga
Feedback: @enridaga
http://link.springer.com/chapter/10.1007/978-3-319-49004-5_9
30
REFERENCES
• Daga, E., d’Aquin, M., Adamou, A., Motta, E.: Addressing exploitability of smart city data.
In: 2016 IEEE Second International Smart Cities Conference (ISC2). IEEE (2016) 

• Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Describing semantic web applica- tions
through relations between data nodes. Technical report kmi-14-05, Knowledge Media
Institute, The Open University, Walton Hall, Milton Keynes (2014). http:// kmi.open.ac.uk/
publications/techreport/kmi-14-05 

• Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Propagation of policies in rich data flows.
In: Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015,
New York, NY, USA, pp. 5:1–5:8 (2015). http://doi.acm.org/10. 1145/2815833.2815839 

• Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on
galois (concept) lattices. Comput. Intell. 11(2), 246–267 (1995) 

• Poelmans,J.,Elzinga,P.,Viaene,S.,Dedene,G.:Formalconceptanalysisinknowl- edge
discovery: a survey. In: Croitoru, M., Ferŕe, S., Lukose, D. (eds.) ICCS 2010. LNCS (LNAI),
vol. 6208, pp. 139–153. Springer, Heidelberg (2010). doi:10.1007/ 978-3-642-14197-3 15

More Related Content

Viewers also liked

Waternomics: Business Models and Exploitation
Waternomics: Business Models and ExploitationWaternomics: Business Models and Exploitation
Waternomics: Business Models and ExploitationWaternomics
 
L'attractivité de Tourcoing
L'attractivité de TourcoingL'attractivité de Tourcoing
L'attractivité de TourcoingAurélie Constant
 
Valenciennes métropole - présentation stratégie numérique 2016-2018
Valenciennes métropole - présentation stratégie numérique 2016-2018Valenciennes métropole - présentation stratégie numérique 2016-2018
Valenciennes métropole - présentation stratégie numérique 2016-2018Rémi PASSARELLA
 
Nouvelle Forge - Valenciennes Métropole
Nouvelle Forge - Valenciennes MétropoleNouvelle Forge - Valenciennes Métropole
Nouvelle Forge - Valenciennes MétropoleLes Interconnectés
 
Ensur= let's get phygital
Ensur= let's get phygitalEnsur= let's get phygital
Ensur= let's get phygitalComarch
 
Waternomics: Overview of the Pilots Objectives, Measures and Outcomes
Waternomics: Overview of the Pilots Objectives, Measures and OutcomesWaternomics: Overview of the Pilots Objectives, Measures and Outcomes
Waternomics: Overview of the Pilots Objectives, Measures and OutcomesWaternomics
 
How network operators improve their efficiency and prepare for market consoli...
How network operators improve their efficiency and prepare for market consoli...How network operators improve their efficiency and prepare for market consoli...
How network operators improve their efficiency and prepare for market consoli...Comarch
 
How will virtual networks, controlled by software, impact OSS systems?
How will virtual networks, controlled by software, impact OSS systems?How will virtual networks, controlled by software, impact OSS systems?
How will virtual networks, controlled by software, impact OSS systems?Comarch
 
How to leverage loyalty data to generate deep customer segmentation?
How to leverage loyalty data to generate deep customer segmentation?How to leverage loyalty data to generate deep customer segmentation?
How to leverage loyalty data to generate deep customer segmentation?Comarch
 
Omnichannel experience and typical customer journeys
Omnichannel experience and typical customer journeysOmnichannel experience and typical customer journeys
Omnichannel experience and typical customer journeysComarch
 
Presentation on smart city 050214 v.3 ukti smart cities, smart living
Presentation on smart city 050214 v.3   ukti smart cities, smart livingPresentation on smart city 050214 v.3   ukti smart cities, smart living
Presentation on smart city 050214 v.3 ukti smart cities, smart livingBhc Kuala Lumpur
 

Viewers also liked (11)

Waternomics: Business Models and Exploitation
Waternomics: Business Models and ExploitationWaternomics: Business Models and Exploitation
Waternomics: Business Models and Exploitation
 
L'attractivité de Tourcoing
L'attractivité de TourcoingL'attractivité de Tourcoing
L'attractivité de Tourcoing
 
Valenciennes métropole - présentation stratégie numérique 2016-2018
Valenciennes métropole - présentation stratégie numérique 2016-2018Valenciennes métropole - présentation stratégie numérique 2016-2018
Valenciennes métropole - présentation stratégie numérique 2016-2018
 
Nouvelle Forge - Valenciennes Métropole
Nouvelle Forge - Valenciennes MétropoleNouvelle Forge - Valenciennes Métropole
Nouvelle Forge - Valenciennes Métropole
 
Ensur= let's get phygital
Ensur= let's get phygitalEnsur= let's get phygital
Ensur= let's get phygital
 
Waternomics: Overview of the Pilots Objectives, Measures and Outcomes
Waternomics: Overview of the Pilots Objectives, Measures and OutcomesWaternomics: Overview of the Pilots Objectives, Measures and Outcomes
Waternomics: Overview of the Pilots Objectives, Measures and Outcomes
 
How network operators improve their efficiency and prepare for market consoli...
How network operators improve their efficiency and prepare for market consoli...How network operators improve their efficiency and prepare for market consoli...
How network operators improve their efficiency and prepare for market consoli...
 
How will virtual networks, controlled by software, impact OSS systems?
How will virtual networks, controlled by software, impact OSS systems?How will virtual networks, controlled by software, impact OSS systems?
How will virtual networks, controlled by software, impact OSS systems?
 
How to leverage loyalty data to generate deep customer segmentation?
How to leverage loyalty data to generate deep customer segmentation?How to leverage loyalty data to generate deep customer segmentation?
How to leverage loyalty data to generate deep customer segmentation?
 
Omnichannel experience and typical customer journeys
Omnichannel experience and typical customer journeysOmnichannel experience and typical customer journeys
Omnichannel experience and typical customer journeys
 
Presentation on smart city 050214 v.3 ukti smart cities, smart living
Presentation on smart city 050214 v.3   ukti smart cities, smart livingPresentation on smart city 050214 v.3   ukti smart cities, smart living
Presentation on smart city 050214 v.3 ukti smart cities, smart living
 

More from Enrico Daga

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything projectEnrico Daga
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Enrico Daga
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchEnrico Daga
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Enrico Daga
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesEnrico Daga
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User StudyEnrico Daga
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 

More from Enrico Daga (18)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Recently uploaded

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 

Recently uploaded (20)

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

An incremental learning method to support the annotation of workflows with data-to-data relations

  • 1. 1 An incremental learning method to support the annotation of workflows with data-to-data relations Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta Feedback: @enridaga 20th International Conference on Knowledge Engineering and Knowledge Management Bologna, Italy 19-23 November 2016 http://link.springer.com/chapter/10.1007/978-3-319-49004-5_9
  • 3. Workflow models are focused on actions, to support multiple and parametric executions
  • 4. There are scenarios in
 which we need to 
 focus on the data…
  • 5. … and understand how
 the data is affected by
 the actions of the workflow.
  • 6. Data flow (DF): to express the
 implications of the actions on the data.
  • 7. Datanode, a taxonomy
 of the relations between
 data objects, used for example to support reasoning on policy propagation http://purl.org/datanode/ns/ Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Propagation of policies in rich data flows. In: Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015 http://doi.acm.org/10. 1145/2815833.2815839 

  • 8. 8 Our objective is to derive such data flows from the representation of existing workflows.
  • 9. 9 APPROACH: to learn how to label data-to-data relations using the description of the actions in the workflow. ASSUMPTION: there is a correlation between the features of a workflow action and the labels. PROBLEM: Cold start - this requires a pre-existing training set, that we do not have!
  • 11. 11 HYPOTHESIS: the quality of the recommendations improves in time
  • 12. 12
  • 13. 13 WORKFLOW to DATA FLOW Arcs = I/O port pairs (1->3 ; 2->3) 1234 Workflows from www.myexperiments.org = 30612 I/O port pairs
  • 14. 14 FEATURES Direct: About the ports and processors involved: ids, data types, annotations, scripts … Derived: From annotations: Bag of words, NER/DBPedia entities plus types and categories. An incremental learning method to support the annotation of workflows 7 Table 2. Example of derived features (bag of words and DBPedia entities) generated for the IO port pair 1 ! 3. Type Value From/FromPortName-word string To/ToPortName-word split From/FromLinkedPortDescription-word single From/FromLinkedPortDescription-word possibilities From/FromLinkedPortDescription-word orb From/FromLinkedPortDescription-word mass FromToPorts/DbPediaType wgs84:SpatialThing FromToPorts/DbPediaType resource:Text file FromToPorts/DbPediaType resource:Mass FromToPorts/DbPediaType Category:State functions FromToPorts/DbPediaType Category:Physical quantities FromToPorts/DbPediaType Category:Mathematical notation 80% 18% 2% < 10 10 ⇠ 100 > 100 Fig. 4. Distribution of features ex- tracted from the workflow descriptions. 68% 28% 4% < 10 10 ⇠ 100 > 100 Fig. 5. Distribution of features (includ- ing derived features). An incremental learning method to support the annotation of workflows 7 Table 2. Example of derived features (bag of words and DBPedia entities) generated for the IO port pair 1 ! 3. Type Value From/FromPortName-word string To/ToPortName-word split From/FromLinkedPortDescription-word single From/FromLinkedPortDescription-word possibilities From/FromLinkedPortDescription-word orb From/FromLinkedPortDescription-word mass FromToPorts/DbPediaType wgs84:SpatialThing FromToPorts/DbPediaType resource:Text file FromToPorts/DbPediaType resource:Mass FromToPorts/DbPediaType Category:State functions FromToPorts/DbPediaType Category:Physical quantities FromToPorts/DbPediaType Category:Mathematical notation 80% 18% 2% < 10 10 ⇠ 100 > 100 Fig. 4. Distribution of features ex- tracted from the workflow descriptions. 68% 28% 4% < 10 10 ⇠ 100 > 100 Fig. 5. Distribution of features (includ- ing derived features). Distribution: (30612 I/O port pairs)
  • 15. 15 FEATURES An incremental learning method to support the annotation of workflows 7 Table 2. Example of derived features (bag of words and DBPedia entities) generated for the IO port pair 1 ! 3. Type Value From/FromPortName-word string To/ToPortName-word split From/FromLinkedPortDescription-word single From/FromLinkedPortDescription-word possibilities From/FromLinkedPortDescription-word orb From/FromLinkedPortDescription-word mass FromToPorts/DbPediaType wgs84:SpatialThing FromToPorts/DbPediaType resource:Text file FromToPorts/DbPediaType resource:Mass FromToPorts/DbPediaType Category:State functions FromToPorts/DbPediaType Category:Physical quantities FromToPorts/DbPediaType Category:Mathematical notation 80% 18% 2% < 10 10 ⇠ 100 > 100 Fig. 4. Distribution of features ex- tracted from the workflow descriptions. 68% 28% 4% < 10 10 ⇠ 100 > 100 Fig. 5. Distribution of features (includ- ing derived features). An incremental learning method to support the annotation of workflows 7 Table 2. Example of derived features (bag of words and DBPedia entities) generated for the IO port pair 1 ! 3. Type Value From/FromPortName-word string To/ToPortName-word split From/FromLinkedPortDescription-word single From/FromLinkedPortDescription-word possibilities From/FromLinkedPortDescription-word orb From/FromLinkedPortDescription-word mass FromToPorts/DbPediaType wgs84:SpatialThing FromToPorts/DbPediaType resource:Text file FromToPorts/DbPediaType resource:Mass FromToPorts/DbPediaType Category:State functions FromToPorts/DbPediaType Category:Physical quantities FromToPorts/DbPediaType Category:Mathematical notation 80% 18% 2% < 10 10 ⇠ 100 > 100 Fig. 4. Distribution of features ex- tracted from the workflow descriptions. 68% 28% 4% < 10 10 ⇠ 100 > 100 Fig. 5. Distribution of features (includ- ing derived features). . This processor has three ports: two input ports (1 and 2) and one output port e can translate this model into a graph connecting the data objects of the inputs one of the output. 1. Sample of the features extracted for the IO port pair 1 ! 3 in the example ure 3. Type Value From/FromPortName string To/ToPortName split Activity/ActivityConfField script Activity/ActivityType http://ns.taverna.org.uk/2010/ activity/beanshell Activity/ActivityName reformat list Activity/ConfField/derivedFrom http://ns.taverna.org.uk/2010/ activity/localworker/org.embl. ebi.escience.scuflworkers.java. SplitByRegex Activity/ConfField/script List split = new ArrayList();if (!string.equals(””)) { String regexString = ”,”; if (regex != void) ... Processor/ProcessorType Processor Processor/ProcessorName reformat list owever, the objective of these feature sets is to support the clustering of nnotated IO port pair through finding similarities with IO port pairs to be ated. At this stage of the study we performed a preliminary evaluation of stribution of the features extracted. We discovered that very few of them shared between a significant number of port pairs (see Figure 4). In order rease the number of shared features we generated a set of derived fea- by extracting bags of words from lexical feature values and by performing d Entity Recognition on the features that constituted textual annotations s and comments), when present. Moreover, from the extracted entities we dded the related DBPedia categories and types as additional features. As ple, Table 2 shows a sample of the bag of words and entities extracted from atures listed in the previous Table 1. An incremental learning method to support the annotation of workflow Table 2. Example of derived features (bag of words and DBPedia entities) for the IO port pair 1 ! 3. Type Value From/FromPortName-word string To/ToPortName-word split From/FromLinkedPortDescription-word single From/FromLinkedPortDescription-word possibilities From/FromLinkedPortDescription-word orb From/FromLinkedPortDescription-word mass FromToPorts/DbPediaType wgs84:SpatialThing FromToPorts/DbPediaType resource:Text file FromToPorts/DbPediaType resource:Mass FromToPorts/DbPediaType Category:State functions FromToPorts/DbPediaType Category:Physical quantities FromToPorts/DbPediaType Category:Mathematical notation 80% 18% 2% < 10 10 ⇠ 100 > 100 Fig. 4. Distribution of features ex- tracted from the workflow descriptions. 68% 28% 4% < 1 > Fig. 5. Distribution of featur ing derived features). 3.3 Retrieval of association rules and generation of recommendations Direct: Derived: Distribution: (30612 I/O port pairs)
  • 16. 16
  • 17. 17 Formal Concept Analysis (FCA) • FCA is a clustering method for association rule mining • Lattice of ordered closed item sets - concepts • Item: I/O port pair <-> features + annotations • FCA Concept: • Extent (I/O port pairs) • Intent (features, annotations) • Incremental lattice construction (Godin algorithm). • Lattice is reconstructed on each item addition.

  • 18. 18 Step 0 At the beginning, the user adds a single item, without support. The lattice contains a single concept.
  • 19. 19 Step 1 By adding new annotations, the lattice allows to derive association rules. (f1, f2, ..., fn) → (a1, a2, ..., an)
  • 20. 20 Step 2 By adding new annotations, the lattice grows…
 allowing to generate recommendations. (f1, f2, ..., fn) → (a1, a2, ..., an)
  • 21. 21 Step 3 By adding new annotations, the lattice grows… allowing to generate more recommendations. (f1, f2, ..., fn) → (a1, a2, ..., an)
  • 22. 22 Step 4 By adding new annotations, the lattice grows… allowing to generate many recommendations. (f1, f2, ..., fn) → (a1, a2, ..., an)
  • 23. 23 ASSOCIATION RULE MINING Generating all association rules on each iteration is expensive We query the lattice to retrieve only rules applicable to a given I/O port pair. • only rules that have annotations in the rule consequence: • This: (f1, f2, ..., fn) → (a1, a2, ..., an) • Not these: (f1, f2, a6) → (f3, f4), (f1, f2, a6) → (f3, a4) • avoid redundancies (select the best for a certain head) • rank the rules according to: support, confidence and relevance.
  • 25. 25 EVALUATION • Expectation: the quality of the recommendations improves in time. • EXPERIMENT: • Dinowolf (Datanode in workflows) 
 http://github.com/enridaga/dinowolf 
 Uses SCUFL2, Apache Taverna, Apache Lucene, DBPedia Spotlight • 6 users to annotate 20 workflows from www.myexperiments.org for a total of 260 I/O port pairs.
  • 26. 26 RESULTS of selected recommendations. The vertical axis represents the score placing at the top the first position. This confirms our hypothesis that the quality of rec- ommendations increases, stabilizing within the upper region after a critical mass of annotated items is produced, reflecting the same behavior observed in Fig. 7. 20 40 60 80 100 120 140 160 180 200 220 240 260 5s 20s 1m 5m 10m Fig. 6. Evolution of the time spent by each user on a given annotation page of the tool before a decision was made. An Incremental Learning Method to Support the Annotation of Workflows 141 20 40 60 80 100 120 140 160 180 200 220 240 260 0.0 0.2 0.5 0.7 1.0 Fig. 7. Progress of the ratio of annotations selected from recommendations. Time required to make a choice: Selections from recommendations: Effort reduced. Cold start problem tackled.
  • 27. 27 RESULTS 20 40 60 80 100 120 140 160 180 200 220 240 260 Fig. 7. Progress of the ratio of annotations selected from recommendations. 20 40 60 80 100 120 140 160 180 200 220 240 260 0.0 0.2 0.5 0.7 1.0 Fig. 8. Average rank of selected recommendations. The vertical axis represents the score placing at the top the first position. 20 40 60 80 100 120 140 160 180 200 220 240 260 0.0 0.2 0.5 0.7 1.0 Fig. 9. Progress of the average relevance score of picked recommendations. 20 40 60 80 100 120 140 160 180 200 220 240 260 0.0 0.2 Fig. 7. Progress of the ratio of annotations selected from recommendations. 20 40 60 80 100 120 140 160 180 200 220 240 260 0.0 0.2 0.5 0.7 1.0 Fig. 8. Average rank of selected recommendations. The vertical axis represents the score placing at the top the first position. 20 40 60 80 100 120 140 160 180 200 220 240 260 0.0 0.2 0.5 0.7 1.0 Fig. 9. Progress of the average relevance score of picked recommendations. Rank of selected recommendations: Relevance score of selected recommendations: Quality of recommendations increases.
  • 28. 28 CONCLUSIONS • Supporting users on annotating workflows with data-to-data relations with recommendations is problematic because of the lack of an initial training set (cold start problem).  We tackled this issue by means of an incremental learning process that leverages FCA and an information retrieval approach to ARM. • Future work: • Integrate this approach in Data Hub metadata management to support policy propagation. • Study the quality and consistency of annotations. • Agreement/disagreement between users. • The solution is domain independent, can be applied to other scenarios.
  • 29. 29 Thank you Enrico Daga Feedback: @enridaga http://link.springer.com/chapter/10.1007/978-3-319-49004-5_9
  • 30. 30 REFERENCES • Daga, E., d’Aquin, M., Adamou, A., Motta, E.: Addressing exploitability of smart city data. In: 2016 IEEE Second International Smart Cities Conference (ISC2). IEEE (2016) 
 • Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Describing semantic web applica- tions through relations between data nodes. Technical report kmi-14-05, Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes (2014). http:// kmi.open.ac.uk/ publications/techreport/kmi-14-05 
 • Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Propagation of policies in rich data flows. In: Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015, New York, NY, USA, pp. 5:1–5:8 (2015). http://doi.acm.org/10. 1145/2815833.2815839 
 • Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on galois (concept) lattices. Comput. Intell. 11(2), 246–267 (1995) 
 • Poelmans,J.,Elzinga,P.,Viaene,S.,Dedene,G.:Formalconceptanalysisinknowl- edge discovery: a survey. In: Croitoru, M., Ferŕe, S., Lukose, D. (eds.) ICCS 2010. LNCS (LNAI), vol. 6208, pp. 139–153. Springer, Heidelberg (2010). doi:10.1007/ 978-3-642-14197-3 15