Workflow formalisations are often focused on the representation of a process with the primary objective to support execution. However, there are scenarios where what needs to be represented is the effect of the process on the data artefacts involved, for example when reasoning over the corresponding data policies. This can be achieved by annotating the workflow with the semantic relations that occur between these data artefacts. However, manually producing such annotations is difficult and time consuming. In this paper we introduce a method based on recommendations to support users in this task. Our approach is centred on an incremental rule association mining technique that allows to compensate the cold start problem due to the lack of a training set of annotated workflows. We discuss the implementation of a tool relying on this approach and how its application on an existing repository of workflows effectively enable the generation of such annotations.
--
Presented at
20th International Conference on Knowledge Engineering and Knowledge Management
Bologna, Italy
19-23 November 2016
http://link.springer.com/chapter/10.1007/978-3-319-49004-5_9
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
An incremental learning method to support the annotation of workflows with data-to-data relations
1. 1
An incremental learning method to support the
annotation of workflows with data-to-data relations
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta
Feedback: @enridaga
20th International Conference on Knowledge Engineering
and Knowledge Management
Bologna, Italy
19-23 November 2016
http://link.springer.com/chapter/10.1007/978-3-319-49004-5_9
5. … and understand how
the data is affected by
the actions of the
workflow.
6. Data flow (DF): to
express the
implications of the
actions on the data.
7. Datanode, a taxonomy
of the relations between
data objects, used for
example to
support reasoning on
policy propagation
http://purl.org/datanode/ns/
Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Propagation of policies in rich data flows.
In: Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015
http://doi.acm.org/10. 1145/2815833.2815839
8. 8
Our objective is to derive such data flows from the
representation of existing workflows.
9. 9
APPROACH: to learn how to label data-to-data relations
using the description of the actions in the workflow.
ASSUMPTION: there is a correlation between the features
of a workflow action and the labels.
PROBLEM: Cold start - this requires a pre-existing training
set, that we do not have!
13. 13
WORKFLOW to DATA FLOW
Arcs
=
I/O port pairs (1->3 ; 2->3)
1234 Workflows from www.myexperiments.org = 30612 I/O port pairs
14. 14
FEATURES
Direct:
About the ports and
processors involved:
ids, data types,
annotations, scripts …
Derived:
From annotations: Bag
of words, NER/DBPedia
entities plus types and
categories.
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
Distribution:
(30612 I/O port pairs)
15. 15
FEATURES
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
An incremental learning method to support the annotation of workflows 7
Table 2. Example of derived features (bag of words and DBPedia entities) generated
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
< 10
10 ⇠ 100
> 100
Fig. 5. Distribution of features (includ-
ing derived features).
. This processor has three ports: two input ports (1 and 2) and one output port
e can translate this model into a graph connecting the data objects of the inputs
one of the output.
1. Sample of the features extracted for the IO port pair 1 ! 3 in the example
ure 3.
Type Value
From/FromPortName string
To/ToPortName split
Activity/ActivityConfField script
Activity/ActivityType http://ns.taverna.org.uk/2010/
activity/beanshell
Activity/ActivityName reformat list
Activity/ConfField/derivedFrom http://ns.taverna.org.uk/2010/
activity/localworker/org.embl.
ebi.escience.scuflworkers.java.
SplitByRegex
Activity/ConfField/script List split = new ArrayList();if
(!string.equals(””)) { String regexString =
”,”; if (regex != void) ...
Processor/ProcessorType Processor
Processor/ProcessorName reformat list
owever, the objective of these feature sets is to support the clustering of
nnotated IO port pair through finding similarities with IO port pairs to be
ated. At this stage of the study we performed a preliminary evaluation of
stribution of the features extracted. We discovered that very few of them
shared between a significant number of port pairs (see Figure 4). In order
rease the number of shared features we generated a set of derived fea-
by extracting bags of words from lexical feature values and by performing
d Entity Recognition on the features that constituted textual annotations
s and comments), when present. Moreover, from the extracted entities we
dded the related DBPedia categories and types as additional features. As
ple, Table 2 shows a sample of the bag of words and entities extracted from
atures listed in the previous Table 1.
An incremental learning method to support the annotation of workflow
Table 2. Example of derived features (bag of words and DBPedia entities)
for the IO port pair 1 ! 3.
Type Value
From/FromPortName-word string
To/ToPortName-word split
From/FromLinkedPortDescription-word single
From/FromLinkedPortDescription-word possibilities
From/FromLinkedPortDescription-word orb
From/FromLinkedPortDescription-word mass
FromToPorts/DbPediaType wgs84:SpatialThing
FromToPorts/DbPediaType resource:Text file
FromToPorts/DbPediaType resource:Mass
FromToPorts/DbPediaType Category:State functions
FromToPorts/DbPediaType Category:Physical quantities
FromToPorts/DbPediaType Category:Mathematical notation
80%
18%
2%
< 10
10 ⇠ 100
> 100
Fig. 4. Distribution of features ex-
tracted from the workflow descriptions.
68%
28%
4%
<
1
>
Fig. 5. Distribution of featur
ing derived features).
3.3 Retrieval of association rules and generation of
recommendations
Direct: Derived:
Distribution:
(30612 I/O port pairs)
17. 17
Formal Concept Analysis (FCA)
• FCA is a clustering method for association rule mining
• Lattice of ordered closed item sets - concepts
• Item: I/O port pair <-> features + annotations
• FCA Concept:
• Extent (I/O port pairs)
• Intent (features, annotations)
• Incremental lattice construction (Godin algorithm).
• Lattice is reconstructed on each item addition.
18. 18
Step 0
At the beginning, the user adds a single item, without
support. The lattice contains a single concept.
19. 19
Step 1
By adding new annotations, the lattice allows to derive
association rules.
(f1, f2, ..., fn) → (a1, a2, ..., an)
20. 20
Step 2
By adding new annotations, the lattice grows…
allowing to generate recommendations.
(f1, f2, ..., fn) → (a1, a2, ..., an)
21. 21
Step 3
By adding new annotations, the lattice grows…
allowing to generate more recommendations.
(f1, f2, ..., fn) → (a1, a2, ..., an)
22. 22
Step 4
By adding new annotations, the lattice grows…
allowing to generate many recommendations.
(f1, f2, ..., fn) → (a1, a2, ..., an)
23. 23
ASSOCIATION RULE MINING
Generating all association rules on each iteration is
expensive
We query the lattice to retrieve only rules applicable to
a given I/O port pair.
• only rules that have annotations in the rule consequence:
• This: (f1, f2, ..., fn) → (a1, a2, ..., an)
• Not these: (f1, f2, a6) → (f3, f4), (f1, f2, a6) → (f3, a4)
• avoid redundancies (select the best for a certain head)
• rank the rules according to: support, confidence and
relevance.
25. 25
EVALUATION
• Expectation: the quality of the recommendations
improves in time.
• EXPERIMENT:
• Dinowolf (Datanode in workflows)
http://github.com/enridaga/dinowolf
Uses SCUFL2, Apache Taverna, Apache Lucene, DBPedia
Spotlight
• 6 users to annotate 20 workflows from
www.myexperiments.org for a total of 260 I/O
port pairs.
26. 26
RESULTS
of selected recommendations. The vertical axis represents the score placing at
the top the first position. This confirms our hypothesis that the quality of rec-
ommendations increases, stabilizing within the upper region after a critical mass
of annotated items is produced, reflecting the same behavior observed in Fig. 7.
20 40 60 80 100 120 140 160 180 200 220 240 260
5s
20s
1m
5m
10m
Fig. 6. Evolution of the time spent by each user on a given annotation page of the tool
before a decision was made.
An Incremental Learning Method to Support the Annotation of Workflows 141
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 7. Progress of the ratio of annotations selected from recommendations.
Time required to make a choice:
Selections from recommendations:
Effort reduced.
Cold start problem tackled.
27. 27
RESULTS
20 40 60 80 100 120 140 160 180 200 220 240 260
Fig. 7. Progress of the ratio of annotations selected from recommendations.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 8. Average rank of selected recommendations. The vertical axis represents the
score placing at the top the first position.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 9. Progress of the average relevance score of picked recommendations.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
Fig. 7. Progress of the ratio of annotations selected from recommendations.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 8. Average rank of selected recommendations. The vertical axis represents the
score placing at the top the first position.
20 40 60 80 100 120 140 160 180 200 220 240 260
0.0
0.2
0.5
0.7
1.0
Fig. 9. Progress of the average relevance score of picked recommendations.
Rank of selected recommendations:
Relevance score of selected recommendations:
Quality of recommendations increases.
28. 28
CONCLUSIONS
• Supporting users on annotating workflows with data-to-data
relations with recommendations is problematic because of the lack
of an initial training set (cold start problem). We tackled this issue
by means of an incremental learning process that leverages FCA
and an information retrieval approach to ARM.
• Future work:
• Integrate this approach in Data Hub metadata management to
support policy propagation.
• Study the quality and consistency of annotations.
• Agreement/disagreement between users.
• The solution is domain independent, can be applied to other
scenarios.
30. 30
REFERENCES
• Daga, E., d’Aquin, M., Adamou, A., Motta, E.: Addressing exploitability of smart city data.
In: 2016 IEEE Second International Smart Cities Conference (ISC2). IEEE (2016)
• Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Describing semantic web applica- tions
through relations between data nodes. Technical report kmi-14-05, Knowledge Media
Institute, The Open University, Walton Hall, Milton Keynes (2014). http:// kmi.open.ac.uk/
publications/techreport/kmi-14-05
• Daga, E., d’Aquin, M., Gangemi, A., Motta, E.: Propagation of policies in rich data flows.
In: Proceedings of the 8th International Conference on Knowledge Capture, K-CAP 2015,
New York, NY, USA, pp. 5:1–5:8 (2015). http://doi.acm.org/10. 1145/2815833.2815839
• Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on
galois (concept) lattices. Comput. Intell. 11(2), 246–267 (1995)
• Poelmans,J.,Elzinga,P.,Viaene,S.,Dedene,G.:Formalconceptanalysisinknowl- edge
discovery: a survey. In: Croitoru, M., Ferŕe, S., Lukose, D. (eds.) ICCS 2010. LNCS (LNAI),
vol. 6208, pp. 139–153. Springer, Heidelberg (2010). doi:10.1007/ 978-3-642-14197-3 15