SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
MEX Interfaces: Automating Machine
Learning Metadata Generation
D. Esteves, Pablo N. Mendes, D. Moussallem, J.C. Duarte, Amrapali Zaveri,
Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti
University of Leipzig
September 13, 2016
1
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
What’s metadata and why is it so important?
“Metadata is data that provides
information about other data”
- Data Management
- Meta Analysis (ML)
- Social Engines
- ...
2
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
=== Run information ===
Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode:evaluate on training data
...
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 Iris-
setosa
0.98 0.02 0.961 0.98 0.97 0.99 Iris-
versicolor
0.96 0.01 0.98 0.96 0.97 0.99 Iris-
virginica
Weighted Avg. 0.98 0.01 0.98 0.98 0.98 0.993
...
How costly is the metadata generation process?
3
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
How costly is the metadata generation process?
4
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Reproducible Research - in theory
1.“Data/Metadata publicly available”
2.“The computer code and all the computational
procedures should be available”
3.“Ideally the computer code will encompass all
of the steps of computational analysis”
Dr. Peng / Dr. Jeff Leek
5
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Reproducible Research - in practice
- Experiments are hard to reproduce, when not
impossible.
6
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
ML/DM Environments
7
LIBSVM
OpenML
IDEsFrameworks
Workflow Systems
Collaborative Env.
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Machine Learning Frameworks
8
Platform Advantage Drawbacks
MLF Front-end No (High)
Interoperability
No/low updates
delay
No much code
flexibility
(Low) Workflow
Management
LIBSVM
Frameworks
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Workflow Systems / Collaborative Environments
9
Platform Advantage Drawbacks
WFS High
Provenance
No (High)
Interoperability
Interoperabili
ty (*)
Updates are tool
dependent
Workflow
Management
No much code
flexibility
OpenML
Workflow Systems
Collaborative
Env.
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Integrated Development Environments / Libraries
10
Platform Advantage Drawbacks
IDE/MLL High code-
flexibility
Low Provenance
No learning
curve
Low
Interoperability
Data Management
costs more
IDEs
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Integrated Development Environments / Libraries
11
Platform Advantage Drawbacks
IDE/MLL High code-
flexibility
Low Provenance
No learning
curve
Low
Interoperability
Data Management
costs more
IDEs
motivation
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Research Question
1.How to export machine learning variables -
incomes/outcomes (IDEs)?
- Architecture
- Schema
2.What is the existing approach that
minimizes the coding effort (IDEs)?
12
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Existing solutions
13
Solution Advantage Drawbacks
stdout No Extra Coding
Effort Required
Lack of Provenance
Lack of Interoperability
Lack of Data Query Feature
DBMS Data Query
Feature
Extra Coding Effort
(Integration)
Lack of Provenance
Lack of Interoperability
Self-schema
Definition
Straightforward
Solution
Extra Coding Effort
Extra Analysis Effort
(modeling)
Lack of Provenance
Lack of Interoperability
IDEs
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Interfaces: from annotations to semantic metadata
14
Method Advantage Drawbacks
MEX
Interfaces
- Provenance
- Interoperabili
ty
- Data Query
Feature
- Automatic
Metadata
Generation
- Extra
Processing
Time
- Security
Issues (due
to
reflection)
IDEs
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Interfaces: from annotations to semantic metadata
1. Allow metadata
generation regardless of
the IDE, machine-
learning library and
context of the
experiment
15
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Generates metadata based on one of the state-of-the-
art vocabularies for machine learning
:exp_cf_1_2025644708_exe_2_algo a mexalgo:Algorithm ;
rdfs:label "Support Vector Machines" ;
mexalgo:hasAlgorithmClass mexalgo:SupportVectorMachines ;
mexalgo:hasHyperParameter :exp_cf_1_2025644708_exe_2_hyperpar_4, :exp_
cf_1_2025644708_exe_2_hyperpar_3, :exp_cf_1_2025644708_exe_2_hyperpar_2,
:exp_cf_1_2025644708_exe_2_hyperpar_1 ;
dct:identifier "svm".
...
MEX Vocabulary
16
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Vocabularies and Ontologies for ML/DM
17
MEX
Onto-DM
DMOP
Exposé
W3C ML-Schema
Community
Group
https://www.w3.org/community/ml-
schema/ KDD
MLS
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Vocabularies and Ontologies for ML/DM: MEX
18
Agnieszka Lawrynowicz et al. “The
Algorithm-Implementation-Execution
Ontology Design Pattern”, WOP2016.
Algorithm ⊑ InformationEntity
Implementation ⊑ InformationEntity
Implementation ⊑ ∃implements.Algorithm
Implementation ⊑
∃hasParameter.Parameter Execution ⊑
Process
Execution ⊑ ∃hasInput.ParameterSetting
Execution ⊑ ∃realizes.Algorithm
...
is a lightweight and flexible schema
for machine learning, based on PROV-
O
PROV-O: The PROV
Ontology
http://mex.aksw.org
https://www.w3.org/TR/prov-o/
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Reduces the overall coding effort
- No wrappers/adaptors/connectors do DBMS
- No schema development required
- Transparent process
MEX Interfaces: Features
19
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Transparent process (annotations and reflection)
Features: MEX Annotations
20
@ExperimentInfo(identifier = "e1", createdBy = "Esteves", email = "esteves@informatik.uni-
leipzig.de", title = "Weka Lib Example", tags = {"WEKA","J48", "DecisionTable", "MEX",
"Iris"})
@Hardware(cpu = MEXEnum.EnumProcessors.INTEL_COREI7, memory = MEXEnum.EnumRAM.SIZE_8GB,
hdType = "SSD")
@SamplingMethod(klass = MEXEnum.EnumSamplingMethods.CROSS_VALIDATION, trainSize = 0.5,
testSize = 0.5, folds = 10)
@InterfaceVersion(version = MEXEnum.EnumAnnotationInterfaceStyles.M1)
public class WekaExample001 { … }
java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out
mymex01.ttl
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Transparent process (annotations and reflection)
Features: MEX Annotations
21
@Algorithm(algorithmClass = MEXEnum.EnumAlgorithmsClasses.J48, algorithmID = "1",
algorithmName = "J48", algorithmURI = "http://weka.sourceforge.net/doc.dev/weka/
classifiers/trees/J48.html")
public J48 wekaJ48;
@Algorithm(algorithmClass = MEXEnum.EnumAlgorithmsClasses.PART, algorithmID = "2",
algorithmName = "PART", algorithmURI = "http://weka.sourceforge.net/doc.dev/weka/
classifiers/rules/PART.html")
public PART wekaPART;
@DatasetName public String ds = "iris.arff"; Instances data; ...
java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out
mymex01.ttl
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out
mymex01.ttl
- Transparent process (annotations and reflection)
@Measure(idMeasure = MEXEnum.EnumMeasures.ERROR) public List<Double> errors;
@Measure(idMeasure = MEXEnum.EnumMeasures.ACCURACY) public List<Double> accuracies;
@Start
public void myMainMethod(){ throws Exception {...}
Features: MEX Annotations
22
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Annotations Log
23
Starting the process: MetaGeneration -uc interfaces.WekaExample001 -out mymex01.ttl
[main] INFO org.aksw.mex.interfaces.MetaGeneration - ********************** MEX Interfaces **********************
[main] INFO org.aksw.mex.interfaces.MetaGeneration - http://mex.aksw.org
[main] INFO org.aksw.mex.interfaces.MetaGeneration - Starting the meta annotation for class named: WekaExample001
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @ExperimentInfo - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @Hardware - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @SamplingMethod - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - invoking the main method: start
[main] INFO interfaces.WekaExample001 - Accuracy of J48: 94.00% - Error: 6.00%
[main] INFO interfaces.WekaExample001 - Accuracy of PART: 90.67% - Error: 9.33%
[main] INFO interfaces.WekaExample001 - Accuracy of DecisionTable: 92.67% - Error: 7.33%
[main] INFO interfaces.WekaExample001 - Accuracy of DecisionStump: 36.67% - Error: 63.33%
[main] INFO org.aksw.mex.interfaces.MetaGeneration - invoking the features method: getFeatures
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @DataSet - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @Algorithm - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: starting to add executions...
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: nr. executions = 4
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D1
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D1 : 6.0
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D1 : 94.0
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D2
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D2 : 9.333333333333329
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D2 : 90.66666666666667
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D3
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D3 : 7.333333333333329
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D3 : 92.66666666666667
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D4
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D4 : 63.333333333333336
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D4 : 36.666666666666664
[main] WARN org.aksw.mex.log4mex.MEXSerializer - No model defined
[main] WARN org.aksw.mex.log4mex.MEXSerializer - No tool defined
[main] WARN org.aksw.mex.log4mex.MEXSerializer - No tool parameter defined
[main] INFO org.aksw.mex.interfaces.MetaGeneration - The MEX file has been successfully created: share it ;-)
[main] INFO org.aksw.mex.interfaces.MetaGeneration - process execution time (s): 1
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Metadata file example
:hardware a
mexcore:HardwareConfiguration,
prov:Entity ;
mexcore:cpu "Intel Core
i7" ;
mexcore:hd "SSD" ;
mexcore:memory "8 GB" .
:execution_3 a
mexcore:OverallExecution,
mexcore:group "true" ;
prov:id "3" ;
prov:used this:phaseTEST ;
prov:wasInformedBy
this:configuration1 .
...
:sampling a prov:Entity ,
mexcore:CrossValidation ;
mexcore:folds "10" ;
mexcore:sequential "true";
mexcore:testSize "0.5" ;
mexcore:trainSize "0.5" .
:feature4 a mexcore:Feature,
prov:Entity ; rdfs:label
"petalwidth" ;
dct:identifier "4" .
:measure3_1 a
mexperf:StatisticalMeasure;
mexperf:error "7.333333333333329"
;
prov:wasInformedBy
this:execution_3 .
...
Features: Output file sample
24
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Logging Library
- Transparent process (logging)
25
MyMEX mex = new MyMEX();
try{
mex.setAuthorName("D Esteves");
mex.setAuthorEmail("esteves@informatik.uni-leipzig.de");
mex.setOrganization("Leipzig University");
mex.setExperimentId("E001");
mex.setExperimentTitle("my first experiment");
… }
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Transparent process (logging)
try{ MEXSerializer.getInstance().saveToDisk("./metafiles/log4mex/
ex003", "http://mex.aksw.org/examples/", mex,
MEXConstant.EnumRDFFormats.TTL);
}catch (Exception e){
System.out.print(e.toString());}
Features: Logging Library
26
...
mex.Configuration().addExecution(EnumExecutionsType.OVERALL,
EnumPhases.TRAIN);
mex.Configuration().Execution(ex1).setAlgorithm(alg1);

mex.Configuration().Execution(ex1).addPerformance(EnumMeasures.ACCUR
ACY, x);
...
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Interfaces output x Weka default output
Features: Output file sample
27
=== Evaluation on training set ===
=== Summary ===
Correctly Classified Instances 147
98%
Incorrectly Classified Instances 3
2%
Kappa statistic 0.97
Mean absolute error
0.0233
Root mean squared error
0.108
Relative absolute error
5.2482%
Root relative squared error
22.9089%
Total Number of Instances 150
this:m11 a prov:Entity, mexperf:PerformanceMeasure;
dct:identifier "WekaPerformances";
mexperf:accuracy "0.9768"^^xsd:float;
mexperf:truePositive "147"^^xsd:integer;
mexperf:falsePositive "3"^^xsd:integer;
mexperf:kappaStatistics "0.97"^^xsd:float;
mexperf:meanAbsoluteError "0.0233"^^xsd:float;
mexperf:rootMeanSquaredError "0.108"^^xsd:float;
mexperf:relativeAbsoluteError
"0.052482"^^xsd:float;
mexperf:rootRelativeSquaredError
"0.0229089"^^xsd:float;
prov:wasGeneratedBy this:ep1;.
...
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Visualization (http://lodview.it/)
28
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
29
- ML Model that takes 3 days to be executed
- Iterates 300 times
- Produces/Has 5000 outcomes/incomes
- ….
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 30
- What are the top 4 configurations?
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
31
PREFIX mexcore: <http://mex.aksw.org/mex-core#>

PREFIX mexperf: <http://mex.aksw.org/mex-perf#>

PREFIX mexalgo: <http://mex.aksw.org/mex-algo#>

PREFIX prov: <http://www.w3.org/ns/prov#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?ExecutionID ?Algorithm ?Performance ?fMeasure WHERE {

?execution prov:used ?alg; prov:id ?ExecutionID.

?Performance prov:wasGeneratedBy ?execution.

?Performance mexperf:f1Measure ?fMeasure.

?alg a mexalgo:Algorithm.

?alg rdfs:label ?Algorithm.

} 

ORDER BY DESC (?fMeasure)

LIMIT 4
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
32
?ExecutionID ?Algorithm ?Performance ?fMeasure
"C0_MEX_EXEC_D44" "BaggingJ48" mea_clas_C0_MEX_EXEC_D44_cf_1_-568657719 0.9968
"C0_MEX_EXEC_D24" "Logistic
Model Trees"
mea_clas_C0_MEX_EXEC_D24_cf_1_-568657719 0.9952
"C0_MEX_EXEC_D16" "Random
Forest"
mea_clas_C0_MEX_EXEC_D16_cf_1_-568657719 0.9920
"C0_MEX_EXEC_D64" "Multilayer
Perceptron"
mea_clas_C0_MEX_EXEC_D64_cf_1_-568657719 0.99
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
33
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Conclusions and Future Work
Machine Learning Metadata Generation
- Generating high quality metadata is not a straightforward process
- Dealing with different outputs is not time-efficient
RQ1/RQ2 -> New methodology
- To automatize the process metadata generation for IDEs
- Data Management
- Based on one of the state-of-the-art vocabularies
Future Work
- Integrate others ML ontologies (ML-Schema)
- Analyse the coverage of the methodology with more machine learning
scenarios
- To create a more robust framework (e.g.: automatic pipelines based on
configuration files)
34
Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Conclusions and Future Work
Thank you!
Questions?
mex.aksw.org
35

Más contenido relacionado

Destacado

Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
semanticsconference
 

Destacado (20)

Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
 
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Umutcan Şimşek, Anna Fensel, Anastasios Zafeiropoulos, Eleni Fotopoulou, Pari...
Umutcan Şimşek, Anna Fensel, Anastasios Zafeiropoulos, Eleni Fotopoulou, Pari...Umutcan Şimşek, Anna Fensel, Anastasios Zafeiropoulos, Eleni Fotopoulou, Pari...
Umutcan Şimşek, Anna Fensel, Anastasios Zafeiropoulos, Eleni Fotopoulou, Pari...
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
 
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + NagelThomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
 
eHealth projects in Sierre – Khresmoi
eHealth projects in Sierre – KhresmoieHealth projects in Sierre – Khresmoi
eHealth projects in Sierre – Khresmoi
 
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
Gianluca Correndo, Simon Crowle, Juri Papay and Michael Boniface | Enhancing ...
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
 
Taxonomy: a powerful magnifier with a harsh lens
Taxonomy: a powerful magnifier with a harsh lensTaxonomy: a powerful magnifier with a harsh lens
Taxonomy: a powerful magnifier with a harsh lens
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 

Similar a Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation

Computer sci & applicat set syllabus
Computer sci & applicat set syllabusComputer sci & applicat set syllabus
Computer sci & applicat set syllabus
behappymdgotarkar
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)
Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)
Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)
WE-IT TUTORIALS
 

Similar a Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation (20)

Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
Exploring Emerging Technologies in the Extreme Scale HPC Co-Design Space with...
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experi...
MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experi...MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experi...
MEX Vocabulary - A Lightweight Interchange Format for Machine Learning Experi...
 
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
2014 IEEE JAVA DATA MINING PROJECT Xs path navigation on xml schemas made easy
 
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easyIEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
IEEE 2014 JAVA DATA MINING PROJECTS Xs path navigation on xml schemas made easy
 
Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
 Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
Redes de sensores sem fio autonômicas: abordagens, aplicações e desafios
 
++Matlab 14 sesiones
++Matlab 14 sesiones++Matlab 14 sesiones
++Matlab 14 sesiones
 
HYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMS
HYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMSHYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMS
HYBRID APPROACH TO DESIGN OF STORAGE ATTACHED NETWORK SIMULATION SYSTEMS
 
Computer sci & applicat set syllabus
Computer sci & applicat set syllabusComputer sci & applicat set syllabus
Computer sci & applicat set syllabus
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
ALT
ALTALT
ALT
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters Defense
 
Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)
Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)
Tycs sem 5 asp.net notes unit 1 2 3 4 (2017)
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Dsp file
Dsp fileDsp file
Dsp file
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Matlab.ppt
Matlab.pptMatlab.ppt
Matlab.ppt
 

Más de semanticsconference

Más de semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes &amp; smart grids: towards interoperability...
Session 1.3   energy, smart homes &amp; smart grids: towards interoperability...Session 1.3   energy, smart homes &amp; smart grids: towards interoperability...
Session 1.3 energy, smart homes &amp; smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding &amp; security – a police story
Session 2.3   semantics for safeguarding &amp; security – a police storySession 2.3   semantics for safeguarding &amp; security – a police story
Session 2.3 semantics for safeguarding &amp; security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation

  • 1. MEX Interfaces: Automating Machine Learning Metadata Generation D. Esteves, Pablo N. Mendes, D. Moussallem, J.C. Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti University of Leipzig September 13, 2016 1
  • 2. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 What’s metadata and why is it so important? “Metadata is data that provides information about other data” - Data Management - Meta Analysis (ML) - Social Engines - ... 2
  • 3. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 === Run information === Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: iris Instances: 150 Attributes: 5 sepallength sepalwidth petallength petalwidth class Test mode:evaluate on training data ... TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 Iris- setosa 0.98 0.02 0.961 0.98 0.97 0.99 Iris- versicolor 0.96 0.01 0.98 0.96 0.97 0.99 Iris- virginica Weighted Avg. 0.98 0.01 0.98 0.98 0.98 0.993 ... How costly is the metadata generation process? 3
  • 4. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 How costly is the metadata generation process? 4
  • 5. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Reproducible Research - in theory 1.“Data/Metadata publicly available” 2.“The computer code and all the computational procedures should be available” 3.“Ideally the computer code will encompass all of the steps of computational analysis” Dr. Peng / Dr. Jeff Leek 5
  • 6. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Reproducible Research - in practice - Experiments are hard to reproduce, when not impossible. 6
  • 7. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 ML/DM Environments 7 LIBSVM OpenML IDEsFrameworks Workflow Systems Collaborative Env.
  • 8. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Machine Learning Frameworks 8 Platform Advantage Drawbacks MLF Front-end No (High) Interoperability No/low updates delay No much code flexibility (Low) Workflow Management LIBSVM Frameworks
  • 9. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Workflow Systems / Collaborative Environments 9 Platform Advantage Drawbacks WFS High Provenance No (High) Interoperability Interoperabili ty (*) Updates are tool dependent Workflow Management No much code flexibility OpenML Workflow Systems Collaborative Env.
  • 10. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Integrated Development Environments / Libraries 10 Platform Advantage Drawbacks IDE/MLL High code- flexibility Low Provenance No learning curve Low Interoperability Data Management costs more IDEs
  • 11. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Integrated Development Environments / Libraries 11 Platform Advantage Drawbacks IDE/MLL High code- flexibility Low Provenance No learning curve Low Interoperability Data Management costs more IDEs motivation
  • 12. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Research Question 1.How to export machine learning variables - incomes/outcomes (IDEs)? - Architecture - Schema 2.What is the existing approach that minimizes the coding effort (IDEs)? 12
  • 13. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Existing solutions 13 Solution Advantage Drawbacks stdout No Extra Coding Effort Required Lack of Provenance Lack of Interoperability Lack of Data Query Feature DBMS Data Query Feature Extra Coding Effort (Integration) Lack of Provenance Lack of Interoperability Self-schema Definition Straightforward Solution Extra Coding Effort Extra Analysis Effort (modeling) Lack of Provenance Lack of Interoperability IDEs
  • 14. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 MEX Interfaces: from annotations to semantic metadata 14 Method Advantage Drawbacks MEX Interfaces - Provenance - Interoperabili ty - Data Query Feature - Automatic Metadata Generation - Extra Processing Time - Security Issues (due to reflection) IDEs
  • 15. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 MEX Interfaces: from annotations to semantic metadata 1. Allow metadata generation regardless of the IDE, machine- learning library and context of the experiment 15
  • 16. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 - Generates metadata based on one of the state-of-the- art vocabularies for machine learning :exp_cf_1_2025644708_exe_2_algo a mexalgo:Algorithm ; rdfs:label "Support Vector Machines" ; mexalgo:hasAlgorithmClass mexalgo:SupportVectorMachines ; mexalgo:hasHyperParameter :exp_cf_1_2025644708_exe_2_hyperpar_4, :exp_ cf_1_2025644708_exe_2_hyperpar_3, :exp_cf_1_2025644708_exe_2_hyperpar_2, :exp_cf_1_2025644708_exe_2_hyperpar_1 ; dct:identifier "svm". ... MEX Vocabulary 16
  • 17. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Vocabularies and Ontologies for ML/DM 17 MEX Onto-DM DMOP Exposé W3C ML-Schema Community Group https://www.w3.org/community/ml- schema/ KDD MLS
  • 18. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Vocabularies and Ontologies for ML/DM: MEX 18 Agnieszka Lawrynowicz et al. “The Algorithm-Implementation-Execution Ontology Design Pattern”, WOP2016. Algorithm ⊑ InformationEntity Implementation ⊑ InformationEntity Implementation ⊑ ∃implements.Algorithm Implementation ⊑ ∃hasParameter.Parameter Execution ⊑ Process Execution ⊑ ∃hasInput.ParameterSetting Execution ⊑ ∃realizes.Algorithm ... is a lightweight and flexible schema for machine learning, based on PROV- O PROV-O: The PROV Ontology http://mex.aksw.org https://www.w3.org/TR/prov-o/
  • 19. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 - Reduces the overall coding effort - No wrappers/adaptors/connectors do DBMS - No schema development required - Transparent process MEX Interfaces: Features 19
  • 20. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 - Transparent process (annotations and reflection) Features: MEX Annotations 20 @ExperimentInfo(identifier = "e1", createdBy = "Esteves", email = "esteves@informatik.uni- leipzig.de", title = "Weka Lib Example", tags = {"WEKA","J48", "DecisionTable", "MEX", "Iris"}) @Hardware(cpu = MEXEnum.EnumProcessors.INTEL_COREI7, memory = MEXEnum.EnumRAM.SIZE_8GB, hdType = "SSD") @SamplingMethod(klass = MEXEnum.EnumSamplingMethods.CROSS_VALIDATION, trainSize = 0.5, testSize = 0.5, folds = 10) @InterfaceVersion(version = MEXEnum.EnumAnnotationInterfaceStyles.M1) public class WekaExample001 { … } java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out mymex01.ttl
  • 21. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 - Transparent process (annotations and reflection) Features: MEX Annotations 21 @Algorithm(algorithmClass = MEXEnum.EnumAlgorithmsClasses.J48, algorithmID = "1", algorithmName = "J48", algorithmURI = "http://weka.sourceforge.net/doc.dev/weka/ classifiers/trees/J48.html") public J48 wekaJ48; @Algorithm(algorithmClass = MEXEnum.EnumAlgorithmsClasses.PART, algorithmID = "2", algorithmName = "PART", algorithmURI = "http://weka.sourceforge.net/doc.dev/weka/ classifiers/rules/PART.html") public PART wekaPART; @DatasetName public String ds = "iris.arff"; Instances data; ... java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out mymex01.ttl
  • 22. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out mymex01.ttl - Transparent process (annotations and reflection) @Measure(idMeasure = MEXEnum.EnumMeasures.ERROR) public List<Double> errors; @Measure(idMeasure = MEXEnum.EnumMeasures.ACCURACY) public List<Double> accuracies; @Start public void myMainMethod(){ throws Exception {...} Features: MEX Annotations 22
  • 23. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 MEX Annotations Log 23 Starting the process: MetaGeneration -uc interfaces.WekaExample001 -out mymex01.ttl [main] INFO org.aksw.mex.interfaces.MetaGeneration - ********************** MEX Interfaces ********************** [main] INFO org.aksw.mex.interfaces.MetaGeneration - http://mex.aksw.org [main] INFO org.aksw.mex.interfaces.MetaGeneration - Starting the meta annotation for class named: WekaExample001 [main] INFO org.aksw.mex.interfaces.MetaGeneration - @ExperimentInfo - OK [main] INFO org.aksw.mex.interfaces.MetaGeneration - @Hardware - OK [main] INFO org.aksw.mex.interfaces.MetaGeneration - @SamplingMethod - OK [main] INFO org.aksw.mex.interfaces.MetaGeneration - invoking the main method: start [main] INFO interfaces.WekaExample001 - Accuracy of J48: 94.00% - Error: 6.00% [main] INFO interfaces.WekaExample001 - Accuracy of PART: 90.67% - Error: 9.33% [main] INFO interfaces.WekaExample001 - Accuracy of DecisionTable: 92.67% - Error: 7.33% [main] INFO interfaces.WekaExample001 - Accuracy of DecisionStump: 36.67% - Error: 63.33% [main] INFO org.aksw.mex.interfaces.MetaGeneration - invoking the features method: getFeatures [main] INFO org.aksw.mex.interfaces.MetaGeneration - @DataSet - OK [main] INFO org.aksw.mex.interfaces.MetaGeneration - @Algorithm - OK [main] INFO org.aksw.mex.interfaces.MetaGeneration - :: starting to add executions... [main] INFO org.aksw.mex.interfaces.MetaGeneration - :: nr. executions = 4 [main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D1 [main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D1 : 6.0 [main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D1 : 94.0 [main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D2 [main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D2 : 9.333333333333329 [main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D2 : 90.66666666666667 [main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D3 [main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D3 : 7.333333333333329 [main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D3 : 92.66666666666667 [main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D4 [main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D4 : 63.333333333333336 [main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D4 : 36.666666666666664 [main] WARN org.aksw.mex.log4mex.MEXSerializer - No model defined [main] WARN org.aksw.mex.log4mex.MEXSerializer - No tool defined [main] WARN org.aksw.mex.log4mex.MEXSerializer - No tool parameter defined [main] INFO org.aksw.mex.interfaces.MetaGeneration - The MEX file has been successfully created: share it ;-) [main] INFO org.aksw.mex.interfaces.MetaGeneration - process execution time (s): 1
  • 24. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Metadata file example :hardware a mexcore:HardwareConfiguration, prov:Entity ; mexcore:cpu "Intel Core i7" ; mexcore:hd "SSD" ; mexcore:memory "8 GB" . :execution_3 a mexcore:OverallExecution, mexcore:group "true" ; prov:id "3" ; prov:used this:phaseTEST ; prov:wasInformedBy this:configuration1 . ... :sampling a prov:Entity , mexcore:CrossValidation ; mexcore:folds "10" ; mexcore:sequential "true"; mexcore:testSize "0.5" ; mexcore:trainSize "0.5" . :feature4 a mexcore:Feature, prov:Entity ; rdfs:label "petalwidth" ; dct:identifier "4" . :measure3_1 a mexperf:StatisticalMeasure; mexperf:error "7.333333333333329" ; prov:wasInformedBy this:execution_3 . ... Features: Output file sample 24
  • 25. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Features: Logging Library - Transparent process (logging) 25 MyMEX mex = new MyMEX(); try{ mex.setAuthorName("D Esteves"); mex.setAuthorEmail("esteves@informatik.uni-leipzig.de"); mex.setOrganization("Leipzig University"); mex.setExperimentId("E001"); mex.setExperimentTitle("my first experiment"); … }
  • 26. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 - Transparent process (logging) try{ MEXSerializer.getInstance().saveToDisk("./metafiles/log4mex/ ex003", "http://mex.aksw.org/examples/", mex, MEXConstant.EnumRDFFormats.TTL); }catch (Exception e){ System.out.print(e.toString());} Features: Logging Library 26 ... mex.Configuration().addExecution(EnumExecutionsType.OVERALL, EnumPhases.TRAIN); mex.Configuration().Execution(ex1).setAlgorithm(alg1);
 mex.Configuration().Execution(ex1).addPerformance(EnumMeasures.ACCUR ACY, x); ...
  • 27. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 MEX Interfaces output x Weka default output Features: Output file sample 27 === Evaluation on training set === === Summary === Correctly Classified Instances 147 98% Incorrectly Classified Instances 3 2% Kappa statistic 0.97 Mean absolute error 0.0233 Root mean squared error 0.108 Relative absolute error 5.2482% Root relative squared error 22.9089% Total Number of Instances 150 this:m11 a prov:Entity, mexperf:PerformanceMeasure; dct:identifier "WekaPerformances"; mexperf:accuracy "0.9768"^^xsd:float; mexperf:truePositive "147"^^xsd:integer; mexperf:falsePositive "3"^^xsd:integer; mexperf:kappaStatistics "0.97"^^xsd:float; mexperf:meanAbsoluteError "0.0233"^^xsd:float; mexperf:rootMeanSquaredError "0.108"^^xsd:float; mexperf:relativeAbsoluteError "0.052482"^^xsd:float; mexperf:rootRelativeSquaredError "0.0229089"^^xsd:float; prov:wasGeneratedBy this:ep1;. ...
  • 28. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Features: Data Visualization (http://lodview.it/) 28
  • 29. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Features: Data Management “for free” 29 - ML Model that takes 3 days to be executed - Iterates 300 times - Produces/Has 5000 outcomes/incomes - ….
  • 30. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 30 - What are the top 4 configurations?
  • 31. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Features: Data Management “for free” 31 PREFIX mexcore: <http://mex.aksw.org/mex-core#>
 PREFIX mexperf: <http://mex.aksw.org/mex-perf#>
 PREFIX mexalgo: <http://mex.aksw.org/mex-algo#>
 PREFIX prov: <http://www.w3.org/ns/prov#>
 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 SELECT DISTINCT ?ExecutionID ?Algorithm ?Performance ?fMeasure WHERE {
 ?execution prov:used ?alg; prov:id ?ExecutionID.
 ?Performance prov:wasGeneratedBy ?execution.
 ?Performance mexperf:f1Measure ?fMeasure.
 ?alg a mexalgo:Algorithm.
 ?alg rdfs:label ?Algorithm.
 } 
 ORDER BY DESC (?fMeasure)
 LIMIT 4
  • 32. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Features: Data Management “for free” 32 ?ExecutionID ?Algorithm ?Performance ?fMeasure "C0_MEX_EXEC_D44" "BaggingJ48" mea_clas_C0_MEX_EXEC_D44_cf_1_-568657719 0.9968 "C0_MEX_EXEC_D24" "Logistic Model Trees" mea_clas_C0_MEX_EXEC_D24_cf_1_-568657719 0.9952 "C0_MEX_EXEC_D16" "Random Forest" mea_clas_C0_MEX_EXEC_D16_cf_1_-568657719 0.9920 "C0_MEX_EXEC_D64" "Multilayer Perceptron" mea_clas_C0_MEX_EXEC_D64_cf_1_-568657719 0.99
  • 33. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Features: Data Management “for free” 33
  • 34. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Conclusions and Future Work Machine Learning Metadata Generation - Generating high quality metadata is not a straightforward process - Dealing with different outputs is not time-efficient RQ1/RQ2 -> New methodology - To automatize the process metadata generation for IDEs - Data Management - Based on one of the state-of-the-art vocabularies Future Work - Integrate others ML ontologies (ML-Schema) - Analyse the coverage of the methodology with more machine learning scenarios - To create a more robust framework (e.g.: automatic pipelines based on configuration files) 34
  • 35. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 Conclusions and Future Work Thank you! Questions? mex.aksw.org 35