Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Similar a Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation
Similar a Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation (20)
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Diego Esteves, Pablo Mendes, Diego Moussallem, Julio Cesar Duarte, Amrapali Zaveri, Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti | MEX Framework: Automating Machine Learning Metadata Generation
1. MEX Interfaces: Automating Machine
Learning Metadata Generation
D. Esteves, Pablo N. Mendes, D. Moussallem, J.C. Duarte, Amrapali Zaveri,
Jens Lehmann, Ciro Baron Neto, Igor Costa and Maria Claudia Cavalcanti
University of Leipzig
September 13, 2016
1
2. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
What’s metadata and why is it so important?
“Metadata is data that provides
information about other data”
- Data Management
- Meta Analysis (ML)
- Social Engines
- ...
2
3. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
=== Run information ===
Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode:evaluate on training data
...
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 Iris-
setosa
0.98 0.02 0.961 0.98 0.97 0.99 Iris-
versicolor
0.96 0.01 0.98 0.96 0.97 0.99 Iris-
virginica
Weighted Avg. 0.98 0.01 0.98 0.98 0.98 0.993
...
How costly is the metadata generation process?
3
4. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
How costly is the metadata generation process?
4
5. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Reproducible Research - in theory
1.“Data/Metadata publicly available”
2.“The computer code and all the computational
procedures should be available”
3.“Ideally the computer code will encompass all
of the steps of computational analysis”
Dr. Peng / Dr. Jeff Leek
5
6. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Reproducible Research - in practice
- Experiments are hard to reproduce, when not
impossible.
6
7. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
ML/DM Environments
7
LIBSVM
OpenML
IDEsFrameworks
Workflow Systems
Collaborative Env.
8. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Machine Learning Frameworks
8
Platform Advantage Drawbacks
MLF Front-end No (High)
Interoperability
No/low updates
delay
No much code
flexibility
(Low) Workflow
Management
LIBSVM
Frameworks
9. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Workflow Systems / Collaborative Environments
9
Platform Advantage Drawbacks
WFS High
Provenance
No (High)
Interoperability
Interoperabili
ty (*)
Updates are tool
dependent
Workflow
Management
No much code
flexibility
OpenML
Workflow Systems
Collaborative
Env.
10. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Integrated Development Environments / Libraries
10
Platform Advantage Drawbacks
IDE/MLL High code-
flexibility
Low Provenance
No learning
curve
Low
Interoperability
Data Management
costs more
IDEs
11. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Integrated Development Environments / Libraries
11
Platform Advantage Drawbacks
IDE/MLL High code-
flexibility
Low Provenance
No learning
curve
Low
Interoperability
Data Management
costs more
IDEs
motivation
12. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Research Question
1.How to export machine learning variables -
incomes/outcomes (IDEs)?
- Architecture
- Schema
2.What is the existing approach that
minimizes the coding effort (IDEs)?
12
13. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Existing solutions
13
Solution Advantage Drawbacks
stdout No Extra Coding
Effort Required
Lack of Provenance
Lack of Interoperability
Lack of Data Query Feature
DBMS Data Query
Feature
Extra Coding Effort
(Integration)
Lack of Provenance
Lack of Interoperability
Self-schema
Definition
Straightforward
Solution
Extra Coding Effort
Extra Analysis Effort
(modeling)
Lack of Provenance
Lack of Interoperability
IDEs
14. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Interfaces: from annotations to semantic metadata
14
Method Advantage Drawbacks
MEX
Interfaces
- Provenance
- Interoperabili
ty
- Data Query
Feature
- Automatic
Metadata
Generation
- Extra
Processing
Time
- Security
Issues (due
to
reflection)
IDEs
15. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Interfaces: from annotations to semantic metadata
1. Allow metadata
generation regardless of
the IDE, machine-
learning library and
context of the
experiment
15
16. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Generates metadata based on one of the state-of-the-
art vocabularies for machine learning
:exp_cf_1_2025644708_exe_2_algo a mexalgo:Algorithm ;
rdfs:label "Support Vector Machines" ;
mexalgo:hasAlgorithmClass mexalgo:SupportVectorMachines ;
mexalgo:hasHyperParameter :exp_cf_1_2025644708_exe_2_hyperpar_4, :exp_
cf_1_2025644708_exe_2_hyperpar_3, :exp_cf_1_2025644708_exe_2_hyperpar_2,
:exp_cf_1_2025644708_exe_2_hyperpar_1 ;
dct:identifier "svm".
...
MEX Vocabulary
16
17. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Vocabularies and Ontologies for ML/DM
17
MEX
Onto-DM
DMOP
Exposé
W3C ML-Schema
Community
Group
https://www.w3.org/community/ml-
schema/ KDD
MLS
18. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Vocabularies and Ontologies for ML/DM: MEX
18
Agnieszka Lawrynowicz et al. “The
Algorithm-Implementation-Execution
Ontology Design Pattern”, WOP2016.
Algorithm ⊑ InformationEntity
Implementation ⊑ InformationEntity
Implementation ⊑ ∃implements.Algorithm
Implementation ⊑
∃hasParameter.Parameter Execution ⊑
Process
Execution ⊑ ∃hasInput.ParameterSetting
Execution ⊑ ∃realizes.Algorithm
...
is a lightweight and flexible schema
for machine learning, based on PROV-
O
PROV-O: The PROV
Ontology
http://mex.aksw.org
https://www.w3.org/TR/prov-o/
19. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Reduces the overall coding effort
- No wrappers/adaptors/connectors do DBMS
- No schema development required
- Transparent process
MEX Interfaces: Features
19
20. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Transparent process (annotations and reflection)
Features: MEX Annotations
20
@ExperimentInfo(identifier = "e1", createdBy = "Esteves", email = "esteves@informatik.uni-
leipzig.de", title = "Weka Lib Example", tags = {"WEKA","J48", "DecisionTable", "MEX",
"Iris"})
@Hardware(cpu = MEXEnum.EnumProcessors.INTEL_COREI7, memory = MEXEnum.EnumRAM.SIZE_8GB,
hdType = "SSD")
@SamplingMethod(klass = MEXEnum.EnumSamplingMethods.CROSS_VALIDATION, trainSize = 0.5,
testSize = 0.5, folds = 10)
@InterfaceVersion(version = MEXEnum.EnumAnnotationInterfaceStyles.M1)
public class WekaExample001 { … }
java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out
mymex01.ttl
21. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Transparent process (annotations and reflection)
Features: MEX Annotations
21
@Algorithm(algorithmClass = MEXEnum.EnumAlgorithmsClasses.J48, algorithmID = "1",
algorithmName = "J48", algorithmURI = "http://weka.sourceforge.net/doc.dev/weka/
classifiers/trees/J48.html")
public J48 wekaJ48;
@Algorithm(algorithmClass = MEXEnum.EnumAlgorithmsClasses.PART, algorithmID = "2",
algorithmName = "PART", algorithmURI = "http://weka.sourceforge.net/doc.dev/weka/
classifiers/rules/PART.html")
public PART wekaPART;
@DatasetName public String ds = "iris.arff"; Instances data; ...
java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out
mymex01.ttl
22. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
java -cp /home/mexframework org.aksw.mex.framework.MetaGeneration -uc IrisWekaExample.java -out
mymex01.ttl
- Transparent process (annotations and reflection)
@Measure(idMeasure = MEXEnum.EnumMeasures.ERROR) public List<Double> errors;
@Measure(idMeasure = MEXEnum.EnumMeasures.ACCURACY) public List<Double> accuracies;
@Start
public void myMainMethod(){ throws Exception {...}
Features: MEX Annotations
22
23. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Annotations Log
23
Starting the process: MetaGeneration -uc interfaces.WekaExample001 -out mymex01.ttl
[main] INFO org.aksw.mex.interfaces.MetaGeneration - ********************** MEX Interfaces **********************
[main] INFO org.aksw.mex.interfaces.MetaGeneration - http://mex.aksw.org
[main] INFO org.aksw.mex.interfaces.MetaGeneration - Starting the meta annotation for class named: WekaExample001
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @ExperimentInfo - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @Hardware - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @SamplingMethod - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - invoking the main method: start
[main] INFO interfaces.WekaExample001 - Accuracy of J48: 94.00% - Error: 6.00%
[main] INFO interfaces.WekaExample001 - Accuracy of PART: 90.67% - Error: 9.33%
[main] INFO interfaces.WekaExample001 - Accuracy of DecisionTable: 92.67% - Error: 7.33%
[main] INFO interfaces.WekaExample001 - Accuracy of DecisionStump: 36.67% - Error: 63.33%
[main] INFO org.aksw.mex.interfaces.MetaGeneration - invoking the features method: getFeatures
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @DataSet - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - @Algorithm - OK
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: starting to add executions...
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: nr. executions = 4
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D1
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D1 : 6.0
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D1 : 94.0
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D2
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D2 : 9.333333333333329
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D2 : 90.66666666666667
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D3
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D3 : 7.333333333333329
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D3 : 92.66666666666667
[main] INFO org.aksw.mex.interfaces.MetaGeneration - :: idExecution = C1_MEX_EXEC_D4
[main] INFO org.aksw.mex.interfaces.MetaGeneration - error of Execution C1_MEX_EXEC_D4 : 63.333333333333336
[main] INFO org.aksw.mex.interfaces.MetaGeneration - accuracy of Execution C1_MEX_EXEC_D4 : 36.666666666666664
[main] WARN org.aksw.mex.log4mex.MEXSerializer - No model defined
[main] WARN org.aksw.mex.log4mex.MEXSerializer - No tool defined
[main] WARN org.aksw.mex.log4mex.MEXSerializer - No tool parameter defined
[main] INFO org.aksw.mex.interfaces.MetaGeneration - The MEX file has been successfully created: share it ;-)
[main] INFO org.aksw.mex.interfaces.MetaGeneration - process execution time (s): 1
24. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Metadata file example
:hardware a
mexcore:HardwareConfiguration,
prov:Entity ;
mexcore:cpu "Intel Core
i7" ;
mexcore:hd "SSD" ;
mexcore:memory "8 GB" .
:execution_3 a
mexcore:OverallExecution,
mexcore:group "true" ;
prov:id "3" ;
prov:used this:phaseTEST ;
prov:wasInformedBy
this:configuration1 .
...
:sampling a prov:Entity ,
mexcore:CrossValidation ;
mexcore:folds "10" ;
mexcore:sequential "true";
mexcore:testSize "0.5" ;
mexcore:trainSize "0.5" .
:feature4 a mexcore:Feature,
prov:Entity ; rdfs:label
"petalwidth" ;
dct:identifier "4" .
:measure3_1 a
mexperf:StatisticalMeasure;
mexperf:error "7.333333333333329"
;
prov:wasInformedBy
this:execution_3 .
...
Features: Output file sample
24
25. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Logging Library
- Transparent process (logging)
25
MyMEX mex = new MyMEX();
try{
mex.setAuthorName("D Esteves");
mex.setAuthorEmail("esteves@informatik.uni-leipzig.de");
mex.setOrganization("Leipzig University");
mex.setExperimentId("E001");
mex.setExperimentTitle("my first experiment");
… }
26. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
- Transparent process (logging)
try{ MEXSerializer.getInstance().saveToDisk("./metafiles/log4mex/
ex003", "http://mex.aksw.org/examples/", mex,
MEXConstant.EnumRDFFormats.TTL);
}catch (Exception e){
System.out.print(e.toString());}
Features: Logging Library
26
...
mex.Configuration().addExecution(EnumExecutionsType.OVERALL,
EnumPhases.TRAIN);
mex.Configuration().Execution(ex1).setAlgorithm(alg1);
mex.Configuration().Execution(ex1).addPerformance(EnumMeasures.ACCUR
ACY, x);
...
27. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
MEX Interfaces output x Weka default output
Features: Output file sample
27
=== Evaluation on training set ===
=== Summary ===
Correctly Classified Instances 147
98%
Incorrectly Classified Instances 3
2%
Kappa statistic 0.97
Mean absolute error
0.0233
Root mean squared error
0.108
Relative absolute error
5.2482%
Root relative squared error
22.9089%
Total Number of Instances 150
this:m11 a prov:Entity, mexperf:PerformanceMeasure;
dct:identifier "WekaPerformances";
mexperf:accuracy "0.9768"^^xsd:float;
mexperf:truePositive "147"^^xsd:integer;
mexperf:falsePositive "3"^^xsd:integer;
mexperf:kappaStatistics "0.97"^^xsd:float;
mexperf:meanAbsoluteError "0.0233"^^xsd:float;
mexperf:rootMeanSquaredError "0.108"^^xsd:float;
mexperf:relativeAbsoluteError
"0.052482"^^xsd:float;
mexperf:rootRelativeSquaredError
"0.0229089"^^xsd:float;
prov:wasGeneratedBy this:ep1;.
...
28. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Visualization (http://lodview.it/)
28
29. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
29
- ML Model that takes 3 days to be executed
- Iterates 300 times
- Produces/Has 5000 outcomes/incomes
- ….
30. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016 30
- What are the top 4 configurations?
31. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
31
PREFIX mexcore: <http://mex.aksw.org/mex-core#>
PREFIX mexperf: <http://mex.aksw.org/mex-perf#>
PREFIX mexalgo: <http://mex.aksw.org/mex-algo#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?ExecutionID ?Algorithm ?Performance ?fMeasure WHERE {
?execution prov:used ?alg; prov:id ?ExecutionID.
?Performance prov:wasGeneratedBy ?execution.
?Performance mexperf:f1Measure ?fMeasure.
?alg a mexalgo:Algorithm.
?alg rdfs:label ?Algorithm.
}
ORDER BY DESC (?fMeasure)
LIMIT 4
32. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
32
?ExecutionID ?Algorithm ?Performance ?fMeasure
"C0_MEX_EXEC_D44" "BaggingJ48" mea_clas_C0_MEX_EXEC_D44_cf_1_-568657719 0.9968
"C0_MEX_EXEC_D24" "Logistic
Model Trees"
mea_clas_C0_MEX_EXEC_D24_cf_1_-568657719 0.9952
"C0_MEX_EXEC_D16" "Random
Forest"
mea_clas_C0_MEX_EXEC_D16_cf_1_-568657719 0.9920
"C0_MEX_EXEC_D64" "Multilayer
Perceptron"
mea_clas_C0_MEX_EXEC_D64_cf_1_-568657719 0.99
33. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Features: Data Management “for free”
33
34. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Conclusions and Future Work
Machine Learning Metadata Generation
- Generating high quality metadata is not a straightforward process
- Dealing with different outputs is not time-efficient
RQ1/RQ2 -> New methodology
- To automatize the process metadata generation for IDEs
- Data Management
- Based on one of the state-of-the-art vocabularies
Future Work
- Integrate others ML ontologies (ML-Schema)
- Analyse the coverage of the methodology with more machine learning
scenarios
- To create a more robust framework (e.g.: automatic pipelines based on
configuration files)
34
35. Esteves et al. (University of Leipzig) MEX Interfaces: Automating ML Metadata Generation September 13, 2016
Conclusions and Future Work
Thank you!
Questions?
mex.aksw.org
35