SlideShare a Scribd company logo
1 of 59
Download to read offline
N E T W O R K E D MAC H I N E L E A R N I N G
J OAQ U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4
#OpenML
Research different.
1 6 1 0
G A L I L E O G A L I L E I
D I S C O V E R S S A T U R N ’ S R I N G S
‘ S M A I S M R M I L M E P O E TA L
E U M I B U N E N U G T TA U I R A S ’
Research different.
Royal society: Take nobody’s word for it
Scientific Journal: Reputation-based culture
3 0 0 Y E A R S L AT E R
J O U R N A L S S H O W L I M I T S
• Complex code not included
• Large data sets not included
• Experiment details scant
• Results hard to reproduce
• Papers not updatable
• Slow, incomplete tracking of
paper impact
• Publication bias
• No online public discussion
• Open access?
J O U R N A L S : L O N G - T E R M M E M O RY
I N T E R N E T: S H O R T- T E R M W O R K I N G M E M O RY
N E T W O R K E D S C I E N C E
O N L I N E D A TA B A S E S
O P E N S O U R C E C O D E
W E B S E R V I C E S , A P I S
C O L L A B O R A T I V E T O O L S
!
O P E N , S C A L A B L E C O L L A B O R A T I O N
R E A L - T I M E D I S C U S S I O N
C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S
C I T I Z E N S C I E N C E
Research different.
Polymaths: Solve math problems through
massive collaboration (not competition)
Broadcast question, combine 	

many minds to solve it
Solved hard problems in weeks
Many (joint) publications
Research different.
SDSS: Robotic telescope, data publicly online (SkyServer)
+1 million distinct users 	

vs. 10.000 astronomers
Broadcast data, allow many minds to ask the right questions
Thousands of papers
Research different.
Galaxy Zoo: citizen scientists classify a million galaxies
Offer right tools so that anybody can be a scientist
Many novel discoveries by scientists and citizens
Research different.
Sharing data sparks discovery
Designed serendipity:	

- What’s hard for one scientist is
easy for another	

- Surprising ideas, observations
can spark new discoveries
Share, organise data for easy, 	

large-scale collaboration
Data exploding in all sciences: 	

collaborative data analysis needed
Building reputation
Authorship: easy to contribute + contributions stored, visible online
Collaboration: build trust, work 	

with new people
Citation: more people see, build upon, and cite your work. 	

Tell people how to cite data and code.
Altmetrics: track reuse/interest online (ArXiv)
N E T W O R K E D MAC H I N E L E A R N I N G
Machine learning
Complex code, large-scale data, experiments (impossible to print)
Experiments not shared online: impossible to build on prior work:
inhibits deeper analysis (e.g. meta-learning)
Low reproducibility, generalisability (studies contradict)
What if we could all connect with each other, and with other 	

scientists, to explore and apply machine learning?
Few collaborative tools to speed up research
OpenML
Place to share data, code, experiments in full detail
All results organised, linked together for further (meta)analysis,
reuse, discussion, study, education
Links to (open-source) code, open data anywhere online.
Anyone can post data to analyse, anyone can share code and
results (models, predictions, evaluations)
Integrated in ML platforms (R,Weka, Rapidminer,…) 	

to automatically load data, upload results
Scientists can work in teams, but results only publicly visible if
data, code shared
OpenML: benefits for scientists
More time: automates routinizable work: 	

- find data and/or code	

- setup and run large-scale experiments	

- results compared to state-of-the-art	

- log experiment details for future reference
More control: 	

- state how others should cite your work	

- track reuse	

- share results more easily
More knowledge: 	

- more time for actual research	

- build directly on prior work	

- easier, large-scale collaboration + interaction
Plugins:WEKA
Plugins: MOA
Plugins: RapidMiner
1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C )
2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S
3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
OpenML: under development
OpenML studies	

- collection of datasets, flows, runs, results in a study	

- online counterpart of paper (with url)	

- construct by simply tagging resources	

- easily include (build on) data of others
Reputation building	

- Profile page: statistics of activity and impact on OpenML 	

- Collaborative leaderboards: best contributors to solving a task
Teams	

- Add scientists in teams (circles)	

- Share resources, results within team only	

- Make public at any time (e.g. after publication)
Meta-learning support	

- Data/Flow qualities: easy adding, better overviews	

- Algorithm selection techniques running on website (vs humans?)
J O I N T H E C LU B

More Related Content

Similar to OpenML Tutorial: Networked Science in Machine Learning

Open Science Framework (OSF)
Open Science Framework (OSF)Open Science Framework (OSF)
Open Science Framework (OSF)Andrew Sallans
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningMegan Bowe
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningMegan Bowe
 
Big Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable InsightBig Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable Insightalywise
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersRenaud Clément
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Werner Leyh
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013The Pathway Group
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Open Science for sustainability and inclusiveness: the SKA role model
 Open Science for sustainability and inclusiveness: the SKA role model Open Science for sustainability and inclusiveness: the SKA role model
Open Science for sustainability and inclusiveness: the SKA role modelLourdes Verdes-Montenegro
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017Joaquin Vanschoren
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteTheContentMine
 
New e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionNew e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionDavid De Roure
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Social media cafe ResearchGate
Social media cafe ResearchGateSocial media cafe ResearchGate
Social media cafe ResearchGateHugo Besemer
 

Similar to OpenML Tutorial: Networked Science in Machine Learning (20)

OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Open Science Framework (OSF)
Open Science Framework (OSF)Open Science Framework (OSF)
Open Science Framework (OSF)
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Data Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong LearningData Interoperability for Learning Analytics and Lifelong Learning
Data Interoperability for Learning Analytics and Lifelong Learning
 
Big Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable InsightBig Data + Learning Theory + Computational Power => Actionable Insight
Big Data + Learning Theory + Computational Power => Actionable Insight
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for Beginners
 
SENCER_panel.ppt
SENCER_panel.pptSENCER_panel.ppt
SENCER_panel.ppt
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Open Science for sustainability and inclusiveness: the SKA role model
 Open Science for sustainability and inclusiveness: the SKA role model Open Science for sustainability and inclusiveness: the SKA role model
Open Science for sustainability and inclusiveness: the SKA role model
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
 
Ebi
EbiEbi
Ebi
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
New e-Science Edinburgh Late Edition
New e-Science Edinburgh Late EditionNew e-Science Edinburgh Late Edition
New e-Science Edinburgh Late Edition
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Social media cafe ResearchGate
Social media cafe ResearchGateSocial media cafe ResearchGate
Social media cafe ResearchGate
 

More from Joaquin Vanschoren (15)

Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
Data science
Data scienceData science
Data science
 
Open Machine Learning
Open Machine LearningOpen Machine Learning
Open Machine Learning
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
 
Hadoop sensordata part3
Hadoop sensordata part3Hadoop sensordata part3
Hadoop sensordata part3
 

Recently uploaded

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 

Recently uploaded (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 

OpenML Tutorial: Networked Science in Machine Learning

  • 1. N E T W O R K E D MAC H I N E L E A R N I N G J OAQ U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4 #OpenML
  • 3. 1 6 1 0 G A L I L E O G A L I L E I D I S C O V E R S S A T U R N ’ S R I N G S ‘ S M A I S M R M I L M E P O E TA L E U M I B U N E N U G T TA U I R A S ’
  • 4. Research different. Royal society: Take nobody’s word for it Scientific Journal: Reputation-based culture
  • 5. 3 0 0 Y E A R S L AT E R J O U R N A L S S H O W L I M I T S • Complex code not included • Large data sets not included • Experiment details scant • Results hard to reproduce • Papers not updatable • Slow, incomplete tracking of paper impact • Publication bias • No online public discussion • Open access?
  • 6. J O U R N A L S : L O N G - T E R M M E M O RY I N T E R N E T: S H O R T- T E R M W O R K I N G M E M O RY N E T W O R K E D S C I E N C E O N L I N E D A TA B A S E S O P E N S O U R C E C O D E W E B S E R V I C E S , A P I S C O L L A B O R A T I V E T O O L S ! O P E N , S C A L A B L E C O L L A B O R A T I O N R E A L - T I M E D I S C U S S I O N C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S C I T I Z E N S C I E N C E
  • 7. Research different. Polymaths: Solve math problems through massive collaboration (not competition) Broadcast question, combine many minds to solve it Solved hard problems in weeks Many (joint) publications
  • 8. Research different. SDSS: Robotic telescope, data publicly online (SkyServer) +1 million distinct users vs. 10.000 astronomers Broadcast data, allow many minds to ask the right questions Thousands of papers
  • 9. Research different. Galaxy Zoo: citizen scientists classify a million galaxies Offer right tools so that anybody can be a scientist Many novel discoveries by scientists and citizens
  • 10. Research different. Sharing data sparks discovery Designed serendipity: - What’s hard for one scientist is easy for another - Surprising ideas, observations can spark new discoveries Share, organise data for easy, large-scale collaboration Data exploding in all sciences: collaborative data analysis needed
  • 11. Building reputation Authorship: easy to contribute + contributions stored, visible online Collaboration: build trust, work with new people Citation: more people see, build upon, and cite your work. Tell people how to cite data and code. Altmetrics: track reuse/interest online (ArXiv)
  • 12. N E T W O R K E D MAC H I N E L E A R N I N G
  • 13. Machine learning Complex code, large-scale data, experiments (impossible to print) Experiments not shared online: impossible to build on prior work: inhibits deeper analysis (e.g. meta-learning) Low reproducibility, generalisability (studies contradict) What if we could all connect with each other, and with other scientists, to explore and apply machine learning? Few collaborative tools to speed up research
  • 14. OpenML Place to share data, code, experiments in full detail All results organised, linked together for further (meta)analysis, reuse, discussion, study, education Links to (open-source) code, open data anywhere online. Anyone can post data to analyse, anyone can share code and results (models, predictions, evaluations) Integrated in ML platforms (R,Weka, Rapidminer,…) to automatically load data, upload results Scientists can work in teams, but results only publicly visible if data, code shared
  • 15. OpenML: benefits for scientists More time: automates routinizable work: - find data and/or code - setup and run large-scale experiments - results compared to state-of-the-art - log experiment details for future reference More control: - state how others should cite your work - track reuse - share results more easily More knowledge: - more time for actual research - build directly on prior work - easier, large-scale collaboration + interaction
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 57. Plugins: RapidMiner 1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C ) 2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S 3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
  • 58. OpenML: under development OpenML studies - collection of datasets, flows, runs, results in a study - online counterpart of paper (with url) - construct by simply tagging resources - easily include (build on) data of others Reputation building - Profile page: statistics of activity and impact on OpenML - Collaborative leaderboards: best contributors to solving a task Teams - Add scientists in teams (circles) - Share resources, results within team only - Make public at any time (e.g. after publication) Meta-learning support - Data/Flow qualities: easy adding, better overviews - Algorithm selection techniques running on website (vs humans?)
  • 59. J O I N T H E C LU B