SlideShare una empresa de Scribd logo
1 de 31
Trends in Use of Scientific Workflows:




                                                            DataONE
Insights from a Public Repository and
Recommendations for Best Practices
Richard Littauer, Karthik Ram, Bertram Ludäscher, William
Michener, Rebecca Koskela

                                                             1
Scientific Workflows
Tools that help scientists:

  • Automate repetitive or
    difficult work




                                       DataONE
  • Provide reproducibility to their
    experiments

  • Track provenance

  • Share their data with other         2
    scientists
Workflow Workbenches




                       DataONE
                        3
Workflow Workbenches
These facilitate:
  • Creation

  • Mapping




                       DataONE
  • Scheduling

  • Execution

  • Visualization

  • Re-Use              4
Example Workflow




                                                 DataONE
                                                  5


http://www.myexperiment.org/workflows/140.html
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                                                   6


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                • How are they being shared?




                                                                                   7


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                    DataONE
                                                • How are they being shared?
                                                • What sort of best practices can
                                                  researchers follow to maximize
                                                  the longevity and use of their
                                                  work?

                                                                                     8


http://www.flickr.com/photos/eleaf/2536358399
Our Study

• www.myexperiment.org
  • Est. 2007




                                                            DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)




                                                             9
Our Study

• www.myexperiment.org
  • Est. 2007




                                                                      DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)

  • Minable RDF storage for workflows, groups, packs, users, files.
  • Minable data gathered through the SCUFLE XML language for the
    Taverna workflows
  • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
                                                                      10
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
                                                                    11
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
• Gathered user, workflow, files, packs, groups view and
  download statistics, metadata, descriptions, tags, and so on
  (http://thedatahub.org/dataset/myexperiment-screenscrape)




                                                                    12
Findings
           • A large percentage of
             workflows consist of few
             components.

           • The amount of components




                                         DataONE
             ranges from 1 to 250. The
             average workflow supports
             24.3 tasks.

           • Complex workflows are
             downloaded more.
                                         13
Findings
           • Most workflow contributors
             submit a single workflow.

           • Only 13 users have uploaded
             more than 30 workflows.




                                            DataONE
           • Just over 5% of the users on
             myExperiment have uploaded
             workflows.



                                            14
Findings
           • Most workflows have only
             one version uploaded.

           • When several versions do




                                           DataONE
             exist, the workflow is more
             frequently downloaded than
             “single-edition” workflows.




                                           15
Findings
           • Workflow use declined
             significantly a month after
             initial upload.




                                           DataONE
                                           16
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.




                                                              17
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.




                                                              18
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.

  • 8% more than previous studies (Lin et al.)

                                                              19
Findings

• 60% of workflows have embedded workflows within them.




                                                          DataONE
                                                          20
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…




                                                               21
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…

• … but community engagement does.




                                                               22
Recommendations


Remember workflows are evolving entities.




                                            DataONE
They are updated in response to user
feedback, engagement, and improvements in
methodology.

                                            23
Recommendations


Use relevant social annotation tools.




                                                DataONE
But they need to be constrained; for
instance, through the use of a controlled tag
vocabulary.

                                                24
Recommendations


Talk about them.




                                     DataONE
Cite the workflow in publications.
Share with colleagues
Advertise the workflow.

                                     25
Recommendations


Provide sufficient descriptions of your workflows.




                                                     DataONE
                                                     26
Recommendations


Keep in mind that one size does not fit all.




                                               DataONE
                                               27
Recommendations


Workflow re-use could benefit significantly from




                                                     DataONE
the assignment of stable identifiers, like Digital
Object Identifiers (DOI).




                                                     28
Recommendations


Education is the key to more use.




                                                   DataONE
i.e. in professional society meetings, online
courses, and undergraduate and graduate courses.



                                                   29
Impact on Science
Following these recommendations can help:
• Make science more efficient.
• Facilitate reproducible science.




                                                   DataONE
• Help with collaborative research.
• Speed up the peer review process.
• Your impact. (For instance, NSF has said these
  are valuable contributions.)

                                                   30
Links
• Mendeley Research Group:
  http://www.mendeley.com/groups/1189721/scientific-workflows-
  and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows
• Data http://thedatahub.org/dataset/myexperiment-screenscrape




                                                                  DataONE
• Notebook https://notebooks.dataone.org/workflows




                                                                  31


http://www.flickr.com/photos/wwworks/4759535950/

Más contenido relacionado

Similar a Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics
Khaled Tumbi
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories
prwheatley
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
Arpan Bhandari
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
Agnes Molnar
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
Paris Open Source Summit
 

Similar a Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices (20)

Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics
 
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre NimmagaddaM&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservation
 
E-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia contentE-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia content
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
IETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work BetterIETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work Better
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 

Más de Richard Littauer

On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
Richard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
Richard Littauer
 

Más de Richard Littauer (12)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

  • 1. Trends in Use of Scientific Workflows: DataONE Insights from a Public Repository and Recommendations for Best Practices Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela 1
  • 2. Scientific Workflows Tools that help scientists: • Automate repetitive or difficult work DataONE • Provide reproducibility to their experiments • Track provenance • Share their data with other 2 scientists
  • 4. Workflow Workbenches These facilitate: • Creation • Mapping DataONE • Scheduling • Execution • Visualization • Re-Use 4
  • 5. Example Workflow DataONE 5 http://www.myexperiment.org/workflows/140.html
  • 6. Our Study • How are workflows being used? DataONE 6 http://www.flickr.com/photos/eleaf/2536358399
  • 7. Our Study • How are workflows being used? DataONE • How are they being shared? 7 http://www.flickr.com/photos/eleaf/2536358399
  • 8. Our Study • How are workflows being used? DataONE • How are they being shared? • What sort of best practices can researchers follow to maximize the longevity and use of their work? 8 http://www.flickr.com/photos/eleaf/2536358399
  • 9. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) 9
  • 10. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) • Minable RDF storage for workflows, groups, packs, users, files. • Minable data gathered through the SCUFLE XML language for the Taverna workflows • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows. 10
  • 11. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE 11
  • 12. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE • Gathered user, workflow, files, packs, groups view and download statistics, metadata, descriptions, tags, and so on (http://thedatahub.org/dataset/myexperiment-screenscrape) 12
  • 13. Findings • A large percentage of workflows consist of few components. • The amount of components DataONE ranges from 1 to 250. The average workflow supports 24.3 tasks. • Complex workflows are downloaded more. 13
  • 14. Findings • Most workflow contributors submit a single workflow. • Only 13 users have uploaded more than 30 workflows. DataONE • Just over 5% of the users on myExperiment have uploaded workflows. 14
  • 15. Findings • Most workflows have only one version uploaded. • When several versions do DataONE exist, the workflow is more frequently downloaded than “single-edition” workflows. 15
  • 16. Findings • Workflow use declined significantly a month after initial upload. DataONE 16
  • 17. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. 17
  • 18. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. 18
  • 19. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. • 8% more than previous studies (Lin et al.) 19
  • 20. Findings • 60% of workflows have embedded workflows within them. DataONE 20
  • 21. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… 21
  • 22. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… • … but community engagement does. 22
  • 23. Recommendations Remember workflows are evolving entities. DataONE They are updated in response to user feedback, engagement, and improvements in methodology. 23
  • 24. Recommendations Use relevant social annotation tools. DataONE But they need to be constrained; for instance, through the use of a controlled tag vocabulary. 24
  • 25. Recommendations Talk about them. DataONE Cite the workflow in publications. Share with colleagues Advertise the workflow. 25
  • 26. Recommendations Provide sufficient descriptions of your workflows. DataONE 26
  • 27. Recommendations Keep in mind that one size does not fit all. DataONE 27
  • 28. Recommendations Workflow re-use could benefit significantly from DataONE the assignment of stable identifiers, like Digital Object Identifiers (DOI). 28
  • 29. Recommendations Education is the key to more use. DataONE i.e. in professional society meetings, online courses, and undergraduate and graduate courses. 29
  • 30. Impact on Science Following these recommendations can help: • Make science more efficient. • Facilitate reproducible science. DataONE • Help with collaborative research. • Speed up the peer review process. • Your impact. (For instance, NSF has said these are valuable contributions.) 30
  • 31. Links • Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows- and-workflow-systems/ • Github https://github.com/RichardLitt/Understanding-Workflows • Data http://thedatahub.org/dataset/myexperiment-screenscrape DataONE • Notebook https://notebooks.dataone.org/workflows 31 http://www.flickr.com/photos/wwworks/4759535950/