SlideShare una empresa de Scribd logo
1 de 8
An open platform approach
  to cyberinfrastructure
                    C. Titus Brown
                    ctb@msu.edu
       Asst Professor, Michigan State University
    (Microbiology, Computer Science, and BEACON)
khmer software
An efficient, sensitive, and specific pipeline component for extremely
                 scalable shotgun sequencing analysis




                                              github.com/ged-lab/khmer
Estimate 50% drop-off at each junction.
Academic software development
     is really, really hard!
Considerations of “remixing” are in addition to:
• Interesting science
• Sufficient compute
• User interface
• Liability and other legal issues
• Integration
Towards an “ecology” of components

• We don’t need “one true pipeline.”
• We need flexible, reusable, and competing pipeline
  components.


• This is not a concern:


• It’s how science works!
                                           http://xkcd.com/927/
• Want flexible, sustainable CI? Build open platforms, openly,
  with open source approaches.

   – The OSS community has lots of experience in doing this, & working
     within incentive structures.
   – Note, traditional academic incentives don’t align well.

• Agile methodologies (iterative, use-case driven, organic)
  ensure that software doesn’t go too far astray; must directly
  involve (& be driven by) domain research groups.

• Too much of software that is produced is not even reusable in
  theory, much less in practice. This needs to change!!!
      Blog post will be at: http://ivory.idyll.org/blog/2013-gbmf-mmi.html
Other things I’m doing
• Scalable/sensitive/specific algorithms for shotgunomics.
• Benchmarking shotgun metagenome assembly.
• CI education (NIH/ngs; NSF/data + compute;
  Sloan/Software Carpentry; BEACON/intro computing for
  grad)
• Hobbies/windmills:
   – Open science and open data.
   – Replication and reproducible research.
   – Changing publication and peer review culture in biology.
Exploratory interfaces for data
   & executable notebooks

Más contenido relacionado

Destacado

Kansen zien kansen benutten okw woerden
Kansen zien kansen benutten okw woerdenKansen zien kansen benutten okw woerden
Kansen zien kansen benutten okw woerdenPiet van Vugt
 
Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Gaurab Dutta
 
Museo Virtual De La Escuelaeste
Museo Virtual De La EscuelaesteMuseo Virtual De La Escuelaeste
Museo Virtual De La Escuelaesteguest09551a
 
Fantastic Photography
Fantastic  PhotographyFantastic  Photography
Fantastic Photographylewisj2111
 
Know Your Enemy
Know Your EnemyKnow Your Enemy
Know Your Enemytlineshill
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachLive Union
 
2009 Business Breakfast Slideshow
2009 Business Breakfast Slideshow2009 Business Breakfast Slideshow
2009 Business Breakfast SlideshowUWTSA
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
유기화학 2nd
유기화학 2nd유기화학 2nd
유기화학 2ndshinkyung
 
Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Edmund_Wheeler
 
Louisiana Technology Council Summer 2010
Louisiana Technology Council Summer 2010Louisiana Technology Council Summer 2010
Louisiana Technology Council Summer 2010JaclynSBR
 

Destacado (20)

Kansen zien kansen benutten okw woerden
Kansen zien kansen benutten okw woerdenKansen zien kansen benutten okw woerden
Kansen zien kansen benutten okw woerden
 
Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010
 
About BMC
About BMCAbout BMC
About BMC
 
Museo Virtual De La Escuelaeste
Museo Virtual De La EscuelaesteMuseo Virtual De La Escuelaeste
Museo Virtual De La Escuelaeste
 
Fantastic Photography
Fantastic  PhotographyFantastic  Photography
Fantastic Photography
 
Know Your Enemy
Know Your EnemyKnow Your Enemy
Know Your Enemy
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
2012 stamps-mbl-2
2012 stamps-mbl-22012 stamps-mbl-2
2012 stamps-mbl-2
 
Non Medical Use of Prescription Drugs October 2016
Non Medical Use of Prescription Drugs October 2016Non Medical Use of Prescription Drugs October 2016
Non Medical Use of Prescription Drugs October 2016
 
Langkah Membuat Blogspot
Langkah Membuat BlogspotLangkah Membuat Blogspot
Langkah Membuat Blogspot
 
Geek girlmeet10
Geek girlmeet10Geek girlmeet10
Geek girlmeet10
 
Dr Roadmap
Dr RoadmapDr Roadmap
Dr Roadmap
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approach
 
2009 Business Breakfast Slideshow
2009 Business Breakfast Slideshow2009 Business Breakfast Slideshow
2009 Business Breakfast Slideshow
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
유기화학 2nd
유기화학 2nd유기화학 2nd
유기화학 2nd
 
2012 XLDB talk
2012 XLDB talk2012 XLDB talk
2012 XLDB talk
 
Hazed and Confused
Hazed and ConfusedHazed and Confused
Hazed and Confused
 
Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09Section 1031 For Legistlative Review 12.16.09
Section 1031 For Legistlative Review 12.16.09
 
Louisiana Technology Council Summer 2010
Louisiana Technology Council Summer 2010Louisiana Technology Council Summer 2010
Louisiana Technology Council Summer 2010
 

Similar a An open platform approach to cyberinfrastructure

2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practicesc.titus.brown
 
BEACON's Cyberinfrastructure Needs
BEACON's Cyberinfrastructure NeedsBEACON's Cyberinfrastructure Needs
BEACON's Cyberinfrastructure Needsc.titus.brown
 
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...James Howison
 
Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Grigori Fursin
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Elia Brodsky
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibilityc.titus.brown
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
(Very) Recent AI advances for Chemical Engineering research and education
(Very) Recent AI advances for Chemical Engineering research and education(Very) Recent AI advances for Chemical Engineering research and education
(Very) Recent AI advances for Chemical Engineering research and educationRichard West
 
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITDr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITBristlecone SCC
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
Scientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexityScientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexityJames Howison
 
A Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainA Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainDaniel S. Katz
 
Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Lionel Briand
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkAdina Chuang Howe
 
Boosting Collective IQ - A New Grand Challenge (1996)
Boosting Collective IQ - A New Grand Challenge (1996)Boosting Collective IQ - A New Grand Challenge (1996)
Boosting Collective IQ - A New Grand Challenge (1996)Doug Engelbart Institute
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingGigaScience, BGI Hong Kong
 

Similar a An open platform approach to cyberinfrastructure (20)

2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practices
 
BEACON's Cyberinfrastructure Needs
BEACON's Cyberinfrastructure NeedsBEACON's Cyberinfrastructure Needs
BEACON's Cyberinfrastructure Needs
 
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...Sustainability in Scientific Software:Ecosystem complexityandSoftware Vis...
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
 
Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...Enabling open and reproducible computer systems research: the good, the bad a...
Enabling open and reproducible computer systems research: the good, the bad a...
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
2014 manchester-reproducibility
2014 manchester-reproducibility2014 manchester-reproducibility
2014 manchester-reproducibility
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
(Very) Recent AI advances for Chemical Engineering research and education
(Very) Recent AI advances for Chemical Engineering research and education(Very) Recent AI advances for Chemical Engineering research and education
(Very) Recent AI advances for Chemical Engineering research and education
 
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITDr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Scientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexityScientific software sustainability and ecosystem complexity
Scientific software sustainability and ecosystem complexity
 
A Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainA Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to Sustain
 
Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.Software Engineering Research: Leading a Double-Agent Life.
Software Engineering Research: Leading a Double-Agent Life.
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
EPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data TalkEPA 2013 Air Sensors Meeting Big Data Talk
EPA 2013 Air Sensors Meeting Big Data Talk
 
Boosting Collective IQ - A New Grand Challenge (1996)
Boosting Collective IQ - A New Grand Challenge (1996)Boosting Collective IQ - A New Grand Challenge (1996)
Boosting Collective IQ - A New Grand Challenge (1996)
 
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data PublishingScott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
Scott Edmunds @ Balti & Bioinformatics: New Models in Open Data Publishing
 

Más de c.titus.brown

Más de c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 

An open platform approach to cyberinfrastructure

  • 1. An open platform approach to cyberinfrastructure C. Titus Brown ctb@msu.edu Asst Professor, Michigan State University (Microbiology, Computer Science, and BEACON)
  • 2. khmer software An efficient, sensitive, and specific pipeline component for extremely scalable shotgun sequencing analysis github.com/ged-lab/khmer
  • 3. Estimate 50% drop-off at each junction.
  • 4. Academic software development is really, really hard! Considerations of “remixing” are in addition to: • Interesting science • Sufficient compute • User interface • Liability and other legal issues • Integration
  • 5. Towards an “ecology” of components • We don’t need “one true pipeline.” • We need flexible, reusable, and competing pipeline components. • This is not a concern: • It’s how science works! http://xkcd.com/927/
  • 6. • Want flexible, sustainable CI? Build open platforms, openly, with open source approaches. – The OSS community has lots of experience in doing this, & working within incentive structures. – Note, traditional academic incentives don’t align well. • Agile methodologies (iterative, use-case driven, organic) ensure that software doesn’t go too far astray; must directly involve (& be driven by) domain research groups. • Too much of software that is produced is not even reusable in theory, much less in practice. This needs to change!!! Blog post will be at: http://ivory.idyll.org/blog/2013-gbmf-mmi.html
  • 7. Other things I’m doing • Scalable/sensitive/specific algorithms for shotgunomics. • Benchmarking shotgun metagenome assembly. • CI education (NIH/ngs; NSF/data + compute; Sloan/Software Carpentry; BEACON/intro computing for grad) • Hobbies/windmills: – Open science and open data. – Replication and reproducible research. – Changing publication and peer review culture in biology.
  • 8. Exploratory interfaces for data & executable notebooks

Notas del editor

  1. Mention kbase. Want to make sequence easy again . Develop in close contact with specific biology projects.
  2. Want to be able to do this without talking to anyone!!
  3. I do not buy into the idea that we can project data analysis and software needs very far into the future.
  4. We are also attacking many other things, including education and training, reproducibility, etc. Also, please stop “developing software for researchers”. We need a more bottom up approach. Maybe mention CaBIG.