Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Software Sustainability: 
Better Software, Better Research 
Professor Carole Goble CBE FREng FBCS 
The University of Manch...
Carole Goble , 
Better Software, Better Research 
Issue No.05 - Sept.-Oct. (2014 vol.18) 
pp: 4-8 
IEEE Computer Society 
...
Acknowledgements 
Neil Chue Hong 
Simon Hettrick 
Steve Crouch 
Aleks Pawlik 
Shoaib Sufi 
and many other SSI and 
other c...
We’ve written lots of software 
Some sustained… 
Some not…
28/10/2014, 20.38 GMT
The research community 
relies on software 
Do you use research 
software? 
What would happen to your 
research without so...
The Research community 
produces software 
scientific software is important 
for their own research 91% 
developing scient...
The cost of UK research that 
relies on software 
£840m 
Investment in 2013-2014 financial 
year, an amount that has risen...
Software is the infrastructure 
“Software is the infrastructure, and hardware the 
consumable. 
Maintenance is required wh...
Scientific software can live a long time 
Computational Chemistry - CASTEP 
1990s. From the first implementation of a DFT ...
“An article about computational 
science in a scientific publication 
is not the scholarship itself, it is 
merely adverti...
Broken software, broken science 
• Geoffrey Chang, Scripps 
Institute 
• Homemade data-analysis 
program inherited from 
a...
recomputation.org 
sciencecodemanifesto.org
Campaigns 
Advice 
Guides 
Courses 
Workshops 
Fellowship 
Software 
Research 
Policy 
Training 
Community 
Consultancy 
C...
What Software? 
What are we sustaining? 
Software context 
Software 
Sustainability 
Institute 
www.software.ac.uk
Not all software 
is worth sustaining 
Is it key to recomputing results? 
Could it be reused? 
Is it more than a one run s...
Novel reuse of public sector data 
What do we sustain: 
- Map? 
- Software that creates map so we can create it and other ...
Do we sustain: 
-Workflow? 
- Software that runs workflow? 
- Software referenced by workflow?
Components & 
Dependencies 
• 35 kinds of annotations 
• 5 Main Workflows 
• 14 Nested Workflows 
• 11 Configuration files...
Sustaining the software ecosystem 
Tools 
Platforms 
Libraries 
Infrastructure 
Frameworks 
Visible 
Invisible 
Domain spe...
Computational Instrument Entropy 
form or function? preserve or sustain? 
prepare to repair 
Document Instrument 
Reproduc...
[Adapted Freire, 2013] 
preservation 
portability 
packaging 
provenance 
gather dependencies 
capture steps 
track & keep...
preservation 
portability 
packaging 
provenance 
gather dependencies 
capture steps 
track & keep results 
tolerance 
Por...
Virtual 
Machines 
Open 
Source 
Recompute Redevelop 
Ian Gent http://www.recomputation.org/blog/2014/09/14/why-isnt-the-a...
Research Objects 
A Framework to Bundle and Relatemulti-hosted 
(digital) resources of a scientific experiment or 
investi...
Research Objects – Method + Materials 
Bechhofer 2013, Why linked data is not enough for 
scientists, DOI: 10.1016/j.futur...
Software Production 
Unsustainable software 
Research Software written by non-software engineers. 
Software 
Sustainabilit...
Research software is used by 
90% of researchers and 
accounts for at least 30% of 
RCUK investment… 
http://www.phdcomics...
through each SNP of interest 
= 0; $x < scalar @pos; $x++) 
then each downstream SNP of interest 
my $y = $x+1; $y < scala...
Maintenance = Evolution 
all code becomes legacy, 
prepare to repair 
code that is used evolves 
• Testing, infrastructure...
[Norman Morrison] 
Software entropy 
Time, resource and capacity to tackle
http://xkcd.com/974 
Long-term sustainable codes for wider adoption vs 
Fast-return codes for me. 
Working to working, jam...
Rule 1: Don't Reinvent the Wheel 
Rule 2: Code Well 
Rule 3: Be Your Own User 
Rule 4: Be Transparent 
Rule 5: Be Simple 
...
Software making practices 
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a 
Computational science: ...Erro...
The UK research community 
making software 
56% 
Of UK researchers develop their 
own research software or scripts 
73% 
O...
Training Unappreciated 
• Basic training for 
kitchen chef: 3-4 years 
• Head chef: 10 years 
• Basic training for s/w 
en...
software-carpentry.org 
Software engineering life skills. 
Teach the “95% researchers” basic lab skills for scientific com...
A one-man, small-scale 
software project -> multi-developer 
programme, 
transforming research into 
molecular binding. 
C...
Findable, Accessible, 
Interoperable, Reusable 
Software 
Software 
Sustainability 
Institute 
www.software.ac.uk
http://softwarediscoveryindex.org/
Required as condition of publication 
Code Sharing Policy 
170 journals, 2011-2012 
StoddenV, Guo P, Ma Z (2013) Toward Re...
The accessibility of software in articles 
87% software could be found 
20% mentions provided version data 
5% mentions pr...
Barriers to Sharing 
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, 
Special Issue Reproducible Research Comp...
http://matt.might.net/articles/crapl/ http://sciencecodemanifesto.org/
I can publish every new monolithic 
thing I make and I can’t publish if I 
reuse someone else’s thing. 
Novelty > sustaina...
Recognition 
Software 
Software developers 
Software 
Sustainability 
Institute 
www.software.ac.uk
The invisibility of software 
esp software that is widely used, 
infrastructural, components or cross-discipline 
[1960s B...
Credit is like Love not $$$$$ 
• Where do I get credit for my software? 
• Where is my name on your 5* journal paper? 
• D...
The visibility of software in articles 
37% formal citation 
78% some form of credit Articles 
87% software could be found...
Original Software Publication
http://www.rse.ac.uk/ 
“Skilled software developers and technicians are 
crucial to modern scientific endeavour so 
develo...
regulation of science 
institutions 
universities 
libraries 
publishers 
public services 
core facilities 
republic of sc...
Sustainability models 
Cultivating Communities 
Contributory citizenship 
Better Funding, Better Software…. 
Software 
Sus...
Technical 
Preservation 
Emulation 
Migration 
Cultivation 
Hiberation 
Depreciation 
Sustainability 
Approaches 
http://w...
Cultivating Sustainable Communities 
Ladder Model of OSS Adoption 
(adapted from Carbone P., Value Derived from 
Open Sour...
Developer Community: to enable 
Critical Mass 
User Community: to drive 
Resources & Time
Cultivate Contributors – R project 
• Basics: Website, mailing list, code repository, issue 
resolution, meet ups 
• Remov...
People depend on 
my software?! 
But I’m a researcher… 
Software is free, 
like free puppies. 
s/w engineers are cute too....
Shared Responsibility 
Incidental 
Adoption 
intentions 
Familial 
Fundamental 
Autocratic 
Control 
intentions 
Adoptive ...
Shared Responsibility 
Incidental 
Adoption 
intentions 
Familial 
Fundamental 
Autocratic 
Control 
intentions 
Adoptive ...
Shared Responsibility 
Incidental 
Adoption 
intentions 
Familial 
Fundamental 
Autocratic 
Control 
intentions 
Adoptive ...
Scale 
Community Interest 
Software Maturity 
Core funding 
Time 
Mind the gap
Sustainability Matrix/Mosaic/Mayhem 
Grant 
Portfolio 
Foundational 
Grants 
Fees 
Community 
Foundation 
Related 
Grants ...
Sustainability through Subsidy 
The Core becomes Skunkworks
What can we do? 
Software 
Sustainability 
Institute 
www.software.ac.uk
Drivers 
Activities 
Outcomes 
Researcher 
Skills 
Reliability of 
Results 
Motivating 
Software 
Protecting 
Investment 
...
http://www.force11.org
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Próxima SlideShare
Cargando en…5
×

Software Sustainability: Better Software Better Science

Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.

Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society

http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88

http://www.software.ac.uk/resources/publications/better-software-better-research

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Software Sustainability: Better Software Better Science

  1. 1. Software Sustainability: Better Software, Better Research Professor Carole Goble CBE FREng FBCS The University of Manchester, UK The Software Sustainability Institute UK carole.goble@manchester.ac.uk @caroleannegoble 2nd Annual Netherlands eScience Symposium, 6th Nov 2014, Almere, The Netherlands Supported by Project funding from
  2. 2. Carole Goble , Better Software, Better Research Issue No.05 - Sept.-Oct. (2014 vol.18) pp: 4-8 IEEE Computer Society http://www.computer.org/csdl/mags/ic/2014/ 05/mic2014050004.pdf http://doi.ieeecomputersociety.org/10.1109/ MIC.2014.88 http://www.software.ac.uk/resources/publications/better-software-better-research
  3. 3. Acknowledgements Neil Chue Hong Simon Hettrick Steve Crouch Aleks Pawlik Shoaib Sufi and many other SSI and other colleagues over the years
  4. 4. We’ve written lots of software Some sustained… Some not…
  5. 5. 28/10/2014, 20.38 GMT
  6. 6. The research community relies on software Do you use research software? What would happen to your research without software Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. 406 respondents covering representative range of funders, discipline and seniority.
  7. 7. The Research community produces software scientific software is important for their own research 91% developing scientific software is important for their own research 84% claimed to spend more time developing scientific software than they did 10 years ago 53% spend at least one fifth of their time developing software 38% 2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc. ICSE Workshop Software Eng. for Computational Science and Eng., 2009, pp. 1–8.
  8. 8. The cost of UK research that relies on software £840m Investment in 2013-2014 financial year, an amount that has risen by 3% on average over last four years 30% Of total research investment has been spent on research which relies on software over the last four financial years How much is wasted? Analysis of data from 49,650 grant titles and abstracts published on Gateway to Research covering 2010-2014.
  9. 9. Software is the infrastructure “Software is the infrastructure, and hardware the consumable. Maintenance is required when there are hardware, operating system and software library changes.” Scientific Infrastructure, UK HOUSE OF LORDS, Select Committee on Science and Technology, 2nd Report of Session 2013–14
  10. 10. Scientific software can live a long time Computational Chemistry - CASTEP 1990s. From the first implementation of a DFT algorithm to a completely new code to community supported software • Individual • Group • Consortium • W/ industry • Community • Active Software advances < hardware speedup http://www.castep.org/
  11. 11. “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 datasets data collections algorithms configurations tools and apps codes workflows scripts code libraries services, system software infrastructure, compilers hardware Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et al The case for open computer programs, Nature 482, 2012
  12. 12. Broken software, broken science • Geoffrey Chang, Scripps Institute • Homemade data-analysis program inherited from another lab • Flipped two columns of data, inverting the electron-density map used to derive protein structure • Retract 3 Science papers and 2 papers in other journals • One paper cited by 364 The structures of MsbA (purple) and Sav1866 (green) overlap little (left) until MsbA is inverted (right). Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006: vol. 314 no. 5807 1856-1857
  13. 13. recomputation.org sciencecodemanifesto.org
  14. 14. Campaigns Advice Guides Courses Workshops Fellowship Software Research Policy Training Community Consultancy Coordination Knowledge Transfer
  15. 15. What Software? What are we sustaining? Software context Software Sustainability Institute www.software.ac.uk
  16. 16. Not all software is worth sustaining Is it key to recomputing results? Could it be reused? Is it more than a one run shot? Is it obsolete? Does anybody care about it?
  17. 17. Novel reuse of public sector data What do we sustain: - Map? - Software that creates map so we can create it and other maps? http://www.mysociety.org
  18. 18. Do we sustain: -Workflow? - Software that runs workflow? - Software referenced by workflow?
  19. 19. Components & Dependencies • 35 kinds of annotations • 5 Main Workflows • 14 Nested Workflows • 11 Configuration files • 25 Scripts • 10 Software dependencies • 1 Workflow management system • 1 Web Service • Dataset: 90 galaxies observed in 3 bands Galaxy Luminosity Profiling José Enrique Ruiz (IAA-CSIC)
  20. 20. Sustaining the software ecosystem Tools Platforms Libraries Infrastructure Frameworks Visible Invisible Domain specific Domain generic CRAN
  21. 21. Computational Instrument Entropy form or function? preserve or sustain? prepare to repair Document Instrument Reproducibility by Inspection Read It, Repair It Reproducibility by Invocation Run It, Repair It
  22. 22. [Adapted Freire, 2013] preservation portability packaging provenance gather dependencies capture steps track & keep results tolerance versioning open accessible available machine actionable description intelligible machine-readable
  23. 23. preservation portability packaging provenance gather dependencies capture steps track & keep results tolerance Portable Package ReproZip Virtual Machines Open Source/Store versioning host ProvStore [Adapted Freire, 2013] Sci as a Service Integrative fws service Workflows, makefiles
  24. 24. Virtual Machines Open Source Recompute Redevelop Ian Gent http://www.recomputation.org/blog/2014/09/14/why-isnt-the-answer-43/
  25. 25. Research Objects A Framework to Bundle and Relatemulti-hosted (digital) resources of a scientific experiment or investigation using standard mechanisms & uniform access protocols. Carriers of Research Context Outputs are first class citizens to be managed, credited and tracked: data, software http://www.researchobject.org/ http://www.slideshare.net/carolegoble/goble-keynote-vivoscits2014
  26. 26. Research Objects – Method + Materials Bechhofer 2013, Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004
  27. 27. Software Production Unsustainable software Research Software written by non-software engineers. Software Sustainability Institute www.software.ac.uk
  28. 28. Research software is used by 90% of researchers and accounts for at least 30% of RCUK investment… http://www.phdcomics.com/comics/archive/phd031214s.gif Reusability Trust Building on prior work
  29. 29. through each SNP of interest = 0; $x < scalar @pos; $x++) then each downstream SNP of interest my $y = $x+1; $y < scalar @pos; $y++) Behind every great piece of science… if SNPs within our chosen distance (500kb) and both present in the haplotypes file if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”; #create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filenamen”; unless (-e $filename) { open(OUT, ">$filename"); #####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }
  30. 30. Maintenance = Evolution all code becomes legacy, prepare to repair code that is used evolves • Testing, infrastructure, documentation • Corrective: – fixing faults • Preventative: – increasing maintainability • Adaptive: – adapting to changes in environment • Perfective: – meeting new/different user requirements
  31. 31. [Norman Morrison] Software entropy Time, resource and capacity to tackle
  32. 32. http://xkcd.com/974 Long-term sustainable codes for wider adoption vs Fast-return codes for me. Working to working, jam today more jam tomorrow, just enough just in time, tailoring Time, resource & capacity to tackle sustaining
  33. 33. Rule 1: Don't Reinvent the Wheel Rule 2: Code Well Rule 3: Be Your Own User Rule 4: Be Transparent Rule 5: Be Simple Rule 6: Don't Be a Perfectionist Rule 7: Nurture and Grow Your Community Rule 8: Promote Your Project Rule 9: Find Sponsors Rule 10: Science Counts doi:10.1371/journal.pcbi.1002802
  34. 34. Software making practices Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute. “As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software” 2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc. ICSE Workshop Software Eng. for Computational Science and Eng., 2009, pp. 1–8.
  35. 35. The UK research community making software 56% Of UK researchers develop their own research software or scripts 73% Of UK researchers have had no formal software engineering training 140,000 UK researchers rely on their own coding skills Survey of researchers from 15 Russell Group universities conducted by SSI between August - October 2014. 406 respondents covering representative range of funders, discipline and seniority.
  36. 36. Training Unappreciated • Basic training for kitchen chef: 3-4 years • Head chef: 10 years • Basic training for s/w engineer: 3-4 years • Architect: 10 years Photo by ZagatBuzz Training in S/W Dev in UG Physics: 140 hours Training in S/W Dev in UG Geography: 0 hours Software Sustainability Institute
  37. 37. software-carpentry.org Software engineering life skills. Teach the “95% researchers” basic lab skills for scientific computing: the tools and techniques that will help them get more done in less time, and with less pain. Volunteer instructors / Bootcamps / Train the trainers / Free lesson materials
  38. 38. A one-man, small-scale software project -> multi-developer programme, transforming research into molecular binding. Christopher Woods, biochemist from the University of Bristol
  39. 39. Findable, Accessible, Interoperable, Reusable Software Software Sustainability Institute www.software.ac.uk
  40. 40. http://softwarediscoveryindex.org/
  41. 41. Required as condition of publication Code Sharing Policy 170 journals, 2011-2012 StoddenV, Guo P, Ma Z (2013) Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoSONE 8(6): e67111. doi:10.1371/journal.pone.0067111 Implied No mention Required but may not affect decisions Explicitly encouraged may be reviewed and/or hosted
  42. 42. The accessibility of software in articles 87% software could be found 20% mentions provided version data 5% mentions provided actual version 20% inaccessible in any form 37% source code access 20% permission to modify 90 biology journal articles Howison and Bullard 2014 The visibility of software in the scientific literature: how do scientists mention software and how effective are those mentions? J Assoc fo Info Science and Technology In review
  43. 43. Barriers to Sharing Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4) Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
  44. 44. http://matt.might.net/articles/crapl/ http://sciencecodemanifesto.org/
  45. 45. I can publish every new monolithic thing I make and I can’t publish if I reuse someone else’s thing. Novelty > sustainability Reward for Reusing as well as Reusability
  46. 46. Recognition Software Software developers Software Sustainability Institute www.software.ac.uk
  47. 47. The invisibility of software esp software that is widely used, infrastructural, components or cross-discipline [1960s Boeing 747-100 Software Configuration]
  48. 48. Credit is like Love not $$$$$ • Where do I get credit for my software? • Where is my name on your 5* journal paper? • Did you cite my software? • Niche credit and credit in private  • Novel disposable code vs reliable software • Subsidized/spare time software development
  49. 49. The visibility of software in articles 37% formal citation 78% some form of credit Articles 87% software could be found Journals 24% had citation policy Requests 18% offered preferred citation 90 biology journal articles 32% who cited ignored it Howison and Bullard 2014 The visibility of software in the scientific literature: how do scientists mention software and how effective are those mentions? J Assoc fo Info Science and Technology In review
  50. 50. Original Software Publication
  51. 51. http://www.rse.ac.uk/ “Skilled software developers and technicians are crucial to modern scientific endeavour so developing appropriate and appealing career paths and incentives and allocating appropriate operational budgets, taking these into account at the investment decision stage, and then committing to deliver them, are essential.” Scientific Infrastructure, UK HOUSE OF LORDS, Select Committee on Science and Technology, 2nd Report of Session 2013–14
  52. 52. regulation of science institutions universities libraries publishers public services core facilities republic of science* *Merton’s four norms of scientific behaviour (1942) NSF Scientific Software Innovation Institutes (S2I2)
  53. 53. Sustainability models Cultivating Communities Contributory citizenship Better Funding, Better Software…. Software Sustainability Institute www.software.ac.uk
  54. 54. Technical Preservation Emulation Migration Cultivation Hiberation Depreciation Sustainability Approaches http://www.software.ac.uk/resources/approaches-software-sustainability
  55. 55. Cultivating Sustainable Communities Ladder Model of OSS Adoption (adapted from Carbone P., Value Derived from Open Source is a Function of Maturity Levels) [FLOSS@Sycracuse]
  56. 56. Developer Community: to enable Critical Mass User Community: to drive Resources & Time
  57. 57. Cultivate Contributors – R project • Basics: Website, mailing list, code repository, issue resolution, meet ups • Remove barriers to participation, increase efficiency • 1993: First public release; 2 devs • 1995: Code open sourced; 3 devs • 1996: r-testers list set up • 1997: lists split: r-announce, r-help, r-devel; public CVS; 11 devs • 2000: CRAN split and mirror • 2001: BioConductor • 2003: Namespaces • 2005: I8n, L8n • 2007: R-Forge • Today: BioConductor (33 core devs), R-Forge (532 projects, 1562 devs), CRAN (1400+ packages) http://cran.r-project.org/doc/html/interface98-paper/paper_2.html
  58. 58. People depend on my software?! But I’m a researcher… Software is free, like free puppies. s/w engineers are cute too. I didn’t intend to be a long term service provider.. [Scott McNealy, 2005] http://www.zdnet.com/open-source-is-free-like-a-puppy-is-free-says-sun-boss-3039202713/
  59. 59. Shared Responsibility Incidental Adoption intentions Familial Fundamental Autocratic Control intentions Adoptive community Selfish Contribute Contribution intentions Me Collaborate / Sponsor Family & Friends Acquaintances Strangers, Rivals Incidental Cooperative
  60. 60. Shared Responsibility Incidental Adoption intentions Familial Fundamental Autocratic Control intentions Adoptive community Selfish Contribute Contribution intentions Me Collaborate / Sponsor Family & Friends Acquaintances Strangers, Rivals Incidental Cooperative
  61. 61. Shared Responsibility Incidental Adoption intentions Familial Fundamental Autocratic Control intentions Adoptive community Selfish Contribute Contribution intentions Me Collaborate / Sponsor Family & Friends Acquaintances Strangers, Rivals Incidental Cooperative
  62. 62. Scale Community Interest Software Maturity Core funding Time Mind the gap
  63. 63. Sustainability Matrix/Mosaic/Mayhem Grant Portfolio Foundational Grants Fees Community Foundation Related Grants Commercial PPP Contracts Hobby Open development In Kind Students Licensing Donations Co-operative development Forked extensions
  64. 64. Sustainability through Subsidy The Core becomes Skunkworks
  65. 65. What can we do? Software Sustainability Institute www.software.ac.uk
  66. 66. Drivers Activities Outcomes Researcher Skills Reliability of Results Motivating Software Protecting Investment Improved researcher capability Greater reuse of software World leading science Recognition for developers Increased RSE capacity Research Software Engineers Research Software Training Community Policy
  67. 67. http://www.force11.org

×