Generative AI for Technical Writer or Information Developers
Doing Science Properly In The Digital Age - Rutgers Seminar
1. www.software.ac.uk
Doing Science
Properly in the
Digital Age
2 October 2012, Rutgers University
Neil Chue Hong (@npch)
N.ChueHong@software.ac.uk
Software Sustainability Institute
2. Four Paradigms of Research
www.software.ac.uk
Software Sustainability Institute
3. Software is pervasive
in research www.software.ac.uk
Software Sustainability Institute
4. Just the Nature of the problem?
www.software.ac.uk
Statistics courtesy of Jo Hannay et al, “How Do Scientists Develop and Use Scientific Software?
Maintenance is not fun
Published online 13 October 2010 | Nature 467, 775-777 Hacking new stuff is fun
(2010)
doi:10.1038/467775a
Software Sustainability Institute
5. The Software Sustainability
Institute www.software.ac.uk
A national facility for cultivating world-
class research through software
• Better software enables better research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Developing the policy and tools to
support the community developing and
using research software
Supported by EPSRC
Software Sustainability Institute Grant EP/H043160/1
6. UK Research Computing
Ecosystem www.software.ac.uk
People
Computing Software Communities Data Centres
…
Network/Collaboration
Instruments
Software Sustainability Institute
7. SSI Organisation
www.software.ac.uk
• Community Engagement (Shoaib Sufi)
Fellowship Programme
• Consultancy (Steve Crouch)
Open Call for Projects
Software Evaluation
• Policy (Simon Hettrick)
Guides and Case Studies
• Training (Mike Jackson)
Software Carpentry
Software Surgeries
• Collaboration between universities of Edinburgh, Manchester, Oxford and
Southampton
Software Sustainability Institute
8. Case Study: Ligand Binding
www.software.ac.uk
• Centre for Computational Chemistry, Bristol
New methods for rapid MC sampling of
biomolecular systems modelled using QM/MM
Developed two codes ProtoMS (F77) + Sire (C++)
Water-Swap Reaction Coordinate method to
calculate absolute protein-ligand binding free
energies
• SSI’s work is helping to scale development
ProtoMS and Sire both single developer codes
ASPIRE/ACQUIRE framework has multiple devs
• Split architecture between ASPIRE (adaptive
multiresolution hybrid MD simulation) and ACQUIRE
(WorkPacket scheduling system with optimisation
for time to result vs “green-ness”
• http://www.siremol.org/adaptive_dynamics
Software Sustainability Institute
9. Case Study: Brain Imaging
www.software.ac.uk
• Brain Research Imaging Centre, Edinburgh
Develop PrivacyGuard software, a DICOM
image deidentification toolkit
Created software to support new multispectral
colouring modulation and variance
identification technique (“MCMxxxVI”) to
identify white matter lesions that are indicative
of declining cognitive ability
BRIC are not principally software developers,
but do provide software to other researchers
• SSI’s work means the software has been reviewed and refactored
Looked at exploitation
• Usability review, Naming/trademark review
Made it easier for BRIC staff to maintain and develop
• Move to standard repositories, testing and documentation processes
• Examination of licencing for MCMxxxVI
• Extraction and refactoring to create standalone tools
• http://www.software.ac.uk/who-do-we-work/brain-research-imaging-centre-edinburgh
• http://www.bric.ed.ac.uk/
Software Sustainability Institute
10. Case Study:
Climate Policy Modelling www.software.ac.uk
• CIAS team at Tyndall Centre for Climate Change
Research, University of East Anglia
Develop linked climate and economic models for
detailed analysis
Their software was not ready to be used by other
groups
• One researcher/developer at UEA, several users
• SSI’s work means the software is robust enough that
it can be installed and used by others
Enabled use of the software by the
WWFN’sClimascope project and James Cook University
• Documented software to allow extensions by contributors
• Made it easier to maintain and backup
• Added job scheduling to improve modeling throughput
• New modelling framework enables new models i.e. new
science
• http://www.tyndall.ac.uk/research/cias
Software Sustainability Institute
11. Case Study: textual studies
www.software.ac.uk
• TextVRE team at CeRCH, Kings College London
Developed an environment which is used to integrate
various tools used in the e-Humanities textual studies
lifecycle
Builds on the German TextGrid project, and many
other existing tools
• SSI’s work means the software is can be run “out of
the box” – an important requirement for the
researchers
Developed a VM image containing the TextVRE
installation
• Improve installation instructions
• Develop tests to check each installed component
• Improve modularisation to allow others to contribute and
maintain
Feeding back work to TextGrid
• http://textvre.cerch.kcl.ac.uk
Software Sustainability Institute
12. The modern researcher…
www.software.ac.uk
• … worries about:
Data management
and analysis
Reproducible
research
Scalable simulations
Integration of
models and
workflows
Picture of Otto Stern of Collaboration
Emilio Segre Visual Archives
Software Sustainability Institute
13. Observation 1:
Software is www.software.ac.uk
pervasive across
research
Corollary: software is bleeding edge and long-tail
Demanding users are coming from arts + humanities,
economics, and social science as well as sciences
Software Sustainability Institute
14. Observation 2:
A culture of re-use
www.software.ac.uk
rather than re-
invention is not
widespread
Corollary: we have wasted effort and increased siloing
Software Sustainability Institute
15. Observation 3: www.software.ac.uk
Many people are
“embarrassed”
about software
Corollary: something is broken in the way we regard,
recognise and reward software
Software Sustainability Institute
16. SSI Drivers and Themes
www.software.ac.uk
• Two key drivers which cause people to seek the
SSI’s advice:
They want to be more productive in their research
They don’t want to be embarrassed by appearing
worse than their peers
• Broadly, our work falls into a few key themes:
The role and reward of software in research
Recognition of software career paths
Developing the scientific computing / software
development skill base
Software Sustainability Institute
17. The Foundations of
Digital Research www.software.ac.uk
Re- Re-usable
search Re-producible
Software Careers
Software Recognition /
Reward
Software Skills and Capability
Software Sustainability Institute
18. Gap 1: Software Skills Training
www.software.ac.uk
Research
Software Summer
Focussed Schools
Carpentry
(methods)
Who fills this gap?
HPC Short
Courses
MSc in HPC /
scientific
computing
Advanced HPC
Training
Programming
Focussed Programming Programming
(Tools) 101 201
Basic Advanced
Software Sustainability Institute
19. Software philosophy
as part of the process www.software.ac.uk
• Foundations of scientific computing in
undergraduate courses
Like presentation skills
• Methods of scientific computing in
postgraduate courses
Like statistics and ethics
• Show the benefits from the knowledge and
methods of digital research
Not just programming 101
Software Sustainability Institute
20. Best Practices
for Scientific Computing www.software.ac.uk
1. Write programs for people, not computers
2. Automate repetitive tasks
3. Use the computer to record history
4. Make incremental changes
5. Use version control
6. Don’t repeat yourself (or others)
7. Plan for mistakes
8. Optimise software only after it works correctly
9. Document the design and purpose of the code, rather than its
mechanics
10. Conduct code reviews
Paper (including the evidence) being submitted to arXiv and PNAS
http://arxiv.org/abs/1210.0530
Software Sustainability Institute
21. Gap 2: Lack of recognition
and reward www.software.ac.uk
• There is an anachronism in the way we conduct
and recognise research?
REF references software as an output but it is still not
easy to get recognition – peer review fails
• Software careers
Researchers who use software
Researcher-Developers
Research Software Engineers
Research Software Support
Research Systems Providers
Software Sustainability Institute
22. No recognition without reward, no
reward without reproducibility? www.software.ac.uk
• How do we reward people for important software contributions?
• Traditionally: publish a research paper that happens to mention
software
Can we provide more direct, acceptable software citations?
• A Research Software Impact Manifesto
http://www.software.ac.uk/blog/2011-05-02-publish-or-be-damned-
alternative-impact-manifesto-research-software
NB Authorship is hard
• It works for data!
C.f. Heather Piowowar’s work
http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0
000308
Software Sustainability Institute
23. Software Metapapers
www.software.ac.uk
• Create a complete scholarly record including “standard”
publication, method, dataset and models, and software
e.g. modelling and simulation, statistical analysis
Enable replay, reproduction and reuse
• Pragmatic approach is to create a metadata record for
the software, and link it to a copy of the software in
some storage infrastructure
This is a software metapaper
Peer-review the metadata, not the software
• Journal of Open Research Software:
http://openresearchsoftware.metajnl.com/
See: http://openresearchsoftware.metajnl.com/faq/
Software Sustainability Institute
and the work by B. Matthews et al: The Significant Properties of Software: A Study
24. Gap 3: Lack of support
infrastructure www.software.ac.uk
• For example: no digital repository which
satisfies the criteria:
Open to anyone in the UK to archive software
Software associated with an OSI license
Provide a unique, permanent identifier
Publishes a preservation/curation/sustainability
plan
• This is just deposit, not even preservation or
sustainability
Software Sustainability Institute
25. 5 Stars of Software?
www.software.ac.uk
• Do we need a 5 stars for software?
Existence – there is accurate
metadata that defines the software
Availability – you can access and run
the software
Openness – the software has an
open permissible license
Assured – the software provides
ways of assuring its correctness c.f.
5 Stars of Linked Data
Linked – the related data, (Berners-Lee)
dependencies and papers are 5 Stars of Online Journals
(Shotton)
indicated
Software Sustainability Institute
26. Gap 4: Software Maturity and
Management www.software.ac.uk
Not all software should make
it to the next stage
Software proliferation
Management changes through
time, requiring planning
Innovation Consolidation Customisation
Time
Software Sustainability Institute
27. A More Manageable Ecosystem
www.software.ac.uk
• Discourage duplicative software development in
research grants by rewarding reuse and long-term
development
Need to change perceptions so that software is seen as
valuable
But understand when it should not proceed to next stage
• Different stages should be managed and funded
separately
Maintenance vs. research vs. development
• A skilled researcher base is the key in the digital age
Create a larger proportion of enabled researchers and
provide the ramps to go from desktop to high-end
infrastructure
Allow and encourage specialism and collaboration
Software Sustainability Institute
28. Take home points www.software.ac.uk
1) Researchers are developing more software
than ever, and trying to do it better
2) We are not adequately providing the
training, recognition and reward, and career
paths to enable a step change improvement
in research software
3) This is hindering digital research
4) The only people who can change this
situation are peopleSustainability Institute
Software
like you!
29. A national facility for cultivating
world-class research through software www.software.ac.uk
Some current collaborations
Become our next collaborators!
Website: www.software.ac.uk
Email: info@software.ac.uk
Twitter: twitter.com/SoftwareSaved
Software Sustainability Institute
Notas del editor
For thousands of years, research was empirical, using observation and experiment to describe natural phenomenaIn the last few hundred years, theory developed using models and generalisationsIn the last decades, computational simulation has made it possible to model complex phenomenaIn the last few years, data exploration – digital research – has unified experimental data, theory, and computational simulation to analyse the vast amounts of collected and generated information
Images courtesy of projects from the ENGAGE programme http://www.engage.ac.uk/
Statistics from Greg WilsonAre academics software developers?Can research consortia manage production?Are timing constraints different?What is the role of the PI in software development management?Are the skills for software and research the same?- more and more researchers use computer software and hardware intheir day to day research, not just those researchers who could beclassed as being computational scientists, yet they find itincreasingly difficult to exploit due to a lack of coordination([Gob10], also observed in [Han09])- there is a wide variance in the levels of experience in scientificcomputing and software development, and hence their use of computing,which is present across all domains and levels of seniority ([Har09],also ongoing as our result with the DIRAC consortium)- software is often treated as if it was disposable, rather than thesubject of a £9m per year investment by EPSRC [SaaI12]
Software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more…Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchersProviding services for research software users and developersDeveloping research community interactions and capacityPromoting research software best practice and capability
Transferring software knowledge is not easyhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2882045/Compare fused pairs of different MR sequences modulated in red-green colour space which enhances tissue discrimination
Transferring software knowledge is not easyhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2882045/Compare fused pairs of different MR sequences modulated in red-green colour space which enhances tissue discrimination
Collaboration helps sustainability
Collaboration helps sustainability
Update slide for surveymapper?
Update slide for surveymapper?
http://www.flickr.com/photos/esva/2364906768
CPD?
Ultimately the Software Sustainability Institute would like to seebasic scientific computing to be taught in the same way thatstatistics are a fundamental part of any researchers toolbox. Likewisean understanding of software programming should be seen as equivalentto the understanding of presenting and disseminating your work whichis expected of graduates.A basic syllabus and list of recognised teaching providers ensuresthere is a way of providing excellent foundation training inscientific computing via the CDTs. Specialist interdisciplinaryscientific computing CDTs which concentrate on instilling the bestcomputational, data analysis and software development techniques intheir doctoral students will provide the UK with the next generationof world-class scientists.
Being submitted to PNAS
c.f work of James Howison
C.f.5 Stars of Linked Data (Berners-Lee):Available w/ open license, machine-readable, non-proprietary format, open standards, linked to provide context 5 Stars of Online Journals (Shotton):Peer Review, Open Access, Enriched Content, Available Datasets, Machine-readable metadataWhat about community?
Become our next collaborator – email info@software.ac.uk