This document summarizes a study on the persistence and availability of bioinformatics web services. The study analyzed over 900 web services listed in the Nucleic Acids Research journal between 2003-2009. It found that 17% of the original web addresses were no longer reachable. More recent services had higher quality standards but 24% of authors said their services would not be maintained long-term. The document provides recommendations for web service authors to improve long-term availability, such as using persistent URLs, releasing source code, and planning for the future maintenance of the service.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Schultheiss bosc2010 persistance-web-services
1. Persistence and availability of
bioinformatics web services
Sebastian J. Schultheiss BOSC 2010
<sebi@tue.mpg.de> Boston, July 9–10
TL-Stiftung
Saturday, July 10, 2010 1
2. The Study
S. J. Schultheiss et al. (2010) PLoS Comp Biol (under review)
‣ Curated data set: Nucleic Acids Research
Web Server Issues 2003-2009
‣ 927 web services
‣ 322 institutions
‣ 39 countries
‣ 827 corresponding authors (274 replies)
Saturday, July 10, 2010 2
3. The Problem
Veretnik et al. (2008) PLoS Comp Biol 4:e1000136
‣ Original web address unreachable for
17% of services
Saturday, July 10, 2010 3
4. The Problem
‣ Research is based on existing services
‣ Reproducibility, comparability
‣ Improving methods made difficult
Saturday, July 10, 2010 4
5. Developments over Time
‣ More recent services are still reachable,
have higher quality standards
‣ NAR publishing policies became stricter
Saturday, July 10, 2010 5
8. Average Citations
43
S. J. Schultheiss et al. (2010) PLoS Comp Biol (under review)
40
Average Citations
Reachable: Average Citations
Unreachable: Average Citations
20% 19% % Unreachable
16%
13%
18
17
14 9%
12 12 11
9 8 9
7 7 7
4
2003 2004 2005 2006 2007
Saturday, July 10, 2010 8
9. Survey among NAR Authors
S. J. Schultheiss et al. (2010) PLoS Comp Biol (under review)
‣ 64% of services used by researchers
without computational background
‣ 58% of services developed by students
only, difficult to maintain after graduation
‣ 24% of services will not be maintained
Saturday, July 10, 2010 9
10. Web Service Problems
# %
Unreachable, web site down 132 48%
No example datasets 110 40%
No help text/documentation 109 40%
Implausible arrangement of interface elements/not intuitive to use 99 36%
Too stringent limitations (e.g. on file size, number of sequences, ...) 87 32%
Processing/waiting time unreasonably long 77 28%
No response upon personal e-mail or on mailing list, no support 66 24%
Bad design choices (colors, size of edit fields, ...) 53 19%
Missing contact information 24 9%
Saturday, July 10, 2010 10
11. Ten Simple Rules
S. J. Schultheiss (2010) PLoS Comp Biol (under review)
1. Consider a stand-alone version
2. Know your audience
3. Use an existing framework
4. Make it portable
5. Provide documentation and assistance
Saturday, July 10, 2010 11
12. Ten Simple Rules
S. J. Schultheiss (2010) PLoS Comp Biol (under review)
6. Assist users and involve the community
7. Be explicit about changes
8. Leave a forwarding address
9. Find someone else to do it
10. Plan the end of the service life cycle
Saturday, July 10, 2010 12
13. Summary
‣ Available services cited 2.2x more often
‣ Authors: use persistent URLs (PURL, DOI,
own domain, ...) and release source code
‣ Prepare to hand over responsibilities
‣ LT score predicts reliability of services for
editors and reviewers
Saturday, July 10, 2010 13
14. Poster
Acknowledgements T5
‣ Marc-Christian Münch and Gergana Andreeva
‣ Gunnar Rätsch and AG Rätsch/MLB at the FML
‣ Oliver Kohlbacher, WSI, University of Tübingen
‣ TL Foundation and its Board at University of Tübingen
‣ ISCB, DOE and NSF for a Travel Fellowship
TL-Stiftung
Saturday, July 10, 2010 14