Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 62 Anuncio

Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

Descargar para leer sin conexión

Scott Edmunds, HKU Open Access Week seminar: Experiences from the front-line of Open Access & Open Data publishing. 19th October 2015

Scott Edmunds, HKU Open Access Week seminar: Experiences from the front-line of Open Access & Open Data publishing. 19th October 2015

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing. (20)

Anuncio

Más de GigaScience, BGI Hong Kong (20)

Más reciente (20)

Anuncio

Scott Edmunds, HKU Open Access Week: Experiences from the front-line of Open Access & Open Data publishing.

  1. 1. 0000-0001-6444-1436 @SCEdmunds scott@gigasciencejournal.com Experiences from the front-line of Open Access & Open Data publishing.
  2. 2. www.gigasciencejournal.com Journal, data-platform and database for large-scale data Editor-in-Chief: Laurie Goodman Executive Editor: Scott Edmunds Commissioning Editor: Nicole Nogoy Lead Curator: Chris Hunter Data Platform: Peter Li in conjunction with
  3. 3. What do publishers do?
  4. 4. What do publishers do? Apologies: http://scholarlykitchen.sspnet.org/2014/10/21/updated-80-things-publishers-do-2014-edition/ the scholarly chicken (tl;dr version)
  5. 5. 1. http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.1001747 Are publishers really adding value?
  6. 6. Need to move beyond 350 year old incentive systems Buckheit & Donoho: Scholarly articles are merely advertisement of scholarship. The actual scholarly artifacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible.
  7. 7. Consequences: increasing number of retractions >15X increase in last decade 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Retracted Science and the Retraction Index ▿ http://iai.asm.org/content/79/10/3855.abstract?
  8. 8. Consequences: increasing number of retractions >15X increase in last decade At current % > by 2045 as many papers published as retracted 1. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 2. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
  9. 9. STAP paper demonstrates problems: Nature Editorial, 2nd July 2014: “We have concluded that we and the referees could not have detected the problems that fatally undermined the papers. The referees’ rigorous reports quite rightly took on trust what was presented in the papers.” http://www.nature.com/news/stap-retracted-1.15488
  10. 10. STAP paper demonstrates problems: …to publish protocols BEFORE analysis …better access to supporting data …more transparent & accountable review …to publish replication studies Need:
  11. 11. JIFBAIT Network more GWAS GWAS JIFBAIT NEWS Arsenic Life forms, will they take over the planet? Which Overhyped, Unreproducible Experiment Are You? Want rapid citations for 2 years only? Carry out this quiz. You got: STAP Cells Of course dipping cells in coffee will make them pluripotent. Even if the research gets discredited, it’ll still get 100’s of citations in two years.
  12. 12. Reward the commons instead? Open-DataOpen-Source Open-Review Open-Access
  13. 13. HK: good with some parts of open… http://hub.hku.hk/
  14. 14. Closed v Open Access [the HKU edition] Ye Old Journal Closed Access, Subject Specific Open Access, public engaging
  15. 15. Closed v Open Access [the HKU edition] Closed Access, Subject Specific Open Access, public engaging
  16. 16. What is impact? • Accessed (some >84,000) • Cited (some >500) • Altmetric scored (some >100) • Influential, educational reproducible & reused • Covered in Int. media (Wired, LA Times, NYT, NBC…) But no impact factor Papers very highly:
  17. 17. What is the cost of the Journal Impact Factor?
  18. 18. 1. http://dx.doi.org/10.1087/20110203 2. http://blog.thegrandlocus.com/2014/10/a-flurry-of-copycats-on-pubmed 3. http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/ What is the cost of the Journal Impact Factor? JIF 2 = $10,000 USD JIF 5 = $20,000 USD Buy Sell C/N/S = $30,000 USD JIF 10 = $1,500 USD
  19. 19. 1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentiv This could never happen in Hong Kong, right? “While we are rightly proud of Hong Kong’s highly regarded and ranked universities system, we are not immune to the same pressures. While funders in Europe have moved away from using citation based metrics such as JIF in their research assessments, the Hong Kong University Grants Committee states in their Research Assessment Exercise guidelines that they may informally use it.”
  20. 20. 1. http://www.scmp.com/comment/insight-opinion/article/1758662/china-must-restructure-its-academic-incentiv This is happening in Hong Kong! JIF 2 = $8,000 USD JIF 5 = $15,000 USD Buy
  21. 21. Specific things we should be rewarding:
  22. 22. • Review • Data • Software • Models • Pipelines • Re-use… = Credit } Credit where credit is overdue: “One option would be to provide researchers who release data to public repositories with a means of accreditation.” “An ability to search the literature for all online papers that used a particular data set would enable appropriate attribution for those who share. “ Nature Biotechnology 27, 579 (2009) New incentives/credit
  23. 23. Not just carrots… “The data discovery index (DDI) enabled through bioCADDIE is to do for data what PubMed (and PubMed Central) did for the literature.”
  24. 24. GigaSolution: deconstructing the paper www.gigadb.org www.gigasciencejournal.com Utilizes big-data infrastructure and expertise from: Combines and integrates (with DOIs): Open-access journal Data Publishing Platform Data Analysis Platform Open Review Platform
  25. 25. Open peer review 1. Transparency
  26. 26. The only drawback? End reviewer 3 Downfall parody videos, now! 1. Transparency Open peer review
  27. 27. Reward open & transparent review Data from similar scope open/closed review journals in BMC Series shows ~5- 10% harder to get referees for open review. (data from Tim Sands at BMC) • Good data showing no difference in acceptance/rejection rates, but better quality reviews. • Does take marginally longer to find reviewers (and for them to return reports). BMC Series Medical Journals
  28. 28. Publons + AcademicKarma = credit for reviewers efforts http://publons.com/ 1. Transparency/open peer review http://academickarma.org/ NOW WITH DOIs
  29. 29. arXiv + blogged reviews = real-time open-review 1. Transparency
  30. 30. 1. Transparency Reward pre-prints
  31. 31. http://tmblr.co/ZzXdssfOMJfy arXiv + blogged reviews = real-time open-review 1. Transparency
  32. 32. 2. Reward Open Data
  33. 33. Data Publishing: nothing new… Data & Metadata Collection/Experiments Analysis/Hypothesis/Analysis Conclusions + Area of Interest/Question 1839 1859 20 Yrs.
  34. 34. Data Publishing: Can be Life or Death Climate change, global hunger, pollution, cancer, disease outbreaks… http://www.nature.com/news/data-sharing-make-outbreak-research-open-access-1.16966
  35. 35. To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001 Our first DOI: To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  36. 36. Downstream consequences: “Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.” 1. Citations (~300) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons 4. Example for faster & more open science
  37. 37. 1.3 The power of intelligently open data The benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro- intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin– producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.
  38. 38. IRRI GALAXY Beneficiaries/users of our work
  39. 39. IRRI GALAXY Rice 3K project: 3,000 rice genomes, 13.4TB public data Feed The World With (Big) Data
  40. 40. OMERO: providing access to imaging data Already used by JCB. View, filter, measure raw images with direct links from journal article. See all image data, not just cherry picked examples. Download and reprocess. Need for better handling of imaging data
  41. 41. The alternative... ...look but don't touch Need for better handling of imaging data
  42. 42. Executable
  43. 43. Methods Answer Metadata softwareAnalysis (Pipelines) Workflows/ Environments Idea Study Rewarding the DOI, etc. Publication Publication Publication Data
  44. 44. Software https://github.com/gigascience Transparent Open & able to build upon Taking citeable snapshots @jeejkang
  45. 45. gigagalaxy.net Workflows Reward Sharing of Workflows
  46. 46. Visualisations & DOIs for workflows http://www.gigasciencejournal.com/series/Galaxy 49
  47. 47. Facilitate reproducibility, reuse & sharing & publish outputs of: Knitr, Sweave, Jupyter/iPython Notebook, etc. Open Documents Reward Open/Dynamic Workbooks
  48. 48. E.g. http://www.gigasciencejournal.com/content/3/1/3
  49. 49. E.g. http://www.gigasciencejournal.com/content/3/1/3
  50. 50. E.g. http://www.gigasciencejournal.com/content/3/1/3 Reviewer (Christophe Pouzat): “It took me a couple of hours to get the data, the few custom developed routines, the “vignette” and to REPRODUCE EXACTLY the analysis presented in the manuscript. With few more hours, I was able to modify the authors’ code to change their Fig. 4. In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewer’s job much more fun!
  51. 51. http://www.gigasciencejournal.com/content/3/1/23 http://www.gigasciencejournal.com/content/4/1/19 Virtual Machines • Downloadable as virtual harddisk/available as Amazon Machine Image • Now publishing container (docker) submissions
  52. 52. Taking a microscope to the publication process
  53. 53. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127612
  54. 54. Lessons Learned • Is possible to push button(s) & recreate a result from a paper • Most published research findings are false. Or at least have errors • Reproducibility is COSTLY. How much are you willing to spend? • Much easier to do this before rather than after publication
  55. 55. The cost of staying with the status quo? • Ioannidis estimate that 85% of research resources are wasted. • ~US$28B year unnecessarily spent on preclinical research in US. • Each retraction estimated to cost $400,000. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001747 http://elifesciences.org/content/3/e02956 http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
  56. 56. The cost to Hong Kong (and your career) of staying with the status quo? • Estimates lack of citation impact not being OA = 50% ($8.75B?)2 • Hong Kong ranked 54th in Global Open Data Index • How much are YOU losing through missing out on potential collaborations, wider engagement & unrepeatable work? HK UCG grant budget = $17.5 Billion HKD/yr (4% of Gov spending) Taking lowest reported reproducibility rates (11%) = >$15 billion wasted1 $$ $ 1. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html 2. http://www.ecs.soton.ac.uk/~harnad/Temp/research-australia.doc
  57. 57. Death to the Publication. Long live the Research Object! Manifesto for a reproducible publisher: The era of the 1665-style publication is over Open is the new black Credit FAIR data, not JIF-bait narrative Reward replication not advertising We need a recognizable mark/badge/scores for replication ?
  58. 58. Ruibang Luo (BGI/HKU) Shaoguang Liang (BGI-SZ) Tin-Lap Lee (CUHK) Qiong Luo (HKUST) Senghong Wang (HKUST) Yan Zhou (HKUST) Thanks to: @gigascience facebook.com/GigaScience blogs.biomedcentral.com/gigablog/ Peter Li Chris Hunter Jesse Si Zhe Rob Davidson Nicole Nogoy Laurie Goodman Amye Kenall (BMC) Marco Roos (LUMC) Mark Thompson (LUMC) Jun Zhao (Lancaster) Susanna Sansone (Oxford) Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford) www.gigadb.org gigagalaxy.net www.gigasciencejournal.com CBIIT Funding from: Our collaborators:team: (Case study) 61
  59. 59. Where: MakerBay, Yau Tong, Kowloon When: Monday, October 26th, 7:30pm Come to our next Open Science meetup: https://opendatahk.com/

Notas del editor

  • Ferric Fang of the University of Washington and his colleagues quantified just how much fraud costs the government 
    It turns out that every paper retracted because of research misconduct costs about $400,000 in funds from the US National Institutes of Health (NIH)—totaling $58 million for papers retracted between 1992 and 2012. 
    Scientific fraud incurs additional costs.
  • Ferric Fang of the University of Washington and his colleagues quantified just how much fraud costs the government 
    It turns out that every paper retracted because of research misconduct costs about $400,000 in funds from the US National Institutes of Health (NIH)—totaling $58 million for papers retracted between 1992 and 2012. 
    Scientific fraud incurs additional costs.
  • Thank you for listening.

×