SlideShare una empresa de Scribd logo
1 de 28
Revolutionizing the Journal 
through Big Data 
Computational Research 
Amye Kenall 
Journal Development Manager, Open Data 
DataCite Annual Conference 
Inist-CNRS 
Vandoeuvre-lès-Nancy, France 
26 August 2014
2 
Who are we? 
• Founded in 2000 (bought by Springer in 2008) 
• Publish over 260 open access journals 
• ~25,000 peer reviewed research articles published annually 
• Genomics and computational biology are a significant fraction 
e.g. Genome Biology, BMC Genomics, BMC Bioinformatics 
• Other key fields include 
• Public Health / Global Health / Infectious Disease 
• Cancer 
• All research articles are CC-BY licensed for reuse 
• Since mid 2013, all data is covered by a CC0 rights waiver
3 
Data reuse @BioMedCentral 
• Strong encouragement to 
authors of all journals to provide 
underlying datasets and 
required on a select number (eg. 
Genome Biology, Genome 
Medicine, GigaScience) 
• CC0 + CC-BY 4.0 by default 
In the works… 
• Interactive tabular data 
• DOIs for all additional files 
• Searchability of additional files 
• Data Citation clearly tagged in 
XML to aid harvesting 
e.g. Data Citation Index 
• Availability of Data section and Data 
Citation 
• Encourage use of ISA-TAB (especially 
GigaScience and BMC Research Notes)
4
5 
Journal, data-platform and database for large-scale 
data 
In conjunction with
6
7 
Linking and Citation
8 
Publishing Reproducible Science: 
SOAPdenovo2, a case study
9
10
11
12
13
14 
Lessons Learned? 
• With enough work, results can be replicated with a push of a button. 
• But a lot of work costs a lot of money! No one would pay an APC that reflects that 
cost. 
• Learn a huge amount about the study and provides a lot of information not 
present in the paper. 
• Needs to happen before publication.
15 
Reproducibility of computational research 
• Computational research in principle 
should be easier to replicate/reproduce 
than bench studies 
• However, practical issues get in the way 
• Even if source code is shared, 
reproducing entire technical 
setup/porting software, gathering 
appropriate input data, rerunning 
analysis is a significant effort 
• This means readers and even 
reviewers don’t bother 
• We would like to reduce this 
‘activation energy’
16 
Strong interest from potential partners
17 
Key technologies
18 
+ + Article 
Technologies Partners 
Journal
19
20
21
22
23
24 
Flexible management/deployment of packaged data/analysis suites 
using VM infrastructure
25 
Complementary roles of publishers, academia, and 
cloud providers 
• Publishers have role in enforcement of community 
standards 
• Public/academic databases can provide credible long term 
archiving for key data with a focus on curation and 
metadata standards 
• Academic grid computing infrastructure can provide access 
for researchers to large-scale computing resource 
• Commercial cloud providers universalize/democratize 
access to large-scale computing. Even if you are not at an 
institution with its own facilities, you can carry out high-end 
computations. No bureaucracy/politics – simply pay per 
CPU-hour.
26 
Specific challenges with respect to data 
• To what extent can/should datasets be included in the VM/suite or pulled 
in externally? 
• How can we avoid the costliness of moving data around, as it gets bigger 
and bigger? 
• To what extent are cross-domain standards for referring to and pulling in 
underlying datasets feasible. Dataset DOIs typically point to metadata 
• Multiple versions of datasets. To what extent is it practical, when dealing 
with evolving datasets/databases, to make them available as reproducible 
snapshots? 
• Culture of data sharing. How to get authors to share their data?
27 
Conclusions 
• With big data and computational tools, research is becoming more 
“reproducible/reusable” 
• The infrastructure is out there; we need to do a better job of using it 
• What authors need to communicate their research is also changing, and as 
publishers we must respond 
• Clear publishers have a role, with other organisations, in setting some 
community standards 
• It took a few 100 years, but publishing is now getting exciting
28 
Questions? 
“One reason that the worldwide web worked was because people reused each 
other’s content in ways never imagined or achieved by those who created it. 
The same will be true of open data.” 
– Tim Berners-Lee and Nigel Shadbolt, The Times, New Year’s Eve 2011 
Amye Kenall 
Journal Development Manager (Open Data), BioMed Central 
@AmyeKenall (also @OpenDataBMC) 
amye.kenall@biomedcentral.com

Más contenido relacionado

La actualidad más candente

Challenges of large scale operations
Challenges of large scale operationsChallenges of large scale operations
Challenges of large scale operations
Dathan Greenwood
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
Alex Hardisty
 
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Jisc
 

La actualidad más candente (20)

Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
Big Data R&D Strategy - Ensure the long term sustainability, access, and deve...
 
MANTRA for Change
MANTRA for ChangeMANTRA for Change
MANTRA for Change
 
Data users, data producers
Data users, data producersData users, data producers
Data users, data producers
 
Presenting RISE
Presenting RISEPresenting RISE
Presenting RISE
 
Connected health cities
Connected health citiesConnected health cities
Connected health cities
 
DMPOnline by Sarah Jones
DMPOnline by Sarah JonesDMPOnline by Sarah Jones
DMPOnline by Sarah Jones
 
RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015
 
SC2 Workshop 2: GODAN Action Project
SC2 Workshop 2: GODAN Action ProjectSC2 Workshop 2: GODAN Action Project
SC2 Workshop 2: GODAN Action Project
 
Developing Research Data Management Policy and Services
Developing Research Data Management Policy and ServicesDeveloping Research Data Management Policy and Services
Developing Research Data Management Policy and Services
 
Challenges of large scale operations
Challenges of large scale operationsChallenges of large scale operations
Challenges of large scale operations
 
Open data who decides?
Open data who decides?Open data who decides?
Open data who decides?
 
Davey2009
Davey2009Davey2009
Davey2009
 
International scholarly infrastructures
International scholarly infrastructuresInternational scholarly infrastructures
International scholarly infrastructures
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFC
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
SC2 Workshop 2: The European Open Science Cloud
SC2 Workshop 2: The European Open Science CloudSC2 Workshop 2: The European Open Science Cloud
SC2 Workshop 2: The European Open Science Cloud
 
European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science Cloud
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy update
 
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
Bristol's Research Data Service - Debra Hiom - Jisc Digital Festival 2014
 

Destacado

presentation for BPC
presentation for BPCpresentation for BPC
presentation for BPC
jjoyce
 
Scaling mondrian
Scaling mondrianScaling mondrian
Scaling mondrian
lucboudreau
 
Developing sustainable business models for institutions’ provision of open ed...
Developing sustainable business models for institutions’ provision of open ed...Developing sustainable business models for institutions’ provision of open ed...
Developing sustainable business models for institutions’ provision of open ed...
Dr Patrina Law
 

Destacado (17)

National and global public inclusive infrastructures
National and global public inclusive infrastructuresNational and global public inclusive infrastructures
National and global public inclusive infrastructures
 
Information Architecture class13 04 10
Information Architecture class13 04 10Information Architecture class13 04 10
Information Architecture class13 04 10
 
presentation for BPC
presentation for BPCpresentation for BPC
presentation for BPC
 
Infografik: Wie fit ist Deutschland für die Zukunft?
Infografik: Wie fit ist Deutschland für die Zukunft?Infografik: Wie fit ist Deutschland für die Zukunft?
Infografik: Wie fit ist Deutschland für die Zukunft?
 
Peter Kunzlik
Peter KunzlikPeter Kunzlik
Peter Kunzlik
 
From Macro to Micro: Greening Your Campus HANDOUT
From Macro to Micro: Greening Your Campus HANDOUTFrom Macro to Micro: Greening Your Campus HANDOUT
From Macro to Micro: Greening Your Campus HANDOUT
 
Story Testimonial Pitch
Story Testimonial PitchStory Testimonial Pitch
Story Testimonial Pitch
 
Scaling mondrian
Scaling mondrianScaling mondrian
Scaling mondrian
 
asdfasdf
asdfasdfasdfasdf
asdfasdf
 
Developing sustainable business models for institutions’ provision of open ed...
Developing sustainable business models for institutions’ provision of open ed...Developing sustainable business models for institutions’ provision of open ed...
Developing sustainable business models for institutions’ provision of open ed...
 
Prueba de portada
Prueba de portadaPrueba de portada
Prueba de portada
 
Digital badging at the OU
Digital badging at the OUDigital badging at the OU
Digital badging at the OU
 
In grammars we trust: LeadMine, a knowledge driven solution
In grammars we trust: LeadMine, a knowledge driven solutionIn grammars we trust: LeadMine, a knowledge driven solution
In grammars we trust: LeadMine, a knowledge driven solution
 
API-diskusjonen
API-diskusjonenAPI-diskusjonen
API-diskusjonen
 
Daily Newsletter: 15th December, 2010
Daily Newsletter: 15th December, 2010Daily Newsletter: 15th December, 2010
Daily Newsletter: 15th December, 2010
 
Narmada Kannan_Resume
Narmada Kannan_ResumeNarmada Kannan_Resume
Narmada Kannan_Resume
 
Applying testing mindset to software development
Applying testing mindset to software developmentApplying testing mindset to software development
Applying testing mindset to software development
 

Similar a Revolutionising the Journal through Big Data Computational Research

Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
World Agroforestry (ICRAF)
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)
Arvados
 

Similar a Revolutionising the Journal through Big Data Computational Research (20)

2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciences
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
From Open Access to Open Data
From Open Access to Open DataFrom Open Access to Open Data
From Open Access to Open Data
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit you
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiatives
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Australia's Environmental Predictive Capability
Australia's Environmental Predictive CapabilityAustralia's Environmental Predictive Capability
Australia's Environmental Predictive Capability
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)Curoverse Presentation at ICG-11 (November 2016)
Curoverse Presentation at ICG-11 (November 2016)
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
The Rise of the Data Journal
The Rise of the Data JournalThe Rise of the Data Journal
The Rise of the Data Journal
 

Más de Amye Kenall (6)

Open Badges: A fit for purpose credit mechanism
Open Badges: A fit for purpose credit mechanismOpen Badges: A fit for purpose credit mechanism
Open Badges: A fit for purpose credit mechanism
 
State of open research data open con
State of open research data   open conState of open research data   open con
State of open research data open con
 
Open data spotlight: Badges for open science
Open data spotlight: Badges for open scienceOpen data spotlight: Badges for open science
Open data spotlight: Badges for open science
 
Clarifying Contributorship through Digital Badges
Clarifying Contributorship through Digital BadgesClarifying Contributorship through Digital Badges
Clarifying Contributorship through Digital Badges
 
Big Data, Big Headaches: Data Privacy in the Genomic Era
Big Data, Big Headaches: Data Privacy in the Genomic EraBig Data, Big Headaches: Data Privacy in the Genomic Era
Big Data, Big Headaches: Data Privacy in the Genomic Era
 
The Open Data Revolution: Innovation in Research and Scholarly Publishing
The Open Data Revolution: Innovation in Research and Scholarly PublishingThe Open Data Revolution: Innovation in Research and Scholarly Publishing
The Open Data Revolution: Innovation in Research and Scholarly Publishing
 

Último

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 

Último (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 

Revolutionising the Journal through Big Data Computational Research

  • 1. Revolutionizing the Journal through Big Data Computational Research Amye Kenall Journal Development Manager, Open Data DataCite Annual Conference Inist-CNRS Vandoeuvre-lès-Nancy, France 26 August 2014
  • 2. 2 Who are we? • Founded in 2000 (bought by Springer in 2008) • Publish over 260 open access journals • ~25,000 peer reviewed research articles published annually • Genomics and computational biology are a significant fraction e.g. Genome Biology, BMC Genomics, BMC Bioinformatics • Other key fields include • Public Health / Global Health / Infectious Disease • Cancer • All research articles are CC-BY licensed for reuse • Since mid 2013, all data is covered by a CC0 rights waiver
  • 3. 3 Data reuse @BioMedCentral • Strong encouragement to authors of all journals to provide underlying datasets and required on a select number (eg. Genome Biology, Genome Medicine, GigaScience) • CC0 + CC-BY 4.0 by default In the works… • Interactive tabular data • DOIs for all additional files • Searchability of additional files • Data Citation clearly tagged in XML to aid harvesting e.g. Data Citation Index • Availability of Data section and Data Citation • Encourage use of ISA-TAB (especially GigaScience and BMC Research Notes)
  • 4. 4
  • 5. 5 Journal, data-platform and database for large-scale data In conjunction with
  • 6. 6
  • 7. 7 Linking and Citation
  • 8. 8 Publishing Reproducible Science: SOAPdenovo2, a case study
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14 Lessons Learned? • With enough work, results can be replicated with a push of a button. • But a lot of work costs a lot of money! No one would pay an APC that reflects that cost. • Learn a huge amount about the study and provides a lot of information not present in the paper. • Needs to happen before publication.
  • 15. 15 Reproducibility of computational research • Computational research in principle should be easier to replicate/reproduce than bench studies • However, practical issues get in the way • Even if source code is shared, reproducing entire technical setup/porting software, gathering appropriate input data, rerunning analysis is a significant effort • This means readers and even reviewers don’t bother • We would like to reduce this ‘activation energy’
  • 16. 16 Strong interest from potential partners
  • 18. 18 + + Article Technologies Partners Journal
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24 Flexible management/deployment of packaged data/analysis suites using VM infrastructure
  • 25. 25 Complementary roles of publishers, academia, and cloud providers • Publishers have role in enforcement of community standards • Public/academic databases can provide credible long term archiving for key data with a focus on curation and metadata standards • Academic grid computing infrastructure can provide access for researchers to large-scale computing resource • Commercial cloud providers universalize/democratize access to large-scale computing. Even if you are not at an institution with its own facilities, you can carry out high-end computations. No bureaucracy/politics – simply pay per CPU-hour.
  • 26. 26 Specific challenges with respect to data • To what extent can/should datasets be included in the VM/suite or pulled in externally? • How can we avoid the costliness of moving data around, as it gets bigger and bigger? • To what extent are cross-domain standards for referring to and pulling in underlying datasets feasible. Dataset DOIs typically point to metadata • Multiple versions of datasets. To what extent is it practical, when dealing with evolving datasets/databases, to make them available as reproducible snapshots? • Culture of data sharing. How to get authors to share their data?
  • 27. 27 Conclusions • With big data and computational tools, research is becoming more “reproducible/reusable” • The infrastructure is out there; we need to do a better job of using it • What authors need to communicate their research is also changing, and as publishers we must respond • Clear publishers have a role, with other organisations, in setting some community standards • It took a few 100 years, but publishing is now getting exciting
  • 28. 28 Questions? “One reason that the worldwide web worked was because people reused each other’s content in ways never imagined or achieved by those who created it. The same will be true of open data.” – Tim Berners-Lee and Nigel Shadbolt, The Times, New Year’s Eve 2011 Amye Kenall Journal Development Manager (Open Data), BioMed Central @AmyeKenall (also @OpenDataBMC) amye.kenall@biomedcentral.com

Notas del editor

  1. More detail of infrastructure.
  2. Linking.
  3. Just as OA reduces activation energy to look at a paper
  4. iPython iPythonNotebook Python, iPython Galaxy Galaxy galaxy Taverna, R/Shiny R R R ROpenSci ROpenSci MATLAB SCaViS VMs VMs matplotlib Plotly deployment-technologies