SlideShare a Scribd company logo
1 of 38
BD2K and why bioinformatics matters
relevance to Australia
EMBL - Australia AHM 2016
Vivien Bonazzi
Senior Advisor for Data Science Technologies
ADDs (Assoc. Director for Data Science) Office
Office of the Director (OD)
National Institutes of Health (NIH)
The NIH Data Commons
Digital Ecosystems for using and sharing FAIR Data
EMBL - Australia AHM 2016
Vivien Bonazzi
Senior Advisor for Data Science Technologies
ADDs (Assoc. Director for Data Science) Office
Office of the Director (OD)
National Institutes of Health (NIH)
http://datascience.nih.gov/bd2k
A word about BD2K
What’s driving the need for a
Data Commons?
Convergence of factors
Mountains of Data
Increasing need and support for Data sharing
Availability of digital technologies and
infrastructures that support Data at scale
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-management/nci-
policies/genomic-data
Requires public sharing of genomic data sets
9
Recommendation #4: A national cancer data ecosystem for sharing and analysis.
Create a National Cancer Data Ecosystem to collect, share, and interconnect a broad
array of large datasets so that researchers, clinicians, and patients will be able to both
contribute and analyze data, facilitating discovery that will ultimately improve patient
care and outcomes.
9
Challenges with Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIR
Findable, Accessible, Interoperable, Reproducible
Limited e-infrastructures to support FAIR data
What’s
Changing?
Digital
ecosystems
Development of the
NIH Data Commons
 How do we find data, software, standards?
 How can we make (large) data, annotations, software,
metadata accessible?
 How do we reuse data, tools and standards?
 How do we make more data machine readable?
 How do we leverage existing digital technologies systems,
infrastructures?
 How do we collaborate?
 How do we enable digital ecosystem?
Changing the conversation around
Data sharing and access
NIH Data Commons
Data Commons
enabling data driven science
Enable investigators to leverage all possible data and tools
in the effort to accelerate biomedical discoveries, therapies
and cures
by
driving the development of data infrastructure and data
science capabilities through collaborative research and
robust engineering
Matthew Trunnel, FHC
Data Commons’s
Developing a Data Commons
 Treats products of research – data, methods, papers etc.
as digital objects
 These digital objects exist in a shared virtual space
• Find, Deposit, Manage, Share, and Reuse data,
software, metadata and workflows
 Digital object compliance through FAIR principles:
• Findable
• Accessible (and usable)
• Interoperable
• Reusable
The Data Commons
is a framework
that supports
FAIR data access and sharing
and
fosters the development
of a digital ecosystem
https://datascience.nih.gov/commons
The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Current Data Commons Pilots
Current Data Commons Pilots
Explore feasibility of the Commons Framework
Facilitate collaboration and interoperability
Making large and/or high impact NIH funded data sets and tools
accessible in the cloud
Developing Data and Software indexing methods
Leveraging BD2K Efforts: bioCADDIE and others.
Collaborating with external groups
Provide access to cloud (IaaS) and PaaS/SaaS via credits
Connecting credits to the grants system
Reference Data Sets Pilot
Large, High-Impact Datasets in the Cloud
Commons Framework Pilots
Software and Services
Commons Framework
• FAIRness Metrics
• Data-object registry
• Interoperability of APIs
• Workflow sharing and docker registry
• Commons Framework Publications
Resource Search & Indexing
Discoverability of data and software
Cloud Credits Model
$ denominated NIH credits to use
cloud resources (IaaS) and services (PaaS/SaaS)
The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Authorization /authentication layer
Digital Ecosystem
Considerations and
Concluding Thoughts
Considerations
 Metrics – Understanding and accounting of data usage patterns
 Cost
• Cloud Storage
• Pay for use cloud compute (NIH credits pilot)
• Indirect costs for cloud
 Hybrid Clouds – Institution (private) and commercial (public) clouds
 Managing Open vs Controlled access data
• Auth: single sign on - dreams/nightmares?
 Archive vs Working and versioning Copies of data
 Interoperability with other Commons (clouds)
 Standards – Metadata, UIDs, APIs
 Discoverability – Finding digital objects across clouds
 Interfaces – For users with different needs and capabilities
 Consent – Reconsenting data, Dynamic consents?
 Policies
• Data sharing policies that are useful and effective
• Keep pace with use of technology (e.g. dbGAP data in the Cloud)
 Incentives
• Access to, and shareability of FAIR Data as part of NIH grant review
criteria
 Governance – Community involvement in governance models
 Sustainability – Long term support
Relevance to Australia?
Relevance to Australia
 The value of Australian Data *
 Unique flora and fauna
 e.g Marsupials
 Indigenous Australians
 Understanding of genomic structure – health & disease
 Medicinal products
 Making this data (securely) available
 With high quality annotation and metadata
 Attributions to original authors
 On the cloud
 Via open standard APIs
 Aggregation of data via an Australian wide Commons?
Authorization /authentication layer
Oz Digital Ecosystem
Summary
 We need an unprecedented level of convergence and
collaboration to drive biomedical science to the next level.
 Supporting this model of data-intensive collaborative science
requires a shift in academic research culture and new
investments in data infrastructure and capabilities.
Matthew Trunnel, FHC
Acknowledgments
• ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu
Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB),
Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI),
Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
Stay in
Touch
QR Business Card
LinkedIn
@Vivien.Bonazzi
Slideshare
Blog
(Coming soon!)

More Related Content

What's hot

Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
SEAD
 
Supporting UC Research Data Management
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Management
slabrams
 

What's hot (20)

D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
BD2K Update
BD2K Update BD2K Update
BD2K Update
 
Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)
 
Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015
 
SEAD slide set (October 2011)
SEAD slide set (October 2011)SEAD slide set (October 2011)
SEAD slide set (October 2011)
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
ESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and ToolsESA14 Workshop on SEAD's Data Services and Tools
ESA14 Workshop on SEAD's Data Services and Tools
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
 
Supporting UC Research Data Management
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Management
 
Imaging dearry ncrdc 11062017
Imaging dearry ncrdc  11062017Imaging dearry ncrdc  11062017
Imaging dearry ncrdc 11062017
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 

Similar to EMBL Australian Bioinformatics Resource AHM - Data Commons

CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECAProject
 

Similar to EMBL Australian Bioinformatics Resource AHM - Data Commons (20)

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
The NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training EnvironmentThe NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training Environment
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
The Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to DateThe Largest General Translational Informatics Public Private Partnership to Date
The Largest General Translational Informatics Public Private Partnership to Date
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 
Open Data is not Enough (final version)
Open Data is not Enough (final version)Open Data is not Enough (final version)
Open Data is not Enough (final version)
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Infrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDAInfrastructure, relationships, trust, and RDA
Infrastructure, relationships, trust, and RDA
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
Workshop intro090314
Workshop intro090314Workshop intro090314
Workshop intro090314
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 

Recently uploaded

THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 

Recently uploaded (20)

PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 

EMBL Australian Bioinformatics Resource AHM - Data Commons

  • 1. BD2K and why bioinformatics matters relevance to Australia EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science Technologies ADDs (Assoc. Director for Data Science) Office Office of the Director (OD) National Institutes of Health (NIH)
  • 2. The NIH Data Commons Digital Ecosystems for using and sharing FAIR Data EMBL - Australia AHM 2016 Vivien Bonazzi Senior Advisor for Data Science Technologies ADDs (Assoc. Director for Data Science) Office Office of the Director (OD) National Institutes of Health (NIH)
  • 4. What’s driving the need for a Data Commons?
  • 5. Convergence of factors Mountains of Data Increasing need and support for Data sharing Availability of digital technologies and infrastructures that support Data at scale
  • 6.
  • 7.
  • 8. https://gds.nih.gov/ Went into effect January 25, 2015 NCI guidance: http://www.cancer.gov/grants-training/grants-management/nci- policies/genomic-data Requires public sharing of genomic data sets
  • 9. 9 Recommendation #4: A national cancer data ecosystem for sharing and analysis. Create a National Cancer Data Ecosystem to collect, share, and interconnect a broad array of large datasets so that researchers, clinicians, and patients will be able to both contribute and analyze data, facilitating discovery that will ultimately improve patient care and outcomes. 9
  • 10.
  • 11.
  • 12. Challenges with Biomedical Data The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data
  • 14. Development of the NIH Data Commons
  • 15.  How do we find data, software, standards?  How can we make (large) data, annotations, software, metadata accessible?  How do we reuse data, tools and standards?  How do we make more data machine readable?  How do we leverage existing digital technologies systems, infrastructures?  How do we collaborate?  How do we enable digital ecosystem? Changing the conversation around Data sharing and access NIH Data Commons
  • 16. Data Commons enabling data driven science Enable investigators to leverage all possible data and tools in the effort to accelerate biomedical discoveries, therapies and cures by driving the development of data infrastructure and data science capabilities through collaborative research and robust engineering Matthew Trunnel, FHC
  • 18. Developing a Data Commons  Treats products of research – data, methods, papers etc. as digital objects  These digital objects exist in a shared virtual space • Find, Deposit, Manage, Share, and Reuse data, software, metadata and workflows  Digital object compliance through FAIR principles: • Findable • Accessible (and usable) • Interoperable • Reusable
  • 19. The Data Commons is a framework that supports FAIR data access and sharing and fosters the development of a digital ecosystem https://datascience.nih.gov/commons
  • 20. The Data Commons Framework Compute Platform: Cloud Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons
  • 22. Current Data Commons Pilots Explore feasibility of the Commons Framework Facilitate collaboration and interoperability Making large and/or high impact NIH funded data sets and tools accessible in the cloud Developing Data and Software indexing methods Leveraging BD2K Efforts: bioCADDIE and others. Collaborating with external groups Provide access to cloud (IaaS) and PaaS/SaaS via credits Connecting credits to the grants system
  • 23. Reference Data Sets Pilot Large, High-Impact Datasets in the Cloud
  • 25. Commons Framework • FAIRness Metrics • Data-object registry • Interoperability of APIs • Workflow sharing and docker registry • Commons Framework Publications
  • 26. Resource Search & Indexing Discoverability of data and software
  • 27. Cloud Credits Model $ denominated NIH credits to use cloud resources (IaaS) and services (PaaS/SaaS)
  • 28. The Data Commons Framework Compute Platform: Cloud Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons
  • 31. Considerations  Metrics – Understanding and accounting of data usage patterns  Cost • Cloud Storage • Pay for use cloud compute (NIH credits pilot) • Indirect costs for cloud  Hybrid Clouds – Institution (private) and commercial (public) clouds  Managing Open vs Controlled access data • Auth: single sign on - dreams/nightmares?  Archive vs Working and versioning Copies of data  Interoperability with other Commons (clouds)
  • 32.  Standards – Metadata, UIDs, APIs  Discoverability – Finding digital objects across clouds  Interfaces – For users with different needs and capabilities  Consent – Reconsenting data, Dynamic consents?  Policies • Data sharing policies that are useful and effective • Keep pace with use of technology (e.g. dbGAP data in the Cloud)  Incentives • Access to, and shareability of FAIR Data as part of NIH grant review criteria  Governance – Community involvement in governance models  Sustainability – Long term support
  • 34. Relevance to Australia  The value of Australian Data *  Unique flora and fauna  e.g Marsupials  Indigenous Australians  Understanding of genomic structure – health & disease  Medicinal products  Making this data (securely) available  With high quality annotation and metadata  Attributions to original authors  On the cloud  Via open standard APIs  Aggregation of data via an Australian wide Commons?
  • 36. Summary  We need an unprecedented level of convergence and collaboration to drive biomedical science to the next level.  Supporting this model of data-intensive collaborative science requires a shift in academic research culture and new investments in data infrastructure and capabilities. Matthew Trunnel, FHC
  • 37. Acknowledgments • ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS) • NCBI: George Komatsoulis • NHGRI: Valentina di Francesco • NIGMS: Susan Gregurick • CIT: Andrea Norris, Debbie Sinmao • NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr • NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen • Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI) • RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI) • OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke, • Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
  • 38. Stay in Touch QR Business Card LinkedIn @Vivien.Bonazzi Slideshare Blog (Coming soon!)

Editor's Notes

  1. Current snapshot of Commons status
  2. Current snapshot of Commons status
  3. The mission of the Office of Science and Technology Policy is threefold; provide the President and his senior staff with accurate, relevant, and timely scientific and technical advice on all matters of consequence; to ensure that the policies of the Executive Branch are informed by sound science; 3) to ensure that the scientific and technical work of the Executive Branch is properly coordinated so as to provide the greatest benefit to society.
  4. Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons
  5. Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons