SlideShare una empresa de Scribd logo
1 de 42
Section 3:
Commons:
Lessons Learned, current state
The Big Data to Knowledge (BD2K)
Guide to the Fundamentals of Data Science
Vivien Bonazzi
Senior Advisor for Data Science & the Data Commons
National Institutes of Health, Bethesda
February 3, 2017
Vivien Bonazzi
• Leads the Data Commons efforts within the NIH.
• Serves on the NIH Big Data to Knowledge (BD2K) executive
committee
• Dr. Bonazzi received a B.Sc. in Medical Laboratory Science from
the University of Canberra, Australia, a M.Sc. (prelim) in
Pharmacology from the University of Melbourne, Australia and
a Ph.D. in Molecular Pharmacology and Computational Biology
also from the University of Melbourne.
• Served as a Program Director for the computational biology and
bioinformatics program for National Human Genome Research
Institute (NHGRI)
• Was part of the Human Microbiome Project (HMP) a trans-NIH Common Fund Initiative.
She was responsible for the bioinformatics & computational aspect of the project as well as
managing several of the computational tools awards.
• She has held positions as the R&D Director for Bioinformatics at Invitrogen and Director of
Gene Discovery at Celera Genomics where she was part of the team that sequenced and
annotated the human, mouse and drosophila genomes.
Lets Talk About Biomedical Big Data
What Makes Big Data Big?
VOLUME
VELOCITY
VARIETY
VERACITY
It’s a signal of the coming Digital Economy
DATA has VALUE
DATA is CENTRAL to the Digital Economy
But its more than this…..
An economy characterized by
using data to gain a business
advantage
(yes, institutions are a business)
Organizations that are not born
digital will be at a disadvantage in
the new economy
Organizations will be defined by their digital assets
Scientific digital assets
Data
Software
Workflows
Documentation
Journal Articles
The most successful organizations of the future will be
those that can leverage their digital assets and transform
them into a digital enterprise
Make data
The currency of an organization
Usable in a digital ecosystems – Data Commons
The problem with biomedical data
Digital assets includes Data
Challenges Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIR
Findable, Accessible, Interoperable, Reproducible
Limited e-infrastructures to support FAIR data
The Problem
With
Biomedical DATA
https://www.youtube.com/watch?v=N2zK3sAtr-4
What’s
Changing?
FAIR principles drive data to become the currency
Policies that promote data sharing via FAIR help change
the culture
We also need a digital ecosystem that allows
transactions to occur on FAIR data
at scale
The Data Commons
is a platform
that fosters the development of a digital ecosystem
The Data Commons platform that fosters development of a digital
ecosystem
Treats products of research – data, software, methods, papers etc as
digital asset (object)
Digital objects need to conform to FAIR principles
Digital objects exist in a shared virtual space
- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support
them
The Data Commons
is a platform?
that fosters the development of a digital ecosystem
“A platform is a plug and play model that
allows multiple participants (producers and consumers)
to connect to it, interact with each other and create
value”
Sangeet Paul Choudary – Platform Scale
A lot of what see today uses a platform approach ”
Sangeet Paul Choudary – Platform Scale
The goal of the a Data Commons Platform is to enable
interactions between producers and consumers
Sangeet Paul Choudary – Platform Scale
To understand the
Data Commons Platform
(and how it works for biomedical data) we
need to use a Platform stack
to help visualize the concept
Sangeet Paul Choudary – Platform Scale
Platforms have 3 layers
NIH Data Commons - Platform Stack
https://datascience.nih.gov/commons
Technology
Technology
Data
Network/
market place
https://datascience.nih.gov/commons
NIH Data Commons - Platform Stack
Initial Phase
Unique digital object identifiers of resolvable to original authoritative source
Machine readable
A minimal set of searchable metadata
Clear access rules (especially important for human subjects data)
An entry (with metadata) in one or more indices
Future Phases
Standard, community based unique digital object identifiers
Conform to community approved standard metadata and ontologies for
enhanced searching
Digital objects accessible via open standard APIs
NIH Data Commons: Digital Asset Compliance
Making things FAIR
Data Commons Platform drives digital ecosystem
The NIH Data Commons Pilot
The NIH Data Commons Pilot
Co-location of large and/or highly utilized
NIH funded data with
storage and computing infrastructure +
Commonly used tools for analyzing and
sharing digital objects
to create an interoperable resource for the
research community.
Investigators will be able to collaborate and
share digital objects within this
environment and connect with others
Other Data Commons’
An NIH Wide Data Commons Pilot - Example
An NIH Wide Data Commons Pilot - Example
Indexing
An NIH Wide Data Commons Pilot - Example
Indexing
An NIH Wide Data Commons Pilot - Example
Indexing
Authorization /authentication layer
Digital Ecosystems
Considerations
• Metrics – Understanding and accounting of data usage patterns
• Cost
• Cloud Storage
• Pay for use cloud compute (NIH credits pilot)
• Indirect costs for cloud
• Hybrid Clouds – Institution (private) and commercial (public) clouds
• Managing Open vs Controlled access data
• Auth: single sign on - dreams/nightmares?
• Archive vs Working and versioning Copies of data
• Interoperability with other Commons (clouds)
• Standards – Metadata, UIDs, APIs
• Discoverability – Finding digital objects across clouds
• Interfaces – For users with different needs and capabilities
• Consent – Re-consenting data
• Policies
• Data sharing policies that are useful and effective
• Keep pace with use of technology (e.g. dbGAP data in the Cloud)
• Incentives
• Access to, and shareability of FAIR Data as part of NIH grant review criteria
• Governance – Community involvement in governance models
• Sustainability – Long term support
Considerations
Acknowledgments
• ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS), Ron Margolis
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco, Ajay Pillai,
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI: Ian Fore, Sean Davis, Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• NIAID: Maria Giovanni, Alison Yao, Eric Choi, Claire Schulkey
• NHLBI: Weiniu Gan, Alastair Thomson
• NIH Clinical Centre: Elaine Ayres, (BITRIS),
• NIBIB: Vinay Pai (DK),
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
Stay in Touch
QR Business Card
LinkedIn
@Vivien.Bonazzi
Slideshare
Blog
(Coming soon!)
Vivien Bonazzi
bonazziv@mail.nih.gov

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
 
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
ESIP Federation: Community-Driven, Collaborative Governance - Carol Beaton Me...
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
ACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBSACRL STS Liaisons Forum - AIBS
ACRL STS Liaisons Forum - AIBS
 
Embracing Social Software And Semantic Web In Digital Libraries
Embracing Social Software And Semantic Web In Digital LibrariesEmbracing Social Software And Semantic Web In Digital Libraries
Embracing Social Software And Semantic Web In Digital Libraries
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015
 
Data 2.0|
Data 2.0|Data 2.0|
Data 2.0|
 
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
 
Data cite
Data citeData cite
Data cite
 
Towards the Digital Research Enterprise
Towards the Digital Research EnterpriseTowards the Digital Research Enterprise
Towards the Digital Research Enterprise
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
 
RDAP13 Mark Parsons: The Research Data Alliance: Making Data Work
RDAP13 Mark Parsons: The Research Data Alliance: Making Data WorkRDAP13 Mark Parsons: The Research Data Alliance: Making Data Work
RDAP13 Mark Parsons: The Research Data Alliance: Making Data Work
 
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
 
RDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library AssociationsRDA Presentation to the International Federation of Library Associations
RDA Presentation to the International Federation of Library Associations
 
Elsevier1 vc
Elsevier1 vcElsevier1 vc
Elsevier1 vc
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 

Similar a Data commons bonazzi bd2 k fundamentals of science feb 2017

Management of Data Collections
Management of Data CollectionsManagement of Data Collections
Management of Data Collections
abedejesus
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Carole Goble
 

Similar a Data commons bonazzi bd2 k fundamentals of science feb 2017 (20)

What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
The NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training EnvironmentThe NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training Environment
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
Workshop intro090314
Workshop intro090314Workshop intro090314
Workshop intro090314
 
Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
Management of Data Collections
Management of Data CollectionsManagement of Data Collections
Management of Data Collections
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
African Open Science Platform
African Open Science PlatformAfrican Open Science Platform
African Open Science Platform
 
The Thinking Behind Big Data at the NIH
The Thinking Behind Big Data at the NIHThe Thinking Behind Big Data at the NIH
The Thinking Behind Big Data at the NIH
 
dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024
dkNET Office Hours: NIH Data Management and Sharing Mandate  05/03/2024dkNET Office Hours: NIH Data Management and Sharing Mandate  05/03/2024
dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024
 

Último

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Último (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 

Data commons bonazzi bd2 k fundamentals of science feb 2017

  • 1. Section 3: Commons: Lessons Learned, current state The Big Data to Knowledge (BD2K) Guide to the Fundamentals of Data Science Vivien Bonazzi Senior Advisor for Data Science & the Data Commons National Institutes of Health, Bethesda February 3, 2017
  • 2. Vivien Bonazzi • Leads the Data Commons efforts within the NIH. • Serves on the NIH Big Data to Knowledge (BD2K) executive committee • Dr. Bonazzi received a B.Sc. in Medical Laboratory Science from the University of Canberra, Australia, a M.Sc. (prelim) in Pharmacology from the University of Melbourne, Australia and a Ph.D. in Molecular Pharmacology and Computational Biology also from the University of Melbourne. • Served as a Program Director for the computational biology and bioinformatics program for National Human Genome Research Institute (NHGRI) • Was part of the Human Microbiome Project (HMP) a trans-NIH Common Fund Initiative. She was responsible for the bioinformatics & computational aspect of the project as well as managing several of the computational tools awards. • She has held positions as the R&D Director for Bioinformatics at Invitrogen and Director of Gene Discovery at Celera Genomics where she was part of the team that sequenced and annotated the human, mouse and drosophila genomes.
  • 3. Lets Talk About Biomedical Big Data
  • 4.
  • 5. What Makes Big Data Big? VOLUME VELOCITY VARIETY VERACITY
  • 6. It’s a signal of the coming Digital Economy DATA has VALUE DATA is CENTRAL to the Digital Economy But its more than this…..
  • 7. An economy characterized by using data to gain a business advantage (yes, institutions are a business) Organizations that are not born digital will be at a disadvantage in the new economy
  • 8. Organizations will be defined by their digital assets Scientific digital assets Data Software Workflows Documentation Journal Articles
  • 9. The most successful organizations of the future will be those that can leverage their digital assets and transform them into a digital enterprise
  • 10. Make data The currency of an organization Usable in a digital ecosystems – Data Commons
  • 11. The problem with biomedical data Digital assets includes Data
  • 12. Challenges Biomedical Data The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data
  • 15. FAIR principles drive data to become the currency Policies that promote data sharing via FAIR help change the culture
  • 16. We also need a digital ecosystem that allows transactions to occur on FAIR data at scale
  • 17. The Data Commons is a platform that fosters the development of a digital ecosystem
  • 18. The Data Commons platform that fosters development of a digital ecosystem Treats products of research – data, software, methods, papers etc as digital asset (object) Digital objects need to conform to FAIR principles Digital objects exist in a shared virtual space - Find, Deposit, Manage, Share and Reuse: digital assets Enables interactions between Producers and Consumers of digital assets Gives currency to digital assets and the people who develop and support them
  • 19. The Data Commons is a platform? that fosters the development of a digital ecosystem
  • 20. “A platform is a plug and play model that allows multiple participants (producers and consumers) to connect to it, interact with each other and create value” Sangeet Paul Choudary – Platform Scale
  • 21. A lot of what see today uses a platform approach ” Sangeet Paul Choudary – Platform Scale
  • 22. The goal of the a Data Commons Platform is to enable interactions between producers and consumers Sangeet Paul Choudary – Platform Scale
  • 23. To understand the Data Commons Platform (and how it works for biomedical data) we need to use a Platform stack to help visualize the concept
  • 24. Sangeet Paul Choudary – Platform Scale Platforms have 3 layers
  • 25. NIH Data Commons - Platform Stack https://datascience.nih.gov/commons Technology Technology Data Network/ market place
  • 27. Initial Phase Unique digital object identifiers of resolvable to original authoritative source Machine readable A minimal set of searchable metadata Clear access rules (especially important for human subjects data) An entry (with metadata) in one or more indices Future Phases Standard, community based unique digital object identifiers Conform to community approved standard metadata and ontologies for enhanced searching Digital objects accessible via open standard APIs NIH Data Commons: Digital Asset Compliance Making things FAIR
  • 28.
  • 29.
  • 30. Data Commons Platform drives digital ecosystem
  • 31. The NIH Data Commons Pilot
  • 32. The NIH Data Commons Pilot Co-location of large and/or highly utilized NIH funded data with storage and computing infrastructure + Commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community. Investigators will be able to collaborate and share digital objects within this environment and connect with others
  • 34. An NIH Wide Data Commons Pilot - Example
  • 35. An NIH Wide Data Commons Pilot - Example
  • 36. Indexing An NIH Wide Data Commons Pilot - Example
  • 37. Indexing An NIH Wide Data Commons Pilot - Example
  • 39. Considerations • Metrics – Understanding and accounting of data usage patterns • Cost • Cloud Storage • Pay for use cloud compute (NIH credits pilot) • Indirect costs for cloud • Hybrid Clouds – Institution (private) and commercial (public) clouds • Managing Open vs Controlled access data • Auth: single sign on - dreams/nightmares? • Archive vs Working and versioning Copies of data • Interoperability with other Commons (clouds)
  • 40. • Standards – Metadata, UIDs, APIs • Discoverability – Finding digital objects across clouds • Interfaces – For users with different needs and capabilities • Consent – Re-consenting data • Policies • Data sharing policies that are useful and effective • Keep pace with use of technology (e.g. dbGAP data in the Cloud) • Incentives • Access to, and shareability of FAIR Data as part of NIH grant review criteria • Governance – Community involvement in governance models • Sustainability – Long term support Considerations
  • 41. Acknowledgments • ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS), Ron Margolis • NCBI: George Komatsoulis • NHGRI: Valentina di Francesco, Ajay Pillai, • NIGMS: Susan Gregurick • CIT: Andrea Norris, Debbie Sinmao • NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr • NCI: Ian Fore, Sean Davis, Warren Kibbe, Tony Kerlavage, Tanja Davidsen • NIAID: Maria Giovanni, Alison Yao, Eric Choi, Claire Schulkey • NHLBI: Weiniu Gan, Alastair Thomson • NIH Clinical Centre: Elaine Ayres, (BITRIS), • NIBIB: Vinay Pai (DK), • OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke, • Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
  • 42. Stay in Touch QR Business Card LinkedIn @Vivien.Bonazzi Slideshare Blog (Coming soon!) Vivien Bonazzi bonazziv@mail.nih.gov

Notas del editor

  1. Currencies don’t exist in a vacuum Buy and sell Goods
  2. A nascent platform
  3. Platforms that utilize data as a central currency – enable transactions between producers and consumers
  4. Producers of digital objects - data, tools, workflows - used by consumers The Platform enables these transactions – Accommodates bioinformatics and non bioinformatics users
  5. Framework helps visualize the concept of the platform