SlideShare una empresa de Scribd logo
1 de 35
The Commons
Vivien Bonazzi ADDS Office (OD)
George Komatsoulis (NCBI)
BD2K AHM - Nov 12, 2015
Why do we need a Commons?
NIH DataNIH Data
System.out.println (“the Commons”)
What are the PRINCIPLES of The Commons?
 Supports a digital biomedical ecosystem
 Treats products of research – data, software, methods,
papers etc. as digital research objects
 Digital research objects exist in a shared virtual space
Find, Deposit, Manage, Share and Reuse data,
software, metadata and workflows
 Digital objects need to conform to FAIR principles:
 Findable
 Accessible (and usable)
 Interoperable
 Reusable
What is The Commons Framework:?
 Exploits new scalable computing technologies - Cloud
 Provides physical or logical access to data
 Simplifies access, sharing and interoperability of digital research
objects such as data, software, metadata and workflows
 Makes digital research objects indexable and findable: FAIR
 Provides understanding and accounting of usage patterns
 Is potentially more cost effective given digital growth
 Gives currency to digital objects and the people who develop and
support them
The Commons Framework
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
The Commons Framework
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
IaaS
PaaS
SaaS
Commons: Digital Object Compliance
 Attributes of digital research objects in the Commons
Initial Phase
 Unique digital object identifiers of resolvable to original authoritative
source
 Machine readable
 A minimal set of searchable metadata
 Physically available in a cloud based Commons provider
 Clear access rules (especially important for human subjects data)
 An entry (with metadata) in one or more indices
Future Phases
 Standard, community based unique digital object identifiers
 Conform to community approved standard metadata for enhanced
searching
 Digital objects accessible via open standard APIs
 Are physically and logical available to the commons
Commons PILOTS
Commons Pilots - current
 The Cloud Credits Model
 Infrastructure building blocks: IaaS: accessing cloud services for NIH grantees
 Eventual portal for academic and commercial PaaS and SaaS
 Commons Supplements – Data, analysis tools, APIs, containers
 BD2K Centers
 MODs (Model Organism Databases)
 Interoperability (some)
 HMP (Human Microbiome Project)
 NIH Affiliated Commons projects
 NIAID/CF/ADDS - Microbiome/HMP Cloud Pilot
 NCI Cloud Pilots & Genomic Data Commons
Cloud credits
model (CCM)
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping pilots to the Commons framework: Cloud
Credits Model: George Komatsoulis
IaaS
PaaS
SaaS
Drivers of the Cloud Credits Model
 Scalability
 Exploiting new computing models
 Cost Effectiveness
 Simplified sharing of digital objects
 FAIR: Findable, Accessible, Interoperable and
Reusable
 Cloud computing supports many of these
objectives
The Cloud Credits Model
The Commons
Cloud Provider
A
Cloud Provider
B
Investigator
NIH
Provides credits Enables Search
Discovery Index
Uses credits in
the Commons
IndexesOption:
Direct Funding
HPC Provider
 Supports simplified data sharing by driving science into publicly
accessible computing environments that still provide for
investigator level access control
 Scalable for the needs of the scientific community for the next 5
years
 Democratize access to data and computational tools
 Cost effective
 Competitive marketplace for biomedical computing services
 Reduces redundancy
 Uses resources efficiently
Advantages of this Model
 Novelty:
Never been tried, so we don’t have data about likelihood of success
 Cost Models:
Assumes stable or declining prices among providers
True for the last several years, but we can’t guarantee that it will continue,
particularly if there is significant consolidation in industry
 Service Providers:
Assumes that providers are willing to make the investment to become
conformant
Market research suggests 3-5 providers within 2-3 months of launch
 Persistence:
 The model is ‘Pay As You Go’ which means if you stop paying it stops
going
 Giving investigators an unprecedented level of control over what lives (or
dies) in the Commons
Potential Disadvantages of this Model
 Minimum set of requirements for
 Business relationships (reseller, investigators)
 Interfaces (upload, download, manage, compute)
 Capacity (storage, compute)
 Networking and Connectivity
 Information Assurance
 Authentication and authorization
 A conformant cloud is not necessarily a provider of
Infrastructure as a Service (IaaS) although all providers
must provide IaaS
What does it mean for a provider to be
conformant?
 NIH intends to run a 3 year pilot to test the efficacy of this business
model in enhancing data sharing and reducing costs.
 Pilot will not directly interact with the existing grant system,
rather, it is being modeled on the mechanisms being used to gain
access to NSF and DOE national resources (HPC, light sources,
etc.)
 The only required qualification for applying for credits will be that the
investigator has an existing NIH grant
 A major element will be the collection of metrics to assess
effectiveness of this model
Pilot of the Commons Cloud Credits
Business Model
 NIH recently completed a contract with the CAMH Federally
Funded Research and Development Center (FFRDC) to act as the
coordinating center for this effort.
 We need you to:
 Identify what capabilities will be useful to investigators
 Provide guidance on the conformance requirements
 Help identify good metrics
 Define the criteria that are used to decide if credit requests are
selected
Status and requests
Co-moderated by Victor Jongeneel (UIUC), Valentina di Francesco
(NHGRI) & George Komatsoulis (NCBI)
FRIDAY 10:45 AM – 12:30 PM
Cloud Credits Model
Breakout Session Tomorrow!
BD2K Centers,
MODS, HMP &
Interoperability
Supplements
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping pilots to the Commons framework: BD2K
centers, HMP, MODs & Interoperability
PaaS
SaaS
 Testing the Commons framework
 Facilitating connectivity, interoperability and access to digital
research objects
 Interoperable (APIs, containers)
 Digital object compliant: FAIR
 Indexable
 Publishable
 Privacy/security (PHI)
 Available on cloud platforms
 Providing digital research objects to populate the Commons
Commons Supplements:
BD2K Centers, MODs, HMP & Interoperability
Moderated by Valentina di Francesco (NHGRI)
& Vivien Bonazzi (ADDS)
FRIDAY 2:00 – 3:30 PM
Commons Supplements
Program Session Tomorrow!
NCI & NIAID
Cloud
Pilots/GDC
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping pilots to the Commons framework: NIAID
and NCI Cloud Pilots/GDC
:
PaaS
SaaS
IaaS
 Making Human Microbiome Project (HMP) data broadly accessible,
computable, and usable.
 Moving ~20TB of HMP data to AWS
 Providing access to a suite of tools and APIs to facilitate data access and
use
 Data and tools will follow FAIR principles (digital object compliance)
* In collaboration with Owen White (UM) – HMP DCC
NAID/CF/ADDS Microbiome* Cloud Pilot
HMP data and tools in the AWS cloud
 Making cancer genomics data broadly accessible, computable, and
usable by researchers worldwide.
 Genomic Data Commons (GDC) will store, analyze and distribute
~2.5 PB of cancer genomics data and associated clinical data
generated by the TCGA and TARGET (Therapeutically Applicable
Research to Generate Effective Treatments) initiatives
 The NCI cloud pilots will make TCGA data available on the AWS
and Google clouds, along with a suite of tools and APIs to facilitate
their access and use
NCI Cloud pilots and
Genomic Data Commons
THURSDAY 2:00 – 4:00 PM
Commons Pilots
Poster Session Today!
Commons Pilots - proposed
 Indexing and Searching digital research objects in
the Commons
 Leveraging indexing methods within BD2K
e.g. BioCADDIE
 Accessing large, high value commonly used data
sets in the cloud
 E.g 1000 genomes, HMP, modENCODE
 Co-location of data and computes
 Digital research objects are FAIR
BD2K Indexing
e.g. BioCADDIE
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping proposed pilots to the Commons
framework : Indexing & Large Data Sets
NIH + Community
defined data sets
Commons Pilots - proposed
 Indexing and Searching digital research objects in
the Commons
 Leveraging indexing methods within BD2K
e.g. BioCADDIE
 Accessing large, high value commonly used data
sets in the cloud
 E.g 1000 genomes, HMP, modENCODE
 Co-location of data and computes
 Digital research objects are FAIR
Thankyou
 ADDS Office
Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso
 NCBI: George Komatsoulis
 NHGRI: Valentina di Francesco, Kevin Lee
 CIT: Debbie Sinmao, Andrea Norris, Stacy Charland
 Trans NIH BD2K Executive Committee & Working groups
 NCI: Warren Kibbe, Tony Kerlavage, Lou Staudt, Tanja Davidsen
 NIAID: Nick Weber, Darrell Hurt, Maria Giovanni, JJ McGowan
 Many biomedical researchers, cloud providers, IT professionals
QUESTIONS?
Come to the poster session!

Más contenido relacionado

La actualidad más candente

Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 
Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015George Komatsoulis
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data AnalyticsRICHARD AMUOK
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertWansoo Im
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) DataEdward Curry
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeEdward Curry
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...ResearchSpace
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataPhilip Bourne
 
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTUREA HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTUREijccsa
 
Research data management & planning: an introduction
Research data management & planning: an introductionResearch data management & planning: an introduction
Research data management & planning: an introductionMaggie Neilson
 
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...European Data Forum
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesEditor IJMTER
 

La actualidad más candente (19)

Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015Komatsoulis internet2 global forum 2015
Komatsoulis internet2 global forum 2015
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data Analytics
 
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertA Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
A Framework for Geospatial Web Services for Public Health by Dr. Leslie Lenert
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Smart Geo. Guido Satta (Maggio 2015)
Smart Geo. Guido Satta (Maggio 2015)Smart Geo. Guido Satta (Maggio 2015)
Smart Geo. Guido Satta (Maggio 2015)
 
thesis defense1
thesis defense1thesis defense1
thesis defense1
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Key Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in EuropeKey Technology Trends for Big Data in Europe
Key Technology Trends for Big Data in Europe
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTUREA HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
A HEALTH RESEARCH COLLABORATION CLOUD ARCHITECTURE
 
Research data management & planning: an introduction
Research data management & planning: an introductionResearch data management & planning: an introduction
Research data management & planning: an introduction
 
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
 
A Survey on Big Data Mining Challenges
A Survey on Big Data Mining ChallengesA Survey on Big Data Mining Challenges
A Survey on Big Data Mining Challenges
 

Similar a The NIH Data Commons - BD2K All Hands Meeting 2015

The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataPhilip Bourne
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands Vivien Bonazzi
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
The potential of the cloud
The potential of the cloudThe potential of the cloud
The potential of the cloudJisc
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
Cloud Services for Repositories
Cloud Services for RepositoriesCloud Services for Repositories
Cloud Services for RepositoriesEduserv
 
GSA on Cloud Computing and More
GSA on Cloud Computing and MoreGSA on Cloud Computing and More
GSA on Cloud Computing and Moreguest163bca0
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computingabhijeetnawal
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Futurestratuslab
 

Similar a The NIH Data Commons - BD2K All Hands Meeting 2015 (20)

The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
 
The potential of the cloud
The potential of the cloudThe potential of the cloud
The potential of the cloud
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Bibliotheken en cloud computing
Bibliotheken en cloud computingBibliotheken en cloud computing
Bibliotheken en cloud computing
 
Cloud Services for Repositories
Cloud Services for RepositoriesCloud Services for Repositories
Cloud Services for Repositories
 
Thesis Defense MBI
Thesis Defense MBIThesis Defense MBI
Thesis Defense MBI
 
Data 2.0|
Data 2.0|Data 2.0|
Data 2.0|
 
GSA on Cloud Computing and More
GSA on Cloud Computing and MoreGSA on Cloud Computing and More
GSA on Cloud Computing and More
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computing
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Future
 
Computer project
Computer projectComputer project
Computer project
 

Último

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 

Último (20)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 

The NIH Data Commons - BD2K All Hands Meeting 2015

  • 1. The Commons Vivien Bonazzi ADDS Office (OD) George Komatsoulis (NCBI) BD2K AHM - Nov 12, 2015
  • 2. Why do we need a Commons?
  • 4.
  • 5.
  • 6.
  • 8. What are the PRINCIPLES of The Commons?  Supports a digital biomedical ecosystem  Treats products of research – data, software, methods, papers etc. as digital research objects  Digital research objects exist in a shared virtual space Find, Deposit, Manage, Share and Reuse data, software, metadata and workflows  Digital objects need to conform to FAIR principles:  Findable  Accessible (and usable)  Interoperable  Reusable
  • 9. What is The Commons Framework:?  Exploits new scalable computing technologies - Cloud  Provides physical or logical access to data  Simplifies access, sharing and interoperability of digital research objects such as data, software, metadata and workflows  Makes digital research objects indexable and findable: FAIR  Provides understanding and accounting of usage patterns  Is potentially more cost effective given digital growth  Gives currency to digital objects and the people who develop and support them
  • 10. The Commons Framework Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface
  • 11. The Commons Framework Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface IaaS PaaS SaaS
  • 12. Commons: Digital Object Compliance  Attributes of digital research objects in the Commons Initial Phase  Unique digital object identifiers of resolvable to original authoritative source  Machine readable  A minimal set of searchable metadata  Physically available in a cloud based Commons provider  Clear access rules (especially important for human subjects data)  An entry (with metadata) in one or more indices Future Phases  Standard, community based unique digital object identifiers  Conform to community approved standard metadata for enhanced searching  Digital objects accessible via open standard APIs  Are physically and logical available to the commons
  • 14. Commons Pilots - current  The Cloud Credits Model  Infrastructure building blocks: IaaS: accessing cloud services for NIH grantees  Eventual portal for academic and commercial PaaS and SaaS  Commons Supplements – Data, analysis tools, APIs, containers  BD2K Centers  MODs (Model Organism Databases)  Interoperability (some)  HMP (Human Microbiome Project)  NIH Affiliated Commons projects  NIAID/CF/ADDS - Microbiome/HMP Cloud Pilot  NCI Cloud Pilots & Genomic Data Commons
  • 15. Cloud credits model (CCM) Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Mapping pilots to the Commons framework: Cloud Credits Model: George Komatsoulis IaaS PaaS SaaS
  • 16. Drivers of the Cloud Credits Model  Scalability  Exploiting new computing models  Cost Effectiveness  Simplified sharing of digital objects  FAIR: Findable, Accessible, Interoperable and Reusable  Cloud computing supports many of these objectives
  • 17. The Cloud Credits Model The Commons Cloud Provider A Cloud Provider B Investigator NIH Provides credits Enables Search Discovery Index Uses credits in the Commons IndexesOption: Direct Funding HPC Provider
  • 18.  Supports simplified data sharing by driving science into publicly accessible computing environments that still provide for investigator level access control  Scalable for the needs of the scientific community for the next 5 years  Democratize access to data and computational tools  Cost effective  Competitive marketplace for biomedical computing services  Reduces redundancy  Uses resources efficiently Advantages of this Model
  • 19.  Novelty: Never been tried, so we don’t have data about likelihood of success  Cost Models: Assumes stable or declining prices among providers True for the last several years, but we can’t guarantee that it will continue, particularly if there is significant consolidation in industry  Service Providers: Assumes that providers are willing to make the investment to become conformant Market research suggests 3-5 providers within 2-3 months of launch  Persistence:  The model is ‘Pay As You Go’ which means if you stop paying it stops going  Giving investigators an unprecedented level of control over what lives (or dies) in the Commons Potential Disadvantages of this Model
  • 20.  Minimum set of requirements for  Business relationships (reseller, investigators)  Interfaces (upload, download, manage, compute)  Capacity (storage, compute)  Networking and Connectivity  Information Assurance  Authentication and authorization  A conformant cloud is not necessarily a provider of Infrastructure as a Service (IaaS) although all providers must provide IaaS What does it mean for a provider to be conformant?
  • 21.  NIH intends to run a 3 year pilot to test the efficacy of this business model in enhancing data sharing and reducing costs.  Pilot will not directly interact with the existing grant system, rather, it is being modeled on the mechanisms being used to gain access to NSF and DOE national resources (HPC, light sources, etc.)  The only required qualification for applying for credits will be that the investigator has an existing NIH grant  A major element will be the collection of metrics to assess effectiveness of this model Pilot of the Commons Cloud Credits Business Model
  • 22.  NIH recently completed a contract with the CAMH Federally Funded Research and Development Center (FFRDC) to act as the coordinating center for this effort.  We need you to:  Identify what capabilities will be useful to investigators  Provide guidance on the conformance requirements  Help identify good metrics  Define the criteria that are used to decide if credit requests are selected Status and requests
  • 23. Co-moderated by Victor Jongeneel (UIUC), Valentina di Francesco (NHGRI) & George Komatsoulis (NCBI) FRIDAY 10:45 AM – 12:30 PM Cloud Credits Model Breakout Session Tomorrow!
  • 24. BD2K Centers, MODS, HMP & Interoperability Supplements Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Mapping pilots to the Commons framework: BD2K centers, HMP, MODs & Interoperability PaaS SaaS
  • 25.  Testing the Commons framework  Facilitating connectivity, interoperability and access to digital research objects  Interoperable (APIs, containers)  Digital object compliant: FAIR  Indexable  Publishable  Privacy/security (PHI)  Available on cloud platforms  Providing digital research objects to populate the Commons Commons Supplements: BD2K Centers, MODs, HMP & Interoperability
  • 26. Moderated by Valentina di Francesco (NHGRI) & Vivien Bonazzi (ADDS) FRIDAY 2:00 – 3:30 PM Commons Supplements Program Session Tomorrow!
  • 27. NCI & NIAID Cloud Pilots/GDC Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Mapping pilots to the Commons framework: NIAID and NCI Cloud Pilots/GDC : PaaS SaaS IaaS
  • 28.  Making Human Microbiome Project (HMP) data broadly accessible, computable, and usable.  Moving ~20TB of HMP data to AWS  Providing access to a suite of tools and APIs to facilitate data access and use  Data and tools will follow FAIR principles (digital object compliance) * In collaboration with Owen White (UM) – HMP DCC NAID/CF/ADDS Microbiome* Cloud Pilot HMP data and tools in the AWS cloud
  • 29.  Making cancer genomics data broadly accessible, computable, and usable by researchers worldwide.  Genomic Data Commons (GDC) will store, analyze and distribute ~2.5 PB of cancer genomics data and associated clinical data generated by the TCGA and TARGET (Therapeutically Applicable Research to Generate Effective Treatments) initiatives  The NCI cloud pilots will make TCGA data available on the AWS and Google clouds, along with a suite of tools and APIs to facilitate their access and use NCI Cloud pilots and Genomic Data Commons
  • 30. THURSDAY 2:00 – 4:00 PM Commons Pilots Poster Session Today!
  • 31. Commons Pilots - proposed  Indexing and Searching digital research objects in the Commons  Leveraging indexing methods within BD2K e.g. BioCADDIE  Accessing large, high value commonly used data sets in the cloud  E.g 1000 genomes, HMP, modENCODE  Co-location of data and computes  Digital research objects are FAIR
  • 32. BD2K Indexing e.g. BioCADDIE Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface Mapping proposed pilots to the Commons framework : Indexing & Large Data Sets NIH + Community defined data sets
  • 33. Commons Pilots - proposed  Indexing and Searching digital research objects in the Commons  Leveraging indexing methods within BD2K e.g. BioCADDIE  Accessing large, high value commonly used data sets in the cloud  E.g 1000 genomes, HMP, modENCODE  Co-location of data and computes  Digital research objects are FAIR
  • 34. Thankyou  ADDS Office Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso  NCBI: George Komatsoulis  NHGRI: Valentina di Francesco, Kevin Lee  CIT: Debbie Sinmao, Andrea Norris, Stacy Charland  Trans NIH BD2K Executive Committee & Working groups  NCI: Warren Kibbe, Tony Kerlavage, Lou Staudt, Tanja Davidsen  NIAID: Nick Weber, Darrell Hurt, Maria Giovanni, JJ McGowan  Many biomedical researchers, cloud providers, IT professionals
  • 35. QUESTIONS? Come to the poster session!

Notas del editor

  1. There is not enough funding for every researcher to house all the data they need Analyzing the data is more expensive than producing it It can take weeks to download large datasets
  2. Mimimum Requirements: Business relationship is to allow distribution and billing of credits and to ensure that liability issues are resolved. Investigator that puts digital object in the commons is the one that retains the liability associated with its use. Interfaces – would need to be open, but not necessarily open-source. Requires support for basic operations. In addition, environment has to be open to all; so a private environment behind a university firewall won’t work. Identifiers and metadata: Tied together and together enable researchers to search for and find resources. Networking and Connectivity: Make sure that stuff is accessible, require connection to commodity internet and internet2, but key element from investigator point of view is a free egress tier for academics Environment is secure A&A: Must support inCommon because most NIH investigators have it. Minimizes hassle of granting access to collaborators across multiple platforms. Approval of clouds: Self certify vs. NIH certify vs. 3rd party certify. In early test cases, may simply say ‘FedRamped’ Cloud vs IaaS: Some IaaS (AWS comes to mind) may be uninterested in providing the ‘conformant’ layer but support other companies that provide these services using AWS backend. Already exemplars of this: Seven Bridges Genomics and the Cancer Genomics Cloud Pilots are all software layers over an IaaS provider.