1. The Data Commons
Digital Ecosystems for Sharing and
Analyzing biomedical Big Data
Vivien Bonazzi, Ph.D.
Senior Advisor for Data Science
Office of Data Science (ADDS)
National Institutes of Health
4. What Makes Big Data Big?
VOLUME
VELOCITY
VARIETY
VERACITY
5. It’s a signal of the coming Digital Economy
DATA has VALUE
DATA is CENTRAL to the Digital Economy
But its more than this…..
6. An economy characterized by
using data to gain a business
advantage
(yes, institutions are a business)
Organizations that are not born
digital will be at a disadvantage
in the new economy
7. Organizations will be defined by their digital assets
Scientific digital assets
Data
Software
Workflows
8. The most successful organizations of the future will
be those that can leverage their digital assets and
transform them into a digital enterprise
11. Challenges Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIR
Findable, Accessible, Interoperable,
Reproducible
Limited e-infrastructures to
14. FAIR principles drive data to become the currency
Policies that promote data sharing via FAIR help
change the culture
15. We also need a digital ecosystem that allows
transactions to occur on FAIR data
at scale
16. The Data Commons
is a platform
that fosters the development of a digital ecosystem
17. The Data Commons platform that fosters development of a digital
ecosystem
Treats products of research – data, software, methods, papers etc as
digital asset (object)
Digital objects need to conform to FAIR principles
Digital objects exist in a shared virtual space
- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support
them
18. The Data Commons
is a platform?
that fosters the development of a digital ecosystem
19. “A platform is a plug and play model
that allows multiple participants (producers and
consumers) to connect to it, interact with each
other and create value”
Sangeet Paul Choudary – Platform Scale
20. A lot of what see today uses a platform approach ”
Sangeet Paul Choudary – Platform Scale
21. The goal of the a Data Commons Platform is to
enable interactions between producers and
consumers
Sangeet Paul Choudary – Platform Scale
22. To understand the
Data Commons Platform
(and how it works for biomedical data)
we need to use a Platform stack
to help visualize the concept
26. Initial Phase
Unique digital object identifiers of resolvable to original authoritative source
Machine readable
A minimal set of searchable metadata
Clear access rules (especially important for human subjects data)
An entry (with metadata) in one or more indices
Future Phases
Standard, community based unique digital object identifiers
Conform to community approved standard metadata and ontologies for
enhanced searching
Digital objects accessible via open standard APIs
NIH Data Commons: Digital Asset Compliance
Making things FAIR
31. The NIH Data Commons Pilot
Co-location of large and/or highly
utilized NIH funded data with
storage and computing infrastructure +
Commonly used tools for analyzing and
sharing digital objects
to create an interoperable resource for
the research community.
Investigators will be able to collaborate
and share digital objects within this
environment and connect with others
38. Considerations
Metrics - understanding and accounting of data usage patterns
Cost - Cloud Storage, pay for use cloud compute (NIH credits)
Hybrid Clouds – Mix of research and commercial clouds
Connecting - Interoperability with other Commons, clouds
Consent - Reconsenting data, Dynamic consents
Standards – Metadata, UIDs, APIs
40. A Garvan Data Commons Platform?
Garvan DATA
NCI + Cloud
Analysis tools (Inc 3rd party)
Apps Store
Community – Research, Clinical, Public
API connectivity
with other
Commons
41. An Australian Data Commons?
Australian DATA - Flora and Fauna
Commercial Cloud (NCI)
Analysis tools (Inc 3rd party)
Apps Store
Community – Research, Clinical, Public
API connectivity
with other
Commons
42. “To achieve great things, two things are
needed: a plan and not quite enough
time”
Leonard Bernstein
43. Thank you
• ADDS Office
- Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso
• NCBI: Jim Ostell, David Lipman, George Komatsoulis
• NHGRI: Valentina di Francesco, Kevin Lee, Eric Green
• NIGMS: John Lorsch, Susan Gregurik
• CIT: Andrea Norris, Debbie Sinmao, Stacy Charland
• NCI: Warren Kibbe, Tony Kerlavage, Lou Staudt, Tanja Davidsen, Ian Fore
• NIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria Giovanni
• The NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie Derr
• Trans NIH BD2K Executive Committee & Working groups
• Many biomedical researchers, cloud providers, IT professionals
• John Mattick and the Garvan Institute
44. Stay in Touch
QR Business Card
LinkedIn
@Vivien.Bonazzi
Slideshare
Blog
(Coming soon!)
Vivien Bonazzi
bonazziv@mail.nih.gov
Notas del editor
Currencies don’t exist in a vacuum
Buy and sell Goods
A nascent platform
Platforms that utilize data as a central currency – enable transactions between producers and consumers
Producers of digital objects - data, tools, workflows - used by consumers
The Platform enables these transactions –
Accommodates bioinformatics and non bioinformatics users
Framework helps visualize the concept of the platform
* All Garvan Data + Tools in authorized /access control environment – allow access to approved users
* Hybrid Clouds: NCI (National Computing Infrastructure) + Commercial (AWS
Allow approved users Garvan or others inlcudingcommercial vendors (ie DNA Nexus) to develop tools (SaaS) onto of the Garvan data
API connections to other Commons – NY Genome Center
* Beacon projects - variation
Develop an Australian Data Commons
Make ALL Australian data : flora, fauna incl. human clinical data available in a data commons cloud – (mix of NCI and commercial cloud)
Encourage tool development from bioinformatics research or commercial groups
Make the commons interoperable with other Cloud Commons
Use NCBI and EBI as an archive – learn their annotation methods for metadata and their data distribution methods and cloud access.
Embed Postdocs within NCBI and EBI to learn these methods and bring them back to Australia. Develop a team approach
Use this as a way to train the next generation of scientists – Bfx and non Bfx