Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Data commons bonazzi bd2 k fundamentals of science feb 2017
1. Section 3:
Commons:
Lessons Learned, current state
The Big Data to Knowledge (BD2K)
Guide to the Fundamentals of Data Science
Vivien Bonazzi
Senior Advisor for Data Science & the Data Commons
National Institutes of Health, Bethesda
February 3, 2017
2. Vivien Bonazzi
• Leads the Data Commons efforts within the NIH.
• Serves on the NIH Big Data to Knowledge (BD2K) executive
committee
• Dr. Bonazzi received a B.Sc. in Medical Laboratory Science from
the University of Canberra, Australia, a M.Sc. (prelim) in
Pharmacology from the University of Melbourne, Australia and
a Ph.D. in Molecular Pharmacology and Computational Biology
also from the University of Melbourne.
• Served as a Program Director for the computational biology and
bioinformatics program for National Human Genome Research
Institute (NHGRI)
• Was part of the Human Microbiome Project (HMP) a trans-NIH Common Fund Initiative.
She was responsible for the bioinformatics & computational aspect of the project as well as
managing several of the computational tools awards.
• She has held positions as the R&D Director for Bioinformatics at Invitrogen and Director of
Gene Discovery at Celera Genomics where she was part of the team that sequenced and
annotated the human, mouse and drosophila genomes.
5. What Makes Big Data Big?
VOLUME
VELOCITY
VARIETY
VERACITY
6. It’s a signal of the coming Digital Economy
DATA has VALUE
DATA is CENTRAL to the Digital Economy
But its more than this…..
7. An economy characterized by
using data to gain a business
advantage
(yes, institutions are a business)
Organizations that are not born
digital will be at a disadvantage in
the new economy
8. Organizations will be defined by their digital assets
Scientific digital assets
Data
Software
Workflows
Documentation
Journal Articles
9. The most successful organizations of the future will be
those that can leverage their digital assets and transform
them into a digital enterprise
12. Challenges Biomedical Data
The Journal Article is the end goal
Data is a means to an ends (low value)
Data is not FAIR
Findable, Accessible, Interoperable, Reproducible
Limited e-infrastructures to support FAIR data
15. FAIR principles drive data to become the currency
Policies that promote data sharing via FAIR help change
the culture
16. We also need a digital ecosystem that allows
transactions to occur on FAIR data
at scale
17. The Data Commons
is a platform
that fosters the development of a digital ecosystem
18. The Data Commons platform that fosters development of a digital
ecosystem
Treats products of research – data, software, methods, papers etc as
digital asset (object)
Digital objects need to conform to FAIR principles
Digital objects exist in a shared virtual space
- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support
them
19. The Data Commons
is a platform?
that fosters the development of a digital ecosystem
20. “A platform is a plug and play model that
allows multiple participants (producers and consumers)
to connect to it, interact with each other and create
value”
Sangeet Paul Choudary – Platform Scale
21. A lot of what see today uses a platform approach ”
Sangeet Paul Choudary – Platform Scale
22. The goal of the a Data Commons Platform is to enable
interactions between producers and consumers
Sangeet Paul Choudary – Platform Scale
23. To understand the
Data Commons Platform
(and how it works for biomedical data) we
need to use a Platform stack
to help visualize the concept
27. Initial Phase
Unique digital object identifiers of resolvable to original authoritative source
Machine readable
A minimal set of searchable metadata
Clear access rules (especially important for human subjects data)
An entry (with metadata) in one or more indices
Future Phases
Standard, community based unique digital object identifiers
Conform to community approved standard metadata and ontologies for
enhanced searching
Digital objects accessible via open standard APIs
NIH Data Commons: Digital Asset Compliance
Making things FAIR
32. The NIH Data Commons Pilot
Co-location of large and/or highly utilized
NIH funded data with
storage and computing infrastructure +
Commonly used tools for analyzing and
sharing digital objects
to create an interoperable resource for the
research community.
Investigators will be able to collaborate and
share digital objects within this
environment and connect with others
39. Considerations
• Metrics – Understanding and accounting of data usage patterns
• Cost
• Cloud Storage
• Pay for use cloud compute (NIH credits pilot)
• Indirect costs for cloud
• Hybrid Clouds – Institution (private) and commercial (public) clouds
• Managing Open vs Controlled access data
• Auth: single sign on - dreams/nightmares?
• Archive vs Working and versioning Copies of data
• Interoperability with other Commons (clouds)
40. • Standards – Metadata, UIDs, APIs
• Discoverability – Finding digital objects across clouds
• Interfaces – For users with different needs and capabilities
• Consent – Re-consenting data
• Policies
• Data sharing policies that are useful and effective
• Keep pace with use of technology (e.g. dbGAP data in the Cloud)
• Incentives
• Access to, and shareability of FAIR Data as part of NIH grant review criteria
• Governance – Community involvement in governance models
• Sustainability – Long term support
Considerations
41. Acknowledgments
• ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS), Ron Margolis
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco, Ajay Pillai,
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI: Ian Fore, Sean Davis, Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• NIAID: Maria Giovanni, Alison Yao, Eric Choi, Claire Schulkey
• NHLBI: Weiniu Gan, Alastair Thomson
• NIH Clinical Centre: Elaine Ayres, (BITRIS),
• NIBIB: Vinay Pai (DK),
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
42. Stay in Touch
QR Business Card
LinkedIn
@Vivien.Bonazzi
Slideshare
Blog
(Coming soon!)
Vivien Bonazzi
bonazziv@mail.nih.gov
Notas del editor
Currencies don’t exist in a vacuum
Buy and sell Goods
A nascent platform
Platforms that utilize data as a central currency – enable transactions between producers and consumers
Producers of digital objects - data, tools, workflows - used by consumers
The Platform enables these transactions –
Accommodates bioinformatics and non bioinformatics users
Framework helps visualize the concept of the platform