Some preliminary thoughts about my role as Associate Director for Data Science at the NIH so as to have a discussion with attendees at the Pacific Symposium on Biocomputing on Jan 4, 2014, The Big Island of Hawaii.
1. An Informal Discussion About Big Data
Better Stated as
A Vision for Biomedical
Research
Digitally enabling the length and
quality of life
Philip E. Bourne
pbourne@ucsd.edu
http://pebourne.wordpress.com/2013/12/21/taking-on-the-role-of-associate-director-for-data-science-at-the-nih-my-originalvision-statement/
2. The Context for This Discussion
• On March 3, 2014 I will begin as the first
Associate Director of the NIH devoted to data
science
• I am giving up tenure and the sun because I
believe this is the right time for change
• The change that I will try and instill at NIH and
beyond is that of a Digital Enterprise
http://www.nih.gov/news/health/dec2013/od-09.htm
3. What Do I Mean By the Digital
Enterprise?
An organization that succeeds by
maximizing the use of its digital assets
to achieve its goals
4. Why the Digital Enterprise Now?
• Biomedical research is increasingly digital –
the talk of “Big Data” is one manifestation
• Fulfillment of the NIH mission (among others)
will increasingly be tied to actions taken on
digital data across boundaries
• History already has lessons to teach us to
make the job easier
5. Actions on Data Implies:
•
•
•
•
•
•
•
•
•
Insuring data quality and hence trust
Making data sustainable
Making data open and accessible
Making data findable
Providing suitable metadata and annotation
Making data queryable
Making data analyzable
Presenting data as to maximize its value
Rewarding good data practices
6. Boundaries on Data Implies:
• Working across biological scales
• Working across biomedical disciplines
• Working across basic and clinical research and
practice
• Working across institutional boundaries
• Working across public and private sectors
• Working across national and international
borders
• Working across funding agencies
7. Where to Start?
An external advisory group provided a
valuable blueprint for what should be
done
http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf
8. Blueprint Recommendations
• Promote central and federated catalogs
– Establish minimal metadata framework
– Tools to facilitate data sharing
– Elaborate on existing data sharing policies
• Support methods and applications
– Fund all phases of software development
– Leverage lessons from National Centers
• Training
– More funding
– Enhance review of training apps
– Quantitative component to all awards
• On campus IT strategic plan
– Catalog of existing tools
– Informatics laboratory
– Ditto big data
• Sustainable funding commitment
9. What is Under Way?
•
Now:
–
–
–
–
–
Data centers (under review)
Data science training grants (call Q1 14)
Pilot data catalog consortium (call out)
Genomic Research Data Alliance (being finalized)
Piloting “NIH-drive”
• In Year One:
–
–
–
–
–
–
Extended public-private programs specifically for data science activities
Interagency activities
International exchange programs
Programs for better data descriptions
Reward institutions/communities
Policies to get clinical trial data into the public domain
10. Longer Term Strategy: Support for
The Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Repositories
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Training
Institutional Repositories
Commercial Repositories
Data Journals
New Reward
Systems
11. Longer Term Strategy: Support for
The Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Repositories
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Training
Institutional Repositories
Commercial Repositories
Data Journals
New Reward
Systems
15. The Role of Associate Director for Data
Science
1.
2.
3.
4.
5.
6.
7.
provide broad trans-NIH programmatic leadership in the area of
data science;
lead long-term NIH strategic planning in areas of data science;
provide oversight of the BD2K Initiative;
establish and nurture a trans-NIH intellectual and programmatic
‘hub’ for coordinating and enhancing data science activities;
coordinate with data science activities beyond NIH (e.g., other
government agencies, other funding agencies, and the private
sector);
play a major role in data sharing policy development and oversight
at NIH; and
interact with the Chief Information Officer, NIH to generate
synergy between BD2K and the Infrastructure Plus program.
16. Strategy
•
•
•
•
Use the Blueprint as a starting point
Work with IC’s to determine science drivers
Define developments needed for these drivers
Look for commonalities across IC’s – make those
a priority
• Manage and enable emergent developments
– data catalog – used to define the minimal data
description and a home for domain definitions
– Centers of excellence – test beds and exemplars for
best practices
17. Ways to Sell the NIH Data Science
Vision
• Developed in response to well recognized scientific needs
• Support for the complete research lifecycle – this is more
than just data
• Simple and well understood by all stakeholders (i.e.,
branded)
• A shared vision
• As ubiquitous as TCP/IP is to the Internet – a backbone for
the digital enterprise
• To data what PLOS is to knowledge – a movement that
people believe in and get behind
• An app store for the research enterprise
18. General Features of NIH Data Science
• Lightweight metadata standards
• Data & software registries
• Expanded policies on data sharing, open
source software
• Training programs & reward systems
• Institutional incentives
• Private sector incentives
• Data centers serving community needs