SlideShare una empresa de Scribd logo
1 de 53
Big Data & Analytics Across the
Interdisciplinary Divide
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
12/17/18 BigDIA 1
@pebourne
Perspective
• I was not trained as a data scientist or computer scientist - I
started as a physical chemist
• At this point I can’t give you a deep technical perspective
• My examples are taken from biomedicine, but broadly
applicable
• Deeply engaged in preparing one academic institution for a very
different data driven interdisciplinary future
12/17/18 BigDIA 2
My motivation
The biggest gains for our society are going to come
through interdisciplinary research where data and
analytics catalyze the collaboration
12/17/18 BigDIA 3
Consider a wake up call of sorts
12/17/18 BigDIA 4
A wake up call of sorts
12/17/18 BigDIA 5
https://www.sciencemag.org/news/2018/12/google-s-deepmind-aces-protein-folding
https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/
Data as driver
12/17/18 BigDIA 6
https://www.ebi.ac.uk/uniprot/TrEMBLstats
Contents of the Protein Data Bank
This is a somewhat predictable outcome..
The real excitement comes from the unexpected …
Witness the tale of the trauma surgeon …
12/17/18 BigDIA 7
But there is more…
Air pollution-ecosystem feedback: unmanned
aerial vehicles and ecosystem models to
quantify ozone-forest interactions
12/17/18 BigDIA 8
• Spatial heterogeneity
• Novel sampling
• Senor data
Departments:
Environmental Sciences
Electrical Engineering
A working definition of what we are doing …
It is the unexpected re-use of information which is
the value added by the web
Tim Berners-Lee
12/17/18 BigDIA 9
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
A working definition of what we are doing …
It is the unexpected re-use of information which is
the value added by the web and subsequent
analysis of that information for societal benefit
Tim Berners-Lee / Phil Bourne
12/17/18 BigDIA 10
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
Of course this was all predicted by smart people ..
12/17/18 BigDIA 11
12
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
12/17/18 BigDIA
I would suggest that this audience has a
responsibility to promote the fourth paradigm
which is not a well recognized phenomenon across
disciplines …
Here is one example of how to do so
12/17/18 BigDIA 13
How Will Science Change?
1412/17/18 BigDIA
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - Photography
1512/17/18 BigDIA
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
Open, complex, diverse digital data
Systems Pharmacology
Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262
12/17/18
16
BigDIA
How should we think about organizing ourselves in
an interdisciplinary way to maximize the
opportunities offered by the fourth paradigm?
12/17/18 BigDIA 17
The Pillars of Data Science
18
Application Domains
12/17/18 BigDIA
Lets briefly focus on those five pillars
in the context of one area of
biomedical informatics – structural
bioinformatics
What kinds of interchange should be
taking place between this field and
data science?
12/17/18 BigDIA 19
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
Data Acquisition
• Persistence of raw data not clear
• Some level of consistency across instrument manufacturers
• Lessons in community/society drive
12/17/18 BigDIA 20
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
Data Integration and Engineering
• URI’s no - stooped in tradition
• Ontologies – somewhat
• Linked data - somewhat
2112/17/18 BigDIA
Years of experience to convey
Data Analytics
22
–SVM’s
–Random forest
–Neural nets
–Deep learning
–??
12/17/18 BigDIA
Opportunity to learn from many domains
Visualization & Dissemination
• Avoid the curse of the
ribbon
• Think sonics
• Look to video games
2312/17/18 BigDIA
Ethics, Law & Policy –
Community Driven Data Sharing
12/17/18 BigDIA 24
How to implement this at any level?
12/17/18 BigDIA 25
Guiding Principles
• Be constantly strategic and nimble - think supply chain
• Be sustainable - do not over reach
• Be interdisciplinary
• Be a organization without walls
• Be diverse, accessible and open
• Be team not individually driven
• Strive for quality not quantity in education & research
• Be innovative and translational through new forms of engagement with
the private sector, government, NGOs, local, state, national and
international partners
2612/17/18 BigDIA
Guiding Principles
• Be constantly strategic and nimble - think supply chain
• Be sustainable - do not over reach
• Be interdisciplinary
• Be a organization without walls
• Be diverse, accessible and open
• Be team not individually driven
• Strive for quality not quantity in education & research
• Be innovative and translational through new forms of engagement with
the private sector, government, NGOs, local, state, national and
international partners
2712/17/18 BigDIA
Be Interdisciplinary – Be Without Walls
• Satellites – discipline driven - located in another School
focusing on the mission of that School where data and
analytics play a role, e.g.,
– SOM – data governance and clinical translation
– Education – working on educational analytics
• Centers – Focus area driven e.g.
– Ethics and justice
– Neurodegenerative disorders – Alzheimer's, autism, TBI
– Sports analytics
2812/17/18 BigDIA
Guiding Principles
• Be constantly strategic and nimble - think supply chain
• Be sustainable - do not over reach
• Be interdisciplinary
• Be a organization without walls
• Be diverse, accessible and open
• Be team not individually driven
• Strive for quality not quantity in education & research
• Be innovative and translational through new forms of engagement with
the private sector, government, NGOs, local, state, national and
international partners
2912/17/18 BigDIA
Be Diverse, Accessible and Open – Why?
• Data science exists largely because of open data
• Open knowledge encourages disciplinary and interdisciplinary
collaboration
• Yet much of the scholarship we produce is not accessible at all and
certainly not accessible to socioeconomically disadvantaged groups
• Gouging by commercial knowledge providers is making the
knowledge produced by others less accessible to us
• Research is suffering from a reproducibility crisis addressable
through greater access to all aspects of the research lifecycle
3012/17/18 BigDIA
Be Diverse, Accessible and Open – Why?
Consider Biomedicine
• Big Data
– Total data from NIH-funded research back in 2016 estimated at 650
PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10
PB in 2016
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives * In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
12/17/18 BigDIA 31
A call for making these data open
• Mandates
– NIH, NSF, Data Management Plans
• Business models can be
protected yet everyone benefits
• It saves lives ….
12/17/18 BigDIA 32
Why a more open process?
Use case:
Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
From Adam Resnick12/17/18 BigDIA 33
Timeline of genomic studies in DIPG
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary,
co-occurring mutation
From Adam Resnick
12/17/18 BigDIA 34
What do we need to do differently
to reveal ACVR1?
• ACVR1 is a targetable kinase
• Inhibition of ACVR1 inhibited tumor
progression in vitro
• ~300 DIPG patients a year
• ~60 are predicted to have ACVR1
• If large scale data sets were only
integrated with TCGA and/or rare
disease data in 2012, ACVR1 mutations
would have been identified
• 60 patients/year X 3 years = 180
children’s lives (who likely succumbed
to the disease during that time) could
have been impacted if only data were
FAIR
From Adam Resnick
12/17/18 BigDIA 35
Research Data Infrastructure …
Both funders and some institutions
see the need to move from pipes to
platforms to accelerate research…
12/17/18 BigDIA 36
https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model-
750x410.png
If platforms are the answer we could
ask the question…
Will {biomedical} research become
more like Airbnb?
12/17/18 BigDIA 37
Vivien Bonazzi
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer
(renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and
consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working:
– 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
12/17/18 BigDIA 38
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
Platforms will ultimately digitally
integrate the scholarly workflow for
human and machine analysis
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
BigDIA 3912/17/18
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
Pilot Open Data Lab
(ODL) underway
BigDIA 4012/17/18
The NIH through the Big Data to Knowledge
(BD2K) is experimenting with a platform,
keeping in mind the need to overcome these
impediments
Enter The Commons
https://en.wikipedia.org/wiki/Ealing_Common
#/media/File:Ealing_Common_-
_geograph.org.uk_-_17075.jpg12/17/18 BigDIA 41
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent
Provider
Reagent
Consumer
Software
Provider
Software
Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
Commons –
Initial focus is on integrating two
layers of the scholarly workflow
12/17/18 BigDIA 42
Commons topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
12/17/18 BigDIA 43
Commons Compliance
• Treat products of research – data,
methods, papers etc. as digital objects
• These digital objects exist in a shared
virtual space
• Digital object compliance through FAIR
principles:
– Findable
– Accessible (and usable)
– Interoperable
– Reusable
https://commonfund.nih.gov/bd2k/commons
12/17/18 BigDIA 44
Why a comparison to Airbnb is not fair
• Airbnb was born digital
• The exchange of services on Airbnb are
simple compared to what is required of a
platform to support biomedical research
Nevertheless there is much to be
learnt
12/17/18 BigDIA 45
Impediments to platforms
• Current work practices by all stakeholders
• Entrenched business models
• Size of the undertaking aka resources
needed
• Trust
• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/
10-barriers-to-employee-
innovation/#8bdbaa811133
12/17/18 BigDIA 46
Even if they are successful, platforms are likely to be
domain specific and only address the
infrastructure..
What else is needed?
12/17/18 BigDIA 47
We need to promote openness
• Encourage persistent identifiers e.g., ORCID
• Encourage preprints
• Encourage Open Access (OA)
• Recognize openness in hiring and P&T
• Teach open scholarship
• Promote institutional openness – repositories, wikimedian in
residence
• Support institutional open data governance
• Support global community efforts….
12/17/18 BigDIA 48
Wikidata – fast growing
12/17/18 BigDIA 49
• Get on board with developments in schema.org, knowledge
graphs, etc… as part of the rule rather than the exception
• Provide metadata and opinion for data we produce or use
Let me summarize:
How do we address the interdisciplinary divide?
• Promote the fourth paradigm
• Work within your institutions to promote data science as an
interdisciplinary field
• Establish an open and integrated environment for data and
analytics
• Be patient and do not oversell …
12/17/18 BigDIA 50
12/17/18 BigDIA 51
Haas & Schmidt 2018
http://iswc2018.semanticweb.org/workshops-tutorials/#ekg
Acknowledgements
12/17/18 BigDIA 52
The BD2K Team at NIH
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Thank You
peb6a@virginia.edu
5312/17/18 BigDIA

Más contenido relacionado

Similar a Big Data and Analytics Across the Interdisciplinary Divide

Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as CatalystCurrent Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Martha Russell
 

Similar a Big Data and Analytics Across the Interdisciplinary Divide (20)

2016 08 gxaas
2016 08 gxaas2016 08 gxaas
2016 08 gxaas
 
Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
 
GODAN presentation at the 42nd APAN meeting
GODAN presentation at the 42nd APAN meetingGODAN presentation at the 42nd APAN meeting
GODAN presentation at the 42nd APAN meeting
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?
 
What Is It Going To Cost And What Is In It For Me?
What Is It Going To Cost And What Is In It For Me?What Is It Going To Cost And What Is In It For Me?
What Is It Going To Cost And What Is In It For Me?
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
It isnt easy being green, or is it?
It isnt easy being green, or is it?It isnt easy being green, or is it?
It isnt easy being green, or is it?
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
 
Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as CatalystCurrent Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
 
Data Science Meets Structural Biology
Data Science Meets Structural BiologyData Science Meets Structural Biology
Data Science Meets Structural Biology
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 

Más de Philip Bourne

Más de Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 

Último

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Big Data and Analytics Across the Interdisciplinary Divide

  • 1. Big Data & Analytics Across the Interdisciplinary Divide Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 12/17/18 BigDIA 1 @pebourne
  • 2. Perspective • I was not trained as a data scientist or computer scientist - I started as a physical chemist • At this point I can’t give you a deep technical perspective • My examples are taken from biomedicine, but broadly applicable • Deeply engaged in preparing one academic institution for a very different data driven interdisciplinary future 12/17/18 BigDIA 2
  • 3. My motivation The biggest gains for our society are going to come through interdisciplinary research where data and analytics catalyze the collaboration 12/17/18 BigDIA 3
  • 4. Consider a wake up call of sorts 12/17/18 BigDIA 4
  • 5. A wake up call of sorts 12/17/18 BigDIA 5 https://www.sciencemag.org/news/2018/12/google-s-deepmind-aces-protein-folding https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/
  • 6. Data as driver 12/17/18 BigDIA 6 https://www.ebi.ac.uk/uniprot/TrEMBLstats Contents of the Protein Data Bank
  • 7. This is a somewhat predictable outcome.. The real excitement comes from the unexpected … Witness the tale of the trauma surgeon … 12/17/18 BigDIA 7 But there is more…
  • 8. Air pollution-ecosystem feedback: unmanned aerial vehicles and ecosystem models to quantify ozone-forest interactions 12/17/18 BigDIA 8 • Spatial heterogeneity • Novel sampling • Senor data Departments: Environmental Sciences Electrical Engineering
  • 9. A working definition of what we are doing … It is the unexpected re-use of information which is the value added by the web Tim Berners-Lee 12/17/18 BigDIA 9 https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
  • 10. A working definition of what we are doing … It is the unexpected re-use of information which is the value added by the web and subsequent analysis of that information for societal benefit Tim Berners-Lee / Phil Bourne 12/17/18 BigDIA 10 https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
  • 11. Of course this was all predicted by smart people .. 12/17/18 BigDIA 11
  • 13. I would suggest that this audience has a responsibility to promote the fourth paradigm which is not a well recognized phenomenon across disciplines … Here is one example of how to do so 12/17/18 BigDIA 13
  • 14. How Will Science Change? 1412/17/18 BigDIA
  • 15. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 1512/17/18 BigDIA
  • 16. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota Open, complex, diverse digital data Systems Pharmacology Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 12/17/18 16 BigDIA
  • 17. How should we think about organizing ourselves in an interdisciplinary way to maximize the opportunities offered by the fourth paradigm? 12/17/18 BigDIA 17
  • 18. The Pillars of Data Science 18 Application Domains 12/17/18 BigDIA
  • 19. Lets briefly focus on those five pillars in the context of one area of biomedical informatics – structural bioinformatics What kinds of interchange should be taking place between this field and data science? 12/17/18 BigDIA 19 Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
  • 20. Data Acquisition • Persistence of raw data not clear • Some level of consistency across instrument manufacturers • Lessons in community/society drive 12/17/18 BigDIA 20 Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
  • 21. Data Integration and Engineering • URI’s no - stooped in tradition • Ontologies – somewhat • Linked data - somewhat 2112/17/18 BigDIA Years of experience to convey
  • 22. Data Analytics 22 –SVM’s –Random forest –Neural nets –Deep learning –?? 12/17/18 BigDIA Opportunity to learn from many domains
  • 23. Visualization & Dissemination • Avoid the curse of the ribbon • Think sonics • Look to video games 2312/17/18 BigDIA
  • 24. Ethics, Law & Policy – Community Driven Data Sharing 12/17/18 BigDIA 24
  • 25. How to implement this at any level? 12/17/18 BigDIA 25
  • 26. Guiding Principles • Be constantly strategic and nimble - think supply chain • Be sustainable - do not over reach • Be interdisciplinary • Be a organization without walls • Be diverse, accessible and open • Be team not individually driven • Strive for quality not quantity in education & research • Be innovative and translational through new forms of engagement with the private sector, government, NGOs, local, state, national and international partners 2612/17/18 BigDIA
  • 27. Guiding Principles • Be constantly strategic and nimble - think supply chain • Be sustainable - do not over reach • Be interdisciplinary • Be a organization without walls • Be diverse, accessible and open • Be team not individually driven • Strive for quality not quantity in education & research • Be innovative and translational through new forms of engagement with the private sector, government, NGOs, local, state, national and international partners 2712/17/18 BigDIA
  • 28. Be Interdisciplinary – Be Without Walls • Satellites – discipline driven - located in another School focusing on the mission of that School where data and analytics play a role, e.g., – SOM – data governance and clinical translation – Education – working on educational analytics • Centers – Focus area driven e.g. – Ethics and justice – Neurodegenerative disorders – Alzheimer's, autism, TBI – Sports analytics 2812/17/18 BigDIA
  • 29. Guiding Principles • Be constantly strategic and nimble - think supply chain • Be sustainable - do not over reach • Be interdisciplinary • Be a organization without walls • Be diverse, accessible and open • Be team not individually driven • Strive for quality not quantity in education & research • Be innovative and translational through new forms of engagement with the private sector, government, NGOs, local, state, national and international partners 2912/17/18 BigDIA
  • 30. Be Diverse, Accessible and Open – Why? • Data science exists largely because of open data • Open knowledge encourages disciplinary and interdisciplinary collaboration • Yet much of the scholarship we produce is not accessible at all and certainly not accessible to socioeconomically disadvantaged groups • Gouging by commercial knowledge providers is making the knowledge produced by others less accessible to us • Research is suffering from a reproducibility crisis addressable through greater access to all aspects of the research lifecycle 3012/17/18 BigDIA
  • 31. Be Diverse, Accessible and Open – Why? Consider Biomedicine • Big Data – Total data from NIH-funded research back in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 12/17/18 BigDIA 31
  • 32. A call for making these data open • Mandates – NIH, NSF, Data Management Plans • Business models can be protected yet everyone benefits • It saves lives …. 12/17/18 BigDIA 32
  • 33. Why a more open process? Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick12/17/18 BigDIA 33
  • 34. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick 12/17/18 BigDIA 34
  • 35. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 12/17/18 BigDIA 35
  • 36. Research Data Infrastructure … Both funders and some institutions see the need to move from pipes to platforms to accelerate research… 12/17/18 BigDIA 36 https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model- 750x410.png
  • 37. If platforms are the answer we could ask the question… Will {biomedical} research become more like Airbnb? 12/17/18 BigDIA 37 Vivien Bonazzi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  • 38. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 12/17/18 BigDIA 38 Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  • 39. Platforms will ultimately digitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 BigDIA 3912/17/18
  • 40. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Pilot Open Data Lab (ODL) underway BigDIA 4012/17/18
  • 41. The NIH through the Big Data to Knowledge (BD2K) is experimenting with a platform, keeping in mind the need to overcome these impediments Enter The Commons https://en.wikipedia.org/wiki/Ealing_Common #/media/File:Ealing_Common_- _geograph.org.uk_-_17075.jpg12/17/18 BigDIA 41
  • 42. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Commons – Initial focus is on integrating two layers of the scholarly workflow 12/17/18 BigDIA 42
  • 43. Commons topology Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons 12/17/18 BigDIA 43
  • 44. Commons Compliance • Treat products of research – data, methods, papers etc. as digital objects • These digital objects exist in a shared virtual space • Digital object compliance through FAIR principles: – Findable – Accessible (and usable) – Interoperable – Reusable https://commonfund.nih.gov/bd2k/commons 12/17/18 BigDIA 44
  • 45. Why a comparison to Airbnb is not fair • Airbnb was born digital • The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research Nevertheless there is much to be learnt 12/17/18 BigDIA 45
  • 46. Impediments to platforms • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/ 10-barriers-to-employee- innovation/#8bdbaa811133 12/17/18 BigDIA 46
  • 47. Even if they are successful, platforms are likely to be domain specific and only address the infrastructure.. What else is needed? 12/17/18 BigDIA 47
  • 48. We need to promote openness • Encourage persistent identifiers e.g., ORCID • Encourage preprints • Encourage Open Access (OA) • Recognize openness in hiring and P&T • Teach open scholarship • Promote institutional openness – repositories, wikimedian in residence • Support institutional open data governance • Support global community efforts…. 12/17/18 BigDIA 48
  • 49. Wikidata – fast growing 12/17/18 BigDIA 49 • Get on board with developments in schema.org, knowledge graphs, etc… as part of the rule rather than the exception • Provide metadata and opinion for data we produce or use
  • 50. Let me summarize: How do we address the interdisciplinary divide? • Promote the fourth paradigm • Work within your institutions to promote data science as an interdisciplinary field • Establish an open and integrated environment for data and analytics • Be patient and do not oversell … 12/17/18 BigDIA 50
  • 51. 12/17/18 BigDIA 51 Haas & Schmidt 2018 http://iswc2018.semanticweb.org/workshops-tutorials/#ekg
  • 52. Acknowledgements 12/17/18 BigDIA 52 The BD2K Team at NIH The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0

Notas del editor

  1. Model integration in systems pharmacology. Diverse models need to be integrated across multiple methodologies, multiple heterogeneous data sets, organismal hierarchy, and species (transportability).
  2. Distribution of kinases and the number of covalent small-molecule kinase inhibitors (CSKIs) for every targeted kinase across the human kinome
  3. $1.25bn per year to capture all data. After a significant effort at reduction, intramurally data is spread across > 60 data centers; imagine the extramural situation.
  4. Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons
  5. 53