SlideShare una empresa de Scribd logo
1 de 35
One View of Data Science
Philip E. Bourne
peb6a@virginia.edu
https://www.slideshare.net/pebourne
April 5, 2022
Punchline – in 40+ Years in Academia I Have
Never Seen Anything Like It
• It is a response to the digital transformation of
society
• It is touching every discipline (aka vertical)
• We can keep the students out of our classes
• Cause – large amounts of digital data
• Effect – interdisciplinarity, openness, translation,
search for responsibility and more
In summary, it is disruptive and higher ed. better pay attention
My Perspective/Biases
• Practical Science Long standing computational biomedical researcher
• Open Access Co-Founder and Founding Editor in Chief PLOS
Computational Biology
• Open Knowledge First President of FORCE11
• Data are Value Involved in FAIR
• Translation First Associate Vice Chancellor for Innovation and
Industrial Alliances
• Funders as Lever First Associate Director for Data Science NIH – preprints,
data sharing, BD2K, etc.
• Change Higher Ed Founding Dean School of Data Science
There is a Precedent Which Points to What is
Coming
http://www.ornl.gov/hgmis
• High throughput DNA digital data changed how
we think about biomedicine
• Spawned a new field – bioinformatics /
computational biology/ systems biology /
biomedical data science
• Spawned a multi-billion dollar industry
Is Bioinformatics Dead? PLOS Biology 2021
1991-1995
1993-1998
1998-2003
2003-2010
2011-Present
More on the Data Driven Genomic
Revolution
[Adapted from Eric Green, Director NHGRI]
Life Sciences – The Digital Effect
1980s 1990s 2000s 2010s 2015 2022
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Sequence Genomes Omics Patient Multi-scale
The People:
No name Technicians Industry Bioinformaticians Systems Biologists Data Scientists
From a Presentation to the Advisory Committee to the NIH Director
Given this history what do we need to do
differently to accelerate the process and not make
the same mistakes?
Let’s breakdown one success story to see what
happened and why
https://medium.com/proteinqure/welcome-into-the-fold-bbd3f3b19fdd
Google’s DeepMind’s AlphaFold2 makes gigantic leap in solving
protein structures
AlphaFold2
Numerical optimization – differential programming
Overall gradient descent trained to win CASP
Jumper et al.., 2021. Nature, 596 (7873),
pp.583-589
Transformer models using attention
Geometry invariant to
translation/rotation
Logistics Behind the Win
● Nothing fundamentally new from an AI perspective
● Data Integration
● Collaboration not competition
● Engineering challenge beyond most labs
● Compute power beyond most labs
● Team size beyond most labs
● Worked with protein structure specialists
Downstream Implications
• Cooperation rather than competition
• Public-private partnership
• Translational possibilities are endless
• Made possible by curated open data
• Appreciate engineering
Given these precedents how should we think
about data science in an academic context?
Big data and data science are like the Internet…
If I asked you to define them you would all say
something different, yet you use them every day…
http://vadlo.com/cartoons.php?id=357
The right culture starts with all being on the
same page as to how we define data science
One Representation of Data Science –
The 4+1 Model
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[From Raf Alvarado]
The Data Science Interplay
• Value + Design = Openness,
responsibility
• Value + Analytics = Human
centered AI, algorithmic bias
• Value + Systems =
sustainability, access,
environmental impact
• Design + Analytics = literate
programming, visualization
• Design + Systems =
dashboards, engineering
design
• Analytics + Systems = ML
engineering
[From Raf Alvarado]
Thinking of data as a science unto itself is novel and controversial
Lets dig into a couple of these quadrants ….
Databases
organize data
around a project.
Data warehouses
organize the data
for an organization
Data commons
organize the data
for a scientific
discipline or field
Data
Warehouse
Data Ecosystems
See Forthcoming Science Policy Forum
Challenges
Fixed level of funding
Opportunities
data commons
Data commons co-locate data
with cloud computing
infrastructure and commonly
used software services, tools &
apps for managing, analyzing and
sharing data to create an
interoperable resource for the
research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE
Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818.
Systems
[Adapted from Bob Grossman]
Research ethics
committees (RECs) review
the ethical acceptability
of research involving
human participants.
Historically, the principal
emphases of RECs have
been to protect
participants from physical
harms and to provide
assurance as to
participants’ interests and
welfare.*
[The Framework] is
guided by, Article 27
of the 1948 Universal
Declaration of Human
Rights. Article 27
guarantees the rights
of every individual in
the world "to share in
scientific
advancement and its
benefits" (including to
freely engage in
responsible scientific
inquiry)…*
Protect human
subject data
The right of human
subjects to benefit
from research.
*GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR
Data sharing with protections provides the evidence
so patients can benefit from advances in research.
Balance protecting human subject data
with open research that benefits
patients
[Adapted from Bob Grossman]
Value
A Data Integration Poster Child
Researcher and Assistant Professor of
Medicine Dr. Thomas Hartka, also a
current online Masters in Data Science
student, is combining two disparate
data sets—electronic health records
and DMV crash data—to save lives
after motor vehicle crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
We Are Not Alone … But We Are Unususal
Furthering Discovery to Build a Better World
RESEAR
CH
Cybersecurity
Detecting broad-spectrum cyber
threats almost immediately after
they are launched through a $7.6
million Defense Advanced
Research Projects Agency
(DARPA) grant.
Environment
Using NASA data collected aboard the
International Space Station to examine
climate change in the Shenandoah National
Forest and beyond, and find solutions
Health & Medicine
Securing high-performance computing
equipment and personnel to allow
collaboration across the university on brain
science research like Autism, Alzheimer’s,
mental health disorders, traumatic brain
injuries and more.
Business
Discovering what makes a job
interview successful for the
candidate and the recruiter, and
how to mitigate bias in the
recruiting process
Democracy
Investigating how terrorist groups recruit
women through propaganda and examining
risk and threat assessment for extremist
violence perpetrated by women.
Education
Helping economically disadvantaged,
underrepresented populations pursue tailored
educational workforce pathways that have a
higher probability of leading them to success.
SDS Current Research Portfolio
12
7
4
3
2
3
3
Research Areas
Healthcare/Life Sciences
Technology/Software
Defense/Cybersecurity
Finance/Fintech
Energy/Environment
Education & Digital
Humanities
SDS strives to be a connector – a place where interdisciplinary
research driven by common data, methods and expertise
comes together
With So Much Opportunity – What To Do?
Leverage what our institution is already good at…
For us that is leadership, policy, law
Why Responsible Data Science?
• A defining feature
• A partnership between STEM, social
sciences and the humanities
• Where UVA has strength
Challenges
• Deciding what not to do
• Competition for the best team members (faculty and staff)
• Establishing a diverse team
• Lack of a comprehensive enterprise-wide data infrastructure
• Its easier to conform
Growing the School
M.S. IN DATA SCIENCE
Residential & Online
202
0
2020-
2023
UNDERGRADUATE
MINOR
2022
PH.D. PROGRAM
2023
UNDERGRADUATE
MAJOR
Building occupied
Team Size (FTEs)
5
40
60
80
120
Research
$5M
$10M
$20M
$30M
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
Of course this was all predicted
by smart people ..
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
Metabolic
Signaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWAS
Population
dynamics
Microbiota
From Harnessing Big Data for Systems Pharmacology 2017
https://doi.org/10.1146/annurev-pharmtox-010716-104659
Current roadblocks are more cultural than technical
The Fifth Paradigm: Integration Across Scales?
Gohlke et al. 2022
https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726
Real World Evidence for Preventive Effects of Statins on
Cancer Incidence: A Transatlantic Analysis
EHR
Animal Models
Pathways
Questions I Leave You With ….
• Have I overstated the case for data science?
• Are we currently doing the best by our students?
• Are the models we propose the right ones?
• What should we be doing differently?

Más contenido relacionado

La actualidad más candente

Smart Data in Health – How we will exploit personal, clinical, and social “Bi...
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...Smart Data in Health – How we will exploit personal, clinical, and social “Bi...
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...
Amit Sheth
 

La actualidad más candente (20)

MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
 
Fair by design
Fair by designFair by design
Fair by design
 
Research in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career ResearchersResearch in the time of Covid: Surveying impacts on Early Career Researchers
Research in the time of Covid: Surveying impacts on Early Career Researchers
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
Do Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer NatureDo Open data badges influence author behaviour? A case study at Springer Nature
Do Open data badges influence author behaviour? A case study at Springer Nature
 
A Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital EnterpriseA Successful Academic Medical Center Must be a Truly Digital Enterprise
A Successful Academic Medical Center Must be a Truly Digital Enterprise
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
 
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
Engaging Diverse Communities in Cancer Conversations Through Creation of Stru...
 
SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?SWOT Analysis - What Does it Tell Us?
SWOT Analysis - What Does it Tell Us?
 
Highlights from NIH Data Science
Highlights from NIH Data ScienceHighlights from NIH Data Science
Highlights from NIH Data Science
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?
 
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...Smart Data in Health – How we will exploit personal, clinical, and social “Bi...
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
RDAP 033111
RDAP 033111RDAP 033111
RDAP 033111
 

Similar a One View of Data Science

Similar a One View of Data Science (20)

What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
AMIA 2014
AMIA 2014AMIA 2014
AMIA 2014
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
Data at the NIH
Data at the NIHData at the NIH
Data at the NIH
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 
Data Management and Broader Impacts: a holistic approach
Data Management and Broader Impacts: a holistic approachData Management and Broader Impacts: a holistic approach
Data Management and Broader Impacts: a holistic approach
 
The Thinking Behind Big Data at the NIH
The Thinking Behind Big Data at the NIHThe Thinking Behind Big Data at the NIH
The Thinking Behind Big Data at the NIH
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Data!
Data!Data!
Data!
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 

Más de Philip Bourne

Más de Philip Bourne (19)

AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 
Social Responsibility in Research
Social Responsibility in ResearchSocial Responsibility in Research
Social Responsibility in Research
 
The Most Important Ten Simple Rules
The Most Important Ten Simple RulesThe Most Important Ten Simple Rules
The Most Important Ten Simple Rules
 
Capstone Experience - SWOT Analysis
Capstone Experience - SWOT AnalysisCapstone Experience - SWOT Analysis
Capstone Experience - SWOT Analysis
 
Data Science During and After COVID-19
Data Science During and After COVID-19Data Science During and After COVID-19
Data Science During and After COVID-19
 
Lessons in Modeling from 3-D Structural & Data Science Perspectives
Lessons in Modeling from 3-D Structural & Data Science PerspectivesLessons in Modeling from 3-D Structural & Data Science Perspectives
Lessons in Modeling from 3-D Structural & Data Science Perspectives
 
University of Virginia School of Data Science
University of Virginia School of Data ScienceUniversity of Virginia School of Data Science
University of Virginia School of Data Science
 
Biomedical Data Sciences - New Name and New Opportunities for Change?
Biomedical Data Sciences - New Name and New Opportunities for Change?Biomedical Data Sciences - New Name and New Opportunities for Change?
Biomedical Data Sciences - New Name and New Opportunities for Change?
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

One View of Data Science

  • 1. One View of Data Science Philip E. Bourne peb6a@virginia.edu https://www.slideshare.net/pebourne April 5, 2022
  • 2. Punchline – in 40+ Years in Academia I Have Never Seen Anything Like It • It is a response to the digital transformation of society • It is touching every discipline (aka vertical) • We can keep the students out of our classes • Cause – large amounts of digital data • Effect – interdisciplinarity, openness, translation, search for responsibility and more In summary, it is disruptive and higher ed. better pay attention
  • 3. My Perspective/Biases • Practical Science Long standing computational biomedical researcher • Open Access Co-Founder and Founding Editor in Chief PLOS Computational Biology • Open Knowledge First President of FORCE11 • Data are Value Involved in FAIR • Translation First Associate Vice Chancellor for Innovation and Industrial Alliances • Funders as Lever First Associate Director for Data Science NIH – preprints, data sharing, BD2K, etc. • Change Higher Ed Founding Dean School of Data Science
  • 4. There is a Precedent Which Points to What is Coming http://www.ornl.gov/hgmis • High throughput DNA digital data changed how we think about biomedicine • Spawned a new field – bioinformatics / computational biology/ systems biology / biomedical data science • Spawned a multi-billion dollar industry Is Bioinformatics Dead? PLOS Biology 2021
  • 5. 1991-1995 1993-1998 1998-2003 2003-2010 2011-Present More on the Data Driven Genomic Revolution [Adapted from Eric Green, Director NHGRI]
  • 6. Life Sciences – The Digital Effect 1980s 1990s 2000s 2010s 2015 2022 Discipline: Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver The Raw Material: Non-existent Sequence Genomes Omics Patient Multi-scale The People: No name Technicians Industry Bioinformaticians Systems Biologists Data Scientists From a Presentation to the Advisory Committee to the NIH Director
  • 7. Given this history what do we need to do differently to accelerate the process and not make the same mistakes?
  • 8. Let’s breakdown one success story to see what happened and why https://medium.com/proteinqure/welcome-into-the-fold-bbd3f3b19fdd
  • 9.
  • 10. Google’s DeepMind’s AlphaFold2 makes gigantic leap in solving protein structures
  • 11. AlphaFold2 Numerical optimization – differential programming Overall gradient descent trained to win CASP Jumper et al.., 2021. Nature, 596 (7873), pp.583-589 Transformer models using attention Geometry invariant to translation/rotation
  • 12. Logistics Behind the Win ● Nothing fundamentally new from an AI perspective ● Data Integration ● Collaboration not competition ● Engineering challenge beyond most labs ● Compute power beyond most labs ● Team size beyond most labs ● Worked with protein structure specialists
  • 13. Downstream Implications • Cooperation rather than competition • Public-private partnership • Translational possibilities are endless • Made possible by curated open data • Appreciate engineering
  • 14. Given these precedents how should we think about data science in an academic context?
  • 15. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… http://vadlo.com/cartoons.php?id=357
  • 16. The right culture starts with all being on the same page as to how we define data science
  • 17. One Representation of Data Science – The 4+1 Model • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [From Raf Alvarado]
  • 18. The Data Science Interplay • Value + Design = Openness, responsibility • Value + Analytics = Human centered AI, algorithmic bias • Value + Systems = sustainability, access, environmental impact • Design + Analytics = literate programming, visualization • Design + Systems = dashboards, engineering design • Analytics + Systems = ML engineering [From Raf Alvarado] Thinking of data as a science unto itself is novel and controversial
  • 19. Lets dig into a couple of these quadrants ….
  • 20. Databases organize data around a project. Data warehouses organize the data for an organization Data commons organize the data for a scientific discipline or field Data Warehouse Data Ecosystems See Forthcoming Science Policy Forum
  • 21. Challenges Fixed level of funding Opportunities data commons Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center. Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818. Systems [Adapted from Bob Grossman]
  • 22. Research ethics committees (RECs) review the ethical acceptability of research involving human participants. Historically, the principal emphases of RECs have been to protect participants from physical harms and to provide assurance as to participants’ interests and welfare.* [The Framework] is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world "to share in scientific advancement and its benefits" (including to freely engage in responsible scientific inquiry)…* Protect human subject data The right of human subjects to benefit from research. *GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR Data sharing with protections provides the evidence so patients can benefit from advances in research. Balance protecting human subject data with open research that benefits patients [Adapted from Bob Grossman] Value
  • 23. A Data Integration Poster Child Researcher and Assistant Professor of Medicine Dr. Thomas Hartka, also a current online Masters in Data Science student, is combining two disparate data sets—electronic health records and DMV crash data—to save lives after motor vehicle crashes. “I enrolled in the MSDS program to expand my research on automotive safety. I have already used techniques from classes in my work. I hope to expand my research to real-time analytics to improve emergency room care.” — Dr. Thomas Hartka, UVA School of Medicine
  • 24.
  • 25. We Are Not Alone … But We Are Unususal
  • 26. Furthering Discovery to Build a Better World RESEAR CH Cybersecurity Detecting broad-spectrum cyber threats almost immediately after they are launched through a $7.6 million Defense Advanced Research Projects Agency (DARPA) grant. Environment Using NASA data collected aboard the International Space Station to examine climate change in the Shenandoah National Forest and beyond, and find solutions Health & Medicine Securing high-performance computing equipment and personnel to allow collaboration across the university on brain science research like Autism, Alzheimer’s, mental health disorders, traumatic brain injuries and more. Business Discovering what makes a job interview successful for the candidate and the recruiter, and how to mitigate bias in the recruiting process Democracy Investigating how terrorist groups recruit women through propaganda and examining risk and threat assessment for extremist violence perpetrated by women. Education Helping economically disadvantaged, underrepresented populations pursue tailored educational workforce pathways that have a higher probability of leading them to success.
  • 27. SDS Current Research Portfolio 12 7 4 3 2 3 3 Research Areas Healthcare/Life Sciences Technology/Software Defense/Cybersecurity Finance/Fintech Energy/Environment Education & Digital Humanities SDS strives to be a connector – a place where interdisciplinary research driven by common data, methods and expertise comes together
  • 28. With So Much Opportunity – What To Do? Leverage what our institution is already good at… For us that is leadership, policy, law
  • 29. Why Responsible Data Science? • A defining feature • A partnership between STEM, social sciences and the humanities • Where UVA has strength
  • 30. Challenges • Deciding what not to do • Competition for the best team members (faculty and staff) • Establishing a diverse team • Lack of a comprehensive enterprise-wide data infrastructure • Its easier to conform
  • 31. Growing the School M.S. IN DATA SCIENCE Residential & Online 202 0 2020- 2023 UNDERGRADUATE MINOR 2022 PH.D. PROGRAM 2023 UNDERGRADUATE MAJOR Building occupied Team Size (FTEs) 5 40 60 80 120 Research $5M $10M $20M $30M
  • 33. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics Metabolic Signaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWAS Population dynamics Microbiota From Harnessing Big Data for Systems Pharmacology 2017 https://doi.org/10.1146/annurev-pharmtox-010716-104659 Current roadblocks are more cultural than technical The Fifth Paradigm: Integration Across Scales?
  • 34. Gohlke et al. 2022 https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726 Real World Evidence for Preventive Effects of Statins on Cancer Incidence: A Transatlantic Analysis EHR Animal Models Pathways
  • 35. Questions I Leave You With …. • Have I overstated the case for data science? • Are we currently doing the best by our students? • Are the models we propose the right ones? • What should we be doing differently?

Notas del editor

  1. History Culture NHGRI role >Defined eras
  2. I will introduce the concept of data science with a story that illustrates - citizen engagement, merging of unexpected data and societal benefit