Atul Butte's presentation at #AMIA2021 for the Knowledge Discovery and Data Mining Workshop meeting
1. Precisely Practicing Medicine from
700 Trillion Points of
University of California Health Data
Atul Butte, MD, PhD
Chief Data Scientist, University of California Health (UC Health)
Priscilla Chan and Mark Zuckerberg Distinguished Professor
Director, Bakar Computational Health Sciences Institute, UCSF
atul.butte@ucsf.edu ▪ @atulbutte
2. Conflicts of Interest
• Scientific founder
– Personalis
– NuMedii
– Carmenta (Progenity)
– Genstruct
• Honoraria for talks
– Lilly
– Pfizer
– Siemens
– Bristol Myers Squibb
– AstraZeneca
– Roche
– Genentech
– Warburg Pincus
– CRG
– AbbVie
– Westat
• Past or present
consultancy
– Personalis
– NuMedii
– Lilly
– Johnson and Johnson
– Roche
– Genstruct
– Tercica
– Ecoeos
– Helix
– Ansh Labs
– uBiome
– Prevendia
– Samsung
– Assay Depot
– Regeneron
– Verinata (Illumina)
– Pathway Diagnostics
– Geisinger Health
– Covance
– Wilson Sonsini Goodrich &
Rosati
– Orrick
– 10X Genomics
– GNS Healthcare
– Gerson Lehman Group
– Coatue Management
• Other corporate
relationships
– Northrop Grumman
– Genentech
– Johnson and Johnson
– Optum
• Shares or Ownership
– NuMedii (major)
– Personalis (major)
– Apple
– Facebook
– Alphabet (Google)
– Microsoft
– Amazon
– Snap
– 10x Genomics
– Illumina
– Nuna Health
– Assay Depot (Scientist.com)
– Vet24seven
– Regeneron
– Sanofi
– Royalty Pharma
– AstraZeneca
– Moderna
– Biogen
– Paraxel
– Sutro
• Speakers’ bureau
– None
• Companies started by
students
– Carmenta
– Serendipity
– Stimulomics
– NunaHealth
– Praedicat
– MyTime
– Flipora
– Tumbl.in
– Polyglot
– Iota Health
– Ongevity Health
5. Faculty Recruitments 2015-2021
Dara
Torgerson
Will
Brown
Alejandro Sweet-
Cordero
Udi
Manber
Julia Adler-
Milstein
Ted
Goldstein
Ashish Raj Marina
Sirota
Julian Hong
Peds DOM,
DPM
Peds DOM DOM, DHM Hem Onc Radiology Peds Rad Onc
Vasilis
Ntranos
Jingjing Li Ben Rosner Franklin
Huang
Jean Feng John
Capra
Yulin Hswen Ryan
Hernandez
Tom
Peterson
Vivek
Rudrapatna
Diabetes
Ctr
Neurol DOM, DHM DOM,
Onc
DEB DEB DEB BTS DOM,
Ortho
DOM, Gastro
7. AWS & Wynton Shared Compute & Storage
For many types of computing needs
Driving Integrative Research/Precision Medicine
Spanning Basic, Translational, Clinical, Population, ….
Information Commons
SPOKE Knowledge Network
Reflecting how human biology really works - pathways
Inquiry Tools and Talent
Clinical
Notes
Radiology
Images
Info
Extract
Structured
Clinical
Data
35+ biomed
reference
databases
Genomic
profiles
Clinical
Commons
Imaging
Commons
Omics
Commons
Information Commons
The Research Environment For Precision Medicine
Nearly 200 researchers onboarded
8. Clinical Notes De-identification
State of the art algorithm developed at UCSF
25,000 manually annotated notes for training and testing
75 million notes de-identified in development environment
3rd party vendor certifying data de-identification process
ü
ü
Collaboration with Privacy and IRB on making the data
available upon completion of certification
ü
Beau Norgeot
github.com/beaunorgeot/philter_ucsf
ü
ü
10. Post-approval safety
Updating side effect rates
Discovering novel side effects
Supporting regulatory approval
Single-arm experimental trials
”Digital approvals”
Biosimilar development
Informing clinical trials design
Better patient selection
Trimming the trials: more efficient data collection
Continually establishing efficacy
Assessing the efficacy-effectiveness gap
Searching for efficacy in specific populations
Effect modifiers and precision medicine
Long-term, post trial outcomes
Comparative effectiveness
Integrating costs with comparative effectiveness
Understanding effects of pharmacy practices on healthcare utilization
Studying novel on-label pharmaceuticals versus older off-label drugs
Studying the practice of medicine
Quality of practice, medical errors
Standardizing care and care delivery
The effect of payors on medical care
Are new-generation diagnostics improving outcomes?
Data driven decision support
Clinical-Decision Support: the Provider Perspective
Clinical-Decision Support: the Patient Perspective
Clinical-Decision Support: the Community Perspective
Twenty-one uses for Real World Data
Vivek Rudrapatna
bit.ly/jci_RWD
12. Over 100 now approved just counting radiology!
http://bit.ly/acr_ai
13. Why is AI back in biomedicine?
• Great hardware from video gamers
• Software libraries with novel machine learning methods
• More big data sets to train with
– Molecular data, genomics on more people, more cases
– Electronic health records
• Hard unsolved questions need answers
– Patient predictions
– Efficient drug discovery
– Population modeling
14. The United States is spending $billions on electronic
health records, and too few are using any of this data
15. University of California
• 10 campuses and 3 national labs
• ~200,000 employees, ~250,000 students/yr
UC Health
• 20 health professional schools (6 med schools)
• Train half the medical students and residents in
California
• ~$2 billion NIH funding
• $13+ billion clinical operating revenue
• 5000 faculty physicians, 12000 nurses
• UCSF and UCLA are in US News top 10
• 5 NCI Comprehensive Cancer Centers, 5 NIH CTSA
• IRB reliance, centralized contracting
16. UC Health Data Analytics Platform
Combining healthcare data from across the
six University of California medical schools and systems
Data Warehouse
17. • Combined EHR data from UCSF, UCLA, UC Irvine, UC Davis,
UC San Diego, and UC Riverside
• Central database built using OMOP (not Epic) as a data backend
– First Epic installation was January 2012
– Structured data from 2012 to the present day
– 7.0 million patients with “modern” data
– 220M encounters, 560M procedures, 798M med orders,
722M diagnosis codes, 2.1B lab tests and vital signs
– “From Tylenol to CAR-T cells…”
– California OSHPD data, pathology and radiology text elements, death index
– Claims data from our self-funded plans now included
– Continually harmonizing elements
• Quality and performance dashboards
The University of California has an
incredible view of the medical system
19. All University of California academic medical centers provide
health data to patients through FHIR (and Apple Health)
bit.ly/ucappleh
20. Many operational teams within UC Health now using and
benefitting from the UC Health Data Warehouse, saving $millions
Central tools to
improve quality
of care
Decreasing specific
unnecessary
inpatient drug use
Managing costs
in our self-
funded health
plans
Centralized
population health
management
21. Decile 1 (Least Disadvantaged)
Decile 10 (Most Disadvantaged)
Suppressed Value
California - 2015 ADI State Rankings
• Estimate socioeconomic status based on
income, education, employment, housing
quality at the neighborhood level
• Use with race, ethnicity, gender, age to
identify health disparities
• Central geocoding capability to link 9-digit
zip code for addresses
• We have now geocoded our UC-wide
primary care population
• ADI is significantly associated with adverse
health outcomes in our patients, in
addition to race, ethnicity, and age
Social Determinants of Health: Area Deprivation Index (ADI)
22. San Francisco Bay Area Sacramento Area
San Diego Area
Riverside Area
Los Angeles Area
Orange County Area
33. AI/ML lessons I’ve learned over 25 years
• Get the question right; solve problems that health care professionals need solved, don't guess
– And verify good questions and good unmet needs with more than one doc
– Build a great diagnostic vs. understanding the biology
• Perfectly lassoed variables may miss the big picture biology
– Medical professionals (and biologists) really love explanations over black boxes
• Watch out for input limiting models
– Patients might not type in the right codes for their symptoms
– They barely enter their own race/ethnicity
– And docs?
• Learn what IRB, HIPAA, BAA are. Learn what ICD-10 and CPT codes are. CLIA and CAP.
– And learn patience.
– Not all of us are cloud allergic.
• Not everything needs deep learning
• Having all data on everyone is super rare: genomics, images, and longitudinal EHR data
• Data integration and harmonization can happen if there is a business reason for it
• $1 trillion dollars is spent on Medicare and Medicaid. Would your parents would use it?
• Health care inefficiency is not about friction, it’s about resistance to change
34. Collaborators
• Elizabeth Thomson, Patrick Dunn, Morgan
Crafts / Northrop Grumman
• Quan Chen, Anupama Gururaj, Dawei Lin /
NIAID
• Andrei Goga / UCSF Oncology
• Mallar Bhattacharya / UCSF Pulmonary
• Minnie Sarwal / UCSF Nephrology
• Geoffrey Gurtner / UCSF Surgery
• Roberta Diaz Brinton / Arizona
• Carol Bult / Jackson Labs
• Alejandro Sweet-Cordero, Julien Sage /
Pediatric Oncology
• Takashi Kadowaki, Momoko Horikoshi, Kazuo
Hara, Hiroshi Ohtsu / U Tokyo
• Kyoko Toda, Satoru Yamada, Junichiro Irie /
Kitasato Univ and Hospital
• Shiro Maeda / RIKEN
• David Stevenson, Gary Shaw / Stanford
Neonatology
• Mark Davis, C. Garrison Fathman / Stanford
Immunology
• Russ Altman, Steve Quake / Stanford
Bioengineering
• Euan Ashley, Joseph Wu / Stanford Cardiology
• Mike Snyder, Carlos Bustamante, Anne Brunet
/ Stanford Genetics
• Jay Pasricha / Stanford Gastroenterology
• Rob Tibshirani, Brad Efron / Stanford Statistics
• Hannah Valantine, Kiran Khush/ Stanford
Cardiology
• Mark Musen, Nigam Shah / National Center for
Biomedical Ontology
• Sam So, Ken Weinberg, David Miklos /
Stanford Oncology
35. The Team
Atul Butte
Carrie Byington
Cora Han
Pagan Morris
Monte Ratzlaff
Mike Kilpatrick
Lisa Dahm
Ayan Patel
Aiden Barin
Chaya Mohn
Ray Pablo
Tim Hayes
David Gonzalez
Rob Follett
Teju Yardi
Nadya Balabanova
Ellen Pollack (UCLA)
Chris Longhurst (UCSD)
Scott Joslyn (UCI)
Joe Bengfort (UCSF)
Ashish Atreja (UCD)
Tom Andriola (UCI)
The CIO Team
Kent Anderson
Steve Covington
Hemanth Tatiparthi
Ranjit Singh
Jodi Nygaard
Jeffrey Sterett
Ralph Perrin
Thomas Ami
David Merrill
Dan Phillips
Melody Hill
Kathy Pickell
Leanie Mayor
Neaktisia Lee
Albert Duntugan
Andrew Weaver
Yael Berkovich
Bill Lazarus
Edgar Tijerino
Jay Shah
Vajra Kasturi
Bill Cinnater
Josh Pelino
Shital Pandya
Priya Sreedharan
Josh Glandorf
Jennifer Holland
Peter Ryan
Andy Lucas
Kirk Kurashige
Calvin Fong
Rick Larsen
Nelson Lee
Brian Chan
Oksana Gologorskaya
Sandy Stall
Thomas Huray
Jason Woll
36. Support
Admin and Tech Staff
• Andrew Jan
• Mounira Kenaani
• Pam Allarde
• Boris Oskotsky
• Tyler Fogal
• University of California, San Francisco
• Priscilla Chan and Mark Zuckerberg
• Barbara and Gerson Bakar Foundation
• NIH: NIAID, NLM, NIGMS, NCI, NHLBI, OD; NIDDK, NHGRI, NIA, NCATS, NICHD
• Intervalien Foundation, Leon Lowenstein Foundation
• March of Dimes, Juvenile Diabetes Research Foundation
• California Governor’s Office of Planning and Research
• Howard Hughes Medical Institute, California Institute for Regenerative Medicine
• Hewlett Packard, L’Oreal, Progenity, Genentech
• Scleroderma Research Foundation
• Clayville Research Fund, PhRMA Foundation, Stanford Cancer Center, Bio-X, SPARK
• Tarangini Deshpande
• Kimayani Butte
• Carrie Byington, Talmadge King, Mark Laret
• Jack Stobo, Sam Hawgood, Keith Yamamoto
• Isaac Kohane