2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs at LifeOmic: Harnessing the Power of the Cloud - Matthew Phillips, September 20, 2019
In this talk I'll discuss work in biomedical image and volume segmentation and classification, as well as outcome prediction modeling from insurance claims data that I've pursued at LifeOmic here in the Triangle. In the former case datasets include radiological image volumes, retinal fundus images, and cell images created with fluorescent microscopy. The latter includes MIMIC-III data represented as FHIR objects. I'll discuss the relative challenges and advantages of doing ML locally vs. on a cloud-based platform.
AI in Healthcare: Can AI Help in Diagnosing Coronavirus
Similar to 2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs at LifeOmic: Harnessing the Power of the Cloud - Matthew Phillips, September 20, 2019
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader
Similar to 2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs at LifeOmic: Harnessing the Power of the Cloud - Matthew Phillips, September 20, 2019 (20)
APM Welcome, APM North West Network Conference, Synergies Across Sectors
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs at LifeOmic: Harnessing the Power of the Cloud - Matthew Phillips, September 20, 2019
6. LifeOmic Team Overview
PEOPLE
• 35 cloud software developers
(architecture, UX/UI,
analytics, ML/AI)
• 15 mobile software
developers
• 8 scientific experts (genetics
and data science)
• 5 security experts
• 7 marketing and
administration
CORE COMPETENCIES
• Enterprise cloud software development
• Large-scale architectures
• Global AWS deployment
• Machine learning and AI
• Security
• Genomic data processing, interpretation,
and analytics
• Mobile application development
• iOS and Android
LOCATIONS
• Indianapolis (HQ)
• Research Triangle Park
• Salt Lake City
7. Data Ingestion: Electronic medical records,
Medical images, REDCap, Omics data, Patient
Acquired
The Precision Health Cloud
Ecosystem
Clinicians
Patients
Wearables and
connected devices
Researchers
8. Cloud/Mobile Precision Health Solution
FHIR | REST | GA4GH
FHIR | REST | GA4GH
• iOS andAndroid
• Evidence-based lifestyle factors proven
to improve health
• Healthy plants
• Exercise
• Mindfulness
• Sleep
• Metabolic flexibility (intermittent
fasting or time-restricted eating)
• Gamification
• Social interaction
• Based on the enormously successful
LIFE app
9. INDIANA UNIVERSITY
Precision Health Initiative - Disease Focused
• Adult Cancer
• Pediatric - Sarcomas
• Multiple Myeloma
• Diabetes
• Alzheimer’s Disease
Pharmacogenomics
10. IU Precision Health Architecture
IU clinicians and
researchers
Industry
Sequences
REDCap
LifeOmic PHC Platform
Standardized VCFs
Cohort
Builder
CMG
Sequences
IU Health
Clinical
Eskenazi
Clinical
INPC
Clinical
Imaging (e.g.,
Pathology)
Data Sources
IU Data Staging
Data Quality and
Standardization
FHIR
UITS DC2:
FASTQ/BAM
to VCFFASTQ, BAM,
VCF
UITS SDA
Archive
Subject
Viewer
Insights KB
Data Quality
External
Data
Sources
Data Commons
Archiving
FHIR
Intake
BAM,
VCF
IU System
LifeOmic System
Non-IU System
AnalyticsData
Storage
ML / AIAuto
Indexing
LifeOmic PHC AppsSurveysR StudioTableau
3rd Party Analytics Tools
API
API
LIFE
Mobile
11. LifeOmic Task Service – Bring Code to the Data
Data in PHC
(e.g. sequencing, images, EHR, mobile)
Execute Docker based tools
against the data
Analyze the results
In the PHC
• A Task is a sequence of Docker images that run against data stored in the PHC with the outputs going
back into the PHC.
• All of the data stays in the PHC to reduce transfer times and cost
• Tasks run on compute that is provisioned within the PHC based on a task’s CPU or GPU and memory
requirements
• Docker images can be pulled from Docker Hub or uploaded to the PHC for use in a task
• Gnosis provides genomic data sets like reference genomes that can be used as inputs to tasks.
13. OCR as a Service - Broad applicability
• Communication via Fax accounts for ~75 percent of all medical
communication1.
• OCR can be applied in real-time, and retrospectively.
• Relevance to all of healthcare, including consumer. Non-developer is
the end user.
• Huge repositories of data currently exist.
1 https://www.vox.com/health-care/2017/10/30/16228054/american-medical-system-fax-machines-why
14. Proposed Solution
1. Direct integration with EHRs to load PDF into PHC
2. Task Service: PDF de-noising, then to Textract
3. Apply Ontology Service (for lookup of key medical terms)
4. Display original PDF + OCR Text side by side in Subject Viewer
15. Referring
Medical
Oncologist
Faxes Clinical Notes
and Lab Values to IU.
Medical Associate
Scans Fax into EMR
(PDF Image)
Medical Abstractor: Pulls out what was
given, when, dosage, duration, prior
therapy, lab values. Manual entry into
REDCap. 4 – 5 hours.
Physician
manually
checks each
value. 2.5 – 3
hours per
patient
Loaded into PHC
Referring
Medical
Oncologist
Faxes Clinical Notes
and Lab Values to IU.
Medical Associate
Scans Fax into EMR
(PDF Image)
Physician
manually
checks each
value. 30 min –
1 hour
Loaded into PHCIngest Scan from EMR to
PHC. 30 min – 1 hour
6.5 – 8 Hours
1 – 2 Hours
16. Proposed Solution
1. Direct integration with EHRs to load PDF into PHC
2. Task Service: PDF de-noising, then to Textract
3. Apply Ontology Service (for lookup of key medical terms)
4. Display original PDF + OCR Text side by side in Subject Viewer
17. Proposed Solution
1. Direct integration with EHRs to load PDF into PHC
2. Task Service: PDF de-noising, then to Textract
3. Apply Ontology Service (for lookup of key medical terms)
4. Display original PDF + OCR Text side by side in Subject Viewer
22. So this has already been solved, right?
• There is far less published research on this than you might expect.
• https://www.kaggle.com/c/denoising-dirty-documents (2015)
• D, Vishwanath, Rohit Rahul, Gunjan Sehgal, Swati, Arindam Chowdhury, Monika Sharma,
Lovekesh Vig, Gautam Shroff, and Ashwin Srinivasan. “Deep Reader: Information Extraction
from Document Images via Relation Extraction and Natural Language.” ArXiv:1812.04377 [Cs],
December 11, 2018. http://arxiv.org/abs/1812.04377.
• Older papers, papers on image denoising generally …
• Also couldn’t find off-the-shelf specific document denoiser. No entry for this on ‘Papers with
Code’, for example.
• AWS Textract fails on all of the examples shown.
23. Our solution:
Use Attention U-Net (Oktay et al. 2017) and treat like a
segmentation task
Break the document into high-resolution tiles
25. Results
Residue and dark
background
eliminated.
Now many items
extracted (often
imperfectly)
Top: Before/after,
bottom: Textract
output. (No output
at all prior to
denoising.)
PHC ABAC can be used to control exactly who can do what with any subset of patient data.
Any authorized user can explore the data he/she has access to with PHC’s advanced visualizations as well as machine learning models.
IT only needs to configure access for users and no longer needs to be the gatekeeper for all data manipulation.
The PHC REST API opens the door to rapid innovation since everything is available via a simple web interface but still secure and access-controlled.
The hospital can easily add custom tiles to LX to provide additional capabilities to patients.
Over time, PHC+LX can eliminate the need for expensive systems such as Oracle data warehouses, risk stratification systems, etc.
--
Aggregate advanced data such as genomics to make it fully actionable
Built in visualization / ML tools.
PHC ABAC centralizes and streamlines authentication and authorization. IT no longer has to be the gatekeeper.
FHIR and REST APIs to accelerate innovation.
LIFE Extend delivers actionable patient portal 2.0.
Supports precision health while reducing costs of delivering care