SlideShare una empresa de Scribd logo
1 de 29
Cost of Medical Care
in Older Adults with Chronic Conditions
Randal S. Goomer, PhD
Data Science
DS-SF-29
General Assembly
San Francisco, 1/25/2017
Randal S. Goomer, PhD
DataSet
Medicare 2010 Patient records
Expunged of all personal or identifiable information (33M patient profiles)
Provided by
CMS
As: 2010 Chronic Conditions Public Use File (PUF)
(2010 CMS CC PUF)
Patient Age
Categories: 1 – 6
1. 62-64
2. 65-69
3. 70-74
4. 75-79
5. 80-85
6. 85----
Therapy Behavior:
• Number of Out-
Patient Visits
• Number of In-
Patient admits
• Medicare Part A,
B, C, D, E
payments
Chronic Conditions  CC
• Alzhiemers
• Cancer
• CHF
• Diabetes
• ChrKidneyDisease
• Stroke
• Osteoporosis
• Depression
• COPD
• Ischemic Heart Condition
• Stroke
• Arthritis
Patient Gender
Cost
(Payouts by Medicare)
Randal S. Goomer, PhD
Dataset Includes a 4-page Dictionary of Terms
(Sample: page 1 of 4)
Randal S. Goomer, PhD
Can we predict costs based on patient profile or behavior?
BI Questions
• Does the type of chronic conditions (CC) impact Costs?
• Does age or gender impact cost?
• Does patient behavior such as accessing OP facilities vs. IP
admits influence costs?
• Which behavior costs more of less?
• Which CC cost more or less?
Ho: <<we cannot predict cost from patient profile and behavior>>
H1: <<Patient profile and behavior can predict cost>>
Randal S. Goomer, PhD
ML models
• Data Munging (pd, np)
• Random Forest
• RF optimized by bootstrapping (with replacement)
• and OOB error testing against ROC/AUC
• OLS (p-val, r-squared, coeff., predict accuracy)
• Logit regression (coeff_, curve_fitting, Predict Prob.)
• K-means classification (with K_fold-grid_CV optimization)
• Visualizations (Seaborn, MatplotLib)
Randal S. Goomer, PhD
Heatmap:
Order of importance
w.r.t. ‘payout’ or Cost:
- ip_admit = (hospitalization)
- op_visits = (offices/op-clinics)
- CC_CHF = (Chronic Heart Failure)
- CC_CANCER = (Cancer patients)
- CC_ISCHMCHT = (Ischemic Heart Dis)
- CC_CHRNKIDN = (Chronic kidney Dis)
After Detailed Data ‘Munging’, Heatmap was produced
Heatmap finds hidden correlations
Chronic Kidney Dis.
Osteoporosis
Randal S. Goomer, PhD
Cost v. In-Patient Admit
EDA: Costs Rise Quickly for number of In-Patient Admits
Payout >>
In-PatientAdmits
Randal S. Goomer, PhD
EDA: Costs Plateau out quickly for number of Out-Patient visits
Out-PatientVisits
Payout >>
Cost v. Out-Patient Visit
Randal S. Goomer, PhD
Patients with Chronic Heart Failure (CHF) by Age v. Cost
Patient age and Chronic Condition contributes to Cost
Payout>>
(62 yrs  85+ yrs) (62 yrs  85+ yrs)
CHF = TrueCHF = False
Randal S. Goomer, PhD
Age v. IP Admit and OP visitsIn-PatientAdmit
In-Patient Admits
Out-PatientVisits(62 yrs  85+ yrs) (62 yrs  85+ yrs)Randal S. Goomer, PhD
Payout v. Out-Patient visits
Lin-Reg
Outpatient VisitsRandal S. Goomer, PhD
Payout v. IP Admit
Lin-Reg
Payout
IP AdmitRandal S. Goomer, PhD
Ip_admit significantly affects cost
based on p value but not linearly
(r2 = 0.535)
Randal S. Goomer, PhD
Age is a predictor of
cost but does not have
a linear relationship
due to low r2 value
Randal S. Goomer, PhD
When you have cancer
then age is no longer a
predictor of cost
Randal S. Goomer, PhD
Randal S. Goomer, PhD
IP Admit for Osteoporosis
Not Needed Anymore
(Because of ready availability of long acting drugs)Randal S. Goomer, PhD
IP Admit for Chronic Heart Failure (CHF)
IP Admit
No Need
Still Needed
Randal S. Goomer, PhD
IP Admit for Chronic Heart Failure (CHF)
IP Admit Still Needed
Very much
Logistic regression: Log predicted probability
Randal S. Goomer, PhD
CART = Classification and Regression Tree
Randal S. Goomer, PhD
Random Forest 1
Randal S. Goomer, PhD
Random Forest 1
Feature Importance
Compared to our heatmap from above
Randal S. Goomer, PhD
Random Forest: Trained n_estimators using bootstrap from 30 to
2,000 Trees using AUC as output; Optimized at 1,000 trees
Randal S. Goomer, PhD
#### the above shows stats when using all features (auto and
none), sqrt of features numbers (100 features == 10 used),
90%, or 20%.
Random Forest: optimized max-features options, using bootstrap from
using AUC as output; ‘auto’, sqrt, log2, 0.9, 0.2 used.
Randal S. Goomer, PhD
Random Forest: optimized min-sample option, using AUC as output.
Randal S. Goomer, PhD
K-means Clustering with Silhouette Score
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300, n_clusters=10, n_init=10, n_jobs=1,
precompute_distances='auto', random_state=42, tol=0.0001, verbose=0) , here, white plus signs represent the centroids
Randal S. Goomer, PhD
K-means Clusters with grid_cv
Randal S. Goomer, PhD
Randal S. Goomer, PhD
Future:
Ensemble technique using
pipeline that includes RF,
logReg and GBM
for manuscript preparation
and publication.
Randal S. Goomer, PhD

Más contenido relacionado

Similar a randal_project_business_intelligence

predictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdfpredictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdfDasariSeshadri
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxkumari36
 
Quantitative Cancer Image Analysis
Quantitative Cancer Image AnalysisQuantitative Cancer Image Analysis
Quantitative Cancer Image AnalysisWookjin Choi
 
Societal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackSocietal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackStealth Project
 
Technology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic AnalysesTechnology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic Analysesevadew1
 
Grand Rounds: Univ of Chicago Cardiology
Grand Rounds: Univ of Chicago CardiologyGrand Rounds: Univ of Chicago Cardiology
Grand Rounds: Univ of Chicago CardiologyRobert Poston
 
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016evadew1
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxAkshay Sahatpure
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxAkshay Sahatpure
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxAkshay Sahatpure
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxAkshay Sahatpure
 
advanced diagnostic aids in periodontics
advanced diagnostic aids in periodonticsadvanced diagnostic aids in periodontics
advanced diagnostic aids in periodonticsMehul Shinde
 
An introduction to The Cancer Imaging Archive (Hands on)
An introduction to The Cancer Imaging Archive (Hands on)An introduction to The Cancer Imaging Archive (Hands on)
An introduction to The Cancer Imaging Archive (Hands on)CancerImagingInforma
 
Polycrystech technologies 1.5
Polycrystech technologies 1.5Polycrystech technologies 1.5
Polycrystech technologies 1.5Hai Le
 

Similar a randal_project_business_intelligence (20)

predictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdfpredictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdf
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptx
 
Presentation
PresentationPresentation
Presentation
 
Quantitative Cancer Image Analysis
Quantitative Cancer Image AnalysisQuantitative Cancer Image Analysis
Quantitative Cancer Image Analysis
 
Societal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackSocietal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data Stack
 
Technology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic AnalysesTechnology Assessment, Outcomes Research and Economic Analyses
Technology Assessment, Outcomes Research and Economic Analyses
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Grand Rounds: Univ of Chicago Cardiology
Grand Rounds: Univ of Chicago CardiologyGrand Rounds: Univ of Chicago Cardiology
Grand Rounds: Univ of Chicago Cardiology
 
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
Technology Assessment/Outcome & Cost-Effectiveness Analysis 2016
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
 
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptxfdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
fdocuments.in_cephalometric-analysis-natarajan_Akshay.pptx
 
Shape aha 2005
Shape   aha 2005Shape   aha 2005
Shape aha 2005
 
advanced diagnostic aids in periodontics
advanced diagnostic aids in periodonticsadvanced diagnostic aids in periodontics
advanced diagnostic aids in periodontics
 
An introduction to The Cancer Imaging Archive (Hands on)
An introduction to The Cancer Imaging Archive (Hands on)An introduction to The Cancer Imaging Archive (Hands on)
An introduction to The Cancer Imaging Archive (Hands on)
 
Cephalometric analysis natarajan
Cephalometric analysis natarajanCephalometric analysis natarajan
Cephalometric analysis natarajan
 
Data Leveraging
Data Leveraging Data Leveraging
Data Leveraging
 
Polycrystech technologies 1.5
Polycrystech technologies 1.5Polycrystech technologies 1.5
Polycrystech technologies 1.5
 
PCI & AimRadial 2018 | FFR-CT - Colin Berry
PCI & AimRadial 2018 | FFR-CT - Colin BerryPCI & AimRadial 2018 | FFR-CT - Colin Berry
PCI & AimRadial 2018 | FFR-CT - Colin Berry
 

randal_project_business_intelligence

  • 1. Cost of Medical Care in Older Adults with Chronic Conditions Randal S. Goomer, PhD Data Science DS-SF-29 General Assembly San Francisco, 1/25/2017 Randal S. Goomer, PhD
  • 2. DataSet Medicare 2010 Patient records Expunged of all personal or identifiable information (33M patient profiles) Provided by CMS As: 2010 Chronic Conditions Public Use File (PUF) (2010 CMS CC PUF) Patient Age Categories: 1 – 6 1. 62-64 2. 65-69 3. 70-74 4. 75-79 5. 80-85 6. 85---- Therapy Behavior: • Number of Out- Patient Visits • Number of In- Patient admits • Medicare Part A, B, C, D, E payments Chronic Conditions  CC • Alzhiemers • Cancer • CHF • Diabetes • ChrKidneyDisease • Stroke • Osteoporosis • Depression • COPD • Ischemic Heart Condition • Stroke • Arthritis Patient Gender Cost (Payouts by Medicare) Randal S. Goomer, PhD
  • 3. Dataset Includes a 4-page Dictionary of Terms (Sample: page 1 of 4) Randal S. Goomer, PhD
  • 4. Can we predict costs based on patient profile or behavior? BI Questions • Does the type of chronic conditions (CC) impact Costs? • Does age or gender impact cost? • Does patient behavior such as accessing OP facilities vs. IP admits influence costs? • Which behavior costs more of less? • Which CC cost more or less? Ho: <<we cannot predict cost from patient profile and behavior>> H1: <<Patient profile and behavior can predict cost>> Randal S. Goomer, PhD
  • 5. ML models • Data Munging (pd, np) • Random Forest • RF optimized by bootstrapping (with replacement) • and OOB error testing against ROC/AUC • OLS (p-val, r-squared, coeff., predict accuracy) • Logit regression (coeff_, curve_fitting, Predict Prob.) • K-means classification (with K_fold-grid_CV optimization) • Visualizations (Seaborn, MatplotLib) Randal S. Goomer, PhD
  • 6. Heatmap: Order of importance w.r.t. ‘payout’ or Cost: - ip_admit = (hospitalization) - op_visits = (offices/op-clinics) - CC_CHF = (Chronic Heart Failure) - CC_CANCER = (Cancer patients) - CC_ISCHMCHT = (Ischemic Heart Dis) - CC_CHRNKIDN = (Chronic kidney Dis) After Detailed Data ‘Munging’, Heatmap was produced Heatmap finds hidden correlations Chronic Kidney Dis. Osteoporosis Randal S. Goomer, PhD
  • 7. Cost v. In-Patient Admit EDA: Costs Rise Quickly for number of In-Patient Admits Payout >> In-PatientAdmits Randal S. Goomer, PhD
  • 8. EDA: Costs Plateau out quickly for number of Out-Patient visits Out-PatientVisits Payout >> Cost v. Out-Patient Visit Randal S. Goomer, PhD
  • 9. Patients with Chronic Heart Failure (CHF) by Age v. Cost Patient age and Chronic Condition contributes to Cost Payout>> (62 yrs  85+ yrs) (62 yrs  85+ yrs) CHF = TrueCHF = False Randal S. Goomer, PhD
  • 10. Age v. IP Admit and OP visitsIn-PatientAdmit In-Patient Admits Out-PatientVisits(62 yrs  85+ yrs) (62 yrs  85+ yrs)Randal S. Goomer, PhD
  • 11. Payout v. Out-Patient visits Lin-Reg Outpatient VisitsRandal S. Goomer, PhD
  • 12. Payout v. IP Admit Lin-Reg Payout IP AdmitRandal S. Goomer, PhD
  • 13. Ip_admit significantly affects cost based on p value but not linearly (r2 = 0.535) Randal S. Goomer, PhD
  • 14. Age is a predictor of cost but does not have a linear relationship due to low r2 value Randal S. Goomer, PhD
  • 15. When you have cancer then age is no longer a predictor of cost Randal S. Goomer, PhD
  • 17. IP Admit for Osteoporosis Not Needed Anymore (Because of ready availability of long acting drugs)Randal S. Goomer, PhD
  • 18. IP Admit for Chronic Heart Failure (CHF) IP Admit No Need Still Needed Randal S. Goomer, PhD
  • 19. IP Admit for Chronic Heart Failure (CHF) IP Admit Still Needed Very much Logistic regression: Log predicted probability Randal S. Goomer, PhD
  • 20. CART = Classification and Regression Tree Randal S. Goomer, PhD
  • 21. Random Forest 1 Randal S. Goomer, PhD
  • 22. Random Forest 1 Feature Importance Compared to our heatmap from above Randal S. Goomer, PhD
  • 23. Random Forest: Trained n_estimators using bootstrap from 30 to 2,000 Trees using AUC as output; Optimized at 1,000 trees Randal S. Goomer, PhD
  • 24. #### the above shows stats when using all features (auto and none), sqrt of features numbers (100 features == 10 used), 90%, or 20%. Random Forest: optimized max-features options, using bootstrap from using AUC as output; ‘auto’, sqrt, log2, 0.9, 0.2 used. Randal S. Goomer, PhD
  • 25. Random Forest: optimized min-sample option, using AUC as output. Randal S. Goomer, PhD
  • 26. K-means Clustering with Silhouette Score KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300, n_clusters=10, n_init=10, n_jobs=1, precompute_distances='auto', random_state=42, tol=0.0001, verbose=0) , here, white plus signs represent the centroids Randal S. Goomer, PhD
  • 27. K-means Clusters with grid_cv Randal S. Goomer, PhD
  • 29. Future: Ensemble technique using pipeline that includes RF, logReg and GBM for manuscript preparation and publication. Randal S. Goomer, PhD