Problem Description & Introduction
- Background: Hospitals are penalized for patients that are re-admitted less than 30 days after they are released.
- Business Objectives: To reduce or eliminate the number of patients re-admitted less than 30 days after they are released.
- Success Criteria: Identification of factors that increse the likelihood of a patient returning within 30 days.
- Business Value: The average cost in 2011 for a hospital stay was $10,000.*
- *http://www.beckershospitalreview.com/finance/11-statistics-on-average-hospital-costs-per-stay.html
2. Telling a Story with Data*
(Communicating effectively with analytics)
• Summary
• Recommendations
• Implications of Results
• Outline of Research Process
*Deloitte Review by Thomas H. Davenport
3. Problem Description & Introduction
• Background: Hospitals are penalized for patients that
are re-admitted less than 30 days after they are
released.
• Business Objectives: To reduce or eliminate the
number of patients re-admitted less than 30 days after
they are released.
• Success Criteria: Identification of factors that increse
the likelihood of a patient returning within 30 days.
• Business Value: The average cost in 2011 for a hospital
stay was $10,000.*
*http://www.beckershospitalreview.com/finance/11-statistics-on-average-hospital-costs-per-stay.html
4. Full Problem Description
Challenge #1: Predicting Hospital Readmissions
In this challenge, your team will try to predict which patients will be re-admitted to the hospital after
being discharged from a hospital stay. This is a real problem for hospitals, who don't get paid for a re-
admission if it happens within 30 days after the patient was discharged. You will be given a training data
set of several files, and will need to assemble them and train your algorithm. You are free to use any
algorithm you wish that your team codes, or combine several. On Sunday at 9:00 AM, we will release
the validation data set. You will run your algorithm and submit your predictions for who got re-admitted
from validation data set.
Data Description :
Training Set: This file (Challenge_1_Training.Set.csv) contains HIPAA compliant de-identified records of
hospital admissions. Each record contains a random and one-way scrambled unique identifier, limited
demographics (age, gender), type of admission, discharge disposition (e.g. home, to a skilled nursing
facility, home with assistance, transfer to another facility), if the person was re-admitted, and the
number of days from the relative to 30 that the re-admission occurred. The (READ_ME.doc) file contains
each field definition, as well as additional definitions for Admission Type (admission_type_id.csv),
Discharge Disposition (discharge_disposition_id.csv), and Admission Source (admission_source_id.csv)
Look-up tables for ICD-9 Diagnosis: This zipped files (version 32) from Center for Medicare Services web
site contains two tables with the ICD-9 Diagnosis (CMS32_DESC_LONG_DX.txt) and also Procedure
(CMS32_DESC_LONG_SG.txt) Codes. The ICD-9 Diagnosis tables provide a description of the numerical
Diagnosis codes contained in the Challenge_1_Training.Set.csv file. You can use this file if you want to
understand the codes and/or deepen your analysis of re-admission causes.
7. Next Steps
• Analyze those close to the 30 day threshold
– i.e. 31 to 45-60 days
• Weight Data
• Cross referencing between the 3 Diagnosis'
• Analyzing the Order of the 3 Diagnosis'
• Add more Diagnosis
• More Granular in the Diagnosis
8. Dataset
• Description: The dataset contains over 56,000
HIPPA compliant de-identified records of hospital
admissions.
• Source: Hack K-State 2016 : Data Science For
Social Good - https://zslie.github.io/
• Details: There are 50 columns, of which is the
Visit ID and Patient ID, along with 48 factors.
• Factors: The factors have varying number of
attributes, ranging from 1 to 715, so there are
~5.27x10^41 solutions.
• Factors: Descriptions below.
9. ETL
• Performed some data manipulation directly in excel,
including:
• Changed 'medical_specialy' to 'MED_SPEC_NUM'
• Changed the 3 'diag_x's to 'DIAG_CAT_X'S & converted 858 unique
diagnosis' into 33 Diagnosis Categories
• Notes are in Challenge_1_Training_Data_Conversion.xlsx file on the
"Storage" page
• Key Business Data Question Summary
• Of 56,000 hospital visits in this dB:
• 6,285 were re-admitted < 30 days - these are the instances that need
solved for
• 19,477 were also re-admitted, but after the 30 day threshold
• 30,238 were not re-admitted - there could be some insight also gleaned
from why they DID'T have to be re-admitted
10. Exploratory Analysis
• Preliminary possibilites correlated with readm2 has changed versus
readmitted
• number_emergency = 0.103321 ==> No longer is showing significant
correlation now at 0.053
• number_inpatient = 0.233149 ==> Is now the only one showing any
significant correlation at 0.162
• number_diagnoses = 0.103885 ==> No longer is showing significant
correlation now at 0.045