SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
Chapter 4: Multivariate Models
Table of Contents
4.3 REALTROMINS-Real Time Risk of Mortality Example
4.3.1 Streaming ECG Data
4.3.2 Physiologic based measures of organ function
4.3.3 Chart data
4.3.4 Developing a Common Time Hierarch
4.3.5 Multivariate Models
4.3.6 Online Interactive Reporting : COGNOS Example
4.1 Introduction
The attempt of data mining is to construct a mathematical algorithm that captures
viable representations of an existing phenomena hidden within a database. These viable
representation must be robust enough based upon their “parameter-estimation” so as to
repeat any predicted classification onto to an independent hold out sample. Different
classes of algorithms can be found with SAS-EM5.1 ranging from standard regression to
rule induction, to neural networks, to two-stage. In addition a model comparison module
exist that provides a cross comparison among multiple models based strictly based upon
ROC characteristics.
4.3 REALTROMINS-Real Time Risk of Mortality
This example attempts to differentiate arrested from surviving children admitted
to the University of North Carolina: Chapel Hill PICU (Pediatric Intensive Care Unit)
using streaming ECG data, clinical laboratory results and demographic chart data. A
sample of 10 pediatric patients were selected as the analytic set to provide this proof of
concept.
4.3.1 Streaming ECG Data was captured at 222 data points per second tagged
with a utc milli-second timestamp provided by a SpaceLabs Monitor XXXX. This data
summarizes into a varied length-varied peaked plot. To create time-domain variables, all
peaks must be found. The initial step in determining Rpeaks was to insure that marks
provided by SpaceLabs are reasonably consistent and representative of their true
location. It was discovered that these marks do not occur at consistent locations within
the ECG cycle. This issue included both extra marks and missing marks for unsteady data
segments. A multiple filtering algorithm was developed using MATLAB which checked
for these conditions and corrected them using several interpolated methods. Once
completed the next step was to calculate the HR (heart rate) based on the Rpeak spacing
using utc time. The length between Rpeaks represents the period between beats and to
convert this to a heart rate, simply divide 60 seconds/minute by this value. For example:
60 sec/min / (0.558 sec/beat) = 107.5 beats per minute (bpm)
This calulcated HR is a value located at each Rpeak. Frequency Domain variables require
an acceptable resolution while maintaining a reasonable time period length. Therefore a
128 point FFT (Fast Fourier Transformation) was chosen as the standard. This provided
64 estimated points over the positive frequency range. The length of the N-second
interval was determined by the heart rate and the 128 point FFT requirement. From the
heart rate, we determine how long it will take to acquire 128 beats. The formula for this
calculation is:
N-seconds (sec) = (128/ average HR)*60
This is set as the length of the N-second period for the current data segment. The
determined N-second amount of data is then retrieved, with the appropriate marks and
Rpeaks. This resulted segment of data may contain a few more or a few less than 128
beats. To sample 128 times consistently (evenly spaced) within this N-second segment,
the HR data is interpolated with a sample rate of 128.
An FFT is now performed on the interpolated HR data. In order to provide a spectrum
that has the 0 Hz (dc) point removed, the HR data is normalized according to the mean
(formula below)
HR normalized = (HR interpolated – HR interpolated average) / (HR interpolated average);
An FFT was then performed on that normalized and interpolated HR data (HRnorm).
The beginning and ending values of each consistent time segment and the beginning and
ending values of each time segment over which the analysis actually occurred (ie. where
the first and last point in utc time were for the HR interpolation) are written out to the HR
data file. Other columns in this file include Rpeaks, Rind, HR, HR interpolated,
HRnorm, frequency (Hz), FFT HR interpolated, and FFT HRnorm.
The data structure developed to store all of this information was multi-hierachical
and variable length. While the number of heart peak record varied about 128, the
frequency domain variable were fixed at 64 record each. Column heading were
HR data file.
Variable Name Hierarchy
A.) Actual Begin Time for the 128 beat sample epoch 1
B.) Actual End Time for the 128 sampled epoch 1
C.) Actual Rpeak Timestamp ~128
D.) Rpeak value ~128
E.) HR – raw ~128
F.) HR – Interpolated ~128
G.) HR- Normative ~128
H.) Frequency 64
I.) Heart rate Spectra Raw 64
J.) Hear Rate Spectra Interpolated 64
Time domain variables were calculated by stripping out the actual Rpeak time stamp
and then binning the distribution and calculating percentages as follows:
if nn_interval > 0 and nn_interval <= .05 then NN50=1;
if nn_interval > .05 and nn_interval <= .10 then NN100=1;
if nn_interval > .10 and nn_interval <= .20 then NN200=1;
if nn_interval > .20 and nn_interval <= .30 then NN300=1;
if nn_interval > .30 and nn_interval <= .40 then NN400=1;
if nn_interval > .40 and nn_interval <= .50 then NN500=1;
if nn_interval > .50 and nn_interval <= .60 then NN600=1;
if nn_interval > .60 and nn_interval <= .70 then NN700=1;
if nn_interval > .70 and nn_interval <= .80 then NN800=1;
if nn_interval > .80 and nn_interval <= .90 then NN900=1;
if nn_interval > .90 and nn_interval <= 1.0 then NN1000=1;
if nn_interval > 1.0 and nn_interval <= 3.0 then NN1000P=1;
pNN50=nn50/rpeak_counts;
pNN100=nn100/rpeak_counts;
pNN200=nn200/rpeak_counts;
pNN300=nn300/rpeak_counts;
pNN400=nn400/rpeak_counts;
pNN500=nn500/rpeak_counts;
pNN600=nn600/rpeak_counts;
pNN700=nn700/rpeak_counts;
pNN800=nn800/rpeak_counts;
pNN900=nn900/rpeak_counts;
pNN1000=nn1000/rpeak_counts;
pNN1000p=nn1000p/rpeak_counts;
Spectra data was extracted by stripping on the fixed 64 records for each group of records.
Frequency bands were grouped as follows using the interpolated heart rate spectra values
to calculate area within that specific band of frequencies.
if frequency >= 0 and frequency <= .003 then freq_type = '1-ULF';
if frequency >= .0031 and frequency <= .040000 then freq_type = '2-VLF';
if frequency >= .040001 and frequency <= .150 then freq_type = '3-LF';
if frequency >= .150001 then freq_type = '4-HF';
4.3.2 Physiologic based measures of organ function important in predicting
mortality were defined by a battery of 75 lab tests collected during a stay within the
PICU. These tests were collected based upon clinical need which varied by patient and
varied over time within patient. This created a concern on how to include all of this
information within a reasonable patient-to-variable ratio and handling the high expected
inter-correlation among the tests. The answer provided here was to create a series of
derogatory bio-markers. Only those tests that could identify segments with the highest
index against base mortality were considered. This was accomplished by looking within
each variables distributions and identifing intervals that index high against based-
mortality using a single factor scan procedure found within QTMS V3.2 .
Base Balance
SINGLE FACTOR SCAN RESULT QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 -15.1 - -6.2 42 40 95.238 5.751 0.0165 146 |******************
2 -6.1 - -4.1 44 40 90.909 4.407 0.0358 139 |****************
3 -4 - -2.7 42 33 78.571 1.127 0.2884 120 |*************
4 -2.6 - -1.3 42 25 59.524 0.217 0.6415 91 |******* |
5 -1.2 - 0.6 46 18 39.130 4.833 0.0279 60 |** |
6 0.7 - 1.9 45 28 62.222 0.067 0.7965 95 |********|
7 2 - 3.4 44 28 63.636 0.019 0.8894 97 |*********
8 3.5 - 5.4 43 29 67.442 0.029 0.8640 103 |**********
9 5.5 - 8.2 43 15 34.884 6.101 0.0135 53 |* |
10 8.5 - 14.2 33 21 63.636 0.014 0.9042 97 |*********
------------------------------------------------------------------------------------------------------------------------|----------
424 277 65.330 22.565 0.0072 100
O2Sat - ART (meas)
SINGLE FACTOR SCAN RESULT QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 37.4 - 68.2 42 37 88.095 2.747 0.0974 131 |***************
2 68.8 - 76 42 36 85.714 2.158 0.1418 128 |**************
3 76.1 - 84.2 42 41 97.619 5.812 0.0159 145 |******************
4 84.3 - 93.7 42 33 78.571 0.818 0.3659 117 |************
5 94 - 97.5 47 16 34.043 7.668 0.0056 51 |* |
6 97.6 - 98.2 46 23 50.000 2.013 0.1560 74 |***** |
7 98.3 - 98.6 42 24 57.143 0.625 0.4291 85 |******* |
8 98.7 - 99.1 46 18 39.130 5.375 0.0204 58 |** |
9 99.2 - 99.6 49 38 77.551 0.791 0.3738 116 |************
10 99.7 - 100 25 18 72.000 0.088 0.7668 107 |***********
------------------------------------------------------------------------------------------------------------------------|----------
423 284 67.139 28.095 0.0009 100
Sodium
SINGLE FACTOR SCAN RESULT QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 4 1 25.000 0.993 0.3190 38 |* |
2 121 - 134 83 34 40.964 7.499 0.0062 63 |**** |
3 135 - 136 84 49 58.333 0.615 0.4331 89 |******** |
4 137 - 138 115 60 52.174 3.010 0.0827 80 |******* |
5 139 - 140 106 39 36.792 13.150 0.0003 56 |*** |
6 141 - 143 91 63 69.231 0.222 0.6376 106 |***********
7 144 - 149 79 72 91.139 8.121 0.0044 140 |***************
8 150 - 157 83 83 100.000 15.369 0.0001 153 |******************
9 158 - 165 57 57 100.000 10.555 0.0012 153 |******************
-------------------------------------------------------------------------------------------------------------------------|---------
702 458 65.242 59.534 0.0000 100
Hemoglobin
SINGLE FACTOR SCAN RESULT QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 6.2 - 8.5 60 48 80.000 1.673 0.1958 120 |************
2 8.6 - 9.2 62 47 75.806 0.828 0.3629 114 |**********
3 9.3 - 9.8 61 33 54.098 1.389 0.2386 81 |**** |
4 9.9 - 10.4 67 30 44.776 4.715 0.0299 67 |* |
5 10.5 - 11.3 66 28 42.424 5.711 0.0169 64 |* |
6 11.4 - 12.2 65 36 55.385 1.186 0.2761 83 |**** |
7 12.3 - 12.9 71 49 69.014 0.074 0.7863 104 |********
8 13 - 13.8 62 47 75.806 0.828 0.3629 114 |**********
9 13.9 - 15.7 62 54 87.097 4.003 0.0454 131 |**************
10 15.8 - 19.2 31 31 100.000 5.274 0.0216 151 |******************
-----------------------------------------------------------------------------------------------------------------------|-----------
607 403 66.392 25.680 0.0023 100
Glucose
SINGLE FACTOR SCAN RESULT QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 5 1 20.000 1.497 0.2211 31 |* |
2 30 - 82 76 57 75.000 1.534 0.2155 118 |*****************
3 83 - 91 83 64 77.108 2.357 0.1248 121 |******************
4 92 - 101 87 60 68.966 0.384 0.5354 108 |***************
5 102 - 110 77 48 62.338 0.021 0.8841 98 |*************
6 111 - 119 73 34 46.575 3.348 0.0673 73 |******** |
7 120 - 133 74 43 58.108 0.359 0.5492 91 |************|
8 134 - 159 75 48 64.000 0.001 0.9709 101 |**************
9 161 - 240 74 42 56.757 0.555 0.4565 89 |*********** |
10 241 - 453 42 27 64.286 0.003 0.9597 101 |**************
----------------------------------------------------------------------------------------------------------------------------|------
666 424 63.664 10.059 0.3457 100
4.3.3 Chart data defined as information taken at the point of addmisssion. Again,
these tests were collected based upon clinical need which varied by patient and varied
over time within patient. We employed the same derogatory biomarker methodology as
described for the battery of lab tests.
SINGLE FACTOR SCAN EPI QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 97 48 49.485 0.513 0.4738 90 |**** |
2 0 1189 416 34.987 85.678 0.0000 64 |* |
3 1 551 544 98.730 193.147 0.0000 180 |******************
---------------------------------------------------------------------------------------------------------------------|-------------
1837 1008 54.872 279.338 0.0000 100
SINGLE FACTOR SCAN FIO2 QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 103 53 51.456 0.219 0.6398 94 |*******|
2 0.21 522 256 49.042 3.233 0.0722 89 |*******|
3 0.22 - 0.28 223 66 29.596 25.963 0.0000 54 |* |
4 0.3 - 0.32 194 86 44.330 3.929 0.0475 81 |***** |
5 0.34 - 0.35 290 205 70.690 13.223 0.0003 129 |*************
6 0.4 203 104 51.232 0.490 0.4838 93 |*******|
7 0.45 - 0.58 193 162 83.938 29.715 0.0000 153 |******************
8 0.6 - 1 109 76 69.725 4.382 0.0363 127 |*************
-----------------------------------------------------------------------------------------------------------------------|-----------
1837 1008 54.872 81.155 0.0000 100
SINGLE FACTOR SCAN GCS QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 146 80 54.795 0.000 0.9899 100 |******
2 3 470 442 94.043 131.421 0.0000 171 |******************
3 4 - 7 237 84 35.443 16.304 0.0001 65 |* |
4 8 - 9 223 126 56.502 0.108 0.7424 103 |*******
5 10 - 11 428 150 35.047 30.657 0.0000 64 |* |
6 12 - 15 333 126 37.838 17.609 0.0000 69 |* |
---------------------------------------------------------------------------------------------------------------------|-------------
1837 1008 54.872 196.100 0.0000 100
SINGLE FACTOR SCAN OTHER QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 97 48 49.485 0.513 0.4738 90 |* |
2 0 1348 662 49.110 8.157 0.0043 89 |* |
3 1 392 298 76.020 31.951 0.0000 139 |******************
-------------------------------------------------------------------------------------------------------------------|---------------
1837 1008 54.872 40.621 0.0000 100
SINGLE FACTOR SCAN PUPILS QTMS
TYPE IS "CONTINUOUS" V3.2
NO.OF NO.OF RESPONSE RESPONSE
# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX
-----------------------------------------------------------------------------------------------------------------------------------
1 . 116 53 45.690 1.782 0.1818 83 |* |
2 0 1406 640 45.519 22.414 0.0000 83 |* |
3 1 - 2 315 315 100.000 116.910 0.0000 182 |******************
------------------------------------------------------------------------------------------------------------------|----------------
1837 1008 54.872 141.106 0.0000 100
4.3.4 – Developing a Common Time Hierarchay. This study was an accumulation
of patient data was collected during the normal course of running a PICU. The units of
time for each grouping of variables was different. These units for the time domain
variables were in sub-second differencing between Rpeaks, while the units for the spectra
domain variables were in groups of 128 beats or approximately 2 minutes of clock
time.Finally the units for Lab and Chart data was intermittent but logged at the actual
clock time taken. To join these 4 groups of data, it was determined to put all information
into a common time frame 2 minutes. This was accomplished using SAS Proc Expand
which can combine time series with different frequencies using various interplotative
methods that can be used to convert raw da into a higher frequency series or aggregate
down to a lower frequency series. Data from all sources were re-calibrated into 2 minute
records
We conclude with data from four sources calibrated into 2 minute epochs for all patients.
We seperated the first four hours of data from each patient file and identified the final
outcome unto each record which resulted in an analytic data set of 1080 records (600 live
packets and 480 dead packets). This represents a summarization of over 125,000 heart
beats using the 10 patients first four hours appended with derogatory bio markers from
lab and chart sources.
4.3.5 Multivariate Models –This file was analyzed using SAS Enterprise Miner
5.1 as diagrammed below. This inlcuded creating a 20% hold out sample as well as
implementing 8 different modelling approaches. The summary of each for both training
and validation samples is provided below. Based upon the ability of the hold out sample
to replicate the ROC profile from the traning sample as well as the over sensitvity, it was
conlcuded that the regression model faired best.
dead live fp fn tp tn %miss %FalseAlarm
Reg TRAIN 384 480 23 4 380 457 1.04% 4.79%
Reg VALIDATE 96 120 7 3 93 113 3.13% 5.83%
DmineReg TRAIN 384 480 24 9 375 456 2.34% 5.00%
DmineReg VALIDATE 96 120 10 7 89 110 7.29% 8.33%
Tree TRAIN 384 480 16 14 370 464 3.65% 3.33%
Tree VALIDATE 96 120 4 8 88 116 8.33% 3.33%
Rule TRAIN 384 480 9 19 365 471 4.95% 1.88%
Rule VALIDATE 96 120 3 10 86 117 10.42% 2.50%
Neural TRAIN 384 480 23 3 381 457 0.78% 4.79%
Neural VALIDATE 96 120 7 4 92 113 4.17% 5.83%
AutoNeural TRAIN 384 480 179 1 383 301 0.26% 37.29%
AutoNeural VALIDATE 96 120 47 0 96 73 0.00% 39.17%
DMNeural TRAIN 384 480 55 0 384 425 0.00% 11.46%
DMNeural VALIDATE 96 120 16 0 96 104 0.00% 13.33%
MBR TRAIN 384 480 26 6 378 454 1.56% 5.42%
MBR VALIDATE 96 120 9 5 91 111 5.21% 7.50%
4.3.6 Online Interactive Reporting : COGNOS Example. This test model can be
accessed on-line and used in realtime. This could allow non sampled patients to be scored
every 2 minutes with a resultant odds of mortality update.

Más contenido relacionado

Similar a Results for Keith

Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...
Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...
Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...
Ehsan Izadi
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detection
Iaetsd Iaetsd
 
St variability assessment based on complexity factor using independent compon...
St variability assessment based on complexity factor using independent compon...St variability assessment based on complexity factor using independent compon...
St variability assessment based on complexity factor using independent compon...
eSAT Journals
 
Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...
Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...
Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...
eSAT Journals
 

Similar a Results for Keith (20)

ECG_based_Biometric_Recognition_using_Wa.pdf
ECG_based_Biometric_Recognition_using_Wa.pdfECG_based_Biometric_Recognition_using_Wa.pdf
ECG_based_Biometric_Recognition_using_Wa.pdf
 
Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...
Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...
Real-Time Detection of Fatal Ventricular Dysrhythmias for Automated External ...
 
Overview
OverviewOverview
Overview
 
Analysis of RS-Segment to Evaluate the Effect of Ventricular Depolarization d...
Analysis of RS-Segment to Evaluate the Effect of Ventricular Depolarization d...Analysis of RS-Segment to Evaluate the Effect of Ventricular Depolarization d...
Analysis of RS-Segment to Evaluate the Effect of Ventricular Depolarization d...
 
multiscale_tutorial.pdf
multiscale_tutorial.pdfmultiscale_tutorial.pdf
multiscale_tutorial.pdf
 
Iaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detectionIaetsd a review on ecg arrhythmia detection
Iaetsd a review on ecg arrhythmia detection
 
Biomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptxBiomedical Signals Classification With Transformer Based Model.pptx
Biomedical Signals Classification With Transformer Based Model.pptx
 
An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...
An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...
An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...
 
An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...
An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...
An Improved Energy Efficiency Algorithm in Wireless Sensor Network Using Quer...
 
IRJET- Prediction and Classification of Cardiac Arrhythmia
IRJET- Prediction and Classification of Cardiac ArrhythmiaIRJET- Prediction and Classification of Cardiac Arrhythmia
IRJET- Prediction and Classification of Cardiac Arrhythmia
 
Automatic ECG signal denoising and arrhythmia classification using deep learning
Automatic ECG signal denoising and arrhythmia classification using deep learningAutomatic ECG signal denoising and arrhythmia classification using deep learning
Automatic ECG signal denoising and arrhythmia classification using deep learning
 
St variability assessment based on complexity factor using independent compon...
St variability assessment based on complexity factor using independent compon...St variability assessment based on complexity factor using independent compon...
St variability assessment based on complexity factor using independent compon...
 
Ojchd.000546
Ojchd.000546Ojchd.000546
Ojchd.000546
 
Robust System for Patient Specific Classification of ECG Signal using PCA and...
Robust System for Patient Specific Classification of ECG Signal using PCA and...Robust System for Patient Specific Classification of ECG Signal using PCA and...
Robust System for Patient Specific Classification of ECG Signal using PCA and...
 
IRJET- R–Peak Detection of ECG Signal using Thresholding Method
IRJET- R–Peak Detection of ECG Signal using Thresholding MethodIRJET- R–Peak Detection of ECG Signal using Thresholding Method
IRJET- R–Peak Detection of ECG Signal using Thresholding Method
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False Positives
 
Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...
Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...
Transfer entropy estimation supplements time domain beat-to-beat baroreflex s...
 
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
IRJET- A Survey on Classification and identification of Arrhythmia using Mach...
 
Classification and Detection of ECG-signals using Artificial Neural Networks
Classification and Detection of ECG-signals using Artificial Neural NetworksClassification and Detection of ECG-signals using Artificial Neural Networks
Classification and Detection of ECG-signals using Artificial Neural Networks
 
Eli plots visualizing innumerable number of correlations
Eli plots   visualizing innumerable number of correlationsEli plots   visualizing innumerable number of correlations
Eli plots visualizing innumerable number of correlations
 

Más de Daniel Kocis Ph.D. - Chair (12)

Channel Strategy1
Channel Strategy1Channel Strategy1
Channel Strategy1
 
Case Conditions
Case ConditionsCase Conditions
Case Conditions
 
How_to_maximize_the_number_of_spots
How_to_maximize_the_number_of_spotsHow_to_maximize_the_number_of_spots
How_to_maximize_the_number_of_spots
 
AD1
AD1AD1
AD1
 
Consumer Models
Consumer ModelsConsumer Models
Consumer Models
 
BuildHistoryfinal
BuildHistoryfinalBuildHistoryfinal
BuildHistoryfinal
 
Weekly Forecasts
Weekly ForecastsWeekly Forecasts
Weekly Forecasts
 
QTMS-EM-Combinatorical Model
QTMS-EM-Combinatorical ModelQTMS-EM-Combinatorical Model
QTMS-EM-Combinatorical Model
 
Brands Analysis
Brands AnalysisBrands Analysis
Brands Analysis
 
CCAR - Kocis
CCAR - KocisCCAR - Kocis
CCAR - Kocis
 
Pharma
PharmaPharma
Pharma
 
Media
MediaMedia
Media
 

Results for Keith

  • 1. Chapter 4: Multivariate Models Table of Contents 4.3 REALTROMINS-Real Time Risk of Mortality Example 4.3.1 Streaming ECG Data 4.3.2 Physiologic based measures of organ function 4.3.3 Chart data 4.3.4 Developing a Common Time Hierarch 4.3.5 Multivariate Models 4.3.6 Online Interactive Reporting : COGNOS Example 4.1 Introduction The attempt of data mining is to construct a mathematical algorithm that captures viable representations of an existing phenomena hidden within a database. These viable representation must be robust enough based upon their “parameter-estimation” so as to repeat any predicted classification onto to an independent hold out sample. Different classes of algorithms can be found with SAS-EM5.1 ranging from standard regression to rule induction, to neural networks, to two-stage. In addition a model comparison module exist that provides a cross comparison among multiple models based strictly based upon ROC characteristics. 4.3 REALTROMINS-Real Time Risk of Mortality This example attempts to differentiate arrested from surviving children admitted to the University of North Carolina: Chapel Hill PICU (Pediatric Intensive Care Unit) using streaming ECG data, clinical laboratory results and demographic chart data. A sample of 10 pediatric patients were selected as the analytic set to provide this proof of concept. 4.3.1 Streaming ECG Data was captured at 222 data points per second tagged with a utc milli-second timestamp provided by a SpaceLabs Monitor XXXX. This data summarizes into a varied length-varied peaked plot. To create time-domain variables, all peaks must be found. The initial step in determining Rpeaks was to insure that marks provided by SpaceLabs are reasonably consistent and representative of their true location. It was discovered that these marks do not occur at consistent locations within the ECG cycle. This issue included both extra marks and missing marks for unsteady data segments. A multiple filtering algorithm was developed using MATLAB which checked for these conditions and corrected them using several interpolated methods. Once completed the next step was to calculate the HR (heart rate) based on the Rpeak spacing using utc time. The length between Rpeaks represents the period between beats and to convert this to a heart rate, simply divide 60 seconds/minute by this value. For example: 60 sec/min / (0.558 sec/beat) = 107.5 beats per minute (bpm)
  • 2. This calulcated HR is a value located at each Rpeak. Frequency Domain variables require an acceptable resolution while maintaining a reasonable time period length. Therefore a 128 point FFT (Fast Fourier Transformation) was chosen as the standard. This provided 64 estimated points over the positive frequency range. The length of the N-second interval was determined by the heart rate and the 128 point FFT requirement. From the heart rate, we determine how long it will take to acquire 128 beats. The formula for this calculation is: N-seconds (sec) = (128/ average HR)*60 This is set as the length of the N-second period for the current data segment. The determined N-second amount of data is then retrieved, with the appropriate marks and Rpeaks. This resulted segment of data may contain a few more or a few less than 128 beats. To sample 128 times consistently (evenly spaced) within this N-second segment, the HR data is interpolated with a sample rate of 128. An FFT is now performed on the interpolated HR data. In order to provide a spectrum that has the 0 Hz (dc) point removed, the HR data is normalized according to the mean (formula below) HR normalized = (HR interpolated – HR interpolated average) / (HR interpolated average); An FFT was then performed on that normalized and interpolated HR data (HRnorm). The beginning and ending values of each consistent time segment and the beginning and ending values of each time segment over which the analysis actually occurred (ie. where the first and last point in utc time were for the HR interpolation) are written out to the HR data file. Other columns in this file include Rpeaks, Rind, HR, HR interpolated, HRnorm, frequency (Hz), FFT HR interpolated, and FFT HRnorm. The data structure developed to store all of this information was multi-hierachical and variable length. While the number of heart peak record varied about 128, the frequency domain variable were fixed at 64 record each. Column heading were HR data file. Variable Name Hierarchy A.) Actual Begin Time for the 128 beat sample epoch 1 B.) Actual End Time for the 128 sampled epoch 1 C.) Actual Rpeak Timestamp ~128 D.) Rpeak value ~128 E.) HR – raw ~128 F.) HR – Interpolated ~128 G.) HR- Normative ~128 H.) Frequency 64 I.) Heart rate Spectra Raw 64 J.) Hear Rate Spectra Interpolated 64
  • 3. Time domain variables were calculated by stripping out the actual Rpeak time stamp and then binning the distribution and calculating percentages as follows: if nn_interval > 0 and nn_interval <= .05 then NN50=1; if nn_interval > .05 and nn_interval <= .10 then NN100=1; if nn_interval > .10 and nn_interval <= .20 then NN200=1; if nn_interval > .20 and nn_interval <= .30 then NN300=1; if nn_interval > .30 and nn_interval <= .40 then NN400=1; if nn_interval > .40 and nn_interval <= .50 then NN500=1; if nn_interval > .50 and nn_interval <= .60 then NN600=1; if nn_interval > .60 and nn_interval <= .70 then NN700=1; if nn_interval > .70 and nn_interval <= .80 then NN800=1; if nn_interval > .80 and nn_interval <= .90 then NN900=1; if nn_interval > .90 and nn_interval <= 1.0 then NN1000=1; if nn_interval > 1.0 and nn_interval <= 3.0 then NN1000P=1; pNN50=nn50/rpeak_counts; pNN100=nn100/rpeak_counts; pNN200=nn200/rpeak_counts; pNN300=nn300/rpeak_counts; pNN400=nn400/rpeak_counts; pNN500=nn500/rpeak_counts; pNN600=nn600/rpeak_counts; pNN700=nn700/rpeak_counts; pNN800=nn800/rpeak_counts; pNN900=nn900/rpeak_counts; pNN1000=nn1000/rpeak_counts; pNN1000p=nn1000p/rpeak_counts; Spectra data was extracted by stripping on the fixed 64 records for each group of records. Frequency bands were grouped as follows using the interpolated heart rate spectra values to calculate area within that specific band of frequencies. if frequency >= 0 and frequency <= .003 then freq_type = '1-ULF'; if frequency >= .0031 and frequency <= .040000 then freq_type = '2-VLF'; if frequency >= .040001 and frequency <= .150 then freq_type = '3-LF'; if frequency >= .150001 then freq_type = '4-HF'; 4.3.2 Physiologic based measures of organ function important in predicting mortality were defined by a battery of 75 lab tests collected during a stay within the PICU. These tests were collected based upon clinical need which varied by patient and varied over time within patient. This created a concern on how to include all of this information within a reasonable patient-to-variable ratio and handling the high expected inter-correlation among the tests. The answer provided here was to create a series of derogatory bio-markers. Only those tests that could identify segments with the highest index against base mortality were considered. This was accomplished by looking within each variables distributions and identifing intervals that index high against based- mortality using a single factor scan procedure found within QTMS V3.2 .
  • 4. Base Balance SINGLE FACTOR SCAN RESULT QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 -15.1 - -6.2 42 40 95.238 5.751 0.0165 146 |****************** 2 -6.1 - -4.1 44 40 90.909 4.407 0.0358 139 |**************** 3 -4 - -2.7 42 33 78.571 1.127 0.2884 120 |************* 4 -2.6 - -1.3 42 25 59.524 0.217 0.6415 91 |******* | 5 -1.2 - 0.6 46 18 39.130 4.833 0.0279 60 |** | 6 0.7 - 1.9 45 28 62.222 0.067 0.7965 95 |********| 7 2 - 3.4 44 28 63.636 0.019 0.8894 97 |********* 8 3.5 - 5.4 43 29 67.442 0.029 0.8640 103 |********** 9 5.5 - 8.2 43 15 34.884 6.101 0.0135 53 |* | 10 8.5 - 14.2 33 21 63.636 0.014 0.9042 97 |********* ------------------------------------------------------------------------------------------------------------------------|---------- 424 277 65.330 22.565 0.0072 100 O2Sat - ART (meas) SINGLE FACTOR SCAN RESULT QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 37.4 - 68.2 42 37 88.095 2.747 0.0974 131 |*************** 2 68.8 - 76 42 36 85.714 2.158 0.1418 128 |************** 3 76.1 - 84.2 42 41 97.619 5.812 0.0159 145 |****************** 4 84.3 - 93.7 42 33 78.571 0.818 0.3659 117 |************ 5 94 - 97.5 47 16 34.043 7.668 0.0056 51 |* | 6 97.6 - 98.2 46 23 50.000 2.013 0.1560 74 |***** | 7 98.3 - 98.6 42 24 57.143 0.625 0.4291 85 |******* | 8 98.7 - 99.1 46 18 39.130 5.375 0.0204 58 |** | 9 99.2 - 99.6 49 38 77.551 0.791 0.3738 116 |************ 10 99.7 - 100 25 18 72.000 0.088 0.7668 107 |*********** ------------------------------------------------------------------------------------------------------------------------|---------- 423 284 67.139 28.095 0.0009 100 Sodium SINGLE FACTOR SCAN RESULT QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 4 1 25.000 0.993 0.3190 38 |* | 2 121 - 134 83 34 40.964 7.499 0.0062 63 |**** | 3 135 - 136 84 49 58.333 0.615 0.4331 89 |******** | 4 137 - 138 115 60 52.174 3.010 0.0827 80 |******* | 5 139 - 140 106 39 36.792 13.150 0.0003 56 |*** | 6 141 - 143 91 63 69.231 0.222 0.6376 106 |*********** 7 144 - 149 79 72 91.139 8.121 0.0044 140 |*************** 8 150 - 157 83 83 100.000 15.369 0.0001 153 |****************** 9 158 - 165 57 57 100.000 10.555 0.0012 153 |****************** -------------------------------------------------------------------------------------------------------------------------|--------- 702 458 65.242 59.534 0.0000 100 Hemoglobin SINGLE FACTOR SCAN RESULT QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 6.2 - 8.5 60 48 80.000 1.673 0.1958 120 |************ 2 8.6 - 9.2 62 47 75.806 0.828 0.3629 114 |********** 3 9.3 - 9.8 61 33 54.098 1.389 0.2386 81 |**** | 4 9.9 - 10.4 67 30 44.776 4.715 0.0299 67 |* | 5 10.5 - 11.3 66 28 42.424 5.711 0.0169 64 |* | 6 11.4 - 12.2 65 36 55.385 1.186 0.2761 83 |**** | 7 12.3 - 12.9 71 49 69.014 0.074 0.7863 104 |******** 8 13 - 13.8 62 47 75.806 0.828 0.3629 114 |********** 9 13.9 - 15.7 62 54 87.097 4.003 0.0454 131 |************** 10 15.8 - 19.2 31 31 100.000 5.274 0.0216 151 |****************** -----------------------------------------------------------------------------------------------------------------------|----------- 607 403 66.392 25.680 0.0023 100 Glucose SINGLE FACTOR SCAN RESULT QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 5 1 20.000 1.497 0.2211 31 |* | 2 30 - 82 76 57 75.000 1.534 0.2155 118 |***************** 3 83 - 91 83 64 77.108 2.357 0.1248 121 |****************** 4 92 - 101 87 60 68.966 0.384 0.5354 108 |*************** 5 102 - 110 77 48 62.338 0.021 0.8841 98 |************* 6 111 - 119 73 34 46.575 3.348 0.0673 73 |******** | 7 120 - 133 74 43 58.108 0.359 0.5492 91 |************| 8 134 - 159 75 48 64.000 0.001 0.9709 101 |************** 9 161 - 240 74 42 56.757 0.555 0.4565 89 |*********** | 10 241 - 453 42 27 64.286 0.003 0.9597 101 |************** ----------------------------------------------------------------------------------------------------------------------------|------ 666 424 63.664 10.059 0.3457 100
  • 5. 4.3.3 Chart data defined as information taken at the point of addmisssion. Again, these tests were collected based upon clinical need which varied by patient and varied over time within patient. We employed the same derogatory biomarker methodology as described for the battery of lab tests. SINGLE FACTOR SCAN EPI QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 97 48 49.485 0.513 0.4738 90 |**** | 2 0 1189 416 34.987 85.678 0.0000 64 |* | 3 1 551 544 98.730 193.147 0.0000 180 |****************** ---------------------------------------------------------------------------------------------------------------------|------------- 1837 1008 54.872 279.338 0.0000 100 SINGLE FACTOR SCAN FIO2 QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 103 53 51.456 0.219 0.6398 94 |*******| 2 0.21 522 256 49.042 3.233 0.0722 89 |*******| 3 0.22 - 0.28 223 66 29.596 25.963 0.0000 54 |* | 4 0.3 - 0.32 194 86 44.330 3.929 0.0475 81 |***** | 5 0.34 - 0.35 290 205 70.690 13.223 0.0003 129 |************* 6 0.4 203 104 51.232 0.490 0.4838 93 |*******| 7 0.45 - 0.58 193 162 83.938 29.715 0.0000 153 |****************** 8 0.6 - 1 109 76 69.725 4.382 0.0363 127 |************* -----------------------------------------------------------------------------------------------------------------------|----------- 1837 1008 54.872 81.155 0.0000 100 SINGLE FACTOR SCAN GCS QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 146 80 54.795 0.000 0.9899 100 |****** 2 3 470 442 94.043 131.421 0.0000 171 |****************** 3 4 - 7 237 84 35.443 16.304 0.0001 65 |* | 4 8 - 9 223 126 56.502 0.108 0.7424 103 |******* 5 10 - 11 428 150 35.047 30.657 0.0000 64 |* | 6 12 - 15 333 126 37.838 17.609 0.0000 69 |* | ---------------------------------------------------------------------------------------------------------------------|------------- 1837 1008 54.872 196.100 0.0000 100 SINGLE FACTOR SCAN OTHER QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 97 48 49.485 0.513 0.4738 90 |* | 2 0 1348 662 49.110 8.157 0.0043 89 |* | 3 1 392 298 76.020 31.951 0.0000 139 |****************** -------------------------------------------------------------------------------------------------------------------|--------------- 1837 1008 54.872 40.621 0.0000 100 SINGLE FACTOR SCAN PUPILS QTMS TYPE IS "CONTINUOUS" V3.2 NO.OF NO.OF RESPONSE RESPONSE # INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX ----------------------------------------------------------------------------------------------------------------------------------- 1 . 116 53 45.690 1.782 0.1818 83 |* | 2 0 1406 640 45.519 22.414 0.0000 83 |* | 3 1 - 2 315 315 100.000 116.910 0.0000 182 |****************** ------------------------------------------------------------------------------------------------------------------|---------------- 1837 1008 54.872 141.106 0.0000 100
  • 6. 4.3.4 – Developing a Common Time Hierarchay. This study was an accumulation of patient data was collected during the normal course of running a PICU. The units of time for each grouping of variables was different. These units for the time domain variables were in sub-second differencing between Rpeaks, while the units for the spectra domain variables were in groups of 128 beats or approximately 2 minutes of clock time.Finally the units for Lab and Chart data was intermittent but logged at the actual clock time taken. To join these 4 groups of data, it was determined to put all information into a common time frame 2 minutes. This was accomplished using SAS Proc Expand which can combine time series with different frequencies using various interplotative methods that can be used to convert raw da into a higher frequency series or aggregate down to a lower frequency series. Data from all sources were re-calibrated into 2 minute records We conclude with data from four sources calibrated into 2 minute epochs for all patients. We seperated the first four hours of data from each patient file and identified the final outcome unto each record which resulted in an analytic data set of 1080 records (600 live packets and 480 dead packets). This represents a summarization of over 125,000 heart beats using the 10 patients first four hours appended with derogatory bio markers from lab and chart sources. 4.3.5 Multivariate Models –This file was analyzed using SAS Enterprise Miner 5.1 as diagrammed below. This inlcuded creating a 20% hold out sample as well as implementing 8 different modelling approaches. The summary of each for both training and validation samples is provided below. Based upon the ability of the hold out sample to replicate the ROC profile from the traning sample as well as the over sensitvity, it was conlcuded that the regression model faired best.
  • 7. dead live fp fn tp tn %miss %FalseAlarm Reg TRAIN 384 480 23 4 380 457 1.04% 4.79% Reg VALIDATE 96 120 7 3 93 113 3.13% 5.83% DmineReg TRAIN 384 480 24 9 375 456 2.34% 5.00% DmineReg VALIDATE 96 120 10 7 89 110 7.29% 8.33% Tree TRAIN 384 480 16 14 370 464 3.65% 3.33% Tree VALIDATE 96 120 4 8 88 116 8.33% 3.33% Rule TRAIN 384 480 9 19 365 471 4.95% 1.88% Rule VALIDATE 96 120 3 10 86 117 10.42% 2.50% Neural TRAIN 384 480 23 3 381 457 0.78% 4.79% Neural VALIDATE 96 120 7 4 92 113 4.17% 5.83% AutoNeural TRAIN 384 480 179 1 383 301 0.26% 37.29% AutoNeural VALIDATE 96 120 47 0 96 73 0.00% 39.17% DMNeural TRAIN 384 480 55 0 384 425 0.00% 11.46% DMNeural VALIDATE 96 120 16 0 96 104 0.00% 13.33% MBR TRAIN 384 480 26 6 378 454 1.56% 5.42% MBR VALIDATE 96 120 9 5 91 111 5.21% 7.50%
  • 8. 4.3.6 Online Interactive Reporting : COGNOS Example. This test model can be accessed on-line and used in realtime. This could allow non sampled patients to be scored every 2 minutes with a resultant odds of mortality update.