SlideShare una empresa de Scribd logo
1 de 265
Pattern Recognition in Medical
Images
Dr.S.Sridhar
Anna University
Introduction
•“One picture is worth more than ten thousand
words”
•Anonymous
Contents
•This lecture will cover:
– Overview of Medical Imaging
– Pattern Recognition Tasks
– Case Studies in Pattern Recognition
What is Medical Image Processing?
• MI focuses on two major tasks
– Improvement of pictorial information for human
interpretation
– Processing of image data for storage, transmission
and representation for autonomous machine
perception
•Some argument about where image processing
ends and fields such as image analysis and
computer vision start
Examples: Medicine
•Take slice from MRI scan of canine heart, and
find boundaries between types of tissue
– Image with gray levels representing tissue density
– Use a suitable filter to highlight edges
Original MRI Image of a Dog Heart Edge Detection Image
ImagestakenfromGonzalez&Woods,DigitalImageProcessing(2002)
Key Stages in Digital Image Processing
Image
Acquisition
Image
Restoration
Morphological
Processing
Segmentation
Representation
& Description
Image
Enhancement
Object
Recognition
Problem Domain
Colour Image
Processing
Image
Compression
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 7
Medical Image Systems
• The last few decades of the 20th
century has seen the development of:
– Computed Tomography (CT)
– Magnetic Resonance Imaging (MRI)
– Digital Subtraction Angiography
– Doppler Ultrasound Imaging
– Other techniques based on nuclear emission e.g:
• PET: Positron Emission Tomography
• SPECT: Single Photon Emission Computed Tomography
• Provide a valuable addition to radiologists imaging tools towards ever
more reliable detection and diagnosis of diseases.
• More recently conventional x-ray imaging is challenged by the emerging
flat panel x-ray detectors.
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 8
• General image processing whether it is applied to:
– Robotics
– Computer vision
– Medicine
– etc.
will treat:
– imaging geometry
– linear transforms
– shift invariance
– frequency domain
– digital vs continuous domains
– segmentation
– histogram analysis
– etc
that apply to any image modality and any application
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 9
• General image analysis regardless of its
application area encompasses:
– incorporation of prior knowledge
– classification of features
– matching of model to sub-images
– description of shape
– many other problems and approaches of AI...
• While these classic approaches to general
images and to general applications are
important, the special nature of medical
images and medical applications requires
special treatments.
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 10
Special nature of medical images
• Derived from
– method of acquisition
– the subject whose images are being acquired
• Ability to provide information about the volume
beneath the surface
– though surface imaging is used in some applications
• Image obtained for medical purposes almost exclusively
probe the otherwise invisible anatomy below the skin.
• Information may be from:
– 2D projection acquired by conventional radiography
– 2D slices of B-mode ultrasound
– full 3D mapping from CT, MRI, SPECT, PET and 3D
ultrasound.
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 11
difficulties/specificities
• Radiology: perspective projection maps physical points into image
space
– but, detection and classification of objects is confounded to over- and
underlying tissue (not the case in general image processing).
• Tomography: 3D images bring both complication and simplifications
– 3D topography is more complex than 2D one.
– problem associated with perspective and occlusion are gone.
• Additional limitation to image quality:
– distortion and burring associated with relatively long acquisition time
(due to anatomical motion).
– reconstruction errors associated with noise, beam hardening etc.
• All these and others account for the differences between medical and
non medical approaches to processing and analysis.
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 12
• Advantage of dealing with medical images:
– knowledge of what is and what is not normal human
anatomy.
– selective enhancement of specific organs or objects via
injection of contrast-enhancing material.
• All these differences affect the way in which images are
processed and analysed.
• Validation of medical image processing and analysis
techniques is also a major part of medical application
– validating results is always important
– the scarcity of accurate and reliable independent standards
create another challenge for medical imaging field.
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 13
Processing and Analysis
• Medical image processing
– Deals with the development of problem specific
approaches to enhancement of raw medical data for the
purposes of selective visualisation as well as further
analysis.
• Medical image analysis
– Concentrates on the development of techniques to
supplement the mostly qualitative and frequently
subjective assessment of medical images by human
experts.
– Provides a variety of new information that is quantitative,
objective and reproducible
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
14
Examples of Medical Images
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
15
Questions
• What does the image show?
• What good is it?
• How is it made?
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
16
X-ray Image
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
17
X-ray Image of Hand
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
18
What is it?
• Two X-ray views of the same hand are formed
on an single film by exposing the hand onto
half of the film while the other half is blocked
by an opaque screen.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
19
What good is it?
• A fracture of the middle finger is seen on both
views, though it is clearer on the view on the
left. This image can be used for diagnosis - to
distinguish between a sprain and a fracture,
and to choose a course of treatment.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
20
X-ray Imaging: How it works.
X-ray shadow cast by an object Strength of shadow depends on
composition and thickness.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
21
Summary: X-ray Imaging
• Oldest non-invasive imaging of internal structures
• Rapid, short exposure time, inexpensive
• Unable to distinguish between soft tissues in head,
abdomen
• Real time X-ray imaging is possible and used during
interventional procedures.
• Ionizing radiation: risk of cancer.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
22
CT (Computed Tomography)
CT Image of plane through
liver and stomach Projection image
from CT scans
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
23
What Is It?
• Computer Tomography image of section
through upper abdomen of patient prior to
abdominal surgery.
• Section shows ribs, vertebra, aorta, liver
(image left), stomach (image right) partially
filled with liquid (bottom).
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
24
What Good Is It?
• The set of CT images, from the heart down to
the coccyx, was used in planning surgery for
the alleviation of intestinal blockage.
• The surgery was successful (I’m still here).
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
25
Computer Tomography:
How It Works
Only one plane is illuminated. Source-subject motion provides added
information.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
26
Fan-Beam Computer Tomography
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
27
Summary of X-Ray CT
• Images of sectional planes (tomography) are harder
to interpret
• CT can visualize small density differences, e.g. grey
matter, white matter, and CSF. CT can detect and
diagnose disease that cannot be seen with X-ray.
• More expensive than X-ray, lower resolution.
• Ionizing radiation.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
28
Functional Magnetic Resonance Imaging
From http://www.fmri.org/
Picture naming task
Plane 3
Plane 6
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
29
What Is It?
• Two of sixteen planes through brain of subject
participating in an image-naming experiment.
• Images are superposition of anatomical scans (gray)
and functional scans (colored).
• Plane 3 shows functional activity in the visual cortex
(bottom)
• Plane 5 shows activity in the speech area ( image
right).
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
30
What Good Is It?
• This set of images is part of research on brain
function (good for publication).
• Functional imaging is used prior to brain surgery, to
identify structures such as the motor areas that
should be avoided, and focal areas for epilepsy, that
should be resectioned.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
31
MRI Signal Source
When a nuclear magnet is tilted away from the
external magnetic field it rotates (precesses) at
the Larmour frequency. For hydrogen, the
Larmour frequency is 42.6 MHz per Tesla.
H0
ω0
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
32
Detected Signal in MRI
Spinning magnetization
induces a voltage in external
coils, proportional to the size
of magnetic moment and to
the frequency.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
33
MRI Image Formation
• Magnetic field gradients cause signals from
different parts of the body to have different
frequencies.
• Signals collected with multiple gradients are
processed by computer to produce an image,
typically of a section through the body.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
34
Features of MRI
• No ionizing radiation – expected to not have any
long-term or short-term harmful effects
• Many contrast mechanisms: contrast between
tissues is determined by pulse sequences
• Can produce sectional as well as projection images.
• Slower and more expensive than X-ray
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
35
Magnetic Resonance Summary
• No ionizing radiation (safe)
• Tomography at arbitrary angle
• Many imaging modes (water, T1, T2, flow,
neural activity)
• Slow
• Expensive
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
36
Ultrasound Imaging
Twin pregnancy during week 10
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
37
What Is It?
• Ultrasound image of a woman’s abdomen
• Image shows a section through the uterus.
Two embryos in their amniotic sacs can be
seen.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
38
What Good Is It?
• This image allows a safe means for early
identification of a twin pregnancy.
• Obstetric ultrasonography can be used to
monitor high-risk pregnancies to allow
optimal treatment.
• Pre-natal scans are part of baby picture
albums.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
39
Ultrasound Scanner
• A picture is built up
from scanned lines.
• Echosonography is
intrinsically
tomographic.
• An image is acquired
in milliseconds, so that
real time imaging is
the norm.
Transducer
travel
Object
Image
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
40
Ultrasound Imaging Overview
• Imaging is in real time - used for interventional
procedures.
• Moving structures and flow (Doppler) can be seen.
Used for heart imaging.
• Ultrasound has no known harmful effects (at levels
used in clinical imaging)
• Ultrasound equipment is inexpensive
• Many anatomical regions (for example, Head) cannot
be visualized with ultrasound.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
41
Single Photon Computed Tomography
Images on left show three sections
through the heart.
A radioactive tracer, Tc99m MIBI (2-
methoxy isobutyl isonitride) is
injected and goes to healthy heart
tissue.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
42
What Is It?
• Three sectional (tomographic) images of a
living heart. Colored areas are measures of
metabolic activity of left ventricle muscle.
Areas damaged by an infarct appear dark. This
seems to be a normal heart.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
43
What Good Is It?
• Used for staging (choosing treatment before
or after a heart attack), and monitoring the
effectiveness of treatment.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
44
Radionuclide Imaging
• Basic Idea
• Collimator
• Tomography
Basic idea: A substance (drug) labeled with a radioactive isotope is
ingested. The drug goes to selective sites.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
45
Collimator
Only rays that are normal to the camera surface are
detected.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
46
SPECT
Single Photon Emission Computed Tomography. Shown here
is a three-headed tomography system. The cameras rotate
around the patient. A three-dimensional volume is imaged.
Gamma
camera
Gammacamera
Gamma
camera
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
47
Features of Radionuclide Imaging
• The image is produced from an agent that is
designed to monitor a physiological or
pathological process
– Blood flow
– Profusion
– Metabolic activity
– Tumor
– Brain receptor concentration
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
48
Fluorescence Microscopy
Image of living tissue culture
cells.
Three agents are used to form
this image. They bond to the
nucleus (blue), cytoskeleton
(green) and membrane (red).
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
49
What Is It?
• Optical microscope image of tissue culture.
• Image is formed with fluorescent light.
• Tree agents are used. They bond to
– DNA in nucleus, blue
– Cytoskeleton, green
– Lipid membranes, red
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
50
What Good Is It?
• This image seems to be a demonstration of
fluorescent agents.
• Tissue culture is used in pharmaceutical and
physiological research, to monitor the effect of drugs
at the cellular level.
• Fluorescent labeling and imaging allows in-vivo
evaluation of the location and mechanism of a drug’s
activity.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
51
Optical Imaging
• Optical imaging (visible and near infrared) is undergoing very
rapid development.
• Like radionuclide imaging, agents can be designed to bind to
almost any substrate.
• Intrinsic contrast, such as oxy- vs. deoxy-hemoglobin
differential absorption are also exploited.
• There has been a growth in new optical imaging methods.
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
52
Thoughts on Imaging
• Three entities in imaging
– Object
– Image
– Observer
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
53
Image vs. Object
• Images (and vision) are two-dimensional
– Surface images
– Projection images
– Sectional images (tomograms)
• Image eliminates data
– 3D object - 2D image
– Moving object - still image
MIPR Lecture 1
Copyright Oleh Tretiak, 2004
54
Creative Imaging
• Imaging procedures create information
– Functional MRI for the first time allows non-
invasive study of the brain
– Doppler ultrasound for the study of flow
– Agents for the study of gene expression, in-vivo
biochemistry
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 55
CT scan MRI
Same patient
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 56
MRI PET
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 57
MRI angiogram
X-ray angiograms
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 58
ultrasound
Kidney
Breast
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 59
fMRI
UCLA Brain Mapping Division
Los Angeles, CA 90095
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 60
Virtual sinus endoscopy of chronic
sinusitis.
The red structure means inflammatory
portion.
The trip starts from right nasal cavity
and goes through right maxillary sinus
and ends at right frontal sinus.
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 61
This demonstrates planning
of a stereotactic procedure
using computerized
simulation.
This shows three
alternative approaches for
a surgical removal of the
tumor.
This demonstrates
registration of vessels
derived from a phase
contrast angiogram and
anatomy derived from
double-echo MR scans.
NeuroSurgery
This animation is derived from MRI data of a patient with a glioma
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 62
Here is an example using
Visage on a data source totally
different than its original design
had anticipated. In this case
the data comes from an MR
scanner
Flow Analysis
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 63
Mammogram 1 Mammogram 2
Mammogram 1
Mammogram 2
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 67
• Contrast Stretching
To enhance low-contrast images
( )
( )




<≤+−
<≤+−
<≤
=
Lubvbu
buavau
auu
v
b
a
,
,
0
γ
β
α
a L
u
v
va
vb
b
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 68
300x180x8: x-tomography of orbital eye slice
256x228xfloat: MRI spine
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 69
– Thresholding: special case of clipping,
• and the output becomes binary
u
v
u
v
Thresholding transformations
tba == ˆ
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 70
118
128 138
64x64x8: nuclear medicine image, axial slice of heart
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 71
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 72
• Logarithmic contrast enhancement
– to brighten dark images, apply a logarithmic
colour-table.
– map the pixel values of original:
Original
Logarithmic colour table
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 73
• Exponential contrast enhancement
Original Image Exponential Map
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 74
Original Laplacian filtered:
high-pass
Sharpened:
original added to laplacian
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 75
Original Original with a grey-ramp
Rainbow colour table SApseudo colour table
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 76
Original image
Increase the image contrast
Subtract the backround image from the original image
Thresholded image
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 77
Labelled object in the image
Dr V.F. Ruiz SE1CA5 Medical Image Analysis 78
original image
Image courtesy of Alan Partin
Johns Hopkins University
binary gradient mask dilated gradient mask binary image with filled holes
cleared border image segmented image outlined original image
© Copyright 2006, Natasha Balac 79
Data Mining Tasks
• Exploratory Data Analysis
• Predictive Modeling: Classification and Regression
• Descriptive Modeling
– Cluster analysis/segmentation
• Discovering Patterns and Rules
– Association/Dependency rules
– Sequential patterns
– Temporal sequences
• Deviation detection
© Copyright 2006, Natasha Balac 80
Data Mining Tasks
 Concept/Class description: Characterization and
discrimination
 Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
 Association (correlation and causality)
 Multi-dimensional or single-dimensional association
age(X, “20-29”) ^ income(X, “60-90K”)  buys(X, “TV”)
© Copyright 2006, Natasha Balac 81
Data Mining Tasks
 Classification and Prediction
 Finding models (functions) that describe and
distinguish classes or concepts for future prediction
 Example: classify countries based on climate, or
classify cars based on gas mileage
 Presentation:
 If-THEN rules, decision-tree, classification rule,
neural network
 Prediction: Predict some unknown or missing
numerical values
© Copyright 2006, Natasha Balac 82
• Cluster analysis
– Class label is unknown: Group data to form new
classes,
• Example: cluster houses to find distribution patterns
– Clustering based on the principle: maximizing the
intra-class similarity and minimizing the interclass
similarity
Data Mining Tasks
© Copyright 2006, Natasha Balac 83
Data Mining Tasks
 Outlier analysis
 Outlier: a data object that does not comply with the
general behavior of the data
 Mostly considered as noise or exception, but is
quite useful in fraud detection, rare events analysis
 Trend and evolution analysis
 Trend and deviation: regression analysis
 Sequential pattern mining, periodicity analysis
© Copyright 2006, Natasha Balac 84
KDD Process
Database
Selection
Transformation
Data
Preparation
DataData
MiningMining
Training
Data
Evaluation,
Verification
Model,
Patterns
Data Mining
in Medicine
Medicine revolves on
Pattern Recognition, Classification, and Prediction
Diagnosis:
Recognize and classify patterns in multivariate
patient attributes
Therapy:
Select from available treatment methods; based on
effectiveness, suitability to patient, etc.
Prognosis:
Predict future outcomes based on previous
experience and present conditions
Medical Applications
• Screening
• Diagnosis
• Therapy
• Prognosis
• Monitoring
• Biomedical/Biological Analysis
• Epidemiological Studies
• Hospital Management
• Medical Instruction and Training
Medical Screening
• Effective low-cost screening using disease models that
require easily-obtained attributes:
(historical, questionnaires, simple measurements)
• Reduces demand for costly specialized tests (Good for
patients, medical staff, facilities, …)
• Examples:
- Prostate cancer using blood tests
- Hepatitis, Diabetes, Sleep apnea, etc.
Diagnosis and Classification
• Assist in decision making with a large number of inputs and
in stressful situations
• Can perform automated analysis of:
- Pathological signals (ECG, EEG, EMG)
- Medical images (mammograms, ultrasound, X-
ray, CT, and MRI)
• Examples:
- Heart attacks, Chest pains, Rheumatic disorders
- Myocardial ischemia using the ST-T ECG complex
- Coronary artery disease using SPECT images
Diagnosis and Classification ECG
Interpretation
R-R interval
S-T elevation
P-R interval
QRS duration
AVF lead
QRS amplitude SV tachycardia
Ventricular tachycardia
LV hypertrophy
RV hypertrophy
Myocardial infarction
Therapy
• Based on modeled historical performance, select
best intervention course: e.g. best
treatment plans in radiotherapy
• Using patient model, predict optimum medication
dosage: e.g. for diabetics
• Data fusion from various sensing modalities in
ICUs to assist overburdened medical staff
Prognosis
• Accurate prognosis and risk assessment are essential for
improved disease management and outcome
Examples:
– Survival analysis for AIDS patients
– Predict pre-term birth risk
– Determine cardiac surgical risk
– Predict ambulation following spinal cord injury
– Breast cancer prognosis
Biochemical/Biological Analysis
• Automate analytical tasks for:
- Analyzing blood and urine
- Tracking glucose levels
- Determining ion levels in body fluids
- Detecting pathological conditions
Epidemiological Studies
Study of health, disease, morbidity, injuries and mortality in
human communities
• Discover patterns relating outcomes to exposures
• Study independence or correlation between diseases
• Analyze public health survey data
• Example Applications:
- Assess asthma strategies in inner-city children
- Predict outbreaks in simulated populations
Hospital Management
• Optimize allocation of resources and assist in future
planning for improved services
Examples:
- Forecasting patient volume,
ambulance run volume, etc.
- Predicting length-of-stay for
incoming patients
Medical Instruction and Training
• Disease models for the instruction and assessment
of undergraduate medical and nursing students
• Intelligent tutoring systems for assisting in
teaching the decision making process
Benefits:
• Efficient screening tools reduce demand on costly
health care resources
• Data fusion from multiple sensors
• Help physicians cope with the information
overload
• Optimize allocation of hospital resources
• Better insight into medical survey data
• Computer-based training and evaluation
The KFUPM Experience
Medical Informatics Applications
• Modeling obesity (KFU)
• Modeling the educational score in school health surveys
(KFU)
• Classifying urinary stones by Cluster Analysis of ionic
composition data (KSU)
• Forecasting patient volume using Univariate Time-Series
Analysis (KFU)
• Improving classification of multiple dermatology disorders
by Problem Decomposition (Cairo University)
Modeling Obesity Using
Abductive Networks
• Waist-to-Hip Ratio (WHR) obesity risk factor modeled in
terms of 13 health parameters
• 1100 cases (800 for training, 300 for evaluation)
• Patients attending 9 primary health care clinics in 1995
in Al-Khobar
• Modeled WHR as a categorical variable and as a
continuous variable
• Analytical relationships derived from the continuous
model adequately ‘explain’ the survey data
Modeling Obesity:
Categorical WHR Model
• WHR > 0.84: Abnormal (1)
• Automatically selects most
relevant 8 inputs
Predicted
1
(250)
0
(50)
T
r
u
e
1
(249)
248 1
0
(51)
2 49
Classification Accuracy: 99%
Modeling Obesity:
Continuous WHR
- Simplified Model
• Uses only 2 variables:
Height and Diastolic Blood
Pressure
• Still reasonably accurate:
– 88% of cases had error within
± 10%
• Simple analytical input-
output relationship
• Adequately explains the
survey data
Modeling the Educational Score in
School Health Surveys
• 2720 Albanian primary school children
• Educational score modeled as an ordinal categorical
variable (1-5) in terms of 8 attributes:
region, age, gender, vision acuity, nourishment level,
parasite test, family size, parents education
• Model built using only 100 cases predicts output for
remaining 2620 cases with 100% accuracy
• A simplified model selects 3 inputs only:
- Vision acuity
- Number of children in family
- Father’s education
Classifying Urinary Stones by Cluster
Analysis of Ionic Composition Data
• Classified 214 non-infection kidney stones
into 3 groups
• 9 chemical analysis variables: Concentrations of
ions: CA, C, N, H, MG, and radicals: Urate,
Oxalate, and Phosphate
• Clustering with only the 3 radicals had 94%
agreement with an empirical classification
scheme developed previously at KSU, with the
same 3 variables
Forecasting Monthly Patient Volume at a
Primary Health Care Clinic, Al-Khobar Using
Univariate Time-Series Analysis
• Used data for 9 years to forecast volume for two years ahead
Error over forecasted 2 years: Mean = 0.55%, Max = 1.17%
1986 1994
1995
1996
1994 1995 1996
1991
Improving classification of multiple dermatology disorders by
Problem Decomposition (Cairo University)
- Improved classification accuracy from 91% to 99%
- About 50% reduction in the number of required input features
Level 1 Level 2
Standard UCI Dataset
6 classes of dermatology
disorders
34 input features
Classes split into two
categories
Classification done
sequentially at two levels
Summary
• Data mining is set to play an important role in tackling
the data overload in medical informatics
• Benefits include improved health care quality, reduced
operating costs, and better insight into medical data
• Abductive networks offer advantages over neural
networks, including faster model development and
better explanation capabilities
Classification
Classification
Classification
Features
• Loosely stated, a feature is a value describing
something about your data points (e.g. for
pixels: intensity, local gradient, distance from
landmark, etc)
• Multiple (n) features are put together to form
a feature vector, which defines a data point’s
location in n-dimensional feature space
Feature Space
• Feature Space -
– The theoretical n-dimensional space occupied by n
input raster objects (features).
– Each feature represents one dimension, and its
values represent positions along one of the
orthogonal coordinate axes in feature space.
– The set of feature values belonging to a data point
define a vector in feature space.
Statistical Notation
• Class probability distribution:
p(x,y) = p(x | y) p(y)
x: feature vector – {x1,x2,x3…,xn}
y: class
p(x | y): probabilty of x given y
p(x,y): probability of both x and y
Example: Binary Classification
Example: Binary Classification
• Two class-conditional distributions:
p(x | y = 0) p(x | y = 1)
• Priors:
p(y = 0) + p(y = 1) = 1
Modeling Class Densities
• In the text, they choose to concentrate on methods
that use Gaussians to model class densities
Modeling Class Densities
Generative Approach to Classification
1. Represent and learn the distribution:
p(x,y)
2. Use it to define probabilistic discriminant
functions
e.g.
go(x) = p(y = 0 | x)
g1(x) = p(y = 1 | x)
Generative Approach to Classification
Typical model:
p(x,y) = p(x | y) p(y)
p(x | y) = Class-conditional distributions (densities)
p(y) = Priors of classes (probability of class y)
We Want:
p(y | x) = Posteriors of classes
Class Modeling
• We model the class distributions as multivariate
Gaussians
x ~ N(μ0, Σ0) for y = 0
x ~ N(μ1, Σ1) for y = 1
• Priors are based on training data, or a distribution can
be chosen that is expected to fit the data well (e.g.
Bernoulli distribution for a coin flip)
Making a class decision
• We need to define discriminant functions ( gn(x) )
• We have two basic choices:
– Likelihood of data – choose the class (Gaussian) that best
explains the input data (x):
– Posterior of class – choose the class with a better posterior
probability:
Calculating Posteriors
• Use Bayes’ Rule:
• In this case,
)(
)()|(
)|(
BP
APABP
BAP =
Linear Decision Boundary
• When covariances are the same
Linear Decision Boundary
Linear Decision Boundary
Quadratic Decision Boundary
• When covariances are different
Quadratic Decision Boundary
Quadratic Decision Boundary
Clustering
• Basic Clustering Problem:
– Distribute data into k different groups such that data
points similar to each other are in the same group
– Similarity between points is defined in terms of some
distance metric
• Clustering is useful for:
– Similarity/Dissimilarity analysis
• Analyze what data point in the sample are close to each other
– Dimensionality Reduction
• High dimensional data replaced with a group (cluster) label
Clustering
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Grouping a set of data objects into clusters
• Clustering is unsupervised classification: no
predefined classes
• Typical applications
– to get insight into data
– as a preprocessing step
– we will use it for image segmentation
What is Clustering?
Find K clusters (or a classification that consists of K clusters) so that the objects of
one cluster are similar to each other whereas objects of different clusters are
dissimilar. (Bacher 1996)
The Goals of Clustering
• Determine the intrinsic grouping in a set of unlabeled data.
• What constitutes a good clustering?
• All clustering algorithms will produce clusters,
regardless of whether the data contains them
• There is no golden standard, depends on goal:
– data reduction
– “natural clusters”
– “useful” clusters
– outlier detection
Stages in clustering
Taxonomy of Clustering Approaches
Hierarchical Clustering
Agglomerative clustering treats each data point as a singleton cluster, and then
successively merges clusters until all points have been merged into a single
remaining cluster. Divisive clustering works the other way around.
Single link
Agglomerative Clustering
In single-link hierarchical clustering, we merge in each step the two clusters
whose two closest members have the smallest distance.
Complete link
Agglomerative Clustering
In complete-link hierarchical clustering, we merge in each step the two
clusters whose merger has the smallest diameter.
Example – Single Link AC
  BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0
What is Cluster Analysis?
• Finding groups of objects such that the objects in a group
will be similar (or related) to one another and different
from (or unrelated to) the objects in other groups
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
Notion of a Cluster can be Ambiguous
How many clusters?
Four Clusters Two Clusters 
Six Clusters 
Types of Clusters: Contiguity-Based
• Contiguous Cluster (Nearest neighbor or
Transitive)
– A cluster is a set of points such that a point in a cluster is closer (or
more similar) to one or more other points in the cluster than to any
point not in the cluster.
8 contiguous clusters
Types of Clusters: Density-Based
• Density-based
– A cluster is a dense region of points, which is separated by low-
density regions, from other regions of high density.
– Used when the clusters are irregular or intertwined, and when
noise and outliers are present.
6 density-based clusters
Euclidean Density – Cell-based
• Simplest approach is to divide region into a
number of rectangular cells of equal volume
and define density as # of points the cell
contains
Euclidean Density – Center-based
• Euclidean density is the number of points
within a specified radius of the point
Data Structures in Clustering
• Data matrix
– (two modes)
• Dissimilarity matrix
– (one mode)


















npx...nfx...n1x
...............
ipx...ifx...i1x
...............
1px...1fx...11x
















0...)2,()1,(
:::
)2,3()
...ndnd
0dd(3,1
0d(2,1)
0
Interval-valued variables
• Standardize data
– Calculate the mean squared deviation:
where
– Calculate the standardized measurement (z-score)
• Using mean absolute deviation could be more robust than
using standard deviation
.)...
21
1
nffff
xx(xnm +++=
)2||...2||2|(|1
21 fnffffff
mxmxmxns −++−+−=
f
fif
if s
mx
z
−
=
• Euclidean distance:
– Properties
• d(i,j) ≥ 0
• d(i,j) = 0 iff i=j
• d(i,j) = d(j,i)
• d(i,j) ≤ d(i,k) + d(k,j)
• Also one can use weighted distance, parametric Pearson
product moment correlation, or other disimilarity measures.
)||...|||(|),( 22
22
2
11 pp j
x
i
x
j
x
i
x
j
x
i
xjid −++−+−=
Similarity and Dissimilarity Between Objects
The set of 5 observations, measuring 3 variables,
can be described by its mean vector and covariance matrix.
The three variables, from left to right are
length, width, and height of a certain object, for example.
Each row vector Xrow
is another observation
of the three variables (or components) for row=1, …, 5.
Covariance Matrix
The mean vector consists of the means of each variable. The covariance matrix consists of
the variances of the variables along the main diagonal and the covariances between each
pair of variables in the other matrix positions.
0.025 is the variance of the length variable,
0.0075 is the covariance between the length and the width variables,
0.00175 is the covariance between the length and the height variables,
0.007 is the variance of the width variable.
where n = 5
for this example
∑
∑
=
=
−−
−
=
−−
−
=
−
=
n
row
krowkjrowjjk
n
row
rowrow
xXxX
n
s
xXxX
n
XX
n
S
1
1
))((
1
1
)')((
1
1
'
1
1
Mahalanobis Distance
T
qpqpqpsmahalanobi )()(),( 1
−∑−= −
For red points, the Euclidean distance is 14.7, Mahalanobis distance is 6.
Σ is the covariance matrix of the
input data X
∑=
−−
−
=Σ
n
i
kikjijkj XXXX
n 1
, ))((
1
1
Mahalanobis Distance
Covariance Matrix:






=Σ
3.02.0
2.03.0
B
A
C
A: (0.5, 0.5)
B: (0, 1)
C: (1.5, 1.5)
Mahal(A,B) = 5
Mahal(A,C) = 4
Cosine Similarity
• If x1
and x2
are two document vectors, then
cos( x1
, x2
) = (x1
• x2
) / ||x1
|| ||x2
|| ,
where • indicates vector dot product and || d || is the length of vector d.
• Example:
x1
= 3 2 0 5 0 0 0 2 0 0
x2
= 1 0 0 0 0 0 0 1 0 2
x1
• x2
= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||x1
|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5
= (42) 0.5
= 6.481
||x2
|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5
= (6) 0.5
= 2.245
cos( x1
, x2
) = .3150
Correlation
• Correlation measures the linear relationship
between objects
• To compute correlation, we standardize data
objects, p and q, and then take their dot
product
)(/))(( pstdpmeanpp kk −=′
)(/))(( qstdqmeanqq kk −=′
qpqpncorrelatio ′•′=),(
Visually Evaluating Correlation
Scatter plots
showing the
similarity from –
1 to 1.
K-means Clustering
• Partitional clustering approach
• Each cluster is associated with a centroid (center point)
• Each point is assigned to the cluster with the closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple
k-means Clustering
• An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj
containing Nj data points so as to minimize the
sum-of-squares criterion
2
1
|| j
K
j Sn
n
j
xJ µ−= ∑ ∑= ∈
where xn is a vector representing the nth data point and µj is the
geometric centroid of the data points in SSjj
K-means Clustering – Details
• Initial centroids are often chosen randomly.
– Clusters produced vary from one run to another.
• The centroid is (typically) the mean of the points in the cluster.
• ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc.
• K-means will converge for common distance functions.
• Most of the convergence happens in the first few iterations.
– Often the stopping condition is changed to ‘Until relatively few points change clusters’
• Complexity is O( n * K * I * d )
– n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
Two different K-means Clusterings
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Sub-optimal Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Optimal Clustering
Original Points
• Importance of choosing initial centroids
Solutions to Initial Centroids Problem
• Multiple runs
– Helps, but probability is not on your side
• Sample and use hierarchical clustering to determine initial
centroids
• Select more than k initial centroids and then select among
these initial centroids
– Select most widely separated
• Postprocessing
• Bisecting K-means
– Not as susceptible to initialization issues
Basic K-means algorithm can yield empty clusters
Handling Empty Clusters
Pre-processing and Post-processing
• Pre-processing
– Normalize the data
– Eliminate outliers
• Post-processing
– Eliminate small clusters that may represent outliers
– Split ‘loose’ clusters, i.e., clusters with relatively high SSE
– Merge clusters that are ‘close’ and that have relatively low
SSE
Bisecting K-means
• Bisecting K-means algorithm
– Variant of K-means that can produce a partitional or a hierarchical
clustering
Bisecting K-means Example
Limitations of K-means
• K-means has problems when clusters are of differing
– Sizes
– Densities
– Non-globular shapes
• K-means has problems when the data contains
outliers.
Limitations of K-means: Differing Sizes
Original Points K-means (3 Clusters)
Limitations of K-means: Differing Density
Original Points K-means (3 Clusters)
Limitations of K-means: Non-globular Shapes
Original Points K-means (2 Clusters)
Overcoming K-means Limitations
Original Points K-means Clusters
One solution is to use many clusters.
Find parts of clusters, but need to put together.
Overcoming K-means Limitations
Original Points K-means Clusters
Variations of the K-Means Method
• A few variants of the k-means which differ in
– Selection of the initial k means
– Dissimilarity calculations
– Strategies to calculate cluster means
• Handling categorical data: k-modes (Huang’98)
– Replacing means of clusters with modes
– Using new dissimilarity measures to deal with categorical objects
– Using a frequency-based method to update modes of clusters
• Handling a mixture of categorical and numerical data: k-
prototype method
The K-Medoids Clustering Method
• Find representative objects, called medoids, in clusters
• PAM (Partitioning Around Medoids, 1987)
– starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of
the resulting clustering
– PAM works effectively for small data sets, but does not scale well for
large data sets
• CLARA (Kaufmann & Rousseeuw, 1990)
– draws multiple samples of the data set, applies PAM on each sample,
and gives the best clustering as the output
• CLARANS (Ng & Han, 1994): Randomized sampling
• Focusing + spatial data structure (Ester et al., 1995)
Hierarchical Clustering
• Produces a set of nested clusters organized as a
hierarchical tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges
or splits
1 3 2 5 4 6
0
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
2
3 4
5
Strengths of Hierarchical Clustering
• Do not have to assume any particular number of
clusters
– Any desired number of clusters can be obtained by
‘cutting’ the dendogram at the proper level
• They may correspond to meaningful taxonomies
– Example in biological sciences (e.g., animal kingdom,
phylogeny reconstruction, …)
Hierarchical Clustering
• Two main types of hierarchical clustering
– Agglomerative:
• Start with the points as individual clusters
• At each step, merge the closest pair of clusters until only one cluster (or k
clusters) left
Matlab: Statistics Toolbox: clusterdata,
which performs all these steps: pdist, linkage, cluster
– Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there are
k clusters)
• Traditional hierarchical algorithms use a similarity or distance
matrix
– Merge or split one cluster at a time
– Image segmentation mostly uses simultaneous merge/split
Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Compute the proximity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
• Key operation is the computation of the proximity of two
clusters
– Different approaches to defining the distance between clusters
distinguish the different algorithms
Starting Situation
• Start with clusters of individual points and a proximity matrix
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
. Proximity Matrix
Intermediate Situation
• After some merging steps, we have some clusters
C1
C4
C2 C5
C3
C2C1
C1
C3
C5
C4
C2
C3 C4 C5
Proximity Matrix
Intermediate Situation
• We want to merge the two closest clusters (C2 and C5) and update
the proximity matrix.
C1
C4
C2 C5
C3
C2C1
C1
C3
C5
C4
C2
C3 C4 C5
Proximity Matrix
After Merging
• The question is “How do we update the proximity matrix?”
C1
C4
C2 U C5
C3
? ? ? ?
?
?
?
C2 U
C5
C1
C1
C3
C4
C2 U C5
C3 C4
Proximity Matrix
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an
objective function
– Ward’s Method uses squared error
Proximity Matrix
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an
objective function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an
objective function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an
objective function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an
objective function
– Ward’s Method uses squared error
× ×
Hierarchical Clustering: Comparison
Group Average
Ward’s Method
1
2
3
4
5
6
1
2
5
3
4
MIN MAX
1
2
3
4
5
6
1
2
5
3
4
1
2
3
4
5
6
1
2 5
3
41
2
3
4
5
6
1
2
3
4
5
Hierarchical Clustering: Time and Space
requirements
• O(N2
) space since it uses the proximity matrix.
– N is the number of points.
• O(N3
) time in many cases
– There are N steps and at each step the size, N2
, proximity
matrix must be updated and searched
– Complexity can be reduced to O(N2
log(N) ) time for some
approaches
Hierarchical Clustering: Problems and
Limitations
• Once a decision is made to combine two clusters, it
cannot be undone
Therefore, we use merge/split to segment images!
• No objective function is directly minimized
• Different schemes have problems with one or more
of the following:
– Sensitivity to noise and outliers
– Difficulty handling different sized clusters and convex
shapes
– Breaking large clusters
MST: Divisive Hierarchical Clustering
• Build MST (Minimum Spanning Tree)
– Start with a tree that consists of any point
– In successive steps, look for the closest pair of points (p, q) such that
one point (p) is in the current tree but the other (q) is not
– Add q to the tree and put an edge between p and q
MST: Divisive Hierarchical Clustering
• Use MST for constructing hierarchy of clusters
More on Hierarchical Clustering Methods
• Major weakness of agglomerative clustering methods
– do not scale well: time complexity of at least O(n2
), where n is the
number of total objects
– can never undo what was done previously
• Integration of hierarchical with distance-based clustering
– BIRCH (1996): uses CF-tree and incrementally adjusts the quality of sub-
clusters
– CURE (1998): selects well-scattered points from the cluster and then
shrinks them towards the center of the cluster by a specified fraction
– CHAMELEON (1999): hierarchical clustering using dynamic modeling
Density-Based Clustering Methods
• Clustering based on density (local cluster criterion), such as
density-connected points
• Major features:
– Discover clusters of arbitrary shape
– Handle noise
– One scan
– Need density parameters as termination condition
• Several interesting studies:
– DBSCAN: Ester, et al. (KDD’96)
– OPTICS: Ankerst, et al (SIGMOD’99).
– DENCLUE: Hinneburg & D. Keim (KDD’98)
– CLIQUE: Agrawal, et al. (SIGMOD’98)
Graph-Based Clustering
• Graph-Based clustering uses the proximity graph
– Start with the proximity matrix
– Consider each point as a node in a graph
– Each edge between two nodes has a weight which is the
proximity between the two points
– Initially the proximity graph is fully connected
– MIN (single-link) and MAX (complete-link) can be viewed
as starting with this graph
• In the simplest case, clusters are connected
components in the graph.
Graph-Based Clustering: Sparsification
• Clustering may work better
– Sparsification techniques keep the connections to the most
similar (nearest) neighbors of a point while breaking the
connections to less similar points.
– The nearest neighbors of a point tend to belong to the same
class as the point itself.
– This reduces the impact of noise and outliers and sharpens the
distinction between clusters.
• Sparsification facilitates the use of graph
partitioning algorithms (or algorithms based on
graph partitioning algorithms.
– Chameleon and Hypergraph-based Clustering
Sparsification in the Clustering Process
Cluster Validity
• For supervised classification we have a variety of measures to
evaluate how good our model is
– Accuracy, precision, recall
• For cluster analysis, the analogous question is how to
evaluate the “goodness” of the resulting clusters?
• Then why do we want to evaluate them?
– To avoid finding patterns in noise
– To compare clustering algorithms
– To compare two sets of clusters
– To compare two clusters
Clusters found in Random Data
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Random
Points
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
K-means
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
DBSCAN
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Complete
Link
• Numerical measures that are applied to judge various aspects of
cluster validity, are classified into the following three types.
– External Index: Used to measure the extent to which cluster labels
match externally supplied class labels.
• Entropy
– Internal Index: Used to measure the goodness of a clustering structure
without respect to external information.
• Sum of Squared Error (SSE)
– Relative Index: Used to compare two different clusterings or clusters.
• Often an external or internal index is used for this function, e.g., SSE or entropy
• Sometimes these are referred to as criteria instead of indices
– However, sometimes criterion is the general strategy and index is the numerical
measure that implements the criterion.
Measures of Cluster Validity
• Cluster Cohesion: Measures how closely related are objects in a
cluster
– Example: SSE
• Cluster Separation: Measure how distinct or well-separated a
cluster is from other clusters
• Example: Squared Error
– Cohesion is measured by the within cluster sum of squares (SSE)
– Separation is measured by the between cluster sum of squares
• Where |Ci| is the size of cluster i
Internal Measures: Cohesion and Separation
∑ ∑
∈
−=
i Cx
i
i
mxWSS 2
)(
∑ −=
i
ii mmCBSS 2
)(
Internal Measures: Cohesion and
Separation
• Example:
1 2 3 4 5
× ××
m1 m2
m
1091
9)35.4(2)5.13(2
1)5.45()5.44()5.12()5.11(
22
2222
=+=
=−×+−×=
=−+−+−+−=
Total
BSS
WSSK=2 clusters:
10010
0)33(4
10)35()34()32()31(
2
2222
=+=
=−×=
=−+−+−+−=
Total
BSS
WSSK=1 cluster:
• A proximity graph based approach can also be used for
cohesion and separation.
– Cluster cohesion is the sum of the weight of all links within a cluster.
– Cluster separation is the sum of the weights between nodes in the
cluster and nodes outside the cluster.
Internal Measures: Cohesion and Separation
cohesion separation
Clustering
Clustering
Distance Metrics
• Euclidean Distance, in some space (for our purposes,
probably a feature space)
• Must fulfill three properties:
Distance Metrics
• Common simple metrics:
– Euclidean:
– Manhattan:
• Both work for an arbitrary k-dimensional space
Clustering Algorithms
• k-Nearest Neighbor
• k-Means
• Parzen Windows
k-Nearest Neighbor
• In essence, a classifier
• Requires input parameter k
– In this algorithm, k indicates the number of
neighboring points to take into account when
classifying a data point
• Requires training data
k-Nearest Neighbor Algorithm
• For each data point xn, choose its class by
finding the most prominent class among the k
nearest data points in the training set
• Use any distance measure (usually a Euclidean
distance measure)
k-Nearest Neighbor Algorithm
+
+
+
+
-
-
-
-
-
-
e1
1-nearest neighbor:
the concept represented by e1
5-nearest neighbors:
q1 is classified as negative
q1
k-Nearest Neighbor
• Advantages:
– Simple
– General (can work for any distance measure you want)
• Disadvantages:
– Requires well classified training data
– Can be sensitive to k value chosen
– All attributes are used in classification, even ones that may
be irrelevant
– Inductive bias: we assume that a data point should be
classified the same as points near it
k-Means
• Suitable only when data points have
continuous values
• Groups are defined in terms of cluster centers
(means)
• Requires input parameter k
– In this algorithm, k indicates the number of
clusters to be created
• Guaranteed to converge to at least a local
optima
k-Means Algorithm
• Algorithm:
1. Randomly initialize k mean values
2. Repeat next two steps until no change in
means:
1. Partition the data using a similarity measure
according to the current means
2. Move the means to the center of the data in the
current partition
3. Stop when no change in the means
k-Means
k-Means
• Advantages:
– Simple
– General (can work for any distance measure you want)
– Requires no training phase
• Disadvantages:
– Result is very sensitive to initial mean placement
– Can perform poorly on overlapping regions
– Doesn’t work on features with non-continuous values (can’t compute
cluster means)
– Inductive bias: we assume that a data point should be classified the
same as points near it
Parzen Windows
• Similar to k-Nearest Neighbor, but instead of
using the k closest training data points, its
uses all points within a kernel (window),
weighting their contribution to the
classification based on the kernel
• As with our classification algorithms, we will
consider a gaussian kernel as the window
Parzen Windows
• Assume a region defined by a d-dimensional
Gaussian of scale σ
• We can define a window density function:
• Note that we consider all points in the training set,
but if a point is outside of the kernel, its weight will
be 0, negating its influence
∑=
−=
S
j
jSxG
S
xp
1
2
),)((
1
),( σσ

Parzen Windows
Parzen Windows
• Advantages:
– More robust than k-nearest neighbor
– Excellent accuracy and consistency
• Disadvantages:
– How to choose the size of the window?
– Alone, kernel density estimation techniques
provide little insight into data or problems
Case study: polyp detection
• Step 1: CT scan of patient
• Step 2: Segmentation of colon
Paik, et al.
Case study: polyp detection
• Step 3: detection of polyp candidates
– Hough transform (looking for spheres)
Paik, et al.
Case study: polyp detection
• Step 4: feature extraction
• Step 5: classification
– Take your pick of algorithms (SVM, ANN, etc.)
Gokturk, et al.
Case study: polyp detection
• Step 6: Flythrough colon giving information to
physician for final diagnosis (not yet realized)
Paik, et al.
Case study: polyp detection
Paik, et al.
Future…
Two categories of interest
• Applications of standard computer vision
techniques into the medical domain
– Segmentation
– Computer-Aided Detection
• New techniques from medical image analysis
added to the vision toolbox
– Multi-modal registration
Two categories of interest
• Applications of standard computer vision
techniques into the medical domain
– Segmentation
– Computer-Aided Detection
• New techniques from medical image analysis
added to the vision toolbox
– Multi-modal registration
Two categories of interest
• Applications of standard computer vision
techniques into the medical domain
– Segmentation
– Computer-Aided Detection
• New techniques from medical image analysis
added to the vision toolbox
– Multi-modal registration
Two categories of interest
• Applications of standard computer vision
techniques into the medical domain
– Segmentation
– Computer-Aided Detection
• New techniques from medical image analysis
added to the vision toolbox
– Multi-modal registration
Two categories of interest
• Applications of standard computer vision
techniques into the medical domain
– Segmentation
– Computer-Aided Detection
• New techniques from medical image analysis
added to the vision toolbox
– Multi-modal registration
Two categories of interest
• Applications of standard computer vision
techniques into the medical domain
– Segmentation
– Computer-Aided Detection
• New techniques from medical image analysis
added to the vision toolbox
– Multi-modal registration
Registration
• “The process of establishing a common,
geometric reference frame between two data
sets.”
• Previously used in vision to align satellite
images, generate image mosaics, etc.
Image 1 Image 2 Registered
+ =
Registration in medicine
• Explosion of data, both 2D and 3D from many
different imaging modalities have made
registration a very important and challenging
problem in medicine
© L. Joskowicz (HUJI)
Ref_MRI Ref_NMR
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Multi-modal registration
Registration
Preoperative Intraoperative
X-rays
US NMR
CT MRI Fluoro
CAD
Tracking
US
Open MR
Special sensors Video
Combined Data
© L. Joskowicz (HUJI)
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Feature selection
• Points-based
– 3D points calculated using an
optical tracker
• Surfaces
– Extracted from images using
segmentation algorithms
• Intensities
– Uses the raw voxel data itself
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Optimization
• Gradients
– Gradient descent
– Conjugate-gradient
– Levenburg-Marquardt
• No gradients
– Finite-difference gradient + above
– Best-neighbor search
– Nelder-Mead
– Simulated annealing
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Transformations
• Rigid (6 DOF)
– 3 rotation
– 3 translation
• Affine (12 DOF)
– 6 from before
– 3 scale
– 3 skew
• Non-rigid (? DOF)
– As many control points as
your favorite supercomputer
can handle
© T. Rohlfing (Stanford)
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Multi-modal registration
Data Set
#1
Feature
Selection
Feature
Selection
T
Similarity
Measure
Optimizer
Transform
Data Set
#2
Similarity measures
• Intra-modality
– normalized cross-correlation
– gradient correlation
– pattern intensity
– sum of squared differences
• Inter-modality
– mutual information (the industry standard)
Example: CT-DSA
Native CT image Post-contrast CT image
© T. Rohlfing (Stanford)
Example: CT-DSA
After affine registration B-spline with 10mm c.p.g.
© T. Rohlfing (Stanford)
Example: CT-DSA
After affine registration B-spline with 10mm c.p.g.
© T. Rohlfing (Stanford)
Example: Liver motion
Respiration gating
during abdominal
MR imaging
Time
© T. Rohlfing (Stanford)
Example: liver motion
© T. Rohlfing (Stanford)
Irradiate tumor (T) with a series of directed beams
avoiding critical structures (C)
Example: CyberKnife
T
C
RDRD
XX
YY
ZZ
The crux of the problem is to match up the coordinate
frames of the CT and the radiation delivery device
Example: CyberKnife
XX22
YY22
ZZ22
CTCT
XX 22YY 22
ZZ 22
CT
RDRD
XX
YY
ZZ
CTCT
XX 22YY 22
ZZ 22
Using only 2D projection images!
Example: CyberKnife
RD
CT
X 2Y 2
Z 2
X
Y
Z
CT
T1
Example: CyberKnife
Digitally
Reconstructed
Radiograph
virtual source
RDRD
XX
YY
ZZ
RDRD
XX
YY
ZZ
CT
T*
DRR
virtual source
RDRD
XX
YY
ZZ
RDRD
XX
YY
ZZ
Example: CyberKnife
Conclusions
• Medicine is a fertile and active area for
computer vision research
• Application of existing vision tools to new,
challenging domains
• Development of new vision tools to assist in
the practice of medicine
Nov. 25 - 28 257
ObjectivesObjectives
To evaluate the tissue characteristic of kidney forTo evaluate the tissue characteristic of kidney for
implementing unbiased diagnosis procedure andimplementing unbiased diagnosis procedure and
to classify important kidney ordersto classify important kidney orders
To establish a set of unconstraint features that areTo establish a set of unconstraint features that are
independent to kidney area variationsindependent to kidney area variations
Nov. 25 - 28 258
Sample US kidney ImagesSample US kidney Images
Fig.1 a. Normal image of male with age 38 years, b. Medical renal diseases image of maleFig.1 a. Normal image of male with age 38 years, b. Medical renal diseases image of male
with age 45 years and c. Cortical polycystic disease image of female with age 51 years.with age 45 years and c. Cortical polycystic disease image of female with age 51 years.
(a) (b) (c)
Nov. 25 - 28 259
Material and MethodsMaterial and Methods
• Image Data CollectionImage Data Collection
• Two types of scanning systems namely ATL HDI 5000 curvilinearTwo types of scanning systems namely ATL HDI 5000 curvilinear
probe with transducer frequency of 5 – 6 MHz and WiproGE LOGICprobe with transducer frequency of 5 – 6 MHz and WiproGE LOGIC
400 curvilinear probe with transducer frequency of 3 – 5 MHz.400 curvilinear probe with transducer frequency of 3 – 5 MHz.
• The longitudinal cross section of the kidney is taken by fixing theThe longitudinal cross section of the kidney is taken by fixing the
transducer frequency at 4 MHz.transducer frequency at 4 MHz.
• In each class 50 images are obtained. In total 150 images are pre-In each class 50 images are obtained. In total 150 images are pre-
processed before feature extraction.processed before feature extraction.
• The necessary care has been taken to preserve the shape, size andThe necessary care has been taken to preserve the shape, size and
gray-level distribution as it obliterates the sonographic content ofgray-level distribution as it obliterates the sonographic content of
information.information.
Nov. 25 - 28 260
Material and MethodsMaterial and Methods
• Image Pre-processingImage Pre-processing
Segmentation by higher orderSegmentation by higher order
spline interpolation after up-spline interpolation after up-
sampling of distributedsampling of distributed
coordinatecoordinate
Rotation to zero degree axisRotation to zero degree axis
Retaining the pixel of interestRetaining the pixel of interest
Estimation of ContentEstimation of Content
Descriptive FeaturesDescriptive FeaturesKidney CharacterizationKidney Characterization
261
Material and MethodsMaterial and Methods
• Image Pre-processingImage Pre-processing
Input USInput US
kidney imagekidney image
ii-HSIC-HSIC
segmentationsegmentation
Image rotationImage rotation
to zero degreeto zero degree
reference axisreference axis
Unbounded pixelUnbounded pixel
eliminationelimination
Nov. 25 - 28
Nov. 25 - 28 262
Material and MethodsMaterial and Methods
• Feature ExtractionFeature Extraction
• First order gray level statistical featuresFirst order gray level statistical features
• Second order gray level statistical featuresSecond order gray level statistical features
• Algebraic moment invariants featuresAlgebraic moment invariants features
• Multi-scale differential featuresMulti-scale differential features
• Power spectral featuresPower spectral features
• Dominant Gabor wavelet featuresDominant Gabor wavelet features
263
Material and MethodsMaterial and Methods
• Feature ExtractionFeature Extraction
• First order gray level statistical featuresFirst order gray level statistical features
– mean (M1), dispersion (M2), variance (M3), average energy (M4),mean (M1), dispersion (M2), variance (M3), average energy (M4),
skewness (M5), kurtosis (M6), median (M7) and mode (M8)skewness (M5), kurtosis (M6), median (M7) and mode (M8)
• Second order gray level statistical featuresSecond order gray level statistical features
– energy (E), entropy (H), correlation (C), inertia (In) and homogeneityenergy (E), entropy (H), correlation (C), inertia (In) and homogeneity
(L)(L)
• Algebraic moment invariants featuresAlgebraic moment invariants features
– eight RST invariant features фeight RST invariant features ф11, ф, ф22, ф, ф33, ф, ф44, ф, ф55, ф, ф66, ф, ф77 and фand ф55/ф/ф11
Nov. 25 - 28
264
Material and MethodsMaterial and Methods
• Feature ExtractionFeature Extraction
• Multi-scale differential featuresMulti-scale differential features
– two principal curvature features namely isophote (N) and flowline (T)two principal curvature features namely isophote (N) and flowline (T)
are computed. From these values of N and T, a set of MSDF’s are thenare computed. From these values of N and T, a set of MSDF’s are then
determined, namely, the mean (Nmean; Tmean), maximum (Nmax;determined, namely, the mean (Nmean; Tmean), maximum (Nmax;
Tmax) and minimum (Nmin; Tmin)Tmax) and minimum (Nmin; Tmin)
• Power spectral featuresPower spectral features
– six power spectral features denoted bysix power spectral features denoted by
and are estimated at the specific cut-off frequencies in the spectrum andand are estimated at the specific cut-off frequencies in the spectrum and
by considering global mean total power.by considering global mean total power.
• Dominant Gabor wavelet featuresDominant Gabor wavelet features
– Out of 30 Gabor wavelets, a unique Dominant Gabor Wavelet isOut of 30 Gabor wavelets, a unique Dominant Gabor Wavelet is
determined by estimating the similarity metrics between original anddetermined by estimating the similarity metrics between original and
reconstructed Gabor image. The Gabor features ‘μreconstructed Gabor image. The Gabor features ‘μmnmn’, ‘σ’, ‘σmnmn’ and’ and
‘AAD‘AADmnmn’ are then evaluated using Dominant Gabor Wavelet’ are then evaluated using Dominant Gabor Wavelet
1W
TΡ 2W
TΡ 1
12
R
WT −Ρ 2
12
R
WT −Ρ 3
1
R
WT d−Ρ 4
1
R
WT d−Ρ
Nov. 25 - 28
265
Decision Support System For Kidney ClassificationDecision Support System For Kidney Classification
.
.
.
.
Input featureInput feature
vectorvector
IIjj
FuzzificationFuzzification
ffjj
All 36All 36
IfIf
X≥n/X≥n/
22
YesYes
NRNR
NoNo
InitiateInitiate
OptimizedOptimized
MBPNMBPN
MRDMRD
CCCCFuzzy rulesFuzzy rules
FISFIS
Hybrid fuzzy-neural systemHybrid fuzzy-neural system
Nov. 25 - 28

Más contenido relacionado

La actualidad más candente

Roles and responsibility of Radiology professional
Roles and responsibility of Radiology professionalRoles and responsibility of Radiology professional
Roles and responsibility of Radiology professional
Self
 

La actualidad más candente (20)

Role of artificial intellegence (a.i) in radiology department nitish virmani
Role of artificial intellegence (a.i) in radiology department nitish virmaniRole of artificial intellegence (a.i) in radiology department nitish virmani
Role of artificial intellegence (a.i) in radiology department nitish virmani
 
DICOM Structure Basics
DICOM Structure BasicsDICOM Structure Basics
DICOM Structure Basics
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 
Artificial intelligence in radiology
Artificial intelligence in radiologyArtificial intelligence in radiology
Artificial intelligence in radiology
 
Chapter 4 image display
Chapter 4 image displayChapter 4 image display
Chapter 4 image display
 
A-Z of AI in Radiology
A-Z of AI in RadiologyA-Z of AI in Radiology
A-Z of AI in Radiology
 
Fusion IMAGING.pptx
Fusion IMAGING.pptxFusion IMAGING.pptx
Fusion IMAGING.pptx
 
Computed tomography
Computed tomography Computed tomography
Computed tomography
 
teleradiology
teleradiologyteleradiology
teleradiology
 
Artificial intelligence in medical image processing
Artificial intelligence in medical image processingArtificial intelligence in medical image processing
Artificial intelligence in medical image processing
 
Roles and responsibility of Radiology professional
Roles and responsibility of Radiology professionalRoles and responsibility of Radiology professional
Roles and responsibility of Radiology professional
 
Medical image processing studies
Medical image processing studiesMedical image processing studies
Medical image processing studies
 
Medical Imaging
Medical ImagingMedical Imaging
Medical Imaging
 
Radiology Information System (RIS)
Radiology Information System (RIS)Radiology Information System (RIS)
Radiology Information System (RIS)
 
Pacs introduction
Pacs introductionPacs introduction
Pacs introduction
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
CT Image reconstruction
CT Image reconstructionCT Image reconstruction
CT Image reconstruction
 
Artificial intelligence-in-radiology
Artificial intelligence-in-radiologyArtificial intelligence-in-radiology
Artificial intelligence-in-radiology
 
Brain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation pptBrain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation ppt
 
Computed tomogrphy(c
Computed tomogrphy(cComputed tomogrphy(c
Computed tomogrphy(c
 

Destacado

Destacado (6)

Automated Breast Tumour Detection in Ultrasound Images Using Support Vector M...
Automated Breast Tumour Detection in Ultrasound Images Using Support Vector M...Automated Breast Tumour Detection in Ultrasound Images Using Support Vector M...
Automated Breast Tumour Detection in Ultrasound Images Using Support Vector M...
 
Enhancing medical evidence discovery through Interactive Pattern Recognition ...
Enhancing medical evidence discovery through Interactive Pattern Recognition ...Enhancing medical evidence discovery through Interactive Pattern Recognition ...
Enhancing medical evidence discovery through Interactive Pattern Recognition ...
 
Journal of Image Processing & Pattern Recognition Progress vol 3 issue 3
Journal of Image Processing & Pattern Recognition Progress vol 3 issue 3Journal of Image Processing & Pattern Recognition Progress vol 3 issue 3
Journal of Image Processing & Pattern Recognition Progress vol 3 issue 3
 
Machine Learning for Medical Image Analysis: What, where and how?
Machine Learning for Medical Image Analysis:What, where and how?Machine Learning for Medical Image Analysis:What, where and how?
Machine Learning for Medical Image Analysis: What, where and how?
 
Medical imaging summary 1
Medical imaging summary 1Medical imaging summary 1
Medical imaging summary 1
 
Digital image processing using matlab
Digital image processing using matlab Digital image processing using matlab
Digital image processing using matlab
 

Similar a Pattern recognition in medical images

Medical imaging Seminar Session 1
Medical imaging Seminar Session 1Medical imaging Seminar Session 1
Medical imaging Seminar Session 1
Space IDEAS Hub
 

Similar a Pattern recognition in medical images (20)

L01Intro.ppt
L01Intro.pptL01Intro.ppt
L01Intro.ppt
 
L01Intro.ppt
L01Intro.pptL01Intro.ppt
L01Intro.ppt
 
approach to radiology of spinal cord.pptx
approach to radiology of spinal cord.pptxapproach to radiology of spinal cord.pptx
approach to radiology of spinal cord.pptx
 
Lec1: Medical Image Computing - Introduction
Lec1: Medical Image Computing - Introduction Lec1: Medical Image Computing - Introduction
Lec1: Medical Image Computing - Introduction
 
medical imaging esraa-multimedia-presentation.pptx
medical imaging esraa-multimedia-presentation.pptxmedical imaging esraa-multimedia-presentation.pptx
medical imaging esraa-multimedia-presentation.pptx
 
Paper presentation report
Paper presentation reportPaper presentation report
Paper presentation report
 
A Review Paper On Brain Tumor Segmentation And Detection
A Review Paper On Brain Tumor Segmentation And DetectionA Review Paper On Brain Tumor Segmentation And Detection
A Review Paper On Brain Tumor Segmentation And Detection
 
(March 13, 2024) Overview of Preclinical Small Animal and Multimodal Imaging
(March 13, 2024) Overview of Preclinical Small Animal and Multimodal Imaging(March 13, 2024) Overview of Preclinical Small Animal and Multimodal Imaging
(March 13, 2024) Overview of Preclinical Small Animal and Multimodal Imaging
 
Introduction to Magnetic resonance imaging (mri)
Introduction to Magnetic resonance imaging (mri)Introduction to Magnetic resonance imaging (mri)
Introduction to Magnetic resonance imaging (mri)
 
3D visualisation of medical images
3D visualisation of medical images3D visualisation of medical images
3D visualisation of medical images
 
Image registration and data fusion techniques.pptx latest save
Image registration and data fusion techniques.pptx latest saveImage registration and data fusion techniques.pptx latest save
Image registration and data fusion techniques.pptx latest save
 
A Survey on Segmentation Techniques Used For Brain Tumor Detection
A Survey on Segmentation Techniques Used For Brain Tumor DetectionA Survey on Segmentation Techniques Used For Brain Tumor Detection
A Survey on Segmentation Techniques Used For Brain Tumor Detection
 
Brain tumor mri image segmentation and detection
Brain tumor mri image segmentation and detectionBrain tumor mri image segmentation and detection
Brain tumor mri image segmentation and detection
 
Brain tumor classification using artificial neural network on mri images
Brain tumor classification using artificial neural network on mri imagesBrain tumor classification using artificial neural network on mri images
Brain tumor classification using artificial neural network on mri images
 
Introduction to MRI, a Short Summary
Introduction to MRI, a Short SummaryIntroduction to MRI, a Short Summary
Introduction to MRI, a Short Summary
 
Radiology_Equipment_Lec-1_Dr. Emad Taleb.pdf
Radiology_Equipment_Lec-1_Dr. Emad Taleb.pdfRadiology_Equipment_Lec-1_Dr. Emad Taleb.pdf
Radiology_Equipment_Lec-1_Dr. Emad Taleb.pdf
 
ANA 211 Imaging.pdf
ANA 211 Imaging.pdfANA 211 Imaging.pdf
ANA 211 Imaging.pdf
 
Medical imaging Seminar Session 1
Medical imaging Seminar Session 1Medical imaging Seminar Session 1
Medical imaging Seminar Session 1
 
MRI
MRIMRI
MRI
 
Medical_Image_Processing-首节课.pptx
Medical_Image_Processing-首节课.pptxMedical_Image_Processing-首节课.pptx
Medical_Image_Processing-首节课.pptx
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Pattern recognition in medical images

  • 1. Pattern Recognition in Medical Images Dr.S.Sridhar Anna University
  • 2. Introduction •“One picture is worth more than ten thousand words” •Anonymous
  • 3. Contents •This lecture will cover: – Overview of Medical Imaging – Pattern Recognition Tasks – Case Studies in Pattern Recognition
  • 4. What is Medical Image Processing? • MI focuses on two major tasks – Improvement of pictorial information for human interpretation – Processing of image data for storage, transmission and representation for autonomous machine perception •Some argument about where image processing ends and fields such as image analysis and computer vision start
  • 5. Examples: Medicine •Take slice from MRI scan of canine heart, and find boundaries between types of tissue – Image with gray levels representing tissue density – Use a suitable filter to highlight edges Original MRI Image of a Dog Heart Edge Detection Image ImagestakenfromGonzalez&Woods,DigitalImageProcessing(2002)
  • 6. Key Stages in Digital Image Processing Image Acquisition Image Restoration Morphological Processing Segmentation Representation & Description Image Enhancement Object Recognition Problem Domain Colour Image Processing Image Compression
  • 7. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 7 Medical Image Systems • The last few decades of the 20th century has seen the development of: – Computed Tomography (CT) – Magnetic Resonance Imaging (MRI) – Digital Subtraction Angiography – Doppler Ultrasound Imaging – Other techniques based on nuclear emission e.g: • PET: Positron Emission Tomography • SPECT: Single Photon Emission Computed Tomography • Provide a valuable addition to radiologists imaging tools towards ever more reliable detection and diagnosis of diseases. • More recently conventional x-ray imaging is challenged by the emerging flat panel x-ray detectors.
  • 8. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 8 • General image processing whether it is applied to: – Robotics – Computer vision – Medicine – etc. will treat: – imaging geometry – linear transforms – shift invariance – frequency domain – digital vs continuous domains – segmentation – histogram analysis – etc that apply to any image modality and any application
  • 9. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 9 • General image analysis regardless of its application area encompasses: – incorporation of prior knowledge – classification of features – matching of model to sub-images – description of shape – many other problems and approaches of AI... • While these classic approaches to general images and to general applications are important, the special nature of medical images and medical applications requires special treatments.
  • 10. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 10 Special nature of medical images • Derived from – method of acquisition – the subject whose images are being acquired • Ability to provide information about the volume beneath the surface – though surface imaging is used in some applications • Image obtained for medical purposes almost exclusively probe the otherwise invisible anatomy below the skin. • Information may be from: – 2D projection acquired by conventional radiography – 2D slices of B-mode ultrasound – full 3D mapping from CT, MRI, SPECT, PET and 3D ultrasound.
  • 11. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 11 difficulties/specificities • Radiology: perspective projection maps physical points into image space – but, detection and classification of objects is confounded to over- and underlying tissue (not the case in general image processing). • Tomography: 3D images bring both complication and simplifications – 3D topography is more complex than 2D one. – problem associated with perspective and occlusion are gone. • Additional limitation to image quality: – distortion and burring associated with relatively long acquisition time (due to anatomical motion). – reconstruction errors associated with noise, beam hardening etc. • All these and others account for the differences between medical and non medical approaches to processing and analysis.
  • 12. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 12 • Advantage of dealing with medical images: – knowledge of what is and what is not normal human anatomy. – selective enhancement of specific organs or objects via injection of contrast-enhancing material. • All these differences affect the way in which images are processed and analysed. • Validation of medical image processing and analysis techniques is also a major part of medical application – validating results is always important – the scarcity of accurate and reliable independent standards create another challenge for medical imaging field.
  • 13. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 13 Processing and Analysis • Medical image processing – Deals with the development of problem specific approaches to enhancement of raw medical data for the purposes of selective visualisation as well as further analysis. • Medical image analysis – Concentrates on the development of techniques to supplement the mostly qualitative and frequently subjective assessment of medical images by human experts. – Provides a variety of new information that is quantitative, objective and reproducible
  • 14. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 14 Examples of Medical Images
  • 15. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 15 Questions • What does the image show? • What good is it? • How is it made?
  • 16. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 16 X-ray Image
  • 17. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 17 X-ray Image of Hand
  • 18. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 18 What is it? • Two X-ray views of the same hand are formed on an single film by exposing the hand onto half of the film while the other half is blocked by an opaque screen.
  • 19. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 19 What good is it? • A fracture of the middle finger is seen on both views, though it is clearer on the view on the left. This image can be used for diagnosis - to distinguish between a sprain and a fracture, and to choose a course of treatment.
  • 20. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 20 X-ray Imaging: How it works. X-ray shadow cast by an object Strength of shadow depends on composition and thickness.
  • 21. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 21 Summary: X-ray Imaging • Oldest non-invasive imaging of internal structures • Rapid, short exposure time, inexpensive • Unable to distinguish between soft tissues in head, abdomen • Real time X-ray imaging is possible and used during interventional procedures. • Ionizing radiation: risk of cancer.
  • 22. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 22 CT (Computed Tomography) CT Image of plane through liver and stomach Projection image from CT scans
  • 23. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 23 What Is It? • Computer Tomography image of section through upper abdomen of patient prior to abdominal surgery. • Section shows ribs, vertebra, aorta, liver (image left), stomach (image right) partially filled with liquid (bottom).
  • 24. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 24 What Good Is It? • The set of CT images, from the heart down to the coccyx, was used in planning surgery for the alleviation of intestinal blockage. • The surgery was successful (I’m still here).
  • 25. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 25 Computer Tomography: How It Works Only one plane is illuminated. Source-subject motion provides added information.
  • 26. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 26 Fan-Beam Computer Tomography
  • 27. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 27 Summary of X-Ray CT • Images of sectional planes (tomography) are harder to interpret • CT can visualize small density differences, e.g. grey matter, white matter, and CSF. CT can detect and diagnose disease that cannot be seen with X-ray. • More expensive than X-ray, lower resolution. • Ionizing radiation.
  • 28. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 28 Functional Magnetic Resonance Imaging From http://www.fmri.org/ Picture naming task Plane 3 Plane 6
  • 29. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 29 What Is It? • Two of sixteen planes through brain of subject participating in an image-naming experiment. • Images are superposition of anatomical scans (gray) and functional scans (colored). • Plane 3 shows functional activity in the visual cortex (bottom) • Plane 5 shows activity in the speech area ( image right).
  • 30. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 30 What Good Is It? • This set of images is part of research on brain function (good for publication). • Functional imaging is used prior to brain surgery, to identify structures such as the motor areas that should be avoided, and focal areas for epilepsy, that should be resectioned.
  • 31. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 31 MRI Signal Source When a nuclear magnet is tilted away from the external magnetic field it rotates (precesses) at the Larmour frequency. For hydrogen, the Larmour frequency is 42.6 MHz per Tesla. H0 ω0
  • 32. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 32 Detected Signal in MRI Spinning magnetization induces a voltage in external coils, proportional to the size of magnetic moment and to the frequency.
  • 33. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 33 MRI Image Formation • Magnetic field gradients cause signals from different parts of the body to have different frequencies. • Signals collected with multiple gradients are processed by computer to produce an image, typically of a section through the body.
  • 34. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 34 Features of MRI • No ionizing radiation – expected to not have any long-term or short-term harmful effects • Many contrast mechanisms: contrast between tissues is determined by pulse sequences • Can produce sectional as well as projection images. • Slower and more expensive than X-ray
  • 35. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 35 Magnetic Resonance Summary • No ionizing radiation (safe) • Tomography at arbitrary angle • Many imaging modes (water, T1, T2, flow, neural activity) • Slow • Expensive
  • 36. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 36 Ultrasound Imaging Twin pregnancy during week 10
  • 37. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 37 What Is It? • Ultrasound image of a woman’s abdomen • Image shows a section through the uterus. Two embryos in their amniotic sacs can be seen.
  • 38. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 38 What Good Is It? • This image allows a safe means for early identification of a twin pregnancy. • Obstetric ultrasonography can be used to monitor high-risk pregnancies to allow optimal treatment. • Pre-natal scans are part of baby picture albums.
  • 39. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 39 Ultrasound Scanner • A picture is built up from scanned lines. • Echosonography is intrinsically tomographic. • An image is acquired in milliseconds, so that real time imaging is the norm. Transducer travel Object Image
  • 40. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 40 Ultrasound Imaging Overview • Imaging is in real time - used for interventional procedures. • Moving structures and flow (Doppler) can be seen. Used for heart imaging. • Ultrasound has no known harmful effects (at levels used in clinical imaging) • Ultrasound equipment is inexpensive • Many anatomical regions (for example, Head) cannot be visualized with ultrasound.
  • 41. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 41 Single Photon Computed Tomography Images on left show three sections through the heart. A radioactive tracer, Tc99m MIBI (2- methoxy isobutyl isonitride) is injected and goes to healthy heart tissue.
  • 42. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 42 What Is It? • Three sectional (tomographic) images of a living heart. Colored areas are measures of metabolic activity of left ventricle muscle. Areas damaged by an infarct appear dark. This seems to be a normal heart.
  • 43. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 43 What Good Is It? • Used for staging (choosing treatment before or after a heart attack), and monitoring the effectiveness of treatment.
  • 44. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 44 Radionuclide Imaging • Basic Idea • Collimator • Tomography Basic idea: A substance (drug) labeled with a radioactive isotope is ingested. The drug goes to selective sites.
  • 45. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 45 Collimator Only rays that are normal to the camera surface are detected.
  • 46. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 46 SPECT Single Photon Emission Computed Tomography. Shown here is a three-headed tomography system. The cameras rotate around the patient. A three-dimensional volume is imaged. Gamma camera Gammacamera Gamma camera
  • 47. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 47 Features of Radionuclide Imaging • The image is produced from an agent that is designed to monitor a physiological or pathological process – Blood flow – Profusion – Metabolic activity – Tumor – Brain receptor concentration
  • 48. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 48 Fluorescence Microscopy Image of living tissue culture cells. Three agents are used to form this image. They bond to the nucleus (blue), cytoskeleton (green) and membrane (red).
  • 49. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 49 What Is It? • Optical microscope image of tissue culture. • Image is formed with fluorescent light. • Tree agents are used. They bond to – DNA in nucleus, blue – Cytoskeleton, green – Lipid membranes, red
  • 50. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 50 What Good Is It? • This image seems to be a demonstration of fluorescent agents. • Tissue culture is used in pharmaceutical and physiological research, to monitor the effect of drugs at the cellular level. • Fluorescent labeling and imaging allows in-vivo evaluation of the location and mechanism of a drug’s activity.
  • 51. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 51 Optical Imaging • Optical imaging (visible and near infrared) is undergoing very rapid development. • Like radionuclide imaging, agents can be designed to bind to almost any substrate. • Intrinsic contrast, such as oxy- vs. deoxy-hemoglobin differential absorption are also exploited. • There has been a growth in new optical imaging methods.
  • 52. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 52 Thoughts on Imaging • Three entities in imaging – Object – Image – Observer
  • 53. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 53 Image vs. Object • Images (and vision) are two-dimensional – Surface images – Projection images – Sectional images (tomograms) • Image eliminates data – 3D object - 2D image – Moving object - still image
  • 54. MIPR Lecture 1 Copyright Oleh Tretiak, 2004 54 Creative Imaging • Imaging procedures create information – Functional MRI for the first time allows non- invasive study of the brain – Doppler ultrasound for the study of flow – Agents for the study of gene expression, in-vivo biochemistry
  • 55. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 55 CT scan MRI Same patient
  • 56. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 56 MRI PET
  • 57. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 57 MRI angiogram X-ray angiograms
  • 58. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 58 ultrasound Kidney Breast
  • 59. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 59 fMRI UCLA Brain Mapping Division Los Angeles, CA 90095
  • 60. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 60 Virtual sinus endoscopy of chronic sinusitis. The red structure means inflammatory portion. The trip starts from right nasal cavity and goes through right maxillary sinus and ends at right frontal sinus.
  • 61. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 61 This demonstrates planning of a stereotactic procedure using computerized simulation. This shows three alternative approaches for a surgical removal of the tumor. This demonstrates registration of vessels derived from a phase contrast angiogram and anatomy derived from double-echo MR scans. NeuroSurgery This animation is derived from MRI data of a patient with a glioma
  • 62. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 62 Here is an example using Visage on a data source totally different than its original design had anticipated. In this case the data comes from an MR scanner Flow Analysis
  • 63. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 63
  • 67. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 67 • Contrast Stretching To enhance low-contrast images ( ) ( )     <≤+− <≤+− <≤ = Lubvbu buavau auu v b a , , 0 γ β α a L u v va vb b
  • 68. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 68 300x180x8: x-tomography of orbital eye slice 256x228xfloat: MRI spine
  • 69. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 69 – Thresholding: special case of clipping, • and the output becomes binary u v u v Thresholding transformations tba == ˆ
  • 70. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 70 118 128 138 64x64x8: nuclear medicine image, axial slice of heart
  • 71. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 71
  • 72. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 72 • Logarithmic contrast enhancement – to brighten dark images, apply a logarithmic colour-table. – map the pixel values of original: Original Logarithmic colour table
  • 73. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 73 • Exponential contrast enhancement Original Image Exponential Map
  • 74. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 74 Original Laplacian filtered: high-pass Sharpened: original added to laplacian
  • 75. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 75 Original Original with a grey-ramp Rainbow colour table SApseudo colour table
  • 76. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 76 Original image Increase the image contrast Subtract the backround image from the original image Thresholded image
  • 77. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 77 Labelled object in the image
  • 78. Dr V.F. Ruiz SE1CA5 Medical Image Analysis 78 original image Image courtesy of Alan Partin Johns Hopkins University binary gradient mask dilated gradient mask binary image with filled holes cleared border image segmented image outlined original image
  • 79. © Copyright 2006, Natasha Balac 79 Data Mining Tasks • Exploratory Data Analysis • Predictive Modeling: Classification and Regression • Descriptive Modeling – Cluster analysis/segmentation • Discovering Patterns and Rules – Association/Dependency rules – Sequential patterns – Temporal sequences • Deviation detection
  • 80. © Copyright 2006, Natasha Balac 80 Data Mining Tasks  Concept/Class description: Characterization and discrimination  Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions  Association (correlation and causality)  Multi-dimensional or single-dimensional association age(X, “20-29”) ^ income(X, “60-90K”)  buys(X, “TV”)
  • 81. © Copyright 2006, Natasha Balac 81 Data Mining Tasks  Classification and Prediction  Finding models (functions) that describe and distinguish classes or concepts for future prediction  Example: classify countries based on climate, or classify cars based on gas mileage  Presentation:  If-THEN rules, decision-tree, classification rule, neural network  Prediction: Predict some unknown or missing numerical values
  • 82. © Copyright 2006, Natasha Balac 82 • Cluster analysis – Class label is unknown: Group data to form new classes, • Example: cluster houses to find distribution patterns – Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Data Mining Tasks
  • 83. © Copyright 2006, Natasha Balac 83 Data Mining Tasks  Outlier analysis  Outlier: a data object that does not comply with the general behavior of the data  Mostly considered as noise or exception, but is quite useful in fraud detection, rare events analysis  Trend and evolution analysis  Trend and deviation: regression analysis  Sequential pattern mining, periodicity analysis
  • 84. © Copyright 2006, Natasha Balac 84 KDD Process Database Selection Transformation Data Preparation DataData MiningMining Training Data Evaluation, Verification Model, Patterns
  • 86. Medicine revolves on Pattern Recognition, Classification, and Prediction Diagnosis: Recognize and classify patterns in multivariate patient attributes Therapy: Select from available treatment methods; based on effectiveness, suitability to patient, etc. Prognosis: Predict future outcomes based on previous experience and present conditions
  • 87. Medical Applications • Screening • Diagnosis • Therapy • Prognosis • Monitoring • Biomedical/Biological Analysis • Epidemiological Studies • Hospital Management • Medical Instruction and Training
  • 88. Medical Screening • Effective low-cost screening using disease models that require easily-obtained attributes: (historical, questionnaires, simple measurements) • Reduces demand for costly specialized tests (Good for patients, medical staff, facilities, …) • Examples: - Prostate cancer using blood tests - Hepatitis, Diabetes, Sleep apnea, etc.
  • 89. Diagnosis and Classification • Assist in decision making with a large number of inputs and in stressful situations • Can perform automated analysis of: - Pathological signals (ECG, EEG, EMG) - Medical images (mammograms, ultrasound, X- ray, CT, and MRI) • Examples: - Heart attacks, Chest pains, Rheumatic disorders - Myocardial ischemia using the ST-T ECG complex - Coronary artery disease using SPECT images
  • 90. Diagnosis and Classification ECG Interpretation R-R interval S-T elevation P-R interval QRS duration AVF lead QRS amplitude SV tachycardia Ventricular tachycardia LV hypertrophy RV hypertrophy Myocardial infarction
  • 91. Therapy • Based on modeled historical performance, select best intervention course: e.g. best treatment plans in radiotherapy • Using patient model, predict optimum medication dosage: e.g. for diabetics • Data fusion from various sensing modalities in ICUs to assist overburdened medical staff
  • 92. Prognosis • Accurate prognosis and risk assessment are essential for improved disease management and outcome Examples: – Survival analysis for AIDS patients – Predict pre-term birth risk – Determine cardiac surgical risk – Predict ambulation following spinal cord injury – Breast cancer prognosis
  • 93. Biochemical/Biological Analysis • Automate analytical tasks for: - Analyzing blood and urine - Tracking glucose levels - Determining ion levels in body fluids - Detecting pathological conditions
  • 94. Epidemiological Studies Study of health, disease, morbidity, injuries and mortality in human communities • Discover patterns relating outcomes to exposures • Study independence or correlation between diseases • Analyze public health survey data • Example Applications: - Assess asthma strategies in inner-city children - Predict outbreaks in simulated populations
  • 95. Hospital Management • Optimize allocation of resources and assist in future planning for improved services Examples: - Forecasting patient volume, ambulance run volume, etc. - Predicting length-of-stay for incoming patients
  • 96. Medical Instruction and Training • Disease models for the instruction and assessment of undergraduate medical and nursing students • Intelligent tutoring systems for assisting in teaching the decision making process
  • 97. Benefits: • Efficient screening tools reduce demand on costly health care resources • Data fusion from multiple sensors • Help physicians cope with the information overload • Optimize allocation of hospital resources • Better insight into medical survey data • Computer-based training and evaluation
  • 99. Medical Informatics Applications • Modeling obesity (KFU) • Modeling the educational score in school health surveys (KFU) • Classifying urinary stones by Cluster Analysis of ionic composition data (KSU) • Forecasting patient volume using Univariate Time-Series Analysis (KFU) • Improving classification of multiple dermatology disorders by Problem Decomposition (Cairo University)
  • 100. Modeling Obesity Using Abductive Networks • Waist-to-Hip Ratio (WHR) obesity risk factor modeled in terms of 13 health parameters • 1100 cases (800 for training, 300 for evaluation) • Patients attending 9 primary health care clinics in 1995 in Al-Khobar • Modeled WHR as a categorical variable and as a continuous variable • Analytical relationships derived from the continuous model adequately ‘explain’ the survey data
  • 101. Modeling Obesity: Categorical WHR Model • WHR > 0.84: Abnormal (1) • Automatically selects most relevant 8 inputs Predicted 1 (250) 0 (50) T r u e 1 (249) 248 1 0 (51) 2 49 Classification Accuracy: 99%
  • 102. Modeling Obesity: Continuous WHR - Simplified Model • Uses only 2 variables: Height and Diastolic Blood Pressure • Still reasonably accurate: – 88% of cases had error within ± 10% • Simple analytical input- output relationship • Adequately explains the survey data
  • 103. Modeling the Educational Score in School Health Surveys • 2720 Albanian primary school children • Educational score modeled as an ordinal categorical variable (1-5) in terms of 8 attributes: region, age, gender, vision acuity, nourishment level, parasite test, family size, parents education • Model built using only 100 cases predicts output for remaining 2620 cases with 100% accuracy • A simplified model selects 3 inputs only: - Vision acuity - Number of children in family - Father’s education
  • 104. Classifying Urinary Stones by Cluster Analysis of Ionic Composition Data • Classified 214 non-infection kidney stones into 3 groups • 9 chemical analysis variables: Concentrations of ions: CA, C, N, H, MG, and radicals: Urate, Oxalate, and Phosphate • Clustering with only the 3 radicals had 94% agreement with an empirical classification scheme developed previously at KSU, with the same 3 variables
  • 105. Forecasting Monthly Patient Volume at a Primary Health Care Clinic, Al-Khobar Using Univariate Time-Series Analysis • Used data for 9 years to forecast volume for two years ahead Error over forecasted 2 years: Mean = 0.55%, Max = 1.17% 1986 1994 1995 1996 1994 1995 1996 1991
  • 106. Improving classification of multiple dermatology disorders by Problem Decomposition (Cairo University) - Improved classification accuracy from 91% to 99% - About 50% reduction in the number of required input features Level 1 Level 2 Standard UCI Dataset 6 classes of dermatology disorders 34 input features Classes split into two categories Classification done sequentially at two levels
  • 107. Summary • Data mining is set to play an important role in tackling the data overload in medical informatics • Benefits include improved health care quality, reduced operating costs, and better insight into medical data • Abductive networks offer advantages over neural networks, including faster model development and better explanation capabilities
  • 111. Features • Loosely stated, a feature is a value describing something about your data points (e.g. for pixels: intensity, local gradient, distance from landmark, etc) • Multiple (n) features are put together to form a feature vector, which defines a data point’s location in n-dimensional feature space
  • 112. Feature Space • Feature Space - – The theoretical n-dimensional space occupied by n input raster objects (features). – Each feature represents one dimension, and its values represent positions along one of the orthogonal coordinate axes in feature space. – The set of feature values belonging to a data point define a vector in feature space.
  • 113. Statistical Notation • Class probability distribution: p(x,y) = p(x | y) p(y) x: feature vector – {x1,x2,x3…,xn} y: class p(x | y): probabilty of x given y p(x,y): probability of both x and y
  • 115. Example: Binary Classification • Two class-conditional distributions: p(x | y = 0) p(x | y = 1) • Priors: p(y = 0) + p(y = 1) = 1
  • 116. Modeling Class Densities • In the text, they choose to concentrate on methods that use Gaussians to model class densities
  • 118. Generative Approach to Classification 1. Represent and learn the distribution: p(x,y) 2. Use it to define probabilistic discriminant functions e.g. go(x) = p(y = 0 | x) g1(x) = p(y = 1 | x)
  • 119. Generative Approach to Classification Typical model: p(x,y) = p(x | y) p(y) p(x | y) = Class-conditional distributions (densities) p(y) = Priors of classes (probability of class y) We Want: p(y | x) = Posteriors of classes
  • 120. Class Modeling • We model the class distributions as multivariate Gaussians x ~ N(μ0, Σ0) for y = 0 x ~ N(μ1, Σ1) for y = 1 • Priors are based on training data, or a distribution can be chosen that is expected to fit the data well (e.g. Bernoulli distribution for a coin flip)
  • 121. Making a class decision • We need to define discriminant functions ( gn(x) ) • We have two basic choices: – Likelihood of data – choose the class (Gaussian) that best explains the input data (x): – Posterior of class – choose the class with a better posterior probability:
  • 122. Calculating Posteriors • Use Bayes’ Rule: • In this case, )( )()|( )|( BP APABP BAP =
  • 123. Linear Decision Boundary • When covariances are the same
  • 126. Quadratic Decision Boundary • When covariances are different
  • 129. Clustering • Basic Clustering Problem: – Distribute data into k different groups such that data points similar to each other are in the same group – Similarity between points is defined in terms of some distance metric • Clustering is useful for: – Similarity/Dissimilarity analysis • Analyze what data point in the sample are close to each other – Dimensionality Reduction • High dimensional data replaced with a group (cluster) label
  • 130. Clustering • Cluster: a collection of data objects – Similar to one another within the same cluster – Dissimilar to the objects in other clusters • Cluster analysis – Grouping a set of data objects into clusters • Clustering is unsupervised classification: no predefined classes • Typical applications – to get insight into data – as a preprocessing step – we will use it for image segmentation
  • 131. What is Clustering? Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)
  • 132. The Goals of Clustering • Determine the intrinsic grouping in a set of unlabeled data. • What constitutes a good clustering? • All clustering algorithms will produce clusters, regardless of whether the data contains them • There is no golden standard, depends on goal: – data reduction – “natural clusters” – “useful” clusters – outlier detection
  • 134. Taxonomy of Clustering Approaches
  • 135. Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.
  • 136. Single link Agglomerative Clustering In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.
  • 137. Complete link Agglomerative Clustering In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.
  • 138. Example – Single Link AC   BA FI MI NA RM TO BA 0 662 877 255 412 996 FI 662 0 295 468 268 400 MI 877 295 0 754 564 138 NA 255 468 754 0 219 869 RM 412 268 564 219 0 669 TO 996 400 138 869 669 0
  • 139. What is Cluster Analysis? • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 140. Notion of a Cluster can be Ambiguous How many clusters? Four Clusters Two Clusters  Six Clusters 
  • 141. Types of Clusters: Contiguity-Based • Contiguous Cluster (Nearest neighbor or Transitive) – A cluster is a set of points such that a point in a cluster is closer (or more similar) to one or more other points in the cluster than to any point not in the cluster. 8 contiguous clusters
  • 142. Types of Clusters: Density-Based • Density-based – A cluster is a dense region of points, which is separated by low- density regions, from other regions of high density. – Used when the clusters are irregular or intertwined, and when noise and outliers are present. 6 density-based clusters
  • 143. Euclidean Density – Cell-based • Simplest approach is to divide region into a number of rectangular cells of equal volume and define density as # of points the cell contains
  • 144. Euclidean Density – Center-based • Euclidean density is the number of points within a specified radius of the point
  • 145. Data Structures in Clustering • Data matrix – (two modes) • Dissimilarity matrix – (one mode)                   npx...nfx...n1x ............... ipx...ifx...i1x ............... 1px...1fx...11x                 0...)2,()1,( ::: )2,3() ...ndnd 0dd(3,1 0d(2,1) 0
  • 146. Interval-valued variables • Standardize data – Calculate the mean squared deviation: where – Calculate the standardized measurement (z-score) • Using mean absolute deviation could be more robust than using standard deviation .)... 21 1 nffff xx(xnm +++= )2||...2||2|(|1 21 fnffffff mxmxmxns −++−+−= f fif if s mx z − =
  • 147. • Euclidean distance: – Properties • d(i,j) ≥ 0 • d(i,j) = 0 iff i=j • d(i,j) = d(j,i) • d(i,j) ≤ d(i,k) + d(k,j) • Also one can use weighted distance, parametric Pearson product moment correlation, or other disimilarity measures. )||...|||(|),( 22 22 2 11 pp j x i x j x i x j x i xjid −++−+−= Similarity and Dissimilarity Between Objects
  • 148. The set of 5 observations, measuring 3 variables, can be described by its mean vector and covariance matrix. The three variables, from left to right are length, width, and height of a certain object, for example. Each row vector Xrow is another observation of the three variables (or components) for row=1, …, 5. Covariance Matrix
  • 149. The mean vector consists of the means of each variable. The covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions. 0.025 is the variance of the length variable, 0.0075 is the covariance between the length and the width variables, 0.00175 is the covariance between the length and the height variables, 0.007 is the variance of the width variable. where n = 5 for this example ∑ ∑ = = −− − = −− − = − = n row krowkjrowjjk n row rowrow xXxX n s xXxX n XX n S 1 1 ))(( 1 1 )')(( 1 1 ' 1 1
  • 150. Mahalanobis Distance T qpqpqpsmahalanobi )()(),( 1 −∑−= − For red points, the Euclidean distance is 14.7, Mahalanobis distance is 6. Σ is the covariance matrix of the input data X ∑= −− − =Σ n i kikjijkj XXXX n 1 , ))(( 1 1
  • 151. Mahalanobis Distance Covariance Matrix:       =Σ 3.02.0 2.03.0 B A C A: (0.5, 0.5) B: (0, 1) C: (1.5, 1.5) Mahal(A,B) = 5 Mahal(A,C) = 4
  • 152. Cosine Similarity • If x1 and x2 are two document vectors, then cos( x1 , x2 ) = (x1 • x2 ) / ||x1 || ||x2 || , where • indicates vector dot product and || d || is the length of vector d. • Example: x1 = 3 2 0 5 0 0 0 2 0 0 x2 = 1 0 0 0 0 0 0 1 0 2 x1 • x2 = 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5 ||x1 || = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481 ||x2 || = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5 = (6) 0.5 = 2.245 cos( x1 , x2 ) = .3150
  • 153. Correlation • Correlation measures the linear relationship between objects • To compute correlation, we standardize data objects, p and q, and then take their dot product )(/))(( pstdpmeanpp kk −=′ )(/))(( qstdqmeanqq kk −=′ qpqpncorrelatio ′•′=),(
  • 154. Visually Evaluating Correlation Scatter plots showing the similarity from – 1 to 1.
  • 155. K-means Clustering • Partitional clustering approach • Each cluster is associated with a centroid (center point) • Each point is assigned to the cluster with the closest centroid • Number of clusters, K, must be specified • The basic algorithm is very simple
  • 156. k-means Clustering • An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj containing Nj data points so as to minimize the sum-of-squares criterion 2 1 || j K j Sn n j xJ µ−= ∑ ∑= ∈ where xn is a vector representing the nth data point and µj is the geometric centroid of the data points in SSjj
  • 157. K-means Clustering – Details • Initial centroids are often chosen randomly. – Clusters produced vary from one run to another. • The centroid is (typically) the mean of the points in the cluster. • ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc. • K-means will converge for common distance functions. • Most of the convergence happens in the first few iterations. – Often the stopping condition is changed to ‘Until relatively few points change clusters’ • Complexity is O( n * K * I * d ) – n = number of points, K = number of clusters, I = number of iterations, d = number of attributes
  • 158. Two different K-means Clusterings -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Sub-optimal Clustering -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Optimal Clustering Original Points • Importance of choosing initial centroids
  • 159. Solutions to Initial Centroids Problem • Multiple runs – Helps, but probability is not on your side • Sample and use hierarchical clustering to determine initial centroids • Select more than k initial centroids and then select among these initial centroids – Select most widely separated • Postprocessing • Bisecting K-means – Not as susceptible to initialization issues Basic K-means algorithm can yield empty clusters Handling Empty Clusters
  • 160. Pre-processing and Post-processing • Pre-processing – Normalize the data – Eliminate outliers • Post-processing – Eliminate small clusters that may represent outliers – Split ‘loose’ clusters, i.e., clusters with relatively high SSE – Merge clusters that are ‘close’ and that have relatively low SSE
  • 161. Bisecting K-means • Bisecting K-means algorithm – Variant of K-means that can produce a partitional or a hierarchical clustering
  • 163. Limitations of K-means • K-means has problems when clusters are of differing – Sizes – Densities – Non-globular shapes • K-means has problems when the data contains outliers.
  • 164. Limitations of K-means: Differing Sizes Original Points K-means (3 Clusters)
  • 165. Limitations of K-means: Differing Density Original Points K-means (3 Clusters)
  • 166. Limitations of K-means: Non-globular Shapes Original Points K-means (2 Clusters)
  • 167. Overcoming K-means Limitations Original Points K-means Clusters One solution is to use many clusters. Find parts of clusters, but need to put together.
  • 168. Overcoming K-means Limitations Original Points K-means Clusters
  • 169. Variations of the K-Means Method • A few variants of the k-means which differ in – Selection of the initial k means – Dissimilarity calculations – Strategies to calculate cluster means • Handling categorical data: k-modes (Huang’98) – Replacing means of clusters with modes – Using new dissimilarity measures to deal with categorical objects – Using a frequency-based method to update modes of clusters • Handling a mixture of categorical and numerical data: k- prototype method
  • 170. The K-Medoids Clustering Method • Find representative objects, called medoids, in clusters • PAM (Partitioning Around Medoids, 1987) – starts from an initial set of medoids and iteratively replaces one of the medoids by one of the non-medoids if it improves the total distance of the resulting clustering – PAM works effectively for small data sets, but does not scale well for large data sets • CLARA (Kaufmann & Rousseeuw, 1990) – draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output • CLARANS (Ng & Han, 1994): Randomized sampling • Focusing + spatial data structure (Ester et al., 1995)
  • 171. Hierarchical Clustering • Produces a set of nested clusters organized as a hierarchical tree • Can be visualized as a dendrogram – A tree like diagram that records the sequences of merges or splits 1 3 2 5 4 6 0 0.05 0.1 0.15 0.2 1 2 3 4 5 6 1 2 3 4 5
  • 172. Strengths of Hierarchical Clustering • Do not have to assume any particular number of clusters – Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level • They may correspond to meaningful taxonomies – Example in biological sciences (e.g., animal kingdom, phylogeny reconstruction, …)
  • 173. Hierarchical Clustering • Two main types of hierarchical clustering – Agglomerative: • Start with the points as individual clusters • At each step, merge the closest pair of clusters until only one cluster (or k clusters) left Matlab: Statistics Toolbox: clusterdata, which performs all these steps: pdist, linkage, cluster – Divisive: • Start with one, all-inclusive cluster • At each step, split a cluster until each cluster contains a point (or there are k clusters) • Traditional hierarchical algorithms use a similarity or distance matrix – Merge or split one cluster at a time – Image segmentation mostly uses simultaneous merge/split
  • 174. Agglomerative Clustering Algorithm • More popular hierarchical clustering technique • Basic algorithm is straightforward 1. Compute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains • Key operation is the computation of the proximity of two clusters – Different approaches to defining the distance between clusters distinguish the different algorithms
  • 175. Starting Situation • Start with clusters of individual points and a proximity matrix p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Proximity Matrix
  • 176. Intermediate Situation • After some merging steps, we have some clusters C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Proximity Matrix
  • 177. Intermediate Situation • We want to merge the two closest clusters (C2 and C5) and update the proximity matrix. C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Proximity Matrix
  • 178. After Merging • The question is “How do we update the proximity matrix?” C1 C4 C2 U C5 C3 ? ? ? ? ? ? ? C2 U C5 C1 C1 C3 C4 C2 U C5 C3 C4 Proximity Matrix
  • 179. How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Similarity? • MIN • MAX • Group Average • Distance Between Centroids • Other methods driven by an objective function – Ward’s Method uses squared error Proximity Matrix
  • 180. How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Proximity Matrix • MIN • MAX • Group Average • Distance Between Centroids • Other methods driven by an objective function – Ward’s Method uses squared error
  • 181. How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Proximity Matrix • MIN • MAX • Group Average • Distance Between Centroids • Other methods driven by an objective function – Ward’s Method uses squared error
  • 182. How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Proximity Matrix • MIN • MAX • Group Average • Distance Between Centroids • Other methods driven by an objective function – Ward’s Method uses squared error
  • 183. How to Define Inter-Cluster Similarity p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Proximity Matrix • MIN • MAX • Group Average • Distance Between Centroids • Other methods driven by an objective function – Ward’s Method uses squared error × ×
  • 184. Hierarchical Clustering: Comparison Group Average Ward’s Method 1 2 3 4 5 6 1 2 5 3 4 MIN MAX 1 2 3 4 5 6 1 2 5 3 4 1 2 3 4 5 6 1 2 5 3 41 2 3 4 5 6 1 2 3 4 5
  • 185. Hierarchical Clustering: Time and Space requirements • O(N2 ) space since it uses the proximity matrix. – N is the number of points. • O(N3 ) time in many cases – There are N steps and at each step the size, N2 , proximity matrix must be updated and searched – Complexity can be reduced to O(N2 log(N) ) time for some approaches
  • 186. Hierarchical Clustering: Problems and Limitations • Once a decision is made to combine two clusters, it cannot be undone Therefore, we use merge/split to segment images! • No objective function is directly minimized • Different schemes have problems with one or more of the following: – Sensitivity to noise and outliers – Difficulty handling different sized clusters and convex shapes – Breaking large clusters
  • 187. MST: Divisive Hierarchical Clustering • Build MST (Minimum Spanning Tree) – Start with a tree that consists of any point – In successive steps, look for the closest pair of points (p, q) such that one point (p) is in the current tree but the other (q) is not – Add q to the tree and put an edge between p and q
  • 188. MST: Divisive Hierarchical Clustering • Use MST for constructing hierarchy of clusters
  • 189. More on Hierarchical Clustering Methods • Major weakness of agglomerative clustering methods – do not scale well: time complexity of at least O(n2 ), where n is the number of total objects – can never undo what was done previously • Integration of hierarchical with distance-based clustering – BIRCH (1996): uses CF-tree and incrementally adjusts the quality of sub- clusters – CURE (1998): selects well-scattered points from the cluster and then shrinks them towards the center of the cluster by a specified fraction – CHAMELEON (1999): hierarchical clustering using dynamic modeling
  • 190. Density-Based Clustering Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: – Discover clusters of arbitrary shape – Handle noise – One scan – Need density parameters as termination condition • Several interesting studies: – DBSCAN: Ester, et al. (KDD’96) – OPTICS: Ankerst, et al (SIGMOD’99). – DENCLUE: Hinneburg & D. Keim (KDD’98) – CLIQUE: Agrawal, et al. (SIGMOD’98)
  • 191. Graph-Based Clustering • Graph-Based clustering uses the proximity graph – Start with the proximity matrix – Consider each point as a node in a graph – Each edge between two nodes has a weight which is the proximity between the two points – Initially the proximity graph is fully connected – MIN (single-link) and MAX (complete-link) can be viewed as starting with this graph • In the simplest case, clusters are connected components in the graph.
  • 192. Graph-Based Clustering: Sparsification • Clustering may work better – Sparsification techniques keep the connections to the most similar (nearest) neighbors of a point while breaking the connections to less similar points. – The nearest neighbors of a point tend to belong to the same class as the point itself. – This reduces the impact of noise and outliers and sharpens the distinction between clusters. • Sparsification facilitates the use of graph partitioning algorithms (or algorithms based on graph partitioning algorithms. – Chameleon and Hypergraph-based Clustering
  • 193. Sparsification in the Clustering Process
  • 194. Cluster Validity • For supervised classification we have a variety of measures to evaluate how good our model is – Accuracy, precision, recall • For cluster analysis, the analogous question is how to evaluate the “goodness” of the resulting clusters? • Then why do we want to evaluate them? – To avoid finding patterns in noise – To compare clustering algorithms – To compare two sets of clusters – To compare two clusters
  • 195. Clusters found in Random Data 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Random Points 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y K-means 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y DBSCAN 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Complete Link
  • 196. • Numerical measures that are applied to judge various aspects of cluster validity, are classified into the following three types. – External Index: Used to measure the extent to which cluster labels match externally supplied class labels. • Entropy – Internal Index: Used to measure the goodness of a clustering structure without respect to external information. • Sum of Squared Error (SSE) – Relative Index: Used to compare two different clusterings or clusters. • Often an external or internal index is used for this function, e.g., SSE or entropy • Sometimes these are referred to as criteria instead of indices – However, sometimes criterion is the general strategy and index is the numerical measure that implements the criterion. Measures of Cluster Validity
  • 197. • Cluster Cohesion: Measures how closely related are objects in a cluster – Example: SSE • Cluster Separation: Measure how distinct or well-separated a cluster is from other clusters • Example: Squared Error – Cohesion is measured by the within cluster sum of squares (SSE) – Separation is measured by the between cluster sum of squares • Where |Ci| is the size of cluster i Internal Measures: Cohesion and Separation ∑ ∑ ∈ −= i Cx i i mxWSS 2 )( ∑ −= i ii mmCBSS 2 )(
  • 198. Internal Measures: Cohesion and Separation • Example: 1 2 3 4 5 × ×× m1 m2 m 1091 9)35.4(2)5.13(2 1)5.45()5.44()5.12()5.11( 22 2222 =+= =−×+−×= =−+−+−+−= Total BSS WSSK=2 clusters: 10010 0)33(4 10)35()34()32()31( 2 2222 =+= =−×= =−+−+−+−= Total BSS WSSK=1 cluster:
  • 199. • A proximity graph based approach can also be used for cohesion and separation. – Cluster cohesion is the sum of the weight of all links within a cluster. – Cluster separation is the sum of the weights between nodes in the cluster and nodes outside the cluster. Internal Measures: Cohesion and Separation cohesion separation
  • 202. Distance Metrics • Euclidean Distance, in some space (for our purposes, probably a feature space) • Must fulfill three properties:
  • 203. Distance Metrics • Common simple metrics: – Euclidean: – Manhattan: • Both work for an arbitrary k-dimensional space
  • 204. Clustering Algorithms • k-Nearest Neighbor • k-Means • Parzen Windows
  • 205. k-Nearest Neighbor • In essence, a classifier • Requires input parameter k – In this algorithm, k indicates the number of neighboring points to take into account when classifying a data point • Requires training data
  • 206. k-Nearest Neighbor Algorithm • For each data point xn, choose its class by finding the most prominent class among the k nearest data points in the training set • Use any distance measure (usually a Euclidean distance measure)
  • 207. k-Nearest Neighbor Algorithm + + + + - - - - - - e1 1-nearest neighbor: the concept represented by e1 5-nearest neighbors: q1 is classified as negative q1
  • 208. k-Nearest Neighbor • Advantages: – Simple – General (can work for any distance measure you want) • Disadvantages: – Requires well classified training data – Can be sensitive to k value chosen – All attributes are used in classification, even ones that may be irrelevant – Inductive bias: we assume that a data point should be classified the same as points near it
  • 209. k-Means • Suitable only when data points have continuous values • Groups are defined in terms of cluster centers (means) • Requires input parameter k – In this algorithm, k indicates the number of clusters to be created • Guaranteed to converge to at least a local optima
  • 210. k-Means Algorithm • Algorithm: 1. Randomly initialize k mean values 2. Repeat next two steps until no change in means: 1. Partition the data using a similarity measure according to the current means 2. Move the means to the center of the data in the current partition 3. Stop when no change in the means
  • 212. k-Means • Advantages: – Simple – General (can work for any distance measure you want) – Requires no training phase • Disadvantages: – Result is very sensitive to initial mean placement – Can perform poorly on overlapping regions – Doesn’t work on features with non-continuous values (can’t compute cluster means) – Inductive bias: we assume that a data point should be classified the same as points near it
  • 213. Parzen Windows • Similar to k-Nearest Neighbor, but instead of using the k closest training data points, its uses all points within a kernel (window), weighting their contribution to the classification based on the kernel • As with our classification algorithms, we will consider a gaussian kernel as the window
  • 214. Parzen Windows • Assume a region defined by a d-dimensional Gaussian of scale σ • We can define a window density function: • Note that we consider all points in the training set, but if a point is outside of the kernel, its weight will be 0, negating its influence ∑= −= S j jSxG S xp 1 2 ),)(( 1 ),( σσ 
  • 216. Parzen Windows • Advantages: – More robust than k-nearest neighbor – Excellent accuracy and consistency • Disadvantages: – How to choose the size of the window? – Alone, kernel density estimation techniques provide little insight into data or problems
  • 217. Case study: polyp detection • Step 1: CT scan of patient • Step 2: Segmentation of colon Paik, et al.
  • 218. Case study: polyp detection • Step 3: detection of polyp candidates – Hough transform (looking for spheres) Paik, et al.
  • 219. Case study: polyp detection • Step 4: feature extraction • Step 5: classification – Take your pick of algorithms (SVM, ANN, etc.) Gokturk, et al.
  • 220. Case study: polyp detection • Step 6: Flythrough colon giving information to physician for final diagnosis (not yet realized) Paik, et al.
  • 221. Case study: polyp detection Paik, et al.
  • 223. Two categories of interest • Applications of standard computer vision techniques into the medical domain – Segmentation – Computer-Aided Detection • New techniques from medical image analysis added to the vision toolbox – Multi-modal registration
  • 224. Two categories of interest • Applications of standard computer vision techniques into the medical domain – Segmentation – Computer-Aided Detection • New techniques from medical image analysis added to the vision toolbox – Multi-modal registration
  • 225. Two categories of interest • Applications of standard computer vision techniques into the medical domain – Segmentation – Computer-Aided Detection • New techniques from medical image analysis added to the vision toolbox – Multi-modal registration
  • 226. Two categories of interest • Applications of standard computer vision techniques into the medical domain – Segmentation – Computer-Aided Detection • New techniques from medical image analysis added to the vision toolbox – Multi-modal registration
  • 227. Two categories of interest • Applications of standard computer vision techniques into the medical domain – Segmentation – Computer-Aided Detection • New techniques from medical image analysis added to the vision toolbox – Multi-modal registration
  • 228. Two categories of interest • Applications of standard computer vision techniques into the medical domain – Segmentation – Computer-Aided Detection • New techniques from medical image analysis added to the vision toolbox – Multi-modal registration
  • 229. Registration • “The process of establishing a common, geometric reference frame between two data sets.” • Previously used in vision to align satellite images, generate image mosaics, etc. Image 1 Image 2 Registered + =
  • 230. Registration in medicine • Explosion of data, both 2D and 3D from many different imaging modalities have made registration a very important and challenging problem in medicine © L. Joskowicz (HUJI) Ref_MRI Ref_NMR
  • 233. Multi-modal registration Registration Preoperative Intraoperative X-rays US NMR CT MRI Fluoro CAD Tracking US Open MR Special sensors Video Combined Data © L. Joskowicz (HUJI)
  • 236. Feature selection • Points-based – 3D points calculated using an optical tracker • Surfaces – Extracted from images using segmentation algorithms • Intensities – Uses the raw voxel data itself
  • 239. Optimization • Gradients – Gradient descent – Conjugate-gradient – Levenburg-Marquardt • No gradients – Finite-difference gradient + above – Best-neighbor search – Nelder-Mead – Simulated annealing
  • 242. Transformations • Rigid (6 DOF) – 3 rotation – 3 translation • Affine (12 DOF) – 6 from before – 3 scale – 3 skew • Non-rigid (? DOF) – As many control points as your favorite supercomputer can handle © T. Rohlfing (Stanford)
  • 245. Similarity measures • Intra-modality – normalized cross-correlation – gradient correlation – pattern intensity – sum of squared differences • Inter-modality – mutual information (the industry standard)
  • 246. Example: CT-DSA Native CT image Post-contrast CT image © T. Rohlfing (Stanford)
  • 247. Example: CT-DSA After affine registration B-spline with 10mm c.p.g. © T. Rohlfing (Stanford)
  • 248. Example: CT-DSA After affine registration B-spline with 10mm c.p.g. © T. Rohlfing (Stanford)
  • 249. Example: Liver motion Respiration gating during abdominal MR imaging Time © T. Rohlfing (Stanford)
  • 250. Example: liver motion © T. Rohlfing (Stanford)
  • 251. Irradiate tumor (T) with a series of directed beams avoiding critical structures (C) Example: CyberKnife T C
  • 252. RDRD XX YY ZZ The crux of the problem is to match up the coordinate frames of the CT and the radiation delivery device Example: CyberKnife XX22 YY22 ZZ22 CTCT XX 22YY 22 ZZ 22 CT
  • 253. RDRD XX YY ZZ CTCT XX 22YY 22 ZZ 22 Using only 2D projection images! Example: CyberKnife RD CT X 2Y 2 Z 2 X Y Z
  • 256. Conclusions • Medicine is a fertile and active area for computer vision research • Application of existing vision tools to new, challenging domains • Development of new vision tools to assist in the practice of medicine
  • 257. Nov. 25 - 28 257 ObjectivesObjectives To evaluate the tissue characteristic of kidney forTo evaluate the tissue characteristic of kidney for implementing unbiased diagnosis procedure andimplementing unbiased diagnosis procedure and to classify important kidney ordersto classify important kidney orders To establish a set of unconstraint features that areTo establish a set of unconstraint features that are independent to kidney area variationsindependent to kidney area variations
  • 258. Nov. 25 - 28 258 Sample US kidney ImagesSample US kidney Images Fig.1 a. Normal image of male with age 38 years, b. Medical renal diseases image of maleFig.1 a. Normal image of male with age 38 years, b. Medical renal diseases image of male with age 45 years and c. Cortical polycystic disease image of female with age 51 years.with age 45 years and c. Cortical polycystic disease image of female with age 51 years. (a) (b) (c)
  • 259. Nov. 25 - 28 259 Material and MethodsMaterial and Methods • Image Data CollectionImage Data Collection • Two types of scanning systems namely ATL HDI 5000 curvilinearTwo types of scanning systems namely ATL HDI 5000 curvilinear probe with transducer frequency of 5 – 6 MHz and WiproGE LOGICprobe with transducer frequency of 5 – 6 MHz and WiproGE LOGIC 400 curvilinear probe with transducer frequency of 3 – 5 MHz.400 curvilinear probe with transducer frequency of 3 – 5 MHz. • The longitudinal cross section of the kidney is taken by fixing theThe longitudinal cross section of the kidney is taken by fixing the transducer frequency at 4 MHz.transducer frequency at 4 MHz. • In each class 50 images are obtained. In total 150 images are pre-In each class 50 images are obtained. In total 150 images are pre- processed before feature extraction.processed before feature extraction. • The necessary care has been taken to preserve the shape, size andThe necessary care has been taken to preserve the shape, size and gray-level distribution as it obliterates the sonographic content ofgray-level distribution as it obliterates the sonographic content of information.information.
  • 260. Nov. 25 - 28 260 Material and MethodsMaterial and Methods • Image Pre-processingImage Pre-processing Segmentation by higher orderSegmentation by higher order spline interpolation after up-spline interpolation after up- sampling of distributedsampling of distributed coordinatecoordinate Rotation to zero degree axisRotation to zero degree axis Retaining the pixel of interestRetaining the pixel of interest Estimation of ContentEstimation of Content Descriptive FeaturesDescriptive FeaturesKidney CharacterizationKidney Characterization
  • 261. 261 Material and MethodsMaterial and Methods • Image Pre-processingImage Pre-processing Input USInput US kidney imagekidney image ii-HSIC-HSIC segmentationsegmentation Image rotationImage rotation to zero degreeto zero degree reference axisreference axis Unbounded pixelUnbounded pixel eliminationelimination Nov. 25 - 28
  • 262. Nov. 25 - 28 262 Material and MethodsMaterial and Methods • Feature ExtractionFeature Extraction • First order gray level statistical featuresFirst order gray level statistical features • Second order gray level statistical featuresSecond order gray level statistical features • Algebraic moment invariants featuresAlgebraic moment invariants features • Multi-scale differential featuresMulti-scale differential features • Power spectral featuresPower spectral features • Dominant Gabor wavelet featuresDominant Gabor wavelet features
  • 263. 263 Material and MethodsMaterial and Methods • Feature ExtractionFeature Extraction • First order gray level statistical featuresFirst order gray level statistical features – mean (M1), dispersion (M2), variance (M3), average energy (M4),mean (M1), dispersion (M2), variance (M3), average energy (M4), skewness (M5), kurtosis (M6), median (M7) and mode (M8)skewness (M5), kurtosis (M6), median (M7) and mode (M8) • Second order gray level statistical featuresSecond order gray level statistical features – energy (E), entropy (H), correlation (C), inertia (In) and homogeneityenergy (E), entropy (H), correlation (C), inertia (In) and homogeneity (L)(L) • Algebraic moment invariants featuresAlgebraic moment invariants features – eight RST invariant features фeight RST invariant features ф11, ф, ф22, ф, ф33, ф, ф44, ф, ф55, ф, ф66, ф, ф77 and фand ф55/ф/ф11 Nov. 25 - 28
  • 264. 264 Material and MethodsMaterial and Methods • Feature ExtractionFeature Extraction • Multi-scale differential featuresMulti-scale differential features – two principal curvature features namely isophote (N) and flowline (T)two principal curvature features namely isophote (N) and flowline (T) are computed. From these values of N and T, a set of MSDF’s are thenare computed. From these values of N and T, a set of MSDF’s are then determined, namely, the mean (Nmean; Tmean), maximum (Nmax;determined, namely, the mean (Nmean; Tmean), maximum (Nmax; Tmax) and minimum (Nmin; Tmin)Tmax) and minimum (Nmin; Tmin) • Power spectral featuresPower spectral features – six power spectral features denoted bysix power spectral features denoted by and are estimated at the specific cut-off frequencies in the spectrum andand are estimated at the specific cut-off frequencies in the spectrum and by considering global mean total power.by considering global mean total power. • Dominant Gabor wavelet featuresDominant Gabor wavelet features – Out of 30 Gabor wavelets, a unique Dominant Gabor Wavelet isOut of 30 Gabor wavelets, a unique Dominant Gabor Wavelet is determined by estimating the similarity metrics between original anddetermined by estimating the similarity metrics between original and reconstructed Gabor image. The Gabor features ‘μreconstructed Gabor image. The Gabor features ‘μmnmn’, ‘σ’, ‘σmnmn’ and’ and ‘AAD‘AADmnmn’ are then evaluated using Dominant Gabor Wavelet’ are then evaluated using Dominant Gabor Wavelet 1W TΡ 2W TΡ 1 12 R WT −Ρ 2 12 R WT −Ρ 3 1 R WT d−Ρ 4 1 R WT d−Ρ Nov. 25 - 28
  • 265. 265 Decision Support System For Kidney ClassificationDecision Support System For Kidney Classification . . . . Input featureInput feature vectorvector IIjj FuzzificationFuzzification ffjj All 36All 36 IfIf X≥n/X≥n/ 22 YesYes NRNR NoNo InitiateInitiate OptimizedOptimized MBPNMBPN MRDMRD CCCCFuzzy rulesFuzzy rules FISFIS Hybrid fuzzy-neural systemHybrid fuzzy-neural system Nov. 25 - 28

Notas del editor

  1. Medical imaging has experienced, during the last few decades, the development and commercialisation of a pletiora of new imaging technologies: computed tomography, MR Imaging, digital subtraction angiography, Doppler ultrasound imaging and various imaging techniques based on nuclear emission (PET,SPECT…). They all have been valuable addition to the radiologists arsenal of imaging tools towards ever more reliable detection and diagnosis of disease. More recently, conventional x-ray imaging technology itself is being challenged by the emerging possibilities offered by flat panel x-ray detectors. This course is to give some ideas and methods of image processing and analysis that are to work in the field of medical imaging.
  2. The special nature of medical images derives as much from their method of acquisition as it does from the subjects whose images are being acquired. While surface imaging is used in some applications (e.g. examination of properties of the skin), medical imaging has been distinguished primarily by its ability to provide information about the volumes beneath the surface (from the discovery of x-ray some 100 years ago). Image are obtained for medical purposes almost exclusively to probe the otherwise invisible anatomy below the skin. This information may be in the form of: 2 dimensional projection acquired by traditional radiography 2D slices of B-mode ultrasound or full 3D mappings such as those provided by CT, RMI, SPECT, PET and 3D ultrasound.
  3. In the case of radiology, perspective projection maps physical points into image space in the same way as photography, but the detection and classification of objects is confounded by the presence of overlying or underlying tissue, a problem rarely considered in general image analysis. In the case of tomography, 3D images bring both complications and simplifications to the processing and analysis relative to two dimensional ones: topology of 3D is more complex than 2D ones problems associated with perspective projection and occlusion are gone In addition to these geometrical differences, medical images typically suffer more from the problems of discretisation, where larger pixels (voxels in 3D) and lower resolution combine to reduce fidelity. Additional limitations to image quality arise from the distortions and burring associated with relatively long acquisition times in the face of inevitable anatomical motion – primarily cardiac and pulmonary. reconstruction errors associated with noise, beam hardening, etc. These and other differences between medical and non medical techniques of image acquisition account for many of the differences between medical and non-medical approaches to processing and analysis.
  4. The fact the medical image processing deal mostly with living body bring other major differences in comparison to computer or robot vision. The object of interest are soft and deformable with 3D shapes whose surfaces are rarely rectangular, cylindrical or spherical and whose features rarely include planes or straight lines that are so frequent in technical vision applications There are however major advantages in dealing with medical images that contribute in a substantial way to the analysis design. The available knowledge of what is and what is not normal human anatomy is one of them. Recent advances in selective enhancement of specific organs or other objects of interest via the injection of contrast-enhancing material represent other advances. All these differences affect the way in which images are effectively processed and analysed. Validation of developed medical image processing and analysis techniques is a major part of any medical application. While validating the results of any methodology is always important, the scarcity of accurate and reliable independent standards creates yet another challenge for medical imaging field.
  5. Medical image processing deals with the development of problem specific approaches to enhancement of raw medical data for the purposes of selective visualisation as well as further analysis. Medical image analysis then concentrates on the development of techniques to supplement the mostly qualitative and frequently subjective assessment of medical images by human experts with a variety of new information that is quantitative, objective and reproducible
  6. This film radiogram shows two views of the same hand. Observe that the third phalanx of the middle finger is broken. The fracture is more apparent in the view on the left, because of the angulation. It can also be seen in the view on the right because of the overlap between the two parts of the phalanx causes a decrease in film density.
  7. A human hand is shown in two projections on one X-ray film.
  8. Functional magnetic resonance imaging displays neural activity in the brain. Regions in which the neurons are active show a signal. This is the only non-invasive method of monitoring the activity of the brain.
  9. Image of pregnant uterus showing two embryos.
  10. Image on left can be interpreted by almost any individual. Image on right requires a knowledge of the subject (pregnant woman) and location (uterus) for interpretation.
  11. Virtual sinus endoscopy of chronic sinusitis. The red structure means inflammatory portion. The trip starts from right nasal cavity and goes through right maxillary sinus and ends at right frontal sinus.
  12. This animation is derived from MRI data of a patient with a glioma 1. This demonstrates planning of a stereotactic procedure using computerized simulation 2. This shows three alternative approaches for a surgical removal of the tumour. 3. This demonstrates registration of vessels derived from a phase contrast angiogram and anatomy derived from double-echo MR scans.
  13. 4
  14. 7
  15. 8
  16. Contrast Stretching Low-contrast images occurs often due to poor non-uniform lightening conditions or due to non-linearity or small dynamic range of the sensor. The figure shows a typical contrast stretching transformation, which can be express as: ...... The parameters a and b can be obtained by examining the histogram of the image. For example, the grey scale intervals where pixels occur most frequently would be stretch most to improve the overall visibility of a scene. The slope of the transformation is chosen greater than unity in the region of stretch For dark region stretch alpha&gt;1 and a of order L/3. For mid-region stretch beta&gt;1, and b of order (2/3)L. For bright region stretch gamma&gt;1.
  17. A practical application of this tool is given below. The original image is a CT Image with pixel values ranging from 1024 to 1862. The histogram plotted ranges from 1000 to 1862. Applying the Window and Level technique with parameters Window=100 and Level=1100 yields Depicted below is the original image and the result after applying the Window and Level technique with parameters Window=38 and Level=74.
  18. Image segmentation is the process of dividing an image into regions. This process is problem-oriented. Examples of segmentation are illustrated using two types of images of the heart, cineangiocardiographic and a nuclear medicine images. In the first case, the challenge is to separate the blood (light) area from the rest. In the second case, the problem is to separate the live tissue (light) area from the rest. The simplest and most widely used segmentation method is thresholding. It consists of setting background values for pixels below a threshold value T and a different set values for the foreground. If the input image is f(x,y) and thresholded image is g(x,y), the equation of the thresholding operator is given by: Thresholding is a special case of clipping where a=b=t and the output becomes binary. Example, a seemingly binary image, such as a printed page, does not give binary output when canned because of sensor noise and background illumination variations. Thresholding is used to make such an image image binary.
  19. Here the example of an angiogram thresholding In the experiment below, the variation on the threshold value causes a large variation on the area of the foreground pixels. This is a difficult problem to solve. Shown below are the original image and the results after applying different thresholds values (118, 128, 138) to it. The areas of each thresholded image are depicted below. Level Area 118 19,670 pixels 128 16,969 pixels 14,462 pixels .
  20. Here the spect-heart. The original image has been first expanded or &quot;zoomed&quot; by a factor of 4. Shows the threshold images at 118, 128, 138 Depicted below are the original image and the results after applying different thresholds values (118, 128, 138) to it. The original image has been first expanded or &quot;zoomed&quot; by a factor of 4. The areas of each thresholded image are depicted below. Level Area 118 731 pixels 128 659 pixels 138 588 pixels
  21. A common contrast enhancement procedure to brighten dark images is the application of a logarithmic colour table. Shown below is the original image for our experiment and the logarithm colour table.
  22. When images too bright, a contrast enhancement table like the exponential function can be used to darken the image. Shown below is the original image for our experiment and the exponential colour table. ... .... We want to map the pixel values of the original image using the exponential colour table. Performing this operation yields
  23. Below is an original grey scale or monochrome image and next to it is the same image but with a grey level ramp inserted in it. This is done so that the colour table can be visualised directly in the image display. The technique to generate the image with a built-in grey-ramp is very useful to understand how the colours are mapped in the display. a)Original image; b)With an grey-ramp Shown below are two images using two different pseudo colour tables. The image on the left uses the &quot;rainbow&quot; colour table and the one on the right uses the &quot;SApseudo&quot; colour table. a)Rainbow colour table; b)SApseudo colour table
  24. What is classification??
  25. Classification is simply the problem of separating different classes of data in some feature space This is a linear decision boundary…can be other types (and often are)
  26. Quadratic decision boundary These depict decision boundaries in two dimensions…feature space is n-dimensional
  27. - Explain feature space (features) as it pertains to image analysis
  28. - p(x|y) = probability of x given y
  29. - Note that these are identical gaussians (i.e. equal covariance)
  30. - Discriminant function is able to determine class given data point
  31. You get p(x,y), denoting the probability of the data and the class Want p(y|x), posterior
  32. - N(mu, sigma) represents a normal (gaussian) distribution
  33. - Ok, that’s it for Linear classifiers for now…on to more interesting stuff: Clustering
  34. - In many respects clustering is a similar problem to classification
  35. Clustering can be considered the most important unsupervised learning problem; it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
  36. the goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. But how to decide what constitutes a good clustering? It can be shown that there is no absolute “best” criterion which would be independent of the final aim of the clustering. Consequently, it is the user which must supply this criterion, in such a way that the result of the clustering will suit their needs. For instance, we could be interested in finding representatives for homogeneous groups ( data reduction ), in finding “natural clusters” and describe their unknown properties ( “natural” data types ), in finding useful and suitable groupings ( “useful” data classes ) or in finding unusual data objects ( outlier detection ).
  37. Average linkage is a compromise between the sensitivity of complete-link clustering to outliers and the tendency of single-link clustering to form long chains that do not correspond to the intuitive notion of clusters as compact, spherical objects.
  38. - Note: k=1 turns out to be a voronoi diagram
  39. Explain all of this in better terms… Data is assigned to whichever mean it is closest to Then we move means to represent center of its current set of data points
  40. Left – small window…high accuracy for given test sample, but VERY specific…probably no good for new data Right – much more general…for large set, probably more accurate
  41. Radiotherapy has emerged in recent years as an effective way to treat tumors previously thought to be inoperable due to their proximity to critical structures of interest. So instead of the risks associated with conventional surgery, what physicians do is irradiate the irradiate the tumor with a serious of directed beam of radiation. The idea here is that the radiation level along the path of a single beam is not enough to damage living tissue. However, focus of all beams gets a lethal dose of radiation. (treats abnormal arterial growths as well)
  42. So what happens is that a team of physicians devise a treatment plan using a preoperative CT. The treatment plan consists of a series of beam directions and dosages such that the tumor gets a lethal dose of radiation and the critical structures get as little as possible. This is all done in the reference frame of the CT. Now, on the day of the operation, the patient is lying underneath a linear accelerator in a difference reference frame. The crux of the 2D-3D registration problem is to match up these two reference frames so that when we shoot the linear accelerator, we’re shooting it in the right place according to the treatment plan. And we need to do this matching using only 2 dimensional X-ray images. So how does this work?
  43. Then we create a virtual model of the Operating Room which has been calibrated so that the absolute positions of the sources and detectors match exactly. So they share the same coordinate system.
  44. We can then compare the reference image and the DRR to see how well they match. We then adjust the pose T until we get a DRR optimally similar to the reference which suggests that the CT is aligned correctly in the reference frame of the treatment room. ** The primary benefit of this method is that it is entirely non-invasive. However this benefit comes at the price of much poorer robustness with respect to initial misregistration. In other words, for intensity-based methods alone, if we don’t have a good initial first guess at the correct registration, often we will not converge to the correct answer.