SlideShare una empresa de Scribd logo
1 de 75
MACHINE LEARNING
UNIT-1
Big Data
3
 Widespread useof personal computersand
wireless communicationleads to “bigdata”
 We are both producers and consumersof data
 Data isnot random, it hasstructure,e.g., customer
behavior
 We need “bigtheory” to extract that structurefrom
data for
(a) Understanding theprocess
(b) Making predictions for thefuture
Why “Learn” ?
4
 Machinelearning isprogramming computersto optimize
a performance criterion usingexample data or past
experience.
 Thereisnoneed to “learn” to calculate payroll
 Learning isusedwhen:
Humanexpertise doesnot exist (navigating on Mars),
Humansare unable to explain their expertise (speech
recognition)
Solutionchangesin time (routing ona computernetwork)
Solutionneedsto be adapted to particular cases(user
biometrics)
What We T
alk About When We T
alk
About “Learning”
5
 Learning general modelsfrom a data ofparticular
examples
 Data ischeap and abundant (data warehouses,
data marts); knowledge isexpensiveandscarce.
 Example in retail: Customertransactions toconsumer
behavior:
Peoplewhobought “Blink” also bought “Outliers”
(www.amazon.com)
 Build a model that isa good anduseful
approximation to thedata.
Data Mining
6
 Retail: Market basket analysis,Customer
relationship management(CRM)
 Finance:Credit scoring,frauddetection
 Manufacturing: Control, robotics,troubleshooting
 Medicine: Medicaldiagnosis
 Telecommunications:Spamfilters, intrusiondetection
 Bioinformatics: Motifs, alignment
 Web mining: Searchengines
 ...
What isMachine Learning?
7
 Optimize a performance criterionusingexample
data or pastexperience.
 Roleof Statistics: Inference from a sample
 Roleof Computer science:Efficient algorithmsto
Solvethe optimization problem
Representing and evaluating the model for inference
Applications
8
 Association
 SupervisedLearning
Classification
Regression
 UnsupervisedLearning
 ReinforcementLearning
LearningAssociations
9
 Basketanalysis:
P(Y| X) probability that somebodywhobuysX
also buysYwhere Xand Yareproducts/services.
Example: P( chips| beer ) = 0.7
Classification
10
 Example:Credit
scoring
 Differentiating
between low-riskand
high-risk customers
from their incomeand
savings
Discriminant: IFincome> θ1 ANDsavings> θ2
THENlow-risk E
L
S
Ehigh-risk
Classification:Applications
11
 Aka Patternrecognition
 Facerecognition: Pose,lighting, occlusion(glasses,
beard), make-up,hairstyle
 Characterrecognition:Different handwriting styles.
 Speechrecognition: Temporaldependency.
 Medical diagnosis: Fromsymptomsto illnesses
 Biometrics:Recognition/authentication usingphysical
and/or behavioral characteristics: Face,iris,
signature,etc
 Outlier/noveltydetection:
FaceRecognition
12
Training examples of a person
Test images
ORL dataset,
AT&T Laboratories, Cambridge UK
Regression
 Example: Price of a
usedcar
 x : car attributes
y : price
y = g (x |  )
g ( ) model,
 parameters
y = wx+w0
RegressionApplications
14
 Navigating a car:Angle of thesteering
 Kinematicsof a robotarm
α1= g1(x,y)
α2= g2(x,y)
(x,y)
α2
α1
 Responsesurface design
Supervised Learning:Uses
15
 Prediction of future cases:Usethe rule to predict
the output for future inputs
 Knowledge extraction: Therule iseasy to
understand
 Compression:Therule issimpler than the data it
explains
 Outlier detection: Exceptionsthat arenot covered
by the rule, e.g., fraud
UnsupervisedLearning
16
 Learning “what normallyhappens”
 No output
 Clustering: Grouping similarinstances
 Exampleapplications
Customersegmentation in CRM
Imagecompression:Color quantization
Bioinformatics: Learning motifs
Reinforcement Learning
17
 Learning a policy: Asequenceof outputs
 No supervised output but delayedreward
 Credit assignmentproblem
 Game playing
 Robotin a maze
 Multipleagents,partial observability,...
SUPER
VISEDLEARNING
Learning a Classfrom Examples
3
 ClassCof a “familycar”
Prediction: Iscar x a family car?
Knowledge extraction: What do people expect from a
family car?
 Output:
Positive(+) and negative (–) examples
 Inputrepresentation:
x1: price, x2 : engine power
Training set X
t t N
t1
X  {x ,r }
0 if x isnegative
 1if x ispositive
r  
 2 
x 
x 
x1 
ClassC
5
p1  price  p2 AND e1  engine power  e2 
HypothesisclassH
h(x) 
0 if h says x isnegative
 1if h says x ispositive


N
t t

t1
1 h x r
E(h|X ) 
6
Error of hon H
S,G,and the Version Space
7
mostspecific hypothesis,S
mostgeneral hypothesis,G
h  H, between Sand G is
consistent and make up the
versionspace
(Mitchell,1997)
Margin
8
 Choosehwith largestmargin
VC Dimension
9
 N points canbe labeled in 2N waysas +/–
 H shattersN if there
existsh H consistent
for any of these:
VC(H ) = N
Anaxis-aligned rectangle shatters4 points only!
Probably Approximately Correct (PAC)
Learning
 Howmanytraining examples N should wehave,
suchthat withprobability
• at least 1 ‒δ, hhaserror at mostε ?
• (Blumeret al., 1989)
 Eachstripisat mostε/4
 Prthat wemissa strip 1‒ε/4
 Prthat N instancesmissa strip(1 ‒
ε/4)N
 Prthat N instancesmiss4 strips4(1 ‒
ε/4)N
 4(1 ‒
ε/4)N ≤ δ and(1 ‒
x)≤exp( ‒
x)
 4exp(‒εN/4) ≤ δ andN ≥ (4/ε)log(4/δ)
10
Noiseand Model Complexity
11
Usethe simpler one because
 Simpler to use
(lower computational
complexity)
 Easierto train (lower
spacecomplexity)
 Easierto explain
(moreinterpretable)
 Generalizes better (lower
variance - Occam’s razor)
Multiple Classes, Ci i=1,...,K
t t N
t1
X  {x ,r }

 
j
t
i
rt
i
C , j i
0ifx 
1ifxt
C
  i
j
t
t

hix  
0 if x C , j
1if xt
C
i
Trainhypotheses
hi(x), i=1,...,K:
R
egression
0
1
gx w xw
2
2 1 0
gx w x w xw
t
N t1
r gxt
2
Eg|X
1

N
t
t
N t1
r  w x w 2
1 0
 X 
1

N

E w1 ,w0 |
 
t

t
r  f x
r t

N
t t
t1
X  x
,r 
Model Selection & Generalization
14
 Learning isan ill-posed problem; data isnot
sufficient to find a uniquesolution
 Theneed for inductive bias, assumptionsaboutH
 Generalization: Howwell a model performs onnew
data
 Overfitting: H morecomplex thanCor f
 Underfitting: H lesscomplex than Cor f
TripleTrade-Off
15
 Thereisa trade-off between threefactors
(Dietterich,2003):
1. Complexity of H, c(H),
2. Training setsize, N,
3. Generalization error, E,onnew data
 AsNE
 Asc(H) first Eand thenE
Cross-V
alidation
16
 T
oestimate generalization error, weneed data
unseenduring training. We split the data as
Training set (50%)
Validation set (25%)
T
est(publication) set(25%)
 Resamplingwhenthere isfew data
Dimensionsof a SupervisedLearner
1. Model:
2. Lossfunction:
gx|
t
E| X  Lrt
,gxt
|

3. Optimization procedure:
*  argminE|X 
BAYESIAN DECISION
THEORY
Probability and
Inference
3
 Result of tossing a coin is 
{Heads,Tails}
 Random var X {1,0}
Bernoulli: P {X=1} = po
X (1 ‒ po)(1
‒ X)
 Sample: X ={xt }N
t =1
Estimation: po = # {Heads}/#{Tosses} = ∑t xt /
N
 Prediction of nexttoss:
Heads if po > ½, Tailsotherwise
Classificati
on
 Credit scoring: Inputs are income and
savings. Output is low-risk vshigh-risk
 Input: x = [x1,x2]T ,Output: C Î{0,1}
 Prediction:
choose
C 1if P(C 1| x1,x2 )  0.5

C  0otherwise

C  0otherwise
or
choose
C 1if P(C 1| x1,x2 )  P(C  0| x1,x2 )
Bayes’
Rule
PC|x
PCpx|C
px
evidence
PC  0 PC 11
px px|C 1PC 1 px|C  0PC  0
pC  0|x PC 1|x1
posteri
or
prior
likelihoo
d
Bayes’ Rule: K>2
Classes
k1
 K
i i
px|Ck PCk 
px|C PC 
i i
px
px|C PC 
i
PC |x
i i k k
K
choose C if PC | x max PC |x
PC  0 and PC 1
i  i
i1
6
Losses and
Risks
i i k k
K
choose if R |x min R |x
 Actions: αi
 Loss of αi when the state is Ck :
λik
 Expected risk (Duda and Hart,
1973)
R |x  PC |x
i  ik k
k1
Losses and Risks: 0/1
Loss


1if i k
0 if i  k
ik
 
i
K
1 PC|x
R |x  PC |x
i  ik k
k1
 PCk |x
ki
8
For minimum risk, choose the most probable class
Losses and Risks:
Reject
0  1
1


  
0 if i  k
if i  K 1 ,
otherwise
ik
K
R |x PC |x 
K1  k
k1
R |x PC |x 1PC |x
i  k i
ki
if PC | x PC | x k  i andPC | x1 
i k i
otherwise
chooseCi
reject
Different Losses and
Reject
10
Equal losses
Unequal losses
With reject
Discriminant
Functions
gi x,i1,,K
chooseCi if gi x maxkgk x
K decision regions R1,...,RK
R  x|gx max g x
i i k k

 

i i
i
i
gi  
px | C PC
x  P C |x
R |x
K=2
Classes
 Log odds:
choose
C2 otherwise
 Dichotomizer (K=2) vs Polychotomizer (K>2)
 g(x) = g1(x) – g2(x)
C1 if gx 0
2
log
PC |x
PC1 |x
Utility
Theory
j
j
i i
Choose α if EU |x max EU |x
 Prob of state k given exidence x: P
(Sk|x)
 Utility of αi when state is k:Uik
 Expected utility:
EU |x U PS |x
i  ik k
k
Association
Rules
 Association rule: X Y
 People who buy/click/visit/enjoy X are also
likely to buy/click/visit/enjoy Y.
 A rule implies association, not necessarily
causation.
Association
measures
 Support (X Y):
• PX,Y
#customerswho bought X andY
• #customers
 Confidence (X  Y):
• PY | X
PX,Y
• P(X)
• 
#customerswho boughtX
andY
15
#customerswho bought X
 Lift (X  Y):

PX,Y 
P(Y|X)
P(X)P(Y) P(Y)
Exampl
e
16
Apriori algorithm (Agrawal et
al., 1996)
17
 For (X,Y,Z), a 3-item set, to be frequent
(have enough support), (X,Y), (X,Z), and
(Y,Z) should be frequent.
 If (X,Y) is not frequent, none of its supersets
can be frequent.
 Once we find the frequent k-item sets, we
convert them to rules:X, Y  Z, ...
and X  Y, Z, ...
PARAMETRIC
METHODS
Parametric
Estimation
 X= { xt }t where xt ~ p (x)
 Parametric estimation:
Assume a form for p (x | ) and estimate  , its
sufficient statistics, using X
e.g., N ( μ, σ2) where  = { μ, σ2}
Maximum Likelihood
Estimation
4
 Likelihoodof  given the sample X
l (θ|X) = p (X |θ) = ∏t p(xt|θ)
 Log likelihood
L(θ|X) = log l (θ|X) = ∑t log p(xt|θ)
 Maximum likelihood estimator (MLE)
θ* = argmaxθ L(θ|X)
Examples:
Bernoulli/Multinomial
5
 Bernoulli: Two states, failure/success, x in
{0,1}
P (x) = po
x (1 – po )(1 – x)
t
o o o
L (p |X) = log ∏ p xt
(1 – p ) (1 –
xt)
MLE: po = ∑t xt /N
 Multinomial: K>2 states, xi in
{0,1}
P (x1,x2,...,xK) = ∏i pi
xi
1 2
K
t i i
L(p ,p ,...,p |X) = log ∏ ∏ pxi
t
MLE: pi = ∑t xi
t /N
Gaussian (Normal)
Distribution
px
1
exp-
x  
 2

2  22

xt
 p(x) = N ( μ,σ2)
 MLE for μ and
σ2:
m  t μ
N
xt
 m2
s2
 t
N
σ





1
22
2
x  2

px exp 
Bias and
Variance
7
Unknown parameter 
Estimator di = d (Xi) on sample
Xi
Bias: b(d) = E [d] – 
Variance: E [(d–E [d])2]
Mean square error:
r (d,) = E [(d–)2]
= (E [d] – )2 + E [(d–E
[d])2]
= Bias2 +Variance
Bayes’
Estimator
8
 Treat θ as a random var with prior p (θ)
 Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)
 Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ
 Maximum a Posteriori (MAP):
θMAP = argmaxθ p(θ|X)
 Maximum Likelihood (ML): θML =argmaxθ
p(X|θ)
 Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X)dθ
Bayes’ Estimator:
Example
 xt ~ N (θ, σo ) and θ ~ N ( μ, σ )
2 2
 θML = m
θMAP = θBayes’
=   
2
2
0
2
2
0
2
0
1/ 2
1/
N/
m
N / 1/
N/
E |X 
Parametric
Classification
gi x px|Ci PCi 
or
g x  logpx|C logPC 
i i i
i
i
i
i
i
px|Ci
logPCi 




 2
1
gi x 
2 2

x  i2
log2 log
2
2 2

1 x   
2
exp
10
 Given the
sample
 ML estimates
are
 Discrimina
nt
t t N
t1
X  {x ,r }
x 
 j
t
t
ri  
0ifx C , j  i
1i fxt
C
i

t
t
 i
t
t
i i
i
t
t
 i
t
t t
i
t
 i  i
i
r
xt
 m r
s
r
x r
m 
t
N
r
PC 
2
2
ˆ
i
i
i
i
ˆ
logPC 
x  m 2
i
2s 2
g x  log2
2
1
 log s 
Equal
variances
Single boundary at
halfway between
means
Variances are
different
Two
boundaries
Regressi
on
r  f x
estimator: gx|
 ~ N 0,2

pr|x~ N gx|,2

t

N
 t
N
 t t

px
N

t1
t1
t1
 t
log
p r |x
log
px ,r
L|Xlog
15
Regression: From LogL to
Error
16

2

1
1



1

N
2 t1
t
t
N
t1
t
t
N

t1 22
r g x |


2
r gx |
 Nlog 2 
22
rt
gxt
|2

2
exp
E|X
L|Xlog
Linear
Regression
1 0
1 0
t t
gx |w ,w w x w

t
t

t
t
t
t
x 
x w
w
t
rt
xt
t
x
 Nw w
rt
2
1
0
0 1










 t
t
 
t
 t
t
rt
xt 
 rt
w 
xt
xt
N
 1 
 w 
w0 
y  
xt
2 
A 
w A1
y
Polynomial
Regression
1 0
2
2 1 0
t
t
k
k 2
k
t t
x   w x w

g ,w ,w ,w w x  w 
x |w ,






 
N
N
N
r
xN
  
x   x 

 2 
r
r1

2 
2
x2
k 
x1
2
 x1
k

x2
2

x1
x2

1
1
 r  
D 1
w  DT
D1
DT
r
Other Error
Measures
19
1
2
N

t1
t
t
2
r gx |
E|X
 Square
Error:
 Relative Square
Error:
 Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|
 ε-sensitive Error:
E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)

2

N

t1
t
N
t
t
2
r r
r g x |
E |X  t1
Bias and
Variance
20
E
r gx 2
|x E
r Er|x2
|x Er|x gx 2
noise squared error
EX Er| x gx | x Er | x EX gx  EX gx EX gx
2 2 2
bias variance
Estimating Bias and
Variance
 M samples Xi={xt , rt },i=1,...,M
• i i
• are used to fit gi (x), i =1,...,M
• Bias2
g
1

gxt
 f xt
2
21

t
i
t
t
 i
t i
t
g x
gx
gx gx 
g
N
1
M
1
NM
2
Variance
Bias/Variance
Dilemma
22
 Example: gi(x)=2 has no variance and high
bias
gi(x)= ∑t rt /N has lower bias with variance
i
 As we increase complexity,
bias decreases (a better fit to data) and
variance increases (fit varies more with
data)
 Bias/Variance dilemma: (Geman et al.,
1992)
23
varianc
e
f
gi
bia
s
g
f
Polynomial
Regression
24
Best fit “min
error”
25
Best fit,
“elbow”
Model
Selection
26
 Cross-validation: Measure generalization
accuracy by testing on data unused during
training
 Regularization: Penalize complex
models E’=error on data + λ model
complexity
Akaike’s information criterion (AIC),
Bayesian information criterion(BIC)
 Minimum description length (MDL):
Kolmogorov complexity,shortest
description of data
Bayesian Model
Selection
27
 Prior on models,p(model)
pmode| data
pdata|modelpmodel
pdata
 Regularization, when prior favors simpler
models
 Bayes, MAP of the posterior, p(model|data)
 Average over a number of models with
high posterior (voting, ensembles)
Regression
example
28
Coefficients
increase in
magnitude as order
increases:
1: [-0.0769,0.0016]
2: [0.1682, -0.6657,
0.0080]
3: [0.4238, -2.5778,
3.4675, -0.0002
4: [-0.1093, 1.4356,
-5.5007, 6.0454,-
0.0019]
i
t
t
wi
2
2 t1
1

N
2
r gx | w 
Regularization (L2): Ew| X

Más contenido relacionado

Similar a ML unit-1.pptx

11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine LearningAndres Mendez-Vazquez
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKJeremy Chen
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
Decision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersDecision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersAlexander Jung
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingShengyuan Wang Steven
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNNLin JiaMing
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 

Similar a ML unit-1.pptx (20)

11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPK
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Introduction
IntroductionIntroduction
Introduction
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Decision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersDecision Trees and Bayes Classifiers
Decision Trees and Bayes Classifiers
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative Investing
 
Optimization
OptimizationOptimization
Optimization
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Bayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdfBayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdf
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Output Units and Cost Function in FNN
Output Units and Cost Function in FNNOutput Units and Cost Function in FNN
Output Units and Cost Function in FNN
 
i2ml3e-chap2.pptx
i2ml3e-chap2.pptxi2ml3e-chap2.pptx
i2ml3e-chap2.pptx
 
I2ml3e chap2
I2ml3e chap2I2ml3e chap2
I2ml3e chap2
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 

Más de SwarnaKumariChinni (8)

CompetencyMatrix-Help Manual.pdf
CompetencyMatrix-Help Manual.pdfCompetencyMatrix-Help Manual.pdf
CompetencyMatrix-Help Manual.pdf
 
CompetencyMatrix-Help Manual (1).pdf
CompetencyMatrix-Help Manual (1).pdfCompetencyMatrix-Help Manual (1).pdf
CompetencyMatrix-Help Manual (1).pdf
 
CompetencyMatrix.pdf
CompetencyMatrix.pdfCompetencyMatrix.pdf
CompetencyMatrix.pdf
 
Unit-1.pdf
Unit-1.pdfUnit-1.pdf
Unit-1.pdf
 
ML unit3.pptx
ML unit3.pptxML unit3.pptx
ML unit3.pptx
 
ML unit2.pptx
ML unit2.pptxML unit2.pptx
ML unit2.pptx
 
HiperLAN & Bluetooth.ppt
HiperLAN & Bluetooth.pptHiperLAN & Bluetooth.ppt
HiperLAN & Bluetooth.ppt
 
HiperLAN.ppt
HiperLAN.pptHiperLAN.ppt
HiperLAN.ppt
 

Último

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 

Último (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 

ML unit-1.pptx

  • 2. Big Data 3  Widespread useof personal computersand wireless communicationleads to “bigdata”  We are both producers and consumersof data  Data isnot random, it hasstructure,e.g., customer behavior  We need “bigtheory” to extract that structurefrom data for (a) Understanding theprocess (b) Making predictions for thefuture
  • 3. Why “Learn” ? 4  Machinelearning isprogramming computersto optimize a performance criterion usingexample data or past experience.  Thereisnoneed to “learn” to calculate payroll  Learning isusedwhen: Humanexpertise doesnot exist (navigating on Mars), Humansare unable to explain their expertise (speech recognition) Solutionchangesin time (routing ona computernetwork) Solutionneedsto be adapted to particular cases(user biometrics)
  • 4. What We T alk About When We T alk About “Learning” 5  Learning general modelsfrom a data ofparticular examples  Data ischeap and abundant (data warehouses, data marts); knowledge isexpensiveandscarce.  Example in retail: Customertransactions toconsumer behavior: Peoplewhobought “Blink” also bought “Outliers” (www.amazon.com)  Build a model that isa good anduseful approximation to thedata.
  • 5. Data Mining 6  Retail: Market basket analysis,Customer relationship management(CRM)  Finance:Credit scoring,frauddetection  Manufacturing: Control, robotics,troubleshooting  Medicine: Medicaldiagnosis  Telecommunications:Spamfilters, intrusiondetection  Bioinformatics: Motifs, alignment  Web mining: Searchengines  ...
  • 6. What isMachine Learning? 7  Optimize a performance criterionusingexample data or pastexperience.  Roleof Statistics: Inference from a sample  Roleof Computer science:Efficient algorithmsto Solvethe optimization problem Representing and evaluating the model for inference
  • 8. LearningAssociations 9  Basketanalysis: P(Y| X) probability that somebodywhobuysX also buysYwhere Xand Yareproducts/services. Example: P( chips| beer ) = 0.7
  • 9. Classification 10  Example:Credit scoring  Differentiating between low-riskand high-risk customers from their incomeand savings Discriminant: IFincome> θ1 ANDsavings> θ2 THENlow-risk E L S Ehigh-risk
  • 10. Classification:Applications 11  Aka Patternrecognition  Facerecognition: Pose,lighting, occlusion(glasses, beard), make-up,hairstyle  Characterrecognition:Different handwriting styles.  Speechrecognition: Temporaldependency.  Medical diagnosis: Fromsymptomsto illnesses  Biometrics:Recognition/authentication usingphysical and/or behavioral characteristics: Face,iris, signature,etc  Outlier/noveltydetection:
  • 11. FaceRecognition 12 Training examples of a person Test images ORL dataset, AT&T Laboratories, Cambridge UK
  • 12. Regression  Example: Price of a usedcar  x : car attributes y : price y = g (x |  ) g ( ) model,  parameters y = wx+w0
  • 13. RegressionApplications 14  Navigating a car:Angle of thesteering  Kinematicsof a robotarm α1= g1(x,y) α2= g2(x,y) (x,y) α2 α1  Responsesurface design
  • 14. Supervised Learning:Uses 15  Prediction of future cases:Usethe rule to predict the output for future inputs  Knowledge extraction: Therule iseasy to understand  Compression:Therule issimpler than the data it explains  Outlier detection: Exceptionsthat arenot covered by the rule, e.g., fraud
  • 15. UnsupervisedLearning 16  Learning “what normallyhappens”  No output  Clustering: Grouping similarinstances  Exampleapplications Customersegmentation in CRM Imagecompression:Color quantization Bioinformatics: Learning motifs
  • 16. Reinforcement Learning 17  Learning a policy: Asequenceof outputs  No supervised output but delayedreward  Credit assignmentproblem  Game playing  Robotin a maze  Multipleagents,partial observability,...
  • 18. Learning a Classfrom Examples 3  ClassCof a “familycar” Prediction: Iscar x a family car? Knowledge extraction: What do people expect from a family car?  Output: Positive(+) and negative (–) examples  Inputrepresentation: x1: price, x2 : engine power
  • 19. Training set X t t N t1 X  {x ,r } 0 if x isnegative  1if x ispositive r    2  x  x  x1 
  • 20. ClassC 5 p1  price  p2 AND e1  engine power  e2 
  • 21. HypothesisclassH h(x)  0 if h says x isnegative  1if h says x ispositive   N t t  t1 1 h x r E(h|X )  6 Error of hon H
  • 22. S,G,and the Version Space 7 mostspecific hypothesis,S mostgeneral hypothesis,G h  H, between Sand G is consistent and make up the versionspace (Mitchell,1997)
  • 24. VC Dimension 9  N points canbe labeled in 2N waysas +/–  H shattersN if there existsh H consistent for any of these: VC(H ) = N Anaxis-aligned rectangle shatters4 points only!
  • 25. Probably Approximately Correct (PAC) Learning  Howmanytraining examples N should wehave, suchthat withprobability • at least 1 ‒δ, hhaserror at mostε ? • (Blumeret al., 1989)  Eachstripisat mostε/4  Prthat wemissa strip 1‒ε/4  Prthat N instancesmissa strip(1 ‒ ε/4)N  Prthat N instancesmiss4 strips4(1 ‒ ε/4)N  4(1 ‒ ε/4)N ≤ δ and(1 ‒ x)≤exp( ‒ x)  4exp(‒εN/4) ≤ δ andN ≥ (4/ε)log(4/δ) 10
  • 26. Noiseand Model Complexity 11 Usethe simpler one because  Simpler to use (lower computational complexity)  Easierto train (lower spacecomplexity)  Easierto explain (moreinterpretable)  Generalizes better (lower variance - Occam’s razor)
  • 27. Multiple Classes, Ci i=1,...,K t t N t1 X  {x ,r }    j t i rt i C , j i 0ifx  1ifxt C   i j t t  hix   0 if x C , j 1if xt C i Trainhypotheses hi(x), i=1,...,K:
  • 28. R egression 0 1 gx w xw 2 2 1 0 gx w x w xw t N t1 r gxt 2 Eg|X 1  N t t N t1 r  w x w 2 1 0  X  1  N  E w1 ,w0 |   t  t r  f x r t  N t t t1 X  x ,r 
  • 29. Model Selection & Generalization 14  Learning isan ill-posed problem; data isnot sufficient to find a uniquesolution  Theneed for inductive bias, assumptionsaboutH  Generalization: Howwell a model performs onnew data  Overfitting: H morecomplex thanCor f  Underfitting: H lesscomplex than Cor f
  • 30. TripleTrade-Off 15  Thereisa trade-off between threefactors (Dietterich,2003): 1. Complexity of H, c(H), 2. Training setsize, N, 3. Generalization error, E,onnew data  AsNE  Asc(H) first Eand thenE
  • 31. Cross-V alidation 16  T oestimate generalization error, weneed data unseenduring training. We split the data as Training set (50%) Validation set (25%) T est(publication) set(25%)  Resamplingwhenthere isfew data
  • 32. Dimensionsof a SupervisedLearner 1. Model: 2. Lossfunction: gx| t E| X  Lrt ,gxt |  3. Optimization procedure: *  argminE|X 
  • 34. Probability and Inference 3  Result of tossing a coin is  {Heads,Tails}  Random var X {1,0} Bernoulli: P {X=1} = po X (1 ‒ po)(1 ‒ X)  Sample: X ={xt }N t =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N  Prediction of nexttoss: Heads if po > ½, Tailsotherwise
  • 35. Classificati on  Credit scoring: Inputs are income and savings. Output is low-risk vshigh-risk  Input: x = [x1,x2]T ,Output: C Î{0,1}  Prediction: choose C 1if P(C 1| x1,x2 )  0.5  C  0otherwise  C  0otherwise or choose C 1if P(C 1| x1,x2 )  P(C  0| x1,x2 )
  • 36. Bayes’ Rule PC|x PCpx|C px evidence PC  0 PC 11 px px|C 1PC 1 px|C  0PC  0 pC  0|x PC 1|x1 posteri or prior likelihoo d
  • 37. Bayes’ Rule: K>2 Classes k1  K i i px|Ck PCk  px|C PC  i i px px|C PC  i PC |x i i k k K choose C if PC | x max PC |x PC  0 and PC 1 i  i i1 6
  • 38. Losses and Risks i i k k K choose if R |x min R |x  Actions: αi  Loss of αi when the state is Ck : λik  Expected risk (Duda and Hart, 1973) R |x  PC |x i  ik k k1
  • 39. Losses and Risks: 0/1 Loss   1if i k 0 if i  k ik   i K 1 PC|x R |x  PC |x i  ik k k1  PCk |x ki 8 For minimum risk, choose the most probable class
  • 40. Losses and Risks: Reject 0  1 1      0 if i  k if i  K 1 , otherwise ik K R |x PC |x  K1  k k1 R |x PC |x 1PC |x i  k i ki if PC | x PC | x k  i andPC | x1  i k i otherwise chooseCi reject
  • 41. Different Losses and Reject 10 Equal losses Unequal losses With reject
  • 42. Discriminant Functions gi x,i1,,K chooseCi if gi x maxkgk x K decision regions R1,...,RK R  x|gx max g x i i k k     i i i i gi   px | C PC x  P C |x R |x
  • 43. K=2 Classes  Log odds: choose C2 otherwise  Dichotomizer (K=2) vs Polychotomizer (K>2)  g(x) = g1(x) – g2(x) C1 if gx 0 2 log PC |x PC1 |x
  • 44. Utility Theory j j i i Choose α if EU |x max EU |x  Prob of state k given exidence x: P (Sk|x)  Utility of αi when state is k:Uik  Expected utility: EU |x U PS |x i  ik k k
  • 45. Association Rules  Association rule: X Y  People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y.  A rule implies association, not necessarily causation.
  • 46. Association measures  Support (X Y): • PX,Y #customerswho bought X andY • #customers  Confidence (X  Y): • PY | X PX,Y • P(X) •  #customerswho boughtX andY 15 #customerswho bought X  Lift (X  Y):  PX,Y  P(Y|X) P(X)P(Y) P(Y)
  • 48. Apriori algorithm (Agrawal et al., 1996) 17  For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent.  If (X,Y) is not frequent, none of its supersets can be frequent.  Once we find the frequent k-item sets, we convert them to rules:X, Y  Z, ... and X  Y, Z, ...
  • 50. Parametric Estimation  X= { xt }t where xt ~ p (x)  Parametric estimation: Assume a form for p (x | ) and estimate  , its sufficient statistics, using X e.g., N ( μ, σ2) where  = { μ, σ2}
  • 51. Maximum Likelihood Estimation 4  Likelihoodof  given the sample X l (θ|X) = p (X |θ) = ∏t p(xt|θ)  Log likelihood L(θ|X) = log l (θ|X) = ∑t log p(xt|θ)  Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X)
  • 52. Examples: Bernoulli/Multinomial 5  Bernoulli: Two states, failure/success, x in {0,1} P (x) = po x (1 – po )(1 – x) t o o o L (p |X) = log ∏ p xt (1 – p ) (1 – xt) MLE: po = ∑t xt /N  Multinomial: K>2 states, xi in {0,1} P (x1,x2,...,xK) = ∏i pi xi 1 2 K t i i L(p ,p ,...,p |X) = log ∏ ∏ pxi t MLE: pi = ∑t xi t /N
  • 53. Gaussian (Normal) Distribution px 1 exp- x    2  2  22  xt  p(x) = N ( μ,σ2)  MLE for μ and σ2: m  t μ N xt  m2 s2  t N σ      1 22 2 x  2  px exp 
  • 54. Bias and Variance 7 Unknown parameter  Estimator di = d (Xi) on sample Xi Bias: b(d) = E [d] –  Variance: E [(d–E [d])2] Mean square error: r (d,) = E [(d–)2] = (E [d] – )2 + E [(d–E [d])2] = Bias2 +Variance
  • 55. Bayes’ Estimator 8  Treat θ as a random var with prior p (θ)  Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)  Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ  Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)  Maximum Likelihood (ML): θML =argmaxθ p(X|θ)  Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X)dθ
  • 56. Bayes’ Estimator: Example  xt ~ N (θ, σo ) and θ ~ N ( μ, σ ) 2 2  θML = m θMAP = θBayes’ =    2 2 0 2 2 0 2 0 1/ 2 1/ N/ m N / 1/ N/ E |X 
  • 57. Parametric Classification gi x px|Ci PCi  or g x  logpx|C logPC  i i i i i i i i px|Ci logPCi       2 1 gi x  2 2  x  i2 log2 log 2 2 2  1 x    2 exp 10
  • 58.  Given the sample  ML estimates are  Discrimina nt t t N t1 X  {x ,r } x   j t t ri   0ifx C , j  i 1i fxt C i  t t  i t t i i i t t  i t t t i t  i  i i r xt  m r s r x r m  t N r PC  2 2 ˆ i i i i ˆ logPC  x  m 2 i 2s 2 g x  log2 2 1  log s 
  • 61.
  • 62. Regressi on r  f x estimator: gx|  ~ N 0,2  pr|x~ N gx|,2  t  N  t N  t t  px N  t1 t1 t1  t log p r |x log px ,r L|Xlog 15
  • 63. Regression: From LogL to Error 16  2  1 1    1  N 2 t1 t t N t1 t t N  t1 22 r g x |   2 r gx |  Nlog 2  22 rt gxt |2  2 exp E|X L|Xlog
  • 64. Linear Regression 1 0 1 0 t t gx |w ,w w x w  t t  t t t t x  x w w t rt xt t x  Nw w rt 2 1 0 0 1            t t   t  t t rt xt   rt w  xt xt N  1   w  w0  y   xt 2  A  w A1 y
  • 65. Polynomial Regression 1 0 2 2 1 0 t t k k 2 k t t x   w x w  g ,w ,w ,w w x  w  x |w ,         N N N r xN    x   x    2  r r1  2  2 x2 k  x1 2  x1 k  x2 2  x1 x2  1 1  r   D 1 w  DT D1 DT r
  • 66. Other Error Measures 19 1 2 N  t1 t t 2 r gx | E|X  Square Error:  Relative Square Error:  Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|  ε-sensitive Error: E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)  2  N  t1 t N t t 2 r r r g x | E |X  t1
  • 67. Bias and Variance 20 E r gx 2 |x E r Er|x2 |x Er|x gx 2 noise squared error EX Er| x gx | x Er | x EX gx  EX gx EX gx 2 2 2 bias variance
  • 68. Estimating Bias and Variance  M samples Xi={xt , rt },i=1,...,M • i i • are used to fit gi (x), i =1,...,M • Bias2 g 1  gxt  f xt 2 21  t i t t  i t i t g x gx gx gx  g N 1 M 1 NM 2 Variance
  • 69. Bias/Variance Dilemma 22  Example: gi(x)=2 has no variance and high bias gi(x)= ∑t rt /N has lower bias with variance i  As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data)  Bias/Variance dilemma: (Geman et al., 1992)
  • 73. Model Selection 26  Cross-validation: Measure generalization accuracy by testing on data unused during training  Regularization: Penalize complex models E’=error on data + λ model complexity Akaike’s information criterion (AIC), Bayesian information criterion(BIC)  Minimum description length (MDL): Kolmogorov complexity,shortest description of data
  • 74. Bayesian Model Selection 27  Prior on models,p(model) pmode| data pdata|modelpmodel pdata  Regularization, when prior favors simpler models  Bayes, MAP of the posterior, p(model|data)  Average over a number of models with high posterior (voting, ensembles)
  • 75. Regression example 28 Coefficients increase in magnitude as order increases: 1: [-0.0769,0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002 4: [-0.1093, 1.4356, -5.5007, 6.0454,- 0.0019] i t t wi 2 2 t1 1  N 2 r gx | w  Regularization (L2): Ew| X