Enviar búsqueda
Cargar
VC dimensio
•
0 recomendaciones
•
1,011 vistas
G
guestfee8698
Seguir
Tecnología
Educación
Denunciar
Compartir
Denunciar
Compartir
1 de 20
Descargar ahora
Descargar para leer sin conexión
Recomendados
Support Vector Machines
Support Vector Machines
guestfee8698
Confluent hypergeometricfunctions
Confluent hypergeometricfunctions
Chumphol Mahattanakul
Brief presentation of the thesis
Brief presentation of the thesis
Skyboyii
3 discrete probability
3 discrete probability
AASHISHSHRIVASTAV1
Modern features-part-3-software
Modern features-part-3-software
zukun
"PAC Learning - a discussion on the original paper by Valiant" presentation @...
"PAC Learning - a discussion on the original paper by Valiant" presentation @...
Adrian Florea
PAC Learning
PAC Learning
guestfee8698
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regression
guestfee8698
Recomendados
Support Vector Machines
Support Vector Machines
guestfee8698
Confluent hypergeometricfunctions
Confluent hypergeometricfunctions
Chumphol Mahattanakul
Brief presentation of the thesis
Brief presentation of the thesis
Skyboyii
3 discrete probability
3 discrete probability
AASHISHSHRIVASTAV1
Modern features-part-3-software
Modern features-part-3-software
zukun
"PAC Learning - a discussion on the original paper by Valiant" presentation @...
"PAC Learning - a discussion on the original paper by Valiant" presentation @...
Adrian Florea
PAC Learning
PAC Learning
guestfee8698
Predicting Real-valued Outputs: An introduction to regression
Predicting Real-valued Outputs: An introduction to regression
guestfee8698
Neural Networks
Neural Networks
guestfee8698
05 history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
Information Gain
Information Gain
guest32311f
Draft6
Draft6
Sudarshan Raghunathan
Regression Theory
Regression Theory
SSA KPI
Morphing Image
Morphing Image
Copen Hagen
Hidden Markov Models
Hidden Markov Models
guestfee8698
K-means and Hierarchical Clustering
K-means and Hierarchical Clustering
guestfee8698
Gaussian Mixture Models
Gaussian Mixture Models
guestfee8698
Short Overview of Bayes Nets
Short Overview of Bayes Nets
guestfee8698
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiers
guestfee8698
Learning Bayesian Networks
Learning Bayesian Networks
guestfee8698
Inference in Bayesian Networks
Inference in Bayesian Networks
guestfee8698
Bayesian Networks
Bayesian Networks
guestfee8698
Eight Regression Algorithms
Eight Regression Algorithms
guestfee8698
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
guestfee8698
Cross-Validation
Cross-Validation
guestfee8698
Gaussian Bayes Classifiers
Gaussian Bayes Classifiers
guestfee8698
Maximum Likelihood Estimation
Maximum Likelihood Estimation
guestfee8698
Gaussians
Gaussians
guestfee8698
Probability for Data Miners
Probability for Data Miners
guestfee8698
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Más contenido relacionado
Similar a VC dimensio
Neural Networks
Neural Networks
guestfee8698
05 history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
zukun
Information Gain
Information Gain
guest32311f
Draft6
Draft6
Sudarshan Raghunathan
Regression Theory
Regression Theory
SSA KPI
Morphing Image
Morphing Image
Copen Hagen
Similar a VC dimensio
(6)
Neural Networks
Neural Networks
05 history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
Information Gain
Information Gain
Draft6
Draft6
Regression Theory
Regression Theory
Morphing Image
Morphing Image
Más de guestfee8698
Hidden Markov Models
Hidden Markov Models
guestfee8698
K-means and Hierarchical Clustering
K-means and Hierarchical Clustering
guestfee8698
Gaussian Mixture Models
Gaussian Mixture Models
guestfee8698
Short Overview of Bayes Nets
Short Overview of Bayes Nets
guestfee8698
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiers
guestfee8698
Learning Bayesian Networks
Learning Bayesian Networks
guestfee8698
Inference in Bayesian Networks
Inference in Bayesian Networks
guestfee8698
Bayesian Networks
Bayesian Networks
guestfee8698
Eight Regression Algorithms
Eight Regression Algorithms
guestfee8698
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
guestfee8698
Cross-Validation
Cross-Validation
guestfee8698
Gaussian Bayes Classifiers
Gaussian Bayes Classifiers
guestfee8698
Maximum Likelihood Estimation
Maximum Likelihood Estimation
guestfee8698
Gaussians
Gaussians
guestfee8698
Probability for Data Miners
Probability for Data Miners
guestfee8698
Más de guestfee8698
(15)
Hidden Markov Models
Hidden Markov Models
K-means and Hierarchical Clustering
K-means and Hierarchical Clustering
Gaussian Mixture Models
Gaussian Mixture Models
Short Overview of Bayes Nets
Short Overview of Bayes Nets
A Short Intro to Naive Bayesian Classifiers
A Short Intro to Naive Bayesian Classifiers
Learning Bayesian Networks
Learning Bayesian Networks
Inference in Bayesian Networks
Inference in Bayesian Networks
Bayesian Networks
Bayesian Networks
Eight Regression Algorithms
Eight Regression Algorithms
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Instance-based learning (aka Case-based or Memory-based or non-parametric)
Cross-Validation
Cross-Validation
Gaussian Bayes Classifiers
Gaussian Bayes Classifiers
Maximum Likelihood Estimation
Maximum Likelihood Estimation
Gaussians
Gaussians
Probability for Data Miners
Probability for Data Miners
Último
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Sujit Pal
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Último
(20)
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
VC dimensio
1.
VC-dimension for
characterizing classifiers Andrew W. Moore Note to other teachers and users of these slides. Andrew would be delighted Associate Professor if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them School of Computer Science to fit your own needs. PowerPoint originals are available. If you make use Carnegie Mellon University of a significant portion of these slides in your own lecture, please include this www.cs.cmu.edu/~awm message, or the following link to the source repository of Andrew’s tutorials: awm@cs.cmu.edu http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully 412-268-7599 received. Copyright © 2001, Andrew W. Moore Nov 20th, 2001 A learning machine • A learning machine f takes an input x and transforms it, somehow using weights α, into a predicted output yest = +/- 1 α α is some vector of adjustable parameters x f yest Copyright © 2001, Andrew W. Moore VC-dimension: Slide 2 1
2.
α
Examples x f yest f(x,b) = sign(x.x – b) denotes +1 denotes -1 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 3 α Examples x f yest f(x,w) = sign(x.w) denotes +1 denotes -1 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 4 2
3.
α
Examples x f yest f(x,w,b) = sign(x.w+b) denotes +1 denotes -1 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 5 How do we characterize “power”? • Different machines have different amounts of “power”. • Tradeoff between: • More power: Can model more complex classifiers but might overfit. • Less power: Not going to overfit, but restricted in what it can model. • How do we characterize the amount of power? Copyright © 2001, Andrew W. Moore VC-dimension: Slide 6 3
4.
Some definitions •
Given some machine f • And under the assumption that all training points (xk ,yk ) were drawn i.i.d from some distribution. • And under the assumption that future test points will be drawn from the same distribution • Define 1 Probabilit y of R(α ) = TESTERR (α ) = E y − f ( x,α ) = 2 Misclassif ication Official terminology Terminology we’ll use Copyright © 2001, Andrew W. Moore VC-dimension: Slide 7 Some definitions • Given some machine f • And under the assumption that all training points (xk ,yk ) were drawn i.i.d from some distribution. • And under the assumption that future test points will be drawn from the same distribution • Define 1 Probabilit y of R(α ) = TESTERR (α ) = E y − f ( x,α ) = 2 Misclassif ication Official terminology Terminology we’ll use Fraction Training 1 R1 ∑ 2 yk − f ( xk ,α ) = Set misclassif ied R emp (α ) = TRAINERR(α ) = R k =1 R = #training set data points Copyright © 2001, Andrew W. Moore VC-dimension: Slide 8 4
5.
Vapnik-Chervonenkis dimension
1 1R1 ∑ y k − f ( x k ,α ) TESTERR (α ) = E y − f ( x, α ) TRAINERR(α ) = 2 R k =1 2 • Given some machine f, let h be its VC dimension. • h is a measure of f’s power (h does not depend on the choice of training set) • Vapnik showed that with probability 1-η h (log( 2 R / h) + 1) − log(η / 4) TESTERR (α ) ≤ TRAINERR(α ) + R This gives us a way to estimate the error on future data based only on the training error and the VC-dimension of f Copyright © 2001, Andrew W. Moore VC-dimension: Slide 9 What VC-dimension is used for 1 1R1 ∑ y k − f ( x k ,α ) TESTERR (α ) = E y − f ( x, α ) TRAINERR(α ) = 2 R k =1 2 • Given some machine f, let h be its VC dimension. • h is a measure of f’s power. f, ine • Vapnik showed that with probability 1-η ach n m efine give we d h(log( 2R / h) + 1) − log(η / 4) But TESTERR (α ) ≤ TRAINERR(α ) + te h? do how compu R nd a This gives us a way to estimate the error on future data based only on the training error and the VC-dimension of f Copyright © 2001, Andrew W. Moore VC-dimension: Slide 10 5
6.
Shattering •
Machine f can shatter a set of points x1, x2 .. xr if and only if… For every possible training set of the form (x1,y1) , (x2,y2 ) ,… (xr ,yr) …There exists some value of α that gets zero training error. There are 2r such training sets to consider, each with a different combination of +1’s and –1’s for the y’s Copyright © 2001, Andrew W. Moore VC-dimension: Slide 11 Shattering • Machine f can shatter a set of points x1, x2 .. Xr if and only if… For every possible training set of the form (x1,y1) , (x2,y2 ) ,… (xr ,yr) …There exists some value of α that gets zero training error. • Question: Can the following f shatter the following points? f(x,w) = sign(x.w) Copyright © 2001, Andrew W. Moore VC-dimension: Slide 12 6
7.
Shattering •
Machine f can shatter a set of points x1, x2 .. Xr if and only if… For every possible training set of the form (x1,y1) , (x2,y2 ) ,… (xr ,yr) …There exists some value of α that gets zero training error. • Question: Can the following f shatter the following points? f(x,w) = sign(x.w) • Answer: No problem. There are four training sets to consider w=(0,1) w=(-2,3) w=(2,-3) w=(0,-1) Copyright © 2001, Andrew W. Moore VC-dimension: Slide 13 Shattering • Machine f can shatter a set of points x1, x2 .. Xr if and only if… For every possible training set of the form (x1,y1) , (x2,y2 ) ,… (xr ,yr) …There exists some value of α that gets zero training error. • Question: Can the following f shatter the following points? f(x,b) = sign(x.x-b) Copyright © 2001, Andrew W. Moore VC-dimension: Slide 14 7
8.
Shattering •
Machine f can shatter a set of points x1, x2 .. Xr if and only if… For every possible training set of the form (x1,y1) , (x2,y2 ) ,… (xr ,yr) …There exists some value of α that gets zero training error. • Question: Can the following f shatter the following points? f(x,b) = sign(x.x-b) • Answer: No way my friend. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 15 Definition of VC dimension Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: What’s VC dimension of f(x,b) = sign(x.x-b) Copyright © 2001, Andrew W. Moore VC-dimension: Slide 16 8
9.
VC dim of
trivial circle Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: What’s VC dimension of f(x,b) = sign(x.x-b) Answer = 1: we can’t even shatter two points! (but it’s clear we can shatter 1) Copyright © 2001, Andrew W. Moore VC-dimension: Slide 17 Reformulated circle Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC dimension of f(x,q,b) = sign(qx.x-b) Copyright © 2001, Andrew W. Moore VC-dimension: Slide 18 9
10.
Reformulated circle Given machine
f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: What’s VC dimension of f(x,q,b) = sign(qx.x-b) • Answer = 2 q,b are -ve Copyright © 2001, Andrew W. Moore VC-dimension: Slide 19 Reformulated circle Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: What’s VC dimension of f(x,q,b) = sign(qx.x-b) • Answer = 2 (clearly can’t do 3) q,b are -ve Copyright © 2001, Andrew W. Moore VC-dimension: Slide 20 10
11.
VC dim of
separating line Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? Well, can f shatter these three points? Copyright © 2001, Andrew W. Moore VC-dimension: Slide 21 VC dim of line machine Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? Well, can f shatter these three points? Yes, of course. All -ve or all +ve is trivial One +ve can be picked off by a line One -ve can be picked off too. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 22 11
12.
VC dim of
line machine Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? Well, can we find four points that f can shatter? Copyright © 2001, Andrew W. Moore VC-dimension: Slide 23 VC dim of line machine Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? Well, can we find four points that f can shatter? Can always draw six lines between pairs of four points. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 24 12
13.
VC dim of
line machine Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? Well, can we find four points that f can shatter? Can always draw six lines between pairs of four points. Two of those lines will cross. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 25 VC dim of line machine Given machine f, the VC-dimension h is The maximum number of points that can be arranged so that f shatter them. Example: For 2-d inputs, what’s VC-dim of f(x,w,b) = sign(w.x+b)? Well, can we find four points that f can shatter? Can always draw six lines between pairs of four points. Two of those lines will cross. If we put points linked by the crossing lines in the same class they can’t be linearly separated So a line can shatter 3 points but not 4 So VC-dim of Line Machine is 3 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 26 13
14.
VC dim of
linear classifiers in m-dimensions If input space is m-dimensional and if f is sign(w.x-b), what is the VC-dimension? Proof that h >= m: Show that m points can be shattered Can you guess how? Copyright © 2001, Andrew W. Moore VC-dimension: Slide 27 VC dim of linear classifiers in m-dimensions If input space is m-dimensional and if f is sign(w.x-b), what is the VC-dimension? Proof that h >= m: Show that m points can be shattered Define m input points thus: x1 = (1,0,0,…,0) x2 = (0,1,0,…,0) : xm = (0,0,0,…,1) So xk [j] = 1 if k=j and 0 otherwise Let y1, y2 ,… ym , be any one of the 2m combinations of class labels. Guess how we can define w1 , w2,… wm and b to ensure sign(w. xk + b) = yk for all k ? Note: m sign (w .x k + b) = sign b + ∑ wj . x k [ j ] j=1 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 28 14
15.
VC dim of
linear classifiers in m-dimensions If input space is m-dimensional and if f is sign(w.x-b), what is the VC-dimension? Proof that h >= m: Show that m points can be shattered Define m input points thus: x1 = (1,0,0,…,0) x2 = (0,1,0,…,0) : xm = (0,0,0,…,1) So xk [j] = 1 if k=j and 0 otherwise Let y1, y2 ,… ym , be any one of the 2m combinations of class labels. Guess how we can define w1 , w2,… wm and b to ensure sign(w. xk + b) = yk for all k ? Note: m sign (w .x k + b) = sign b + ∑ wj . x k [ j ] Answer: b=0 and wk = yk for all k. j=1 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 29 VC dim of linear classifiers in m-dimensions If input space is m-dimensional and if f is sign(w.x-b), what is the VC-dimension? • Now we know that h >= m • In fact, h=m+1 • Proof that h >= m+1 is easy • Proof that h < m+2 is moderate Copyright © 2001, Andrew W. Moore VC-dimension: Slide 30 15
16.
What does VC-dim
measure? • Is it the number of parameters? Related but not really the same. • I can create a machine with one numeric parameter that really encodes 7 parameters (How?) • And I can create a machine with 7 parameters which has a VC-dim of 1 (How?) • Andrew’s private opinion: it often is the number of parameters that counts. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 31 Structural Risk Minimization Let φ(f) = the set of functions representable by f. • f ( f1 ) ⊆ f ( f 2 ) ⊆ L f ( f n ) • Suppose h ( f1 ) ≤ h( f 2 ) ≤ L h( f n ) (Hey, can you formally prove this?) • Then • We’re trying to decide which machine to use. • We train each machine and make a table… h (log( 2 R / h) + 1) − log(η / 4) TESTERR (α ) ≤ TRAINERR(α ) + R TRAINER i fi VC-Conf Probable upper bound Choice R on TESTERR 1 f1 2 f2 Ö 3 f3 4 f4 5 f5 6 f6 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 32 16
17.
Using VC-dimensionality
That’s what VC-dimensionality is about People have worked hard to find VC-dimension for.. • Decision Trees • Perceptrons • Neural Nets • Decision Lists • Support Vector Machines • And many many more All with the goals of 1. Understanding which learning machines are more or less powerful under which circumstances 2. Using Structural Risk Minimization for to choose the best learning machine Copyright © 2001, Andrew W. Moore VC-dimension: Slide 33 Alternatives to VC-dim-based model selection • What could we do instead of the scheme below? h (log( 2 R / h) + 1) − log(η / 4) TESTERR (α ) ≤ TRAINERR(α ) + R TRAINER i fi VC-Conf Probable upper bound Choice R on TESTERR 1 f1 2 f2 Ö 3 f3 4 f4 5 f5 6 f6 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 34 17
18.
Alternatives to VC-dim-based
model selection • What could we do instead of the scheme below? 1. Cross-validation TRAINER i fi 10-FOLD-CV-ERR Choice R 1 f1 2 f2 Ö 3 f3 4 f4 5 f5 6 f6 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 35 Alternatives to VC-dim-based model selection • What could we do instead of the scheme below? As the amount of data 1. Cross-validation goes to infinity, AIC 2. AIC (Akaike Information Criterion) promises* to select the model that’ll have the best likelihood for future AICSCORE = LL ( Data | MLE params) − (# parameters) data *Subject to about a million caveats LOGLIKE(TRAINERR) i fi #parameters AIC Choice 1 f1 2 f2 3 f3 Ö 4 f4 5 f5 6 f6 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 36 18
19.
Alternatives to VC-dim-based
model selection • What could we do instead of the scheme below? As the amount of data 1. Cross-validation goes to infinity, BIC 2. AIC (Akaike Information Criterion) promises* to select the model that the data was 3. BIC (Bayesian Information Criterion) generated from. More conservative than AIC. # params *Another million caveats BICSCORE = LL ( Data | MLE params) − log R 2 LOGLIKE(TRAINERR) i fi #parameters BIC Choice 1 f1 2 f2 Ö 3 f3 4 f4 5 f5 6 f6 Copyright © 2001, Andrew W. Moore VC-dimension: Slide 37 Which model selection method is best? 1. (CV) Cross-validation 2. AIC (Akaike Information Criterion) 3. BIC (Bayesian Information Criterion) 4. (SRMVC) Structural Risk Minimize with VC- dimension • AIC, BIC and SRMVC have the advantage that you only need the training error. • CV error might have more variance • SRMVC is wildly conservative • Asymptotically AIC and Leave-one-out CV should be the same • Asymptotically BIC and a carefully chosen k-fold should be the same • BIC is what you want if you want the best structure instead of the best predictor (e.g. for clustering or Bayes Net structure finding) • Many alternatives to the above including proper Bayesian approaches. • It’s an emotional issue. Copyright © 2001, Andrew W. Moore VC-dimension: Slide 38 19
20.
Extra Comments • Beware:
that second “VC-confidence” term is usually very very conservative (at least hundreds of times larger than the empirical overfitting effect). • An excellent tutorial on VC-dimension and Support Vector Machines (which we’ll be studying soon): C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html Copyright © 2001, Andrew W. Moore VC-dimension: Slide 39 What you should know • The definition of a learning machine: f(x,α ) • The definition of Shattering • Be able to work through simple examples of shattering • The definition of VC-dimension • Be able to work through simple examples of VC- dimension • Structural Risk Minimization for model selection • Awareness of other model selection methods Copyright © 2001, Andrew W. Moore VC-dimension: Slide 40 20
Descargar ahora