SlideShare una empresa de Scribd logo
1 de 22
Text Classification and Naïve Bayes
An example of text classification
Definition of a machine learning problem
A refresher on probability
The Naive Bayes classifier
1
Google News
2
Different ways for classification
Human labor (people assign categories to every incoming
article)
Hand-crafted rules for automatic classification
 If article contains: stock, Dow, share, Nasdaq, etc.  Business
 If article contains: set, breakpoint, player, Federer, etc.  Tennis
Machine learning algorithms
3
What is Machine Learning?
4
Definition: A computer program is said to learn from
experience E when its performance P at a task T
improves with experience E.
Tom Mitchell, Machine Learning, 1997
Examples:
- Learning to recognize spoken words
- Learning to drive a vehicle
- Learning to play backgammon
Components of a ML System (1)
Experience (a set of examples that combines together
input and output for a task)
 Text categorization: document + category
 Speech recognition: spoken text + written text
Experience is referred to as Training Data. When training
data is available, we talk of Supervised Learning.
Performance metrics
 Error or accuracy in the Test Data
 Test Data are not present in the Training Data
 When there are few training data, methods like ‘leave-one-out’ or
‘ten-fold cross validation’ are used to measure error.
5
Components of a ML System (2)
Type of knowledge to be learned (known as the target
function, that will map between input and output)
Representation of the target function
 Decision trees
 Neural networks
 Linear functions
The learning algorithm
 C4.5 (learns decision trees)
 Gradient descent (learns a neural network)
 Linear programming (learns linear functions)
6
Task
Defining Text Classification
7
XdX∈d
},,,{ 21 Jccc =C
D cd,
C×∈Xcd,
C→X:γ
γ=Γ D)(
the document in the multi-dimensional space
a set of classes (categories, or labels)
the training set of labeled documents
Target function:
Learning algorithm:
=cd, “Beijing joins the World Trade Organization”, China
cd =)(γ =)(dγ China
Naïve Bayes Learning
8
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
cd =)(γ
Learning Algorithm: Naïve Bayes
Target Function:
)|()(maxarg)|(maxarg cdPcPdcPc
CcCc
MAP
∈∈
==
)(cP
)|( cdP
The generative process:
)|( dcP
a priori probability, of choosing a category
the cond. prob. of generating d, given the fixed c
a posteriori probability that c generated d
A Refresher on Probability
9
Visualizing probability
A is a random variable that denotes an uncertain event
 Example: A = “I’ll get an A+ in the final exam”
P(A) is “the fraction of possible worlds where A is true”
10
Worlds in
which A
is true
Slide: Andrew W. Moore
Worlds in which A is false
Event space of all possible
worlds. Its area is 1.
P(A) = Area of the blue
circle.
Axioms and Theorems of Probability
Axioms:
 0 <= P(A) <= 1
 P(True) = 1
 P(False) = 0
 P(A or B) = P(A) + P(B) – P(A and B)
Theorems:
 P(not A) = P(~A) = 1 – P(A)
 P(A) = P(A ^ B) + P(A ^ ~B)
11
Conditional Probability
P(A|B) = the probability of A being true, given that we
know that B is true
12
F
H
H = “I have a headache”
F = “Coming down with flu”
P(H) = 1/10
P(F) = 1/40
P(H/F) = 1/2
Slide: Andrew W. Moore
Headaches are rare and flu
even rarer, but if you got that flu,
there is a 50-50 chance you’ll
have a headache.
Deriving the Bayes Rule
13
)(
)(
)|(
BP
BAP
BAP
∧
=Conditional Probability:
)()|()( BPBAPBAP =∧Chain rule:
)()|()()( APABPABPBAP =∧=∧
Bayes Rule:
)(
)()|(
)|(
AP
BPBAP
ABP =
Back to the Naïve Bayes Classifier
14
Deriving the Naïve Bayes
15
)(
)()|(
)|(
AP
BPBAP
ABP = (Bayes Rule)
21,cc 'dGiven two classes and the document
)'(
)|'()(
)'|( 11
1
dP
cdPcP
dcP =
)'(
)|'()(
)'|( 22
2
dP
cdPcP
dcP =
We are looking for a that maximizes the a-posterioriic )'|( dcP i
)'(dP (the denominator) is the same in both cases
)|()(maxarg cdPcPc
Cc
MAP
∈
=Thus:
Estimating parameters for the
target function
We are looking for the estimates and
16
)(ˆ cP )|(ˆ cdP
P(c) is the fraction of possible worlds where c is true.
N
N
cP c
=)(ˆ N – number of all documents
Nc – number of documents in class c
d is a vector in the space X
)|,,,()|( 2 ctttPcdP dni =
where each dimension is a term:
)()|()( BPBAPBAP =∧By using the chain rule: we have:
(P
),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni =
...=
Naïve assumptions of independence
1. All attribute values are independent of each other given
the class. (conditional independence assumption)
2. The conditional probabilities for a term are the same
independent of position in the document.
We assume the document is a “bag-of-words”.
17
∏≤≤
==
d
d
nk
kni ctPctttPcdP
1
2 )|()|,,,()|( 
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
Finally, we get the target function of Slide 8:
Again about estimation
18
For each term, t, we need to estimate P(t|c)
∑ ∈
=
Vt ct
ct
T
T
ctP
' '
)|(ˆ
Because an estimate will be 0 if a term does not appear with a class
in the training data, we need smoothing:
||)(
1
)1(
1
)|(ˆ
' '' ' VT
T
T
T
ctP
Vt ct
ct
Vt ct
ct
∑∑ ∈∈
+
+
=
+
+
=Laplace
Smoothing
|V| is the number of terms in the vocabulary
Tct is the count of term t in all documents of class c
An Example of classification with
Naïve Bayes
19
Example 13.1 (Part 1)
20
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
Two classes: “China”, “not China”
N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP
V = {Beijing, Chinese, Japan, Macao, Tokyo}
Example 13.1 (Part 1)
21
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
7/3)68/()15()|Chinese(ˆ =++=cP
14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
9/2)63/()11()|Chinese(ˆ =++=cP
9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
Estimation Classification
∏≤≤
∝
dnk
k ctPcPdcP
1
)|()()|(
0001.09/29/2)9/2(4/1)|(
0003.014/114/1)7/3(4/3)|(
3
5
3
5
≈⋅⋅⋅∝
≈⋅⋅⋅∝
dcP
dcP
Summary: Miscellanious
Naïve Bayes is linear in the time is takes to scan the data
When we have many terms, the product of probabilities
with cause a floating point underflow, therefore:
For a large training set, the vocabulary is large. It is better
to select only a subset of terms. For that is used “feature
selection” (Section 13.5).
22
∑≤≤∈
+=
dnk
k
Cc
MAP ctPcPc
1
)|(log)(ˆ[logmaxarg

Más contenido relacionado

La actualidad más candente

2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…Dongseo University
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierArunabha Saha
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2butest
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2Srinivasan R
 
Introductory maths analysis chapter 08 official
Introductory maths analysis   chapter 08 officialIntroductory maths analysis   chapter 08 official
Introductory maths analysis chapter 08 officialEvert Sandye Taasiringan
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningMark Chang
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningMark Chang
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 

La actualidad más candente (15)

2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
 
adaboost
adaboostadaboost
adaboost
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
ppt
pptppt
ppt
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Introductory maths analysis chapter 08 official
Introductory maths analysis   chapter 08 officialIntroductory maths analysis   chapter 08 official
Introductory maths analysis chapter 08 official
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 

Destacado

Overview prolog
Overview prologOverview prolog
Overview prologFraboni Ec
 
Introduction to security_and_crypto
Introduction to security_and_cryptoIntroduction to security_and_crypto
Introduction to security_and_cryptoFraboni Ec
 
Memory caching
Memory cachingMemory caching
Memory cachingFraboni Ec
 
Crypto theory practice
Crypto theory practiceCrypto theory practice
Crypto theory practiceFraboni Ec
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprologFraboni Ec
 
Data miningmaximumlikelihood
Data miningmaximumlikelihoodData miningmaximumlikelihood
Data miningmaximumlikelihoodFraboni Ec
 
Access data connection
Access data connectionAccess data connection
Access data connectionFraboni Ec
 
Key exchange in crypto
Key exchange in cryptoKey exchange in crypto
Key exchange in cryptoFraboni Ec
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithmsFraboni Ec
 
List in webpage
List in webpageList in webpage
List in webpageFraboni Ec
 
Database concepts
Database conceptsDatabase concepts
Database conceptsFraboni Ec
 
Test driven development
Test driven developmentTest driven development
Test driven developmentFraboni Ec
 

Destacado (20)

Overview prolog
Overview prologOverview prolog
Overview prolog
 
Big data
Big dataBig data
Big data
 
Game theory
Game theoryGame theory
Game theory
 
Introduction to security_and_crypto
Introduction to security_and_cryptoIntroduction to security_and_crypto
Introduction to security_and_crypto
 
Stack queue
Stack queueStack queue
Stack queue
 
Memory caching
Memory cachingMemory caching
Memory caching
 
Gm theory
Gm theoryGm theory
Gm theory
 
Exception
ExceptionException
Exception
 
Crypto theory practice
Crypto theory practiceCrypto theory practice
Crypto theory practice
 
Introduction toprolog
Introduction toprologIntroduction toprolog
Introduction toprolog
 
Naïve bayes
Naïve bayesNaïve bayes
Naïve bayes
 
Data miningmaximumlikelihood
Data miningmaximumlikelihoodData miningmaximumlikelihood
Data miningmaximumlikelihood
 
Access data connection
Access data connectionAccess data connection
Access data connection
 
Key exchange in crypto
Key exchange in cryptoKey exchange in crypto
Key exchange in crypto
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
List in webpage
List in webpageList in webpage
List in webpage
 
Decision tree
Decision treeDecision tree
Decision tree
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
Maven
MavenMaven
Maven
 
Test driven development
Test driven developmentTest driven development
Test driven development
 

Similar a Text classification

Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsCheng-You Lu
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Languagevsssuresh
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentNazeer Wahab
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.pptImXaib
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptmoiza354
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxavinashBajpayee1
 

Similar a Text classification (20)

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labels
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
ch8Bayes.pptx
ch8Bayes.pptxch8Bayes.pptx
ch8Bayes.pptx
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all department
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
Introduction
IntroductionIntroduction
Introduction
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.ppt
 
Midterm
MidtermMidterm
Midterm
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 

Más de Fraboni Ec

Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreadingFraboni Ec
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreadingFraboni Ec
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceFraboni Ec
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningFraboni Ec
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningFraboni Ec
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryFraboni Ec
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksFraboni Ec
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheFraboni Ec
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsFraboni Ec
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and pythonFraboni Ec
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesFraboni Ec
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsFraboni Ec
 
Abstraction file
Abstraction fileAbstraction file
Abstraction fileFraboni Ec
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisFraboni Ec
 
Abstract class
Abstract classAbstract class
Abstract classFraboni Ec
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaFraboni Ec
 

Más de Fraboni Ec (20)

Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
Lisp
LispLisp
Lisp
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Object model
Object modelObject model
Object model
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Abstract class
Abstract classAbstract class
Abstract class
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Inheritance
InheritanceInheritance
Inheritance
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Text classification

  • 1. Text Classification and Naïve Bayes An example of text classification Definition of a machine learning problem A refresher on probability The Naive Bayes classifier 1
  • 3. Different ways for classification Human labor (people assign categories to every incoming article) Hand-crafted rules for automatic classification  If article contains: stock, Dow, share, Nasdaq, etc.  Business  If article contains: set, breakpoint, player, Federer, etc.  Tennis Machine learning algorithms 3
  • 4. What is Machine Learning? 4 Definition: A computer program is said to learn from experience E when its performance P at a task T improves with experience E. Tom Mitchell, Machine Learning, 1997 Examples: - Learning to recognize spoken words - Learning to drive a vehicle - Learning to play backgammon
  • 5. Components of a ML System (1) Experience (a set of examples that combines together input and output for a task)  Text categorization: document + category  Speech recognition: spoken text + written text Experience is referred to as Training Data. When training data is available, we talk of Supervised Learning. Performance metrics  Error or accuracy in the Test Data  Test Data are not present in the Training Data  When there are few training data, methods like ‘leave-one-out’ or ‘ten-fold cross validation’ are used to measure error. 5
  • 6. Components of a ML System (2) Type of knowledge to be learned (known as the target function, that will map between input and output) Representation of the target function  Decision trees  Neural networks  Linear functions The learning algorithm  C4.5 (learns decision trees)  Gradient descent (learns a neural network)  Linear programming (learns linear functions) 6 Task
  • 7. Defining Text Classification 7 XdX∈d },,,{ 21 Jccc =C D cd, C×∈Xcd, C→X:γ γ=Γ D)( the document in the multi-dimensional space a set of classes (categories, or labels) the training set of labeled documents Target function: Learning algorithm: =cd, “Beijing joins the World Trade Organization”, China cd =)(γ =)(dγ China
  • 8. Naïve Bayes Learning 8 ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg cd =)(γ Learning Algorithm: Naïve Bayes Target Function: )|()(maxarg)|(maxarg cdPcPdcPc CcCc MAP ∈∈ == )(cP )|( cdP The generative process: )|( dcP a priori probability, of choosing a category the cond. prob. of generating d, given the fixed c a posteriori probability that c generated d
  • 9. A Refresher on Probability 9
  • 10. Visualizing probability A is a random variable that denotes an uncertain event  Example: A = “I’ll get an A+ in the final exam” P(A) is “the fraction of possible worlds where A is true” 10 Worlds in which A is true Slide: Andrew W. Moore Worlds in which A is false Event space of all possible worlds. Its area is 1. P(A) = Area of the blue circle.
  • 11. Axioms and Theorems of Probability Axioms:  0 <= P(A) <= 1  P(True) = 1  P(False) = 0  P(A or B) = P(A) + P(B) – P(A and B) Theorems:  P(not A) = P(~A) = 1 – P(A)  P(A) = P(A ^ B) + P(A ^ ~B) 11
  • 12. Conditional Probability P(A|B) = the probability of A being true, given that we know that B is true 12 F H H = “I have a headache” F = “Coming down with flu” P(H) = 1/10 P(F) = 1/40 P(H/F) = 1/2 Slide: Andrew W. Moore Headaches are rare and flu even rarer, but if you got that flu, there is a 50-50 chance you’ll have a headache.
  • 13. Deriving the Bayes Rule 13 )( )( )|( BP BAP BAP ∧ =Conditional Probability: )()|()( BPBAPBAP =∧Chain rule: )()|()()( APABPABPBAP =∧=∧ Bayes Rule: )( )()|( )|( AP BPBAP ABP =
  • 14. Back to the Naïve Bayes Classifier 14
  • 15. Deriving the Naïve Bayes 15 )( )()|( )|( AP BPBAP ABP = (Bayes Rule) 21,cc 'dGiven two classes and the document )'( )|'()( )'|( 11 1 dP cdPcP dcP = )'( )|'()( )'|( 22 2 dP cdPcP dcP = We are looking for a that maximizes the a-posterioriic )'|( dcP i )'(dP (the denominator) is the same in both cases )|()(maxarg cdPcPc Cc MAP ∈ =Thus:
  • 16. Estimating parameters for the target function We are looking for the estimates and 16 )(ˆ cP )|(ˆ cdP P(c) is the fraction of possible worlds where c is true. N N cP c =)(ˆ N – number of all documents Nc – number of documents in class c d is a vector in the space X )|,,,()|( 2 ctttPcdP dni = where each dimension is a term: )()|()( BPBAPBAP =∧By using the chain rule: we have: (P ),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni = ...=
  • 17. Naïve assumptions of independence 1. All attribute values are independent of each other given the class. (conditional independence assumption) 2. The conditional probabilities for a term are the same independent of position in the document. We assume the document is a “bag-of-words”. 17 ∏≤≤ == d d nk kni ctPctttPcdP 1 2 )|()|,,,()|(  ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg Finally, we get the target function of Slide 8:
  • 18. Again about estimation 18 For each term, t, we need to estimate P(t|c) ∑ ∈ = Vt ct ct T T ctP ' ' )|(ˆ Because an estimate will be 0 if a term does not appear with a class in the training data, we need smoothing: ||)( 1 )1( 1 )|(ˆ ' '' ' VT T T T ctP Vt ct ct Vt ct ct ∑∑ ∈∈ + + = + + =Laplace Smoothing |V| is the number of terms in the vocabulary Tct is the count of term t in all documents of class c
  • 19. An Example of classification with Naïve Bayes 19
  • 20. Example 13.1 (Part 1) 20 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? Two classes: “China”, “not China” N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP V = {Beijing, Chinese, Japan, Macao, Tokyo}
  • 21. Example 13.1 (Part 1) 21 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? 7/3)68/()15()|Chinese(ˆ =++=cP 14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP 9/2)63/()11()|Chinese(ˆ =++=cP 9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP Estimation Classification ∏≤≤ ∝ dnk k ctPcPdcP 1 )|()()|( 0001.09/29/2)9/2(4/1)|( 0003.014/114/1)7/3(4/3)|( 3 5 3 5 ≈⋅⋅⋅∝ ≈⋅⋅⋅∝ dcP dcP
  • 22. Summary: Miscellanious Naïve Bayes is linear in the time is takes to scan the data When we have many terms, the product of probabilities with cause a floating point underflow, therefore: For a large training set, the vocabulary is large. It is better to select only a subset of terms. For that is used “feature selection” (Section 13.5). 22 ∑≤≤∈ += dnk k Cc MAP ctPcPc 1 )|(log)(ˆ[logmaxarg

Notas del editor

  1. Q: What is different in this definition from other types of computer programs? A: We do not speak about experience in other occasions, just about the task and performance criteria. Q: If the task T is speech recognition, could you imagine what would be E and P? A: E would be examples of spoken text, i.e., the computer has the written text and while someone speaks the computer matches the written words to the spoken words. P (performance) will be the number of words that the computer recognizes correctly.
  2. We give the target function at the beginning, but we say that we are going to explain later on how this formula is derived (after the refresher in probability). Give the example of selecting topics for the class project, that means, selecting c. Then, given c, the choice of d, is conditional, P(d|c).
  3. It is clear that calculating all the parameters that derive from the application of the chain rule is infeasible. Therefore, we need the naïve assumptions of independence in next page.