SlideShare una empresa de Scribd logo
1 de 24
MUSA AL-HAWAMDAH
  128129001011
   Bayesian Decision Theory came long before Version
    Spaces, Decision Tree Learning and Neural
    Networks. It was studied in the field of Statistical
    Theory and more specifically, in the field of Pattern
    Recognition.
   Bayesian Decision Theory is at the basis of
    important learning schemes such as the Naïve
    Bayes Classifier, Learning Bayesian Belief Networks
    and the EM Algorithm.
   Bayesian Decision Theory is also useful as it
    provides a framework within which many non-
    Bayesian classifiers can be studied
   Bayesian reasoning is applied to decision
    making and inferential statistics that deals
    with probability inference. It is used the
    knowledge of prior events to predict future
    events.

   Example: Predicting the color of marbles in a
    baske.t
   The Bayes Theorem:




   P(h) : Prior probability of hypothesis h
   P(D) : Prior probability of training data D
   P(h/D) : Probability of h given D
   P(D/h) : Probability of D given h
   D : 35 year old customer with an income of $50,000 PA.

   h : Hypothesis that our customer will buy our computer.

   P(h/D) : Probability that customer D will buy our computer given
    that we know his age and income.

   P(h) : Probability that any customer will buy our computer
    regardless of age (Prior Probability).

   P(D/h) : Probability that the customer is 35 yrs old and earns
    $50,000, given that he has bought our computer (Posterior
    Probability).

   P(D) : Probability that a person from our set of customers is 35
    yrs old and earns $50,000.
Example:
   h1: Customer buys a computer = Yes
   h2 : Customer buys a computer = No
       where h1 and h2 are subsets of our
    Hypothesis Space „H‟

   P(h/D) (Final Outcome) = arg max{ P( D/h1) P(h1)
    , P(D/h2) P(h2)}

   P(D) can be ignored as it is the same for both the
    terms
   Theory:

    Generally we want the most probable
    hypothesis given the training data
    hMAP = arg max P(h/D) (where h belongs to
    H and H is the hypothesis space)
   If we assume P(hi) = P(hj) where the
    calculated probabilities amount to the same
     Further simplification leads to:

 hML = arg max P(D/hi)
(where hi belongs to H)
   P (buys computer = yes) = 5/10 = 0.5
   P (buys computer = no) = 5/10 = 0.5
   P (customer is 35 yrs & earns $50,000) =
    4/10 = 0.4
   P (customer is 35 yrs & earns $50,000 / buys
    computer = yes) = 3/5 =0.6
   P (customer is 35 yrs & earns $50,000 / buys
    computer = no) = 1/5 = 0.2
   Customer buys a computer P(h1/D) = P(h1) *
    P (D/ h1) / P(D) = 0.5 * 0.6 / 0.4

   Customer does not buy a computer P(h2/D) =
    P(h2) * P (D/ h2) / P(D) = 0.5 * 0.2 / 0.4

   Final Outcome = arg max {P(h1/D) , P(h2/D)}
    = max(0.6, 0.2)

=> Customer buys a computer
Naïve Bayesian
 Classification
   It is based on the Bayesian theorem It is
    particularly suited when the dimensionality of
    the inputs is high. Parameter estimation for
    naive Bayes models uses the method of
    maximum likelihood. In spite over-simplified
    assumptions, it often performs better in
    many complex realworl situations.

   Advantage: Requires a small amount of
    training data to estimate the parameters
   X = ( age= youth, income = medium, student
    = yes, credit_rating = fair)



   A person belonging to tuple X will buy a
    computer?
   Derivation:
   D : Set of tuples
           ** Each Tuple is an „n‟ dimensional
    attribute vector
           ** X : (x1,x2,x3,…. xn)
   Let there be „m‟ Classes : C1,C2,C3…Cm
   Naïve Bayes classifier predicts X belongs to Class
    Ci iff
           **P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i
   Maximum Posteriori Hypothesis
           **P(Ci/X) = P(X/Ci) P(Ci) / P(X)
           **Maximize P(X/Ci) P(Ci) as P(X) is constant
   With many attributes, it is computationally
    expensive to evaluate P(X/Ci).
   Naïve Assumption of “class conditional
    independence”
   P(C1) = P(buys_computer = yes) = 9/14 =0.643
   P(C2) = P(buys_computer = no) = 5/14= 0.357
   P(age=youth /buys_computer = yes) = 2/9 =0.222
   P(age=youth /buys_computer = no) = 3/5 =0.600
   P(income=medium /buys_computer = yes) = 4/9 =0.444
   P(income=medium /buys_computer = no) = 2/5 =0.400
   P(student=yes /buys_computer = yes) = 6/9 =0.667
   P(student=yes/buys_computer = no) = 1/5 =0.200
   P(credit rating=fair /buys_computer = yes) = 6/9 =0.667
   P(credit rating=fair /buys_computer = no) = 2/5 =0.400
   P(X/Buys a computer = yes) = P(age=youth
    /buys_computer = yes) * P(income=medium
    /buys_computer = yes) * P(student=yes
    /buys_computer = yes) * P(credit rating=fair
    /buys_computer = yes) = 0.222 * 0.444 *
    0.667 * 0.667 = 0.044

    P(X/Buys a computer = No) = 0.600 * 0.400 *
    0.200 * 0.400 = 0.019
   Find class Ci that Maximizes P(X/Ci) * P(Ci)

    =>P(X/Buys a computer = yes) *
    P(buys_computer = yes) = 0.028
    =>P(X/Buys a computer = No) *
    P(buys_computer = no) = 0.007



   Prediction : Buys a computer for Tuple X
   http://en.wikipedia.org/wiki/Bayesian_probability
   http://en.wikipedia.org/wiki/Naive_Bayes_classifier
   http://www.let.rug.nl/~tiedeman/ml05/03_bayesian_hando ut. pdf
   http://www.statsoft.com/textbook/stnaiveb.html
   http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-2/www/mlbook/ch6.pdf
   Chai, K.; H. T. Hn, H. L. Chieu; “Bayesian Online Classifiers for Text Classification
    and Filtering”,
   Proceedings of the 25th annual international ACM SIGIR conference on Research and
    Development in Information Retrieval, August 2002, pp 97-104.
   DATA MINING Concepts and Techniques,Jiawei Han, Micheline Kamber Morgan
    Kaufman Publishers, 2003.
Thank you

Más contenido relacionado

La actualidad más candente

Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learningswapnac12
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2Srinivasan R
 
weak slot and filler structure
weak slot and filler structureweak slot and filler structure
weak slot and filler structureAmey Kerkar
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
Fuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoningFuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoningVeni7
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Control Strategies in AI
Control Strategies in AI Control Strategies in AI
Control Strategies in AI Bharat Bhushan
 

La actualidad más candente (20)

Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
FUZZY COMPLEMENT
FUZZY COMPLEMENTFUZZY COMPLEMENT
FUZZY COMPLEMENT
 
Inductive analytical approaches to learning
Inductive analytical approaches to learningInductive analytical approaches to learning
Inductive analytical approaches to learning
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
weak slot and filler structure
weak slot and filler structureweak slot and filler structure
weak slot and filler structure
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Fuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoningFuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoning
 
Classical Sets & fuzzy sets
Classical Sets & fuzzy setsClassical Sets & fuzzy sets
Classical Sets & fuzzy sets
 
Fuzzy sets
Fuzzy sets Fuzzy sets
Fuzzy sets
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Application of image processing
Application of image processingApplication of image processing
Application of image processing
 
Control Strategies in AI
Control Strategies in AI Control Strategies in AI
Control Strategies in AI
 
Huffman coding
Huffman coding Huffman coding
Huffman coding
 

Similar a Bayes learning

Bayesian classification
Bayesian classification Bayesian classification
Bayesian classification Zul Kawsar
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12Kumari Naveen
 
Managing Data: storage, decisions and classification
Managing Data: storage, decisions and classificationManaging Data: storage, decisions and classification
Managing Data: storage, decisions and classificationEdward Blurock
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classificationsathish sak
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
original
originaloriginal
originalbutest
 
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Practical Artificial Intelligence & Machine Learning (Arturo Servin)Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Practical Artificial Intelligence & Machine Learning (Arturo Servin)LSx Festival of Technology
 
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)Imran Ali
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningbutest
 
Formalizing (Web) Standards: An Application of Test and Proof
Formalizing (Web) Standards: An Application of Test and ProofFormalizing (Web) Standards: An Application of Test and Proof
Formalizing (Web) Standards: An Application of Test and ProofAchim D. Brucker
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKJeremy Chen
 
Introduction to C++
Introduction to C++ Introduction to C++
Introduction to C++ Bharat Kalia
 
SPL 6.1 | Advanced problems on Operators and Math.h function in C
SPL 6.1 | Advanced problems on Operators and Math.h function in CSPL 6.1 | Advanced problems on Operators and Math.h function in C
SPL 6.1 | Advanced problems on Operators and Math.h function in CMohammad Imam Hossain
 

Similar a Bayes learning (20)

Bayesian classification
Bayesian classification Bayesian classification
Bayesian classification
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12-BayesianLearning in machine Learning 12
-BayesianLearning in machine Learning 12
 
Managing Data: storage, decisions and classification
Managing Data: storage, decisions and classificationManaging Data: storage, decisions and classification
Managing Data: storage, decisions and classification
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
My7class
My7classMy7class
My7class
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
original
originaloriginal
original
 
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Practical Artificial Intelligence & Machine Learning (Arturo Servin)Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
 
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Formalizing (Web) Standards: An Application of Test and Proof
Formalizing (Web) Standards: An Application of Test and ProofFormalizing (Web) Standards: An Application of Test and Proof
Formalizing (Web) Standards: An Application of Test and Proof
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPK
 
Introduction to C++
Introduction to C++ Introduction to C++
Introduction to C++
 
SPL 6.1 | Advanced problems on Operators and Math.h function in C
SPL 6.1 | Advanced problems on Operators and Math.h function in CSPL 6.1 | Advanced problems on Operators and Math.h function in C
SPL 6.1 | Advanced problems on Operators and Math.h function in C
 

Bayes learning

  • 1. MUSA AL-HAWAMDAH 128129001011
  • 2. Bayesian Decision Theory came long before Version Spaces, Decision Tree Learning and Neural Networks. It was studied in the field of Statistical Theory and more specifically, in the field of Pattern Recognition.  Bayesian Decision Theory is at the basis of important learning schemes such as the Naïve Bayes Classifier, Learning Bayesian Belief Networks and the EM Algorithm.  Bayesian Decision Theory is also useful as it provides a framework within which many non- Bayesian classifiers can be studied
  • 3. Bayesian reasoning is applied to decision making and inferential statistics that deals with probability inference. It is used the knowledge of prior events to predict future events.  Example: Predicting the color of marbles in a baske.t
  • 4.
  • 5.
  • 6. The Bayes Theorem:  P(h) : Prior probability of hypothesis h  P(D) : Prior probability of training data D  P(h/D) : Probability of h given D  P(D/h) : Probability of D given h
  • 7. D : 35 year old customer with an income of $50,000 PA.  h : Hypothesis that our customer will buy our computer.  P(h/D) : Probability that customer D will buy our computer given that we know his age and income.  P(h) : Probability that any customer will buy our computer regardless of age (Prior Probability).  P(D/h) : Probability that the customer is 35 yrs old and earns $50,000, given that he has bought our computer (Posterior Probability).  P(D) : Probability that a person from our set of customers is 35 yrs old and earns $50,000.
  • 8. Example:  h1: Customer buys a computer = Yes  h2 : Customer buys a computer = No where h1 and h2 are subsets of our Hypothesis Space „H‟  P(h/D) (Final Outcome) = arg max{ P( D/h1) P(h1) , P(D/h2) P(h2)}  P(D) can be ignored as it is the same for both the terms
  • 9. Theory: Generally we want the most probable hypothesis given the training data hMAP = arg max P(h/D) (where h belongs to H and H is the hypothesis space)
  • 10.
  • 11. If we assume P(hi) = P(hj) where the calculated probabilities amount to the same Further simplification leads to:  hML = arg max P(D/hi) (where hi belongs to H)
  • 12. P (buys computer = yes) = 5/10 = 0.5  P (buys computer = no) = 5/10 = 0.5  P (customer is 35 yrs & earns $50,000) = 4/10 = 0.4  P (customer is 35 yrs & earns $50,000 / buys computer = yes) = 3/5 =0.6  P (customer is 35 yrs & earns $50,000 / buys computer = no) = 1/5 = 0.2
  • 13. Customer buys a computer P(h1/D) = P(h1) * P (D/ h1) / P(D) = 0.5 * 0.6 / 0.4  Customer does not buy a computer P(h2/D) = P(h2) * P (D/ h2) / P(D) = 0.5 * 0.2 / 0.4  Final Outcome = arg max {P(h1/D) , P(h2/D)} = max(0.6, 0.2) => Customer buys a computer
  • 15. It is based on the Bayesian theorem It is particularly suited when the dimensionality of the inputs is high. Parameter estimation for naive Bayes models uses the method of maximum likelihood. In spite over-simplified assumptions, it often performs better in many complex realworl situations.  Advantage: Requires a small amount of training data to estimate the parameters
  • 16.
  • 17. X = ( age= youth, income = medium, student = yes, credit_rating = fair)  A person belonging to tuple X will buy a computer?
  • 18. Derivation:  D : Set of tuples ** Each Tuple is an „n‟ dimensional attribute vector ** X : (x1,x2,x3,…. xn)  Let there be „m‟ Classes : C1,C2,C3…Cm  Naïve Bayes classifier predicts X belongs to Class Ci iff **P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i  Maximum Posteriori Hypothesis **P(Ci/X) = P(X/Ci) P(Ci) / P(X) **Maximize P(X/Ci) P(Ci) as P(X) is constant
  • 19. With many attributes, it is computationally expensive to evaluate P(X/Ci).  Naïve Assumption of “class conditional independence”
  • 20. P(C1) = P(buys_computer = yes) = 9/14 =0.643  P(C2) = P(buys_computer = no) = 5/14= 0.357  P(age=youth /buys_computer = yes) = 2/9 =0.222  P(age=youth /buys_computer = no) = 3/5 =0.600  P(income=medium /buys_computer = yes) = 4/9 =0.444  P(income=medium /buys_computer = no) = 2/5 =0.400  P(student=yes /buys_computer = yes) = 6/9 =0.667  P(student=yes/buys_computer = no) = 1/5 =0.200  P(credit rating=fair /buys_computer = yes) = 6/9 =0.667  P(credit rating=fair /buys_computer = no) = 2/5 =0.400
  • 21. P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) * P(income=medium /buys_computer = yes) * P(student=yes /buys_computer = yes) * P(credit rating=fair /buys_computer = yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044 P(X/Buys a computer = No) = 0.600 * 0.400 * 0.200 * 0.400 = 0.019
  • 22. Find class Ci that Maximizes P(X/Ci) * P(Ci) =>P(X/Buys a computer = yes) * P(buys_computer = yes) = 0.028 =>P(X/Buys a computer = No) * P(buys_computer = no) = 0.007  Prediction : Buys a computer for Tuple X
  • 23. http://en.wikipedia.org/wiki/Bayesian_probability  http://en.wikipedia.org/wiki/Naive_Bayes_classifier  http://www.let.rug.nl/~tiedeman/ml05/03_bayesian_hando ut. pdf  http://www.statsoft.com/textbook/stnaiveb.html  http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-2/www/mlbook/ch6.pdf  Chai, K.; H. T. Hn, H. L. Chieu; “Bayesian Online Classifiers for Text Classification and Filtering”,  Proceedings of the 25th annual international ACM SIGIR conference on Research and Development in Information Retrieval, August 2002, pp 97-104.  DATA MINING Concepts and Techniques,Jiawei Han, Micheline Kamber Morgan Kaufman Publishers, 2003.