SlideShare a Scribd company logo
1 of 6
Download to read offline
633-600 Machine Learning                                                          Textbook

• Instructor: Yoonsuck Choe                                            • Tom M. Mitchell (1997) “Machine Learning”, McGraw-Hill.
    – Contact info: HRBB 322B, 845-5466, choe@tamu.edu                 • Book webpage: http://www.cs.cmu.edu/˜tom/mlbook.html
• Course web page: http://courses.cs.tamu.edu/choe/09spring/633        • Text and figures, etc. will be quoted from the textbook without
                                                                          repeated acknowledgment. Instructor’s perspective will be
                                                                          indicated by “YC” where appropriate.




                                   1                                                                      2




                          Course Info                                                  Relation to Other Courses
• Grading: 3 assignments (including written and programming
                                                                      Some overlaps:
  components), 18% each = 54%; 2 exams (in class), 15% each =
  30%; Mini project 16%; No curving:   ≥ 90 = A, ≥ 80 = B, etc.        • Neural Networks: perceptrons, backpropagation, etc.
• Academic integrity: individual work, unless otherwise indicated;     • Pattern analysis: Bayesian learning, instance-based learning
  proper references should be given in case online/offline resources
                                                                       • Artificial intelligence: decision trees (in some courses)
  are used.

• Students with disabilities: see syllabus.                            • Statistics: hypothesis testing

• Lecture notes: check course web page for updates. Download,          • (Relatively) unique to this course: concept learning,
  print, and bring to the class.                                          computational learning theory, genetic algorithms, reinforcement
                                                                          learning, decision trees (in depth treatment)
• Computer accounts: talk to CS helpdesk (HRBB 210).
• Programming: Matlab, or better yet, Octave
  (http://www.octave.org). C/C++, Java, etc. (they should run on CS
  Unix or windows)                 3                                                                      4
ML Overview (I)                                                  ML Overview (II)
                                                                   • Current status: Yet unsolved problem.
  • How can machines (computers) learn?
                                                                        – Theoretical insights emerging.
     How can machines improve automatically with experience?
                                                                        – Practical applications.
  • Benefits:
                                                                        – Huge data volume demands ML, and provides opportunity to
      – Improved performance                                              ML (datamining).
      – Automated optimization                                     • State of the art:
      – New uses of computers                                           – speech recognition
      – Reduced programming (YC)                                        – medical predictions
      – Insights into human learning and learning disabilities          – fraud detection
                                                                        – drive autonomous vehicles (highway and desert)
                                                                        – board games (backgammon, chess)
                                                                        – theoretical bounds on error, number of inputs needed, etc.
                                   5                                                                  6




                           ML Overview (III)                                     Well-Posed Learning Problem
                                                                 A program is said to learn from
Multidisciplinary roots:
                                                                   • experience E with respect to
  • AI
                                                                   • task T and
  • probability and statistics
                                                                   • performance measure P ,
  • computational complexity theory
                                                                   • P in T increase with E .
  • control theory
                                                                 Examples: Playing checkers, Handwriting recognition, Robot driving,
  • information theory                                           etc.

  • philosophy                                                   Goal of ML: “define precisely a class of problems that encompasses
                                                                 interesting forms of learning [but not all: YC], to explore algorithms that
  • psychology
                                                                 solve such problems, and to understand the fundamental structure of
  • neurobiology                                                 learning problems and processes” (Mitchell, 1997)


                                   7                                                                  8
Designing a Learning System (I)                                         Designing a Learning System (II)

Training experience:                                                       Remaining design choices:

  • direct vs. indirect (problem of credit assignment)                       • Exact type of knowledge to be learned.

  • degree of control over training examples (teacher-dependent or           • A representation for this target knowledge.
     learner-generated)
                                                                             • A learning mechanism.
  • closeness of training example distribution to true distribution over
                                                                             • functional/operational principle giving rise to the learning
    which P is measured: in many cases, ML algorithms assume that
                                                                                mechanism (YC)
     both distributions are similar, which may not be the case in
     practice.




                                         9                                                                    10




                   Design: Target Function (I)                                               Design: Target Function (II)
                                                                             • Another function (B : board states, R: real numbers):
Type of knowledge to be learned: for example, we want to learn best
move in a board game.                                                                                     V : B → R,

  • Can represent as a function (B : board states, M : moves):                  which gives the evaluation of each board state.
                                                                                 –   V (b = win) = 100
                         ChooseM ove : B → M,
                                                                                 –   V (b = lose) = −100
     but it is hard to learn directly.                                           –   V (b = draw) = 0
                                                                                 –   V (b = otherwise) = V (b ), where b is the best final
                                                                                     board state that can be reached from b.
                                                                                 – However, this is not efficiently computable, i.e., it is a
                                                                                     nonoperational definition.
                                                                                 – Goal of ML is to find an operational description of V ,
                                                                                     however, in practice, an approximation is all we can get.
                                         11                                                                   12
Design: Representation for Target Function                                  Design: Function Approximation Algorithm
Given an ideal target function V , we want to learn an approximate
         ˆ                                                                  Given board state and true V , we want to learn the weights wi that
function V :
                                                                                    ˆ
                                                                            specify V .
  • Trade-off between rich and parsimonious representation.
                                                                              • Start with a set of a large number of input-target pairs
             ˆ
  • Example: V as a linear combination of number of pieces, number
                                                                                < b, Vtrain (b) >.
     of particular relational situations in the board (e.g., threatened),
     etc. (represented as xi ) in board configuration b:                       • Problem: cannot come up with a full set of < b, Vtrain (b) >
                                           n                                     pairs.
                                           X
                         ˆ
                         V (b) = w0 +            w i xi ,                                                                                   ˆ
                                                                              • Solution: If Vtrain (b) is unknown, set it to the estimated V of
                                           i=1
                                                                                 its successor board state:
     where wi are the weight values to be learned.
                                                                                                         ˆ
                                                                                            Vtrain (b) = Vtrain (Successor(b)).
  • Advantage of the above representation: reduction of scope (or
     dimensionality) from the original problem.


                                    13                                                                         14




                  Design: Adjusting Weights (I)                                            Design: Adjusting Weights (II)

Last component in defining a learning algorithm: adjustment of               Least Mean Squares (LMS) learning rule: For each training example
weights.                                                                    < b, Vtrain (b) >,

  • Want to learn weights wi that best fit the set of training samples                                                ˆ
                                                                              • Use the current weights to calculate V (b).
    {< b, Vtrain (b) >}.
                                                                              • For each weight wi , update it as
                                              ˆ
  • How to define best fit? Once we have V we can calculate all                                                        ˆ
                                                                                            wi ← wi + η(Vtrain (b) − V (b))xi ,
    ˆ
    V (b) for all b in the training set, and calculate the error.
                           X                   “                    ”2           where η is a small learning rate constant.
    E≡                                                            ˆ
                                                 Vtrain (b) − V (b)
                                                                                                       ˆ
                                                                              • The error Vtrain (b) − V (b) and the input xi both contribute to
               <b,Vtrain (b)>∈training set
                                                                                 the weight update.
  • How to reduce E ?



                                    15                                                                         16
Final Design                                                      Alternatives (I)

Putting together the system (checker player):                          • Training experience: against experts, against self, table of correct
                                                                         moves, ...
  • Performance system: input = problem, output = solution trace =
     game history (using what is learned so far)                       • Target function: board → move, board → value, ...
  • Critic: input = solution trace, output = training examples         • Representation of target function: polynomial, linear function of
    (estimated Vtrain (b))                                               small number of features, artificial neural network

  • Generalizer: input = training examples, output = estimated         • Learning algorithm: gradient descent, linear programming, ...
               ˆ
    hypothesis V (i.e., learned weights wi )

                                             ˆ
  • Experiment generator: input = hypothesis V , output = new
     problem (new initial condition, to explore particualar regions)




                                   17                                                                   18




                            Alternatives (II)                           Perspectives on ML: Hypothesis Space Search

  • Memorize (instance-based learning)                                 • Useful to think of ML as searching a very large space of
                                                                         possible hypotheses to best fit the data and the learner’s prior
  • Spawn a population and make them compete with each other
                                                                         knowledge.
     (genetic algorithms)
                                                                                                               ˆ
                                                                       • For example, the hypothesis space for V would be all possible
  • Analyze and reason about things
                                                                         ˆ
                                                                         V s with different weight assignment.

                                                                       • Useful concepts regarding hypothesis space search:
                                                                           – Size of hypothesis space

                                                                           – Number of training examples available/needed.

                                                                           – Confidence in generalizing to new unseen data.




                                   19                                                                   20
Issues in ML                                       Classification of learning algorithms (YC)

• What algorithms exist for generalizable learners given specific       • Supervised learning: input-target pairs given.
  training set? Requirements for convergence? Which algorithms
                                                                       • Unsupervised learning: only input distribution is given.
  are best for a particular domain?
                                                                       • Reinforcement learning: sparse reward signal is given for action
• How much training data needed? Bounds on confidence, based
                                                                         based on sensory input; environment-altering actions.
  on data size? How long to train?

• Use of prior knowledge?

• How to choose best training experience? Impact of the choice?

• How to reduce ML problem to function approximation?

• How can learner alter the representation itself?


                                21                                                                     22




                 Broader questions (YC)

• Can machines themselves formulate their own learning tasks?
    – Can they come up with their own representations?

    – Can they come up with their own learning strategy?

    – Can they come up with their own motivation?

    – Can they come up with their own questions/problems?

• What if the machines are faced with multiple, possibly conflicting
  tasks? Can there be a meta learning algorithm?

• What if performance is hard to measure (i.e., hard to quantify, or
  even worse, subjective)?

• Lesson: think outside the box; question the questions themselves.

                                23

More Related Content

Similar to 633-600 Machine Learning

Artificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarArtificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarGarry D. Lasaga
 
AI In Actuarial Science
AI In Actuarial ScienceAI In Actuarial Science
AI In Actuarial ScienceAudrey Britton
 
Machine Learning for objective QoE assessment: Science, Myths and a look to t...
Machine Learning for objective QoE assessment: Science, Myths and a look to t...Machine Learning for objective QoE assessment: Science, Myths and a look to t...
Machine Learning for objective QoE assessment: Science, Myths and a look to t...Förderverein Technische Fakultät
 
Slide Presentasi Kelompok Keilmuan E
Slide Presentasi Kelompok Keilmuan ESlide Presentasi Kelompok Keilmuan E
Slide Presentasi Kelompok Keilmuan EHendri Karisma
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebertzukun
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.pptbutest
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.pptbutest
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdfVincenzo Lomonaco
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
intro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabiintro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabibotvillain45
 
DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9Wouter Beek
 

Similar to 633-600 Machine Learning (20)

Artificial Intelligence by B. Ravikumar
Artificial Intelligence by B. RavikumarArtificial Intelligence by B. Ravikumar
Artificial Intelligence by B. Ravikumar
 
lecture1.pptx
lecture1.pptxlecture1.pptx
lecture1.pptx
 
AI Lesson 33
AI Lesson 33AI Lesson 33
AI Lesson 33
 
AI In Actuarial Science
AI In Actuarial ScienceAI In Actuarial Science
AI In Actuarial Science
 
Machine Learning for objective QoE assessment: Science, Myths and a look to t...
Machine Learning for objective QoE assessment: Science, Myths and a look to t...Machine Learning for objective QoE assessment: Science, Myths and a look to t...
Machine Learning for objective QoE assessment: Science, Myths and a look to t...
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
Slide Presentasi Kelompok Keilmuan E
Slide Presentasi Kelompok Keilmuan ESlide Presentasi Kelompok Keilmuan E
Slide Presentasi Kelompok Keilmuan E
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
Fcv cross hebert
Fcv cross hebertFcv cross hebert
Fcv cross hebert
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.ppt
 
MLlecture1.ppt
MLlecture1.pptMLlecture1.ppt
MLlecture1.ppt
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
intro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabiintro to ML by the way m toh phasee movie Punjabi
intro to ML by the way m toh phasee movie Punjabi
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

633-600 Machine Learning

  • 1. 633-600 Machine Learning Textbook • Instructor: Yoonsuck Choe • Tom M. Mitchell (1997) “Machine Learning”, McGraw-Hill. – Contact info: HRBB 322B, 845-5466, choe@tamu.edu • Book webpage: http://www.cs.cmu.edu/˜tom/mlbook.html • Course web page: http://courses.cs.tamu.edu/choe/09spring/633 • Text and figures, etc. will be quoted from the textbook without repeated acknowledgment. Instructor’s perspective will be indicated by “YC” where appropriate. 1 2 Course Info Relation to Other Courses • Grading: 3 assignments (including written and programming Some overlaps: components), 18% each = 54%; 2 exams (in class), 15% each = 30%; Mini project 16%; No curving: ≥ 90 = A, ≥ 80 = B, etc. • Neural Networks: perceptrons, backpropagation, etc. • Academic integrity: individual work, unless otherwise indicated; • Pattern analysis: Bayesian learning, instance-based learning proper references should be given in case online/offline resources • Artificial intelligence: decision trees (in some courses) are used. • Students with disabilities: see syllabus. • Statistics: hypothesis testing • Lecture notes: check course web page for updates. Download, • (Relatively) unique to this course: concept learning, print, and bring to the class. computational learning theory, genetic algorithms, reinforcement learning, decision trees (in depth treatment) • Computer accounts: talk to CS helpdesk (HRBB 210). • Programming: Matlab, or better yet, Octave (http://www.octave.org). C/C++, Java, etc. (they should run on CS Unix or windows) 3 4
  • 2. ML Overview (I) ML Overview (II) • Current status: Yet unsolved problem. • How can machines (computers) learn? – Theoretical insights emerging. How can machines improve automatically with experience? – Practical applications. • Benefits: – Huge data volume demands ML, and provides opportunity to – Improved performance ML (datamining). – Automated optimization • State of the art: – New uses of computers – speech recognition – Reduced programming (YC) – medical predictions – Insights into human learning and learning disabilities – fraud detection – drive autonomous vehicles (highway and desert) – board games (backgammon, chess) – theoretical bounds on error, number of inputs needed, etc. 5 6 ML Overview (III) Well-Posed Learning Problem A program is said to learn from Multidisciplinary roots: • experience E with respect to • AI • task T and • probability and statistics • performance measure P , • computational complexity theory • P in T increase with E . • control theory Examples: Playing checkers, Handwriting recognition, Robot driving, • information theory etc. • philosophy Goal of ML: “define precisely a class of problems that encompasses interesting forms of learning [but not all: YC], to explore algorithms that • psychology solve such problems, and to understand the fundamental structure of • neurobiology learning problems and processes” (Mitchell, 1997) 7 8
  • 3. Designing a Learning System (I) Designing a Learning System (II) Training experience: Remaining design choices: • direct vs. indirect (problem of credit assignment) • Exact type of knowledge to be learned. • degree of control over training examples (teacher-dependent or • A representation for this target knowledge. learner-generated) • A learning mechanism. • closeness of training example distribution to true distribution over • functional/operational principle giving rise to the learning which P is measured: in many cases, ML algorithms assume that mechanism (YC) both distributions are similar, which may not be the case in practice. 9 10 Design: Target Function (I) Design: Target Function (II) • Another function (B : board states, R: real numbers): Type of knowledge to be learned: for example, we want to learn best move in a board game. V : B → R, • Can represent as a function (B : board states, M : moves): which gives the evaluation of each board state. – V (b = win) = 100 ChooseM ove : B → M, – V (b = lose) = −100 but it is hard to learn directly. – V (b = draw) = 0 – V (b = otherwise) = V (b ), where b is the best final board state that can be reached from b. – However, this is not efficiently computable, i.e., it is a nonoperational definition. – Goal of ML is to find an operational description of V , however, in practice, an approximation is all we can get. 11 12
  • 4. Design: Representation for Target Function Design: Function Approximation Algorithm Given an ideal target function V , we want to learn an approximate ˆ Given board state and true V , we want to learn the weights wi that function V : ˆ specify V . • Trade-off between rich and parsimonious representation. • Start with a set of a large number of input-target pairs ˆ • Example: V as a linear combination of number of pieces, number < b, Vtrain (b) >. of particular relational situations in the board (e.g., threatened), etc. (represented as xi ) in board configuration b: • Problem: cannot come up with a full set of < b, Vtrain (b) > n pairs. X ˆ V (b) = w0 + w i xi , ˆ • Solution: If Vtrain (b) is unknown, set it to the estimated V of i=1 its successor board state: where wi are the weight values to be learned. ˆ Vtrain (b) = Vtrain (Successor(b)). • Advantage of the above representation: reduction of scope (or dimensionality) from the original problem. 13 14 Design: Adjusting Weights (I) Design: Adjusting Weights (II) Last component in defining a learning algorithm: adjustment of Least Mean Squares (LMS) learning rule: For each training example weights. < b, Vtrain (b) >, • Want to learn weights wi that best fit the set of training samples ˆ • Use the current weights to calculate V (b). {< b, Vtrain (b) >}. • For each weight wi , update it as ˆ • How to define best fit? Once we have V we can calculate all ˆ wi ← wi + η(Vtrain (b) − V (b))xi , ˆ V (b) for all b in the training set, and calculate the error. X “ ”2 where η is a small learning rate constant. E≡ ˆ Vtrain (b) − V (b) ˆ • The error Vtrain (b) − V (b) and the input xi both contribute to <b,Vtrain (b)>∈training set the weight update. • How to reduce E ? 15 16
  • 5. Final Design Alternatives (I) Putting together the system (checker player): • Training experience: against experts, against self, table of correct moves, ... • Performance system: input = problem, output = solution trace = game history (using what is learned so far) • Target function: board → move, board → value, ... • Critic: input = solution trace, output = training examples • Representation of target function: polynomial, linear function of (estimated Vtrain (b)) small number of features, artificial neural network • Generalizer: input = training examples, output = estimated • Learning algorithm: gradient descent, linear programming, ... ˆ hypothesis V (i.e., learned weights wi ) ˆ • Experiment generator: input = hypothesis V , output = new problem (new initial condition, to explore particualar regions) 17 18 Alternatives (II) Perspectives on ML: Hypothesis Space Search • Memorize (instance-based learning) • Useful to think of ML as searching a very large space of possible hypotheses to best fit the data and the learner’s prior • Spawn a population and make them compete with each other knowledge. (genetic algorithms) ˆ • For example, the hypothesis space for V would be all possible • Analyze and reason about things ˆ V s with different weight assignment. • Useful concepts regarding hypothesis space search: – Size of hypothesis space – Number of training examples available/needed. – Confidence in generalizing to new unseen data. 19 20
  • 6. Issues in ML Classification of learning algorithms (YC) • What algorithms exist for generalizable learners given specific • Supervised learning: input-target pairs given. training set? Requirements for convergence? Which algorithms • Unsupervised learning: only input distribution is given. are best for a particular domain? • Reinforcement learning: sparse reward signal is given for action • How much training data needed? Bounds on confidence, based based on sensory input; environment-altering actions. on data size? How long to train? • Use of prior knowledge? • How to choose best training experience? Impact of the choice? • How to reduce ML problem to function approximation? • How can learner alter the representation itself? 21 22 Broader questions (YC) • Can machines themselves formulate their own learning tasks? – Can they come up with their own representations? – Can they come up with their own learning strategy? – Can they come up with their own motivation? – Can they come up with their own questions/problems? • What if the machines are faced with multiple, possibly conflicting tasks? Can there be a meta learning algorithm? • What if performance is hard to measure (i.e., hard to quantify, or even worse, subjective)? • Lesson: think outside the box; question the questions themselves. 23