SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Aims
             09s1: COMP9417 Machine Learning and Data Mining                                        This lecture will provide the basis for you to be able to describe
                                                                                                the motivation, scope and some application areas of machine learning.
  Introduction to Machine Learning                                                              Following it you should be able to:

                                                                                                • describe the general learning problem
                                 March 12, 2008
                                                                                                • state some of the steps in setting up a learning problem
                                                                                                • list some applications of machine learning
                                                                                                • list some issues in machine learning
       Acknowledgement: Material derived from slides for the book
           Machine Learning, Tom Mitchell, McGraw-Hill, 1997
            http://www-2.cs.cmu.edu/~tom/mlbook.html




                                                                                                COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 1




                                   Overview                                                                                Why Machine Learning
                   [Recommended reading: Mitchell, Chapter 1]
                 [Recommended exercises: 1.1,1.2, optionally 1.5]                               • Considerable progress in algorithms and theory

• Why Machine Learning?                                                                         • Growing flood of online data

• What is a well-defined learning problem?                                                       • Increasing computational power

• An example: learning to play checkers (draughts)                                              • Many successful commercial/scientific applications

• What questions should we ask about Machine Learning?




COMP9417: March 11, 2009                            Introduction to Machine Learning: Slide 2   COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 3
Three niches for machine learning:                                                                                                                                   Some definitions

• Data mining: using historical data to improve decisions                                                                     machine learning the science of algorithmic methods of learning from
                                                                                                                           experience with the goal of improving performance on selected tasks
    – medical records → medical knowledge
• Software applications we can’t program by hand                                                                               data mining the use of machine learning or statistical algorithms to
                                                                                                                           search large amounts of data for hidden patterns or relationships that are
    – autonomous robots                                                                                                    interesting and potentially useful
    – speech recognition
• Self customizing programs
    – Web sites that learn user interests




COMP9417: March 11, 2009                                                       Introduction to Machine Learning: Slide 4   COMP9417: March 11, 2009                                                         Introduction to Machine Learning: Slide 5




                                  Typical data mining Task                                                                                                           Datamining Result
              Patient103 time=1           Patient103 time=2            ...     Patient103 time=n
                                                                                                                                         Patient103 time=1             Patient103 time=2            ...     Patient103 time=n
             Age: 23                      Age: 23                            Age: 23
             FirstPregnancy: no           FirstPregnancy: no                 FirstPregnancy: no                                         Age: 23                        Age: 23                            Age: 23
             Anemia: no                   Anemia: no                         Anemia: no                                                 FirstPregnancy: no             FirstPregnancy: no                 FirstPregnancy: no
             Diabetes: no                 Diabetes: YES                      Diabetes: no                                               Anemia: no                     Anemia: no                         Anemia: no
             PreviousPrematureBirth: no   PreviousPrematureBirth: no         PreviousPrematureBirth: no                                 Diabetes: no                   Diabetes: YES                      Diabetes: no
             Ultrasound: ?                Ultrasound: abnormal               Ultrasound: ?                                              PreviousPrematureBirth: no     PreviousPrematureBirth: no         PreviousPrematureBirth: no
             Elective C−Section: ?        Elective C−Section: no             Elective C−Section: no                                     Ultrasound: ?                  Ultrasound: abnormal               Ultrasound: ?
             Emergency C−Section: ?       Emergency C−Section: ?             Emergency C−Section: Yes                                   Elective C−Section: ?          Elective C−Section: no             Elective C−Section: no
             ...                          ...                                ...
                                                                                                                                        Emergency C−Section: ?         Emergency C−Section: ?             Emergency C−Section: Yes
                                                                                                                                        ...                            ...                                ...


Given:
                                                                                                                           One of 18 learned rules:
• 9714 patient records, each describing a pregnancy and birth                                                              If   No previous vaginal delivery, and
• Each patient record contains 215 features                                                                                     Abnormal 2nd Trimester Ultrasound, and
                                                                                                                                Malpresentation at admission
Learn to predict:                                                                                                          Then Probability of Emergency C-Section is 0.6

                                                                                                                            Over training data: 26/41 = .63,
• Classes of future patients at high risk for Emergency Cesarean Section
                                                                                                                            Over test data: 12/20 = .60

COMP9417: March 11, 2009                                                       Introduction to Machine Learning: Slide 6   COMP9417: March 11, 2009                                                         Introduction to Machine Learning: Slide 7
Credit Risk Analysis                                                                                                   Other Prediction Problems
                                                                           ...
         Customer103: (time=t0)             Customer103: (time=t1)                  Customer103: (time=tn)                    Customer purchase behavior:
            Years of credit: 9               Years of credit: 9                           Years of credit: 9
            Loan balance: $2,400             Loan balance: $3,250                         Loan balance: $4,500                                                                                  ...
                                                                                                                                         Customer103: (time=t0)        Customer103: (time=t1)             Customer103: (time=tn)
            Income: $52k                     Income: ?                                    Income: ?
            Own House: Yes                   Own House: Yes                               Own House: Yes                                   Sex: M                        Sex: M                              Sex: M
                                                                                                                                           Age: 53                       Age: 53                             Age: 53
            Other delinquent accts: 2        Other delinquent accts: 2                    Other delinquent accts: 3
                                                                                                                                           Income: $50k                  Income: $50k                        Income: $50k
            Max billing cycles late: 3       Max billing cycles late: 4                   Max billing cycles late: 6
                                                                                                                                           Own House: Yes                Own House: Yes                      Own House: Yes
            Profitable customer?: ?          Profitable customer?: ?                      Profitable customer?: No
                                                                                                                                           MS Products: Word             MS Products: Word                   MS Products: Word
            ...                              ...                                          ...
                                                                                                                                           Computer: 386 PC              Computer: Pentium                   Computer: Pentium
                                                                                                                                           Purchase Excel?: ?            Purchase Excel?: ?                  Purchase Excel?: Yes
                                                                                                                                           ...                           ...                                 ...
Rules learned from synthesized data:

If   Other-Delinquent-Accounts > 2, and                                                                                       Customer retention:
     Number-Delinquent-Billing-Cycles > 1                                                                                          Customer103: (time=t0)            Customer103: (time=t1)     ...         Customer103: (time=tn)
Then Profitable-Customer? = No   [Deny Credit Card application]                                                                      Sex: M                            Sex: M                                      Sex: M
                                                                                                                                     Age: 53                           Age: 53                                     Age: 53
                                                                                                                                     Income: $50k                      Income: $50k                                Income: $50k
If   Other-Delinquent-Accounts = 0, and                                                                                              Own House: Yes                    Own House: Yes                              Own House: Yes
                                                                                                                                     Checking: $5k                     Checking: $20k                              Checking: $0
     (Income > $30k) OR (Years-of-Credit > 3)                                                                                        Savings: $15k                     Savings: $0                                 Savings: $0
Then Profitable-Customer? = Yes [Accept Credit Card application]                                                                     Current−customer?: yes
                                                                                                                                     ...                               Current−customer?: yes
                                                                                                                                                                       ...                                         Current−customer?: No


COMP9417: March 11, 2009                                                          Introduction to Machine Learning: Slide 8   COMP9417: March 11, 2009                                                 Introduction to Machine Learning: Slide 9




Process optimization:                                                                                                                                           Tasmanian Apple Thinning
         Product72:       (time=t0)        Product72:       (time=t1)     ...      Product72:          (time=tn)
           Stage: mix                        Stage: cook                            Stage: cool                               Apple orchards are important in primary production in Tasmania, and
           Mixing−speed: 60rpm               Temperature: 325                       Fan−speed: medium
           Viscosity: 1.3                    Viscosity: 3.2                         Viscosity: 1.3                            there has been a long history in the process of apple thinning. Apples are
           Fat content: 15%
           Density: 2.8
                                             Fat content: 12%
                                             Density: 1.1
                                                                                    Fat content: 12%
                                                                                    Density: 1.2
                                                                                                                              naturally biennial bearing,. Trees flower heavily one year producing a large
           Spectral peak: 2800               Spectral peak: 3200                    Spectral peak: 3100                       crop of small fruit (called the ”On” year) followed by light flowering the
           Product underweight?: ??          Product underweight?: ??               Product underweight?: Yes
           ...                               ...                                    ...
                                                                                                                              next year with a small crop of large poor quality fruit.

                                                                                                                              Thinning is most economically done by applying sprays of chemicals that
                                                                                                                              act similarly to plant hormones and cause the abortion of flowers and
                                                                                                                              fruitlets at an early stage of development. Early thinning favours the
                                                                                                                              development of the desirable high density of cells in the fruit.

                                                                                                                              Orchardists – decision about concentration of thinning agent at blossom
                                                                                                                              time. If concentration too low, then thinning is not effective and cost
                                                                                                                              of hand thinning is prohibitive,. If the concentration too high, then risk
                                                                                                                              of losing all the fruit. Decision is difficult because of large number of
                                                                                                                              variables to be taken into account.

COMP9417: March 11, 2009                                                         Introduction to Machine Learning: Slide 10   COMP9417: March 11, 2009                                                Introduction to Machine Learning: Slide 11
• trees - cultivar, rootstock and age.                                                                            BG Gas Drilling - “Stuck Pipe”
• physiology - previous crop, vigour, number of blossom buds.
• pruning - severity of detailed pruning, limb thinning, and penetration
  of light into the canopy.
• market - size of fruit required for the market.
• spraying - type of spray machinery and volume of water to be used in
  the machinery.

60 tasks, (some with 50 decision tree leaves (i.e. rule paths), plus 30
other variables and 40 procedures supported by a customized help file of
5,000 words.


                                                                                            Drilling is a hugely expensive process, with daily costs for a North Sea
                                                                                            operation typically incurring rig costs of around $50,000 per day. Clearly,
                                                                                            anything that helps to reduce the time when a drilling rig is not productive
                                                                                            has the potential to achieve huge savings.

COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 12   COMP9417: March 11, 2009                         Introduction to Machine Learning: Slide 13




Daily report data from two databases: One of which was old and included                                                Nissan - Car selection
incomplete or absent data - particularly IADC (International Association
of Drilling Contractors) codes. The other database was compiled more
recently and included a large amount of additional data about well site
geology, drilling costs, etc. Sixty recorded occurrences of Stuck Pipe in
170 BG wells.

Possible to mine the data and to determine trends. Much of the time
invested by the project team has concentrated on getting data in good
order. Results indicate that length of time the hole has been open; the
properties of the drilling mud; and the frequency with which the mud is
conditioned all play a significant role in the incidence of Stuck Pipe.



                                                                                            Starting from the basic choices of 3 alternative engines, 3 types of
                                                                                            suspension, 2 types of transmission, 9 colours and 3 styles of seat fabric,
                                                                                            customers can go far further and create a car to suit their own personality.

COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 14   COMP9417: March 11, 2009                         Introduction to Machine Learning: Slide 15
With 670,000 possible combinations, “it is a totally new concept” says                                                   Channel 4 TV scheduling
Takao Ohmura, Sales Manager of Tokyo Nissan Computer Systems.

A guidebook explains the options in table form and we were able to input                      During the day, Channel 4’s strength is the housewife market whilst in
these tables into XpertRule. Normally it is difficult to utilise such a large                   the evenings Channel 4’s strength lies in its varied targeting ability. In
matrix, but XpertRule was able to automatically generate a decision tree                      comparison with ITV, Channel 4 audiences contain a greater proportion
structure to arrive at the correct model, from attributes and values in the                   of younger, lighter, up-market, male viewers (audience research has also
tables.                                                                                       identified Channel 4’s ability to target cluster groups defined by names
                                                                                              such as “Progressive Priscillas” and “Free-thinking Franks”).
It met our three major requirements: (1) the model selection and check
must be completed in three minutes: (2) the ability to run on Nissan                          Advertisers may specify to have commercials placed first in the break,
dealers hardware, and (3) ease of maintaining the system after the launch                     last in the break or “Top & Tail” in a break making break sequencing
of the Cefiro model.                                                                           a challenge if optimal use of airtime is to be achieved. Definition of a
                                                                                              knowledge-based system to solve the problem requires observation of a
                                                                                              number of prioritised “rules”: Top of the list is the need for no overlaps
                                                                                              or gaps, with Top and Tail or First and Last network spots also receiving
                                                                                              high priority. Lower down the list are First and Last Super-macro spots
                                                                                              and non-reporting Super-macros sequenced to play at the same time.

COMP9417: March 11, 2009                         Introduction to Machine Learning: Slide 16   COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 17




Optimization problems – as the number of possible combinations grows                                    Problems Too Difficult to Program by Hand
it becomes impractical to try all combinations to arrive at a solution in a
reasonable time.

Rule of thumb can be used to narrow down options but, in most cases, good                     ALVINN [Pomerleau] drives 70 mph on highways !
rules are not available or are difficult to capture. Numerical optimization
techniques are currently available in most advanced spreadsheets, but
these tend to be incapable of optimizing problems involving sequencing
or scheduling and they are “exploitation” rather than “exploration”
techniques.

The solution involved the use of genetic algorithm techniques which allows
the exploration of large search spaces for optimal or near optimal solutions.




COMP9417: March 11, 2009                         Introduction to Machine Learning: Slide 18   COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 19
Sharp           Straight         Sharp                                                                   Stanley - DARPA Grand Challenge Champion 2005
  Left            Ahead            Right


                                           30 Output
                                             Units



                                4 Hidden
                                  Units




                                           30x32 Sensor
                                                                                                       • won 2 million dollars (US), first team to complete 132 mile course
                                           Input Retina
                                                                                                       • modified VW Touareg R5 with drive-by-wire, took 6 hours 54 minutes
                                                                                                         averaging over 19 mph
                                                                                                       • seven Pentium M computers, GPS and various sensors
                                                                                                       • localization, mapping and collision avoidance

COMP9417: March 11, 2009                                  Introduction to Machine Learning: Slide 20   COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 21




                             Software that adapts to User                                                                         Measuring Neural Activity




• Brin & Page - PhD students in data mining at Stanford                                                • Botros, van Dijk & Killian (2007) - Cochlear implant adjustment
• PageRank algorithm (1998)                                                                            • Expert system uses neural response telemetry (ECAP)
• Google business model - technology targets advertisements to users                                   • Decision tree learning - Quinlan’s C5 and Cubist



COMP9417: March 11, 2009                                  Introduction to Machine Learning: Slide 22   COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 23
concentration of yeast in the wells of the microtitre trays using the                              biological entities.
adjacent plate reader and returns the results to the LIMS (although                                   The original bioinformatic information for the AAA model was
microtitre trays are still moved in and out of incubators manually).
                         Scientific Discovery                                                       taken mainly from the KEGG13 catalogue of metabolism. The model
                                                                                                                                           Scientific Discovery
                                                                                                   was then tested with all possible auxotrophic experiments involving
      The Robot Scientist project (2004)                                                                        Robot scientist in the lab
                                                                                                   a single replacement metabolite, and was altered manually to fit the
                                                                                                   empirical results. To ensure that the model was not ‘over-fitted’, we
                                                                                                   carried out all possible auxotrophic experiments with pairs of
                                                                                                   metabolites. The model correctly predicted at least 98.5% of the
                                                                                                   experiments (Supplementary Information). To the best of our
                                                                                                   knowledge, no bioinformatic model has been as thoroughly tested
                                                                                                   with knockout mutants.
                                                                                                      Machine learning is the branch of artificial intelligence that seeks
                                                                                                   to develop computer systems that improve their performance
                                                                                                   automatically with experience14,15. It has much in common with
                                                                                                   statistics, but differs in having a greater emphasis on algorithms,
Figure 1 The Robot Scientist hypothesis-generation and experimentation loop.                       data representation and making acquired knowledge explicit. The
248                                                                                                                          NATURE | VOL 427 | 15 JANUARY 2004 | www.nature.com/nature
  COMP9417: March 11, 2009                            Introduction to Machine Learning: Slide 24               COMP9417: March 11, 2009                               Introduction to Machine Learning: Slide 25




                             Where Is this Headed ?                                                                                          Where Is this Headed ?


  Mature algorithms                                                                                            Opportunity for tomorrow: enormous impact

  • decision trees, regression, neural nets, Bayesian methods ...                                              • Learn across full mixed-media data

  • can be applied to standard database relations or flat files                                                  • Learn across multiple internal databases, plus the web and newsfeeds

  • established software and services industry                                                                 • Learn by active experimentation
                                                                                                               • Learn more complex functions
                                                                                                               • Learn by analogy
                                                                                                               • Cumulative, lifelong learning and adaptation
                                                                                                               • Programming languages and systems with learning embedded ?




  COMP9417: March 11, 2009                            Introduction to Machine Learning: Slide 26               COMP9417: March 11, 2009                               Introduction to Machine Learning: Slide 27
Relevant Disciplines                                                             A definition of the learning problem

• Artificial intelligence                                                                    Learning = improving with experience at some task
• Computational complexity theory
                                                                                            • Improve over task T ,
• Statistics
                                                                                            • with respect to performance measure P ,
• Information theory
                                                                                            • based on experience E.
• Bayesian methods
• Control theory                                                                            E.g., Learn to play checkers (draughts)
• Philosophy
                                                                                            • T : Play checkers
• Psychology and neurobiology
                                                                                            • P : % of games won in world tournament
• Physics
                                                                                            • E: opportunity to play against self
• ...

COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 28   COMP9417: March 11, 2009                        Introduction to Machine Learning: Slide 29




                           Learning to Play Checkers                                                                   Type of Training Experience

• T : Play checkers                                                                         • Direct or indirect?
• P : Percent of games won in world tournament                                              • Teacher or not?

• What experience?                                                                          A problem: is training experience representative of performance goal?
• What exactly should be learned?
• How shall it be represented?
• What specific algorithm to learn it?




COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 30   COMP9417: March 11, 2009                        Introduction to Machine Learning: Slide 31
Choose the Target Function                                                    Possible Definition for Target Function V

• ChooseM ove : Board → M ove ??                                                            • if b is a final board state that is won, then V (b) = 100
• V : Board →               ??                                                              • if b is a final board state that is lost, then V (b) = −100
• ...                                                                                       • if b is a final board state that is drawn, then V (b) = 0
                                                                                            • if b is a not a final state in the game, then V (b) = V (b ), where b
                                                                                              is the best final board state that can be achieved starting from b and
                                                                                              playing optimally until the end of the game.

                                                                                            This gives correct values, but is not operational




COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 32   COMP9417: March 11, 2009                           Introduction to Machine Learning: Slide 33




           Choose Representation for Target Function                                                       A Representation for Learned Function

• collection of rules?
• neural network ?                                                                          w0 + w1 · bp(b) + w2 · rp(b) + w3 · bk(b) + w4 · rk(b) + w5 · bt(b) + w6 · rt(b)
• polynomial function of board features?
                                                                                            • bp(b): number of black pieces on board b
• ...
                                                                                            • rp(b): number of red pieces on b
                                                                                            • bk(b): number of black kings on b
                                                                                            • rk(b): number of red kings on b
                                                                                            • bt(b): number of red pieces threatened by black (i.e., which can be
                                                                                              taken on black’s next turn)
                                                                                            • rt(b): number of black pieces threatened by red


COMP9417: March 11, 2009                       Introduction to Machine Learning: Slide 34   COMP9417: March 11, 2009                           Introduction to Machine Learning: Slide 35
Obtaining Training Examples                                                                                Choose Weight Tuning Rule

                                                                                                            LMS Weight update rule:
• V (b): the true target function
  ˆ
• V (b) : the learned function                                                                              Do repeatedly:
• Vtrain(b): the training value
                                                                                                           • Select a training example b at random

One rule for estimating training values:                                                                      1. Compute error(b):

                                                                                                                                                                 ˆ
                                                                                                                                          error(b) = Vtrain(b) − V (b)
              ˆ
• Vtrain(b) ← V (Successor(b))
                                                                                                              2. For each board feature fi, update weight wi:

                                                                                                                                           wi ← wi + c · fi · error(b)

                                                                                                               c is some small constant, say 0.1, to moderate the rate of learning

COMP9417: March 11, 2009                                      Introduction to Machine Learning: Slide 36   COMP9417: March 11, 2009                             Introduction to Machine Learning: Slide 37




                                  Design Choices                                                                                Some Issues in Machine Learning
                                         Determine Type
                                     of Training Experience

        Games against
                                                              Table of correct
                                                                                              ...          • What algorithms can approximate functions well (and when)?
          experts
                                          Games against            moves
                                              self
                                                                                                           • How does number of training examples influence accuracy?
                                      Determine
                                    Target Function
                                                                                                           • How does complexity of hypothesis representation impact it?
                              Board                   Board                         ...
                              ¨ move                    ¨ value                                            • How does noisy data influence accuracy?
                                            Determine Representation
                                              of Learned Function
                                                                                                           • What are the theoretical limits of learnability?
                                                                                              ...
                                   Polynomial
                                                 Linear function       Artificial neural
                                                                                                           • How can prior knowledge of learner help?
                                                 of six features           network


                                     Determine
                                                                                                           • What clues can we get from biological learning systems?
                                 Learning Algorithm

                                                                                                           • How can systems alter their own representations?
                                                    Linear                ...
                                    Gradient     programming
                                    descent

                       Completed Design




COMP9417: March 11, 2009                                      Introduction to Machine Learning: Slide 38   COMP9417: March 11, 2009                             Introduction to Machine Learning: Slide 39

Más contenido relacionado

Similar a Introduction to Machine Learning

Chapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) LearningChapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) Learningbutest
 
633-600 Machine Learning
633-600 Machine Learning633-600 Machine Learning
633-600 Machine Learningbutest
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseFormulatedby
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami PresentationGreg Werner
 
Introduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfIntroduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfGandhiMathy6
 
1. Intoduction to ML.pptx
1. Intoduction to ML.pptx1. Intoduction to ML.pptx
1. Intoduction to ML.pptxEmadNail
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfSisayNegash4
 
chapter1-introduction1.ppt
chapter1-introduction1.pptchapter1-introduction1.ppt
chapter1-introduction1.pptSeshuSrinivas2
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptxDr. Amanpreet Kaur
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning Sean Yu
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
mining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.pptmining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.pptUttamVishwakarma7
 

Similar a Introduction to Machine Learning (20)

Chapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) LearningChapter II.6 (Book Part VI) Learning
Chapter II.6 (Book Part VI) Learning
 
633-600 Machine Learning
633-600 Machine Learning633-600 Machine Learning
633-600 Machine Learning
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami Presentation
 
Introduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfIntroduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdf
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
ML basics.pptx
ML basics.pptxML basics.pptx
ML basics.pptx
 
1. Intoduction to ML.pptx
1. Intoduction to ML.pptx1. Intoduction to ML.pptx
1. Intoduction to ML.pptx
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine Learning
 
Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
 
chapter1-introduction1.ppt
chapter1-introduction1.pptchapter1-introduction1.ppt
chapter1-introduction1.ppt
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
mining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.pptmining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.ppt
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Introduction to Machine Learning

  • 1. Aims 09s1: COMP9417 Machine Learning and Data Mining This lecture will provide the basis for you to be able to describe the motivation, scope and some application areas of machine learning. Introduction to Machine Learning Following it you should be able to: • describe the general learning problem March 12, 2008 • state some of the steps in setting up a learning problem • list some applications of machine learning • list some issues in machine learning Acknowledgement: Material derived from slides for the book Machine Learning, Tom Mitchell, McGraw-Hill, 1997 http://www-2.cs.cmu.edu/~tom/mlbook.html COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 1 Overview Why Machine Learning [Recommended reading: Mitchell, Chapter 1] [Recommended exercises: 1.1,1.2, optionally 1.5] • Considerable progress in algorithms and theory • Why Machine Learning? • Growing flood of online data • What is a well-defined learning problem? • Increasing computational power • An example: learning to play checkers (draughts) • Many successful commercial/scientific applications • What questions should we ask about Machine Learning? COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 2 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 3
  • 2. Three niches for machine learning: Some definitions • Data mining: using historical data to improve decisions machine learning the science of algorithmic methods of learning from experience with the goal of improving performance on selected tasks – medical records → medical knowledge • Software applications we can’t program by hand data mining the use of machine learning or statistical algorithms to search large amounts of data for hidden patterns or relationships that are – autonomous robots interesting and potentially useful – speech recognition • Self customizing programs – Web sites that learn user interests COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 4 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 5 Typical data mining Task Datamining Result Patient103 time=1 Patient103 time=2 ... Patient103 time=n Patient103 time=1 Patient103 time=2 ... Patient103 time=n Age: 23 Age: 23 Age: 23 FirstPregnancy: no FirstPregnancy: no FirstPregnancy: no Age: 23 Age: 23 Age: 23 Anemia: no Anemia: no Anemia: no FirstPregnancy: no FirstPregnancy: no FirstPregnancy: no Diabetes: no Diabetes: YES Diabetes: no Anemia: no Anemia: no Anemia: no PreviousPrematureBirth: no PreviousPrematureBirth: no PreviousPrematureBirth: no Diabetes: no Diabetes: YES Diabetes: no Ultrasound: ? Ultrasound: abnormal Ultrasound: ? PreviousPrematureBirth: no PreviousPrematureBirth: no PreviousPrematureBirth: no Elective C−Section: ? Elective C−Section: no Elective C−Section: no Ultrasound: ? Ultrasound: abnormal Ultrasound: ? Emergency C−Section: ? Emergency C−Section: ? Emergency C−Section: Yes Elective C−Section: ? Elective C−Section: no Elective C−Section: no ... ... ... Emergency C−Section: ? Emergency C−Section: ? Emergency C−Section: Yes ... ... ... Given: One of 18 learned rules: • 9714 patient records, each describing a pregnancy and birth If No previous vaginal delivery, and • Each patient record contains 215 features Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission Learn to predict: Then Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63, • Classes of future patients at high risk for Emergency Cesarean Section Over test data: 12/20 = .60 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 6 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 7
  • 3. Credit Risk Analysis Other Prediction Problems ... Customer103: (time=t0) Customer103: (time=t1) Customer103: (time=tn) Customer purchase behavior: Years of credit: 9 Years of credit: 9 Years of credit: 9 Loan balance: $2,400 Loan balance: $3,250 Loan balance: $4,500 ... Customer103: (time=t0) Customer103: (time=t1) Customer103: (time=tn) Income: $52k Income: ? Income: ? Own House: Yes Own House: Yes Own House: Yes Sex: M Sex: M Sex: M Age: 53 Age: 53 Age: 53 Other delinquent accts: 2 Other delinquent accts: 2 Other delinquent accts: 3 Income: $50k Income: $50k Income: $50k Max billing cycles late: 3 Max billing cycles late: 4 Max billing cycles late: 6 Own House: Yes Own House: Yes Own House: Yes Profitable customer?: ? Profitable customer?: ? Profitable customer?: No MS Products: Word MS Products: Word MS Products: Word ... ... ... Computer: 386 PC Computer: Pentium Computer: Pentium Purchase Excel?: ? Purchase Excel?: ? Purchase Excel?: Yes ... ... ... Rules learned from synthesized data: If Other-Delinquent-Accounts > 2, and Customer retention: Number-Delinquent-Billing-Cycles > 1 Customer103: (time=t0) Customer103: (time=t1) ... Customer103: (time=tn) Then Profitable-Customer? = No [Deny Credit Card application] Sex: M Sex: M Sex: M Age: 53 Age: 53 Age: 53 Income: $50k Income: $50k Income: $50k If Other-Delinquent-Accounts = 0, and Own House: Yes Own House: Yes Own House: Yes Checking: $5k Checking: $20k Checking: $0 (Income > $30k) OR (Years-of-Credit > 3) Savings: $15k Savings: $0 Savings: $0 Then Profitable-Customer? = Yes [Accept Credit Card application] Current−customer?: yes ... Current−customer?: yes ... Current−customer?: No COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 8 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 9 Process optimization: Tasmanian Apple Thinning Product72: (time=t0) Product72: (time=t1) ... Product72: (time=tn) Stage: mix Stage: cook Stage: cool Apple orchards are important in primary production in Tasmania, and Mixing−speed: 60rpm Temperature: 325 Fan−speed: medium Viscosity: 1.3 Viscosity: 3.2 Viscosity: 1.3 there has been a long history in the process of apple thinning. Apples are Fat content: 15% Density: 2.8 Fat content: 12% Density: 1.1 Fat content: 12% Density: 1.2 naturally biennial bearing,. Trees flower heavily one year producing a large Spectral peak: 2800 Spectral peak: 3200 Spectral peak: 3100 crop of small fruit (called the ”On” year) followed by light flowering the Product underweight?: ?? Product underweight?: ?? Product underweight?: Yes ... ... ... next year with a small crop of large poor quality fruit. Thinning is most economically done by applying sprays of chemicals that act similarly to plant hormones and cause the abortion of flowers and fruitlets at an early stage of development. Early thinning favours the development of the desirable high density of cells in the fruit. Orchardists – decision about concentration of thinning agent at blossom time. If concentration too low, then thinning is not effective and cost of hand thinning is prohibitive,. If the concentration too high, then risk of losing all the fruit. Decision is difficult because of large number of variables to be taken into account. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 10 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 11
  • 4. • trees - cultivar, rootstock and age. BG Gas Drilling - “Stuck Pipe” • physiology - previous crop, vigour, number of blossom buds. • pruning - severity of detailed pruning, limb thinning, and penetration of light into the canopy. • market - size of fruit required for the market. • spraying - type of spray machinery and volume of water to be used in the machinery. 60 tasks, (some with 50 decision tree leaves (i.e. rule paths), plus 30 other variables and 40 procedures supported by a customized help file of 5,000 words. Drilling is a hugely expensive process, with daily costs for a North Sea operation typically incurring rig costs of around $50,000 per day. Clearly, anything that helps to reduce the time when a drilling rig is not productive has the potential to achieve huge savings. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 12 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 13 Daily report data from two databases: One of which was old and included Nissan - Car selection incomplete or absent data - particularly IADC (International Association of Drilling Contractors) codes. The other database was compiled more recently and included a large amount of additional data about well site geology, drilling costs, etc. Sixty recorded occurrences of Stuck Pipe in 170 BG wells. Possible to mine the data and to determine trends. Much of the time invested by the project team has concentrated on getting data in good order. Results indicate that length of time the hole has been open; the properties of the drilling mud; and the frequency with which the mud is conditioned all play a significant role in the incidence of Stuck Pipe. Starting from the basic choices of 3 alternative engines, 3 types of suspension, 2 types of transmission, 9 colours and 3 styles of seat fabric, customers can go far further and create a car to suit their own personality. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 14 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 15
  • 5. With 670,000 possible combinations, “it is a totally new concept” says Channel 4 TV scheduling Takao Ohmura, Sales Manager of Tokyo Nissan Computer Systems. A guidebook explains the options in table form and we were able to input During the day, Channel 4’s strength is the housewife market whilst in these tables into XpertRule. Normally it is difficult to utilise such a large the evenings Channel 4’s strength lies in its varied targeting ability. In matrix, but XpertRule was able to automatically generate a decision tree comparison with ITV, Channel 4 audiences contain a greater proportion structure to arrive at the correct model, from attributes and values in the of younger, lighter, up-market, male viewers (audience research has also tables. identified Channel 4’s ability to target cluster groups defined by names such as “Progressive Priscillas” and “Free-thinking Franks”). It met our three major requirements: (1) the model selection and check must be completed in three minutes: (2) the ability to run on Nissan Advertisers may specify to have commercials placed first in the break, dealers hardware, and (3) ease of maintaining the system after the launch last in the break or “Top & Tail” in a break making break sequencing of the Cefiro model. a challenge if optimal use of airtime is to be achieved. Definition of a knowledge-based system to solve the problem requires observation of a number of prioritised “rules”: Top of the list is the need for no overlaps or gaps, with Top and Tail or First and Last network spots also receiving high priority. Lower down the list are First and Last Super-macro spots and non-reporting Super-macros sequenced to play at the same time. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 16 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 17 Optimization problems – as the number of possible combinations grows Problems Too Difficult to Program by Hand it becomes impractical to try all combinations to arrive at a solution in a reasonable time. Rule of thumb can be used to narrow down options but, in most cases, good ALVINN [Pomerleau] drives 70 mph on highways ! rules are not available or are difficult to capture. Numerical optimization techniques are currently available in most advanced spreadsheets, but these tend to be incapable of optimizing problems involving sequencing or scheduling and they are “exploitation” rather than “exploration” techniques. The solution involved the use of genetic algorithm techniques which allows the exploration of large search spaces for optimal or near optimal solutions. COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 18 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 19
  • 6. Sharp Straight Sharp Stanley - DARPA Grand Challenge Champion 2005 Left Ahead Right 30 Output Units 4 Hidden Units 30x32 Sensor • won 2 million dollars (US), first team to complete 132 mile course Input Retina • modified VW Touareg R5 with drive-by-wire, took 6 hours 54 minutes averaging over 19 mph • seven Pentium M computers, GPS and various sensors • localization, mapping and collision avoidance COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 20 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 21 Software that adapts to User Measuring Neural Activity • Brin & Page - PhD students in data mining at Stanford • Botros, van Dijk & Killian (2007) - Cochlear implant adjustment • PageRank algorithm (1998) • Expert system uses neural response telemetry (ECAP) • Google business model - technology targets advertisements to users • Decision tree learning - Quinlan’s C5 and Cubist COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 22 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 23
  • 7. concentration of yeast in the wells of the microtitre trays using the biological entities. adjacent plate reader and returns the results to the LIMS (although The original bioinformatic information for the AAA model was microtitre trays are still moved in and out of incubators manually). Scientific Discovery taken mainly from the KEGG13 catalogue of metabolism. The model Scientific Discovery was then tested with all possible auxotrophic experiments involving The Robot Scientist project (2004) Robot scientist in the lab a single replacement metabolite, and was altered manually to fit the empirical results. To ensure that the model was not ‘over-fitted’, we carried out all possible auxotrophic experiments with pairs of metabolites. The model correctly predicted at least 98.5% of the experiments (Supplementary Information). To the best of our knowledge, no bioinformatic model has been as thoroughly tested with knockout mutants. Machine learning is the branch of artificial intelligence that seeks to develop computer systems that improve their performance automatically with experience14,15. It has much in common with statistics, but differs in having a greater emphasis on algorithms, Figure 1 The Robot Scientist hypothesis-generation and experimentation loop. data representation and making acquired knowledge explicit. The 248 NATURE | VOL 427 | 15 JANUARY 2004 | www.nature.com/nature COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 24 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 25 Where Is this Headed ? Where Is this Headed ? Mature algorithms Opportunity for tomorrow: enormous impact • decision trees, regression, neural nets, Bayesian methods ... • Learn across full mixed-media data • can be applied to standard database relations or flat files • Learn across multiple internal databases, plus the web and newsfeeds • established software and services industry • Learn by active experimentation • Learn more complex functions • Learn by analogy • Cumulative, lifelong learning and adaptation • Programming languages and systems with learning embedded ? COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 26 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 27
  • 8. Relevant Disciplines A definition of the learning problem • Artificial intelligence Learning = improving with experience at some task • Computational complexity theory • Improve over task T , • Statistics • with respect to performance measure P , • Information theory • based on experience E. • Bayesian methods • Control theory E.g., Learn to play checkers (draughts) • Philosophy • T : Play checkers • Psychology and neurobiology • P : % of games won in world tournament • Physics • E: opportunity to play against self • ... COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 28 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 29 Learning to Play Checkers Type of Training Experience • T : Play checkers • Direct or indirect? • P : Percent of games won in world tournament • Teacher or not? • What experience? A problem: is training experience representative of performance goal? • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it? COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 30 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 31
  • 9. Choose the Target Function Possible Definition for Target Function V • ChooseM ove : Board → M ove ?? • if b is a final board state that is won, then V (b) = 100 • V : Board → ?? • if b is a final board state that is lost, then V (b) = −100 • ... • if b is a final board state that is drawn, then V (b) = 0 • if b is a not a final state in the game, then V (b) = V (b ), where b is the best final board state that can be achieved starting from b and playing optimally until the end of the game. This gives correct values, but is not operational COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 32 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 33 Choose Representation for Target Function A Representation for Learned Function • collection of rules? • neural network ? w0 + w1 · bp(b) + w2 · rp(b) + w3 · bk(b) + w4 · rk(b) + w5 · bt(b) + w6 · rt(b) • polynomial function of board features? • bp(b): number of black pieces on board b • ... • rp(b): number of red pieces on b • bk(b): number of black kings on b • rk(b): number of red kings on b • bt(b): number of red pieces threatened by black (i.e., which can be taken on black’s next turn) • rt(b): number of black pieces threatened by red COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 34 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 35
  • 10. Obtaining Training Examples Choose Weight Tuning Rule LMS Weight update rule: • V (b): the true target function ˆ • V (b) : the learned function Do repeatedly: • Vtrain(b): the training value • Select a training example b at random One rule for estimating training values: 1. Compute error(b): ˆ error(b) = Vtrain(b) − V (b) ˆ • Vtrain(b) ← V (Successor(b)) 2. For each board feature fi, update weight wi: wi ← wi + c · fi · error(b) c is some small constant, say 0.1, to moderate the rate of learning COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 36 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 37 Design Choices Some Issues in Machine Learning Determine Type of Training Experience Games against Table of correct ... • What algorithms can approximate functions well (and when)? experts Games against moves self • How does number of training examples influence accuracy? Determine Target Function • How does complexity of hypothesis representation impact it? Board Board ... ¨ move ¨ value • How does noisy data influence accuracy? Determine Representation of Learned Function • What are the theoretical limits of learnability? ... Polynomial Linear function Artificial neural • How can prior knowledge of learner help? of six features network Determine • What clues can we get from biological learning systems? Learning Algorithm • How can systems alter their own representations? Linear ... Gradient programming descent Completed Design COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 38 COMP9417: March 11, 2009 Introduction to Machine Learning: Slide 39