SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
10/11/2011 - ONE Talks


                 Machine Learning
                Learning with Data
André Lourenço
Instituto Superior de Engenharia de Lisboa,
Instituto de Telecomunicações,
Instituto Superior Técnico, Lisbon, Portugal




                                           © 2005, it - instituto de telecomunicações. Todos os direitos reservados.
Outline

    • Introduction
       • Examples
    • What does it mean to learn?
       • Supervised and Unsupervised Learning
       • Types of Learning
       • Classification Problem
    • Text Mining Example
    • Conclusions (and further reading)




      2
Introduction


   3
What is Machine Learning?


                               • A       branch    of     artificial
                                  intelligence (AI)
                               • Arthur Samuel (1959)
                                  Field of study that gives
                                  computers the ability to
                                  learn without being explicitly
                                  programmed



      From: Andrew NG – Standford Machine Learning Classes
             http://www.youtube.com/watch?v=UzxYlbK2c7E

     4 09-11-2011
What is Machine Learning?

    • Tom    Mitchell   (1998)   Well-posed   Learning
      Problem:
      A computer program is said to learn from
      experience E with respect to some class of
      tasks T and performance measure P, if its
      performance at tasks in T, as measured by P,
      improves with experience E.
    • Mark Dredze
      Teaching a computer about the world




     5 09-11-2011
What is Machine Learning?

    • Goal:
      Design and development of algorithms that allow
      computers to evolve behaviors based on
      empirical data, such as from sensor data or
      databases


    • How to apply machine Learning?
       •   Observe the world
       •   Develop models that match observations
       •   Teach computer to learn these models
       •   Computer applies learned model to the world



     6 09-11-2011
Example 1:
Prediction of House Price




       From: Andrew NG – Standford Machine Learning Classes
              http://www.youtube.com/watch?v=UzxYlbK2c7E

      7 09-11-2011
Example 2:
Learning to automatically classify text documents




           From: http://www.xcellerateit.com/



      8 09-11-2011
Example 3:
Face Detection and Tracking




         http://www.micc.unifi.it/projects/optimal-
                face-detection-and-tracking/

     9 09-11-2011
Example 4:
        Social Network Mining



                                 Users’
                                 Profile
                Friendship



                             Group &
                             Network
                                                                     U3



                                                           U1                   U5


             Hidden Information ?
                                                                U2         U4
From: Exploit of Online Social Networks with Community-Based
Graph Semi-Supervised Learning, Mingzhen Mo and Irwin King                           Group
ICONIP 2010, Sydney, Australia                                       Network


                    10 09-11-2011
Example 5:
Biometric Systems




                     1. Physical
                     2. Behavioral


     11 09-11-2011
WHAT DOES IT MEAN TO
LEARN?

  12 09-11-2011
What does it mean to learn?

    • Learn patterns in data


            z              Decision        ẋ
                           System



                     z : observed signal
                     ẋ Estimated output




     13 09-11-2011
Unsupervised Learning

    • Look for patterns in data
    • No training Data (no examples of output)
    • Pro:
       • No labeling of examples for output
    • Con:
       • Cannot demonstrate specific types of output
    • Applications:
       • Data mining
       • Finds interesting patterns in data
     From: Mark Dredze
     Machine Learning - Finding Patterns in the World

     14 09-11-2011
Supervised Learning

    • Learn patterns to simulate given output
    • Pro:
       • Can learn complex patterns
       • Good performance
    • Con:
       • Requires many examples of output for examples
    • Applications:
       • Classification
       • Sorts data into predefined groups

     From: Mark Dredze
     Machine Learning - Finding Patterns in the World

     15 09-11-2011
Types of Learning: Output
  • Classification
     •   Binary, multi‐class, multi‐label, hierarchical, etc.
     •   Classify email as spam
     •   Loss: accuracy
  • Ranking
     •   Order examples by preference
     •   Rank results of web search
     •   Loss: Swapped pairs
  • Regression
     •   Real‐valued output
     •   Predict the price of tomorrow’s stock price
     •   Loss: Squared loss
  • Structured prediction
     •   Sequences, trees, segmentation
     •   Find faces in an image
     •   Loss: Precision/Recall of faces

                       From: Mark Dredze
     16 09-11-2011     Machine Learning - Finding Patterns in the World
Classification Problem

    • Classical Architecture


    z                    Feature         y                            ẋ
                                                     Classification
                        Extraction




          z : observed signal
          y : feature vector (pattern)       y   S
          ẋ Estimated output (class)         ẋ   {1,2,…,c}




        17 09-11-2011
Classification Problem

• Example with 1 feature
• Problem: classify people in non-obese or obese by
  observation of its weight (only 1 feature)




   • Is it possible to classify without without making any
     mistakes?


                                                    18
      18
Classification Problem


• Example with 2 features

     z                    Feature     y = {weight,                        ẋ = non-obese
                                                         Classification
                         Extraction       Height}                           or obese




           z : observed signal
           y : feature vector (pattern)        y     S
           ẋ Estimated output (class)         ẋ      {1: non-obese, 2: obese}




         19 09-11-2011
Classification Problem

• Example with 2 feature
  • Problem: classify people in non-obese or obese by
    observation of its weight and height




  • Now the decision appears more simple!
                                                20
      20
Classification Problem

• Example with 2 feature
  • Problem: classify people in non-obese or obese by
    observation of its weight and height




  • Regiões de decisão: R1 : non-obese; R2 : obese
                                                     21
      21
Classification Problem

• Decision Regions
  • Goal of the classifier: define a partition of the feature space with
      c disjoint regions, called decision regions: : R1, R2, …, Rc




                                                                 22
         22
TEXT MINING EXAMPLE


  23 09-11-2011
Text Mining Process




              Adapted from: Introduction to Text Mining,
                     Yair Even-Zohar, University of Illinois
     24 09-11-2011
Text Mining Process
•    Text preprocessing
      •   Syntactic/Semantic text
          analysis
•    Features Generation
      •   Bag of words
•    Features Selection
      •   Simple counting
      •   Statistics
•    Text/Data Mining
      •   Classification- Supervised
          learning
      •   Clustering- Unsupervised
          learning
•    Analyzing results




            25 09-11-2011
Syntactic / Semantic text analysis

    • Part Of Speech (pos) tagging
               • Find the corresponding pos for each word
                      e.g., John (noun) gave (verb) the (det) ball (noun)


    • Word sense disambiguation
               • Context based or proximity based

    • Parsing
               • Generates a parse tree (graph) for each sentence
               • Each sentence is a stand alone graph


      26 09-11-2011
Feature Generation: Bag of words

    • Text document is represented by the words it
      contains (and their occurrences)
        •   e.g., “Lord of the rings”  {“the”, “Lord”, “rings”, “of”}
        •   Highly efficient
        •   Makes learning far simpler and easier
        •   Order of words is not that important for certain applications
    • Stemming: identifies a word by its root
       • e.g., flying, flew  fly
       • Reduce dimensionality
    • Stop words: The most common words are unlikely
      to help text mining
        •   e.g., “the”, “a”, “an”, “you” …



     27 09-11-2011
Example

   Hi,
   Here is your weekly update (that unfortunately hasn't gone
   out in about a month). Not much action here right now.

   1) Due to the unwavering insistence of a member of the
   group, the ncsa.d2k.modules.core.datatype package is month).
              hi, weekly update (that unfortunately gone out
   now completely independent of now. d2k application.
              much action here right the 1) due unwavering insistence
   2) Transformations are now handled differently in Tables. package
              member group, ncsa.d2k.modules.core.datatype
   Previously, transformations were done using a
              now completely independent d2k application. 2)
   TransformationModule. That handled could thentables. previously,
              transformations now module differently be added
   to a list that an ExampleTable kept. transformationmodule. module
              transformations done using Now, there is an
   interfaceadded list exampletable kept. sub-interface called
               called Transformation and a now, interface called
   ReversibleTransformation. unfortunate go out month much action here
                    hi week update
              transformation sub-interface called
                    right now 1 due unwaver insistence member group ncsa
              reversibletransformation.
                    d2k modules core datatype package now complete
                    independence d2k application 2 transformation now handle
                    different table previous transformation do use
                    transformationmodule module add list exampletable keep
                    now interface call transformation sub-interface call
                    reversibletransformation


     28 09-11-2011
Feature Generation: Weighting

                                                                   • Term Frequency

                                                Bag of Words

                                 Lorem                         1


                                                                     term ti, document dj
                                 dolor                         1



                                 Praesent                      1
                                                                   • Inverse Document Frequency
  Lorem ipsum dolor sit
   amet, consectetuer
 adipiscing elit. Praesent       iaculis                       1
  et quam sit amet diam
      porttitor iaculis.
 Vestibulum ante ipsum           Vestibulum                    1

  primis in faucibus orci
luctus et ultrices posuere
                                 ipsum                         2
       cubilia Curae;

                                 consectetuer                  2
                                                                   • TF-IDF




                     29 09-11-2011
Feature Generation: Vector Space Model




                             Documents as vectors




     30 09-11-2011
Feature Selection

    • Reduce dimensionality
      • Learners have difficulty addressing tasks with high
            dimensionality
    • Irrelevant features
       • Not all features help!
              • e.g., the existence of a noun in a news
                 article is unlikely to help classify it as
                 “politics” or “sport”
        •   Stop Words Removal




     31 09-11-2011
Example

        hi            core
        week          datatype
        update        package
        unfortunate   complete
        go            independence
        out           application
        month         2             hi               do
        much          transformationweek             core
        action        handle        update           datatype
        here          different     unfortunate      package
        right         table         go               complete
        now           previous      out              independence
        1             use           month         hi application datatype
        due                         much
                      transformationmodule           transformation
                                                  week             package
        unwaver       add           action           handle
                                                  update           complete
        insistence    list          here             different
                                                  unfortunate independence
        member        exampletable right             table
                                                  month            application
        group         keep          now              previous
        ncsa          interface     due           action
                                                     use           transformation
        d2k           call          insistence    right
                                                     add           handle
        modules       sub-interface member        duelist          different
        do                          group
                      reversibletransformation       keep
                                                  insistence       table
                                    ncsa             interface
                                                  member           previous
                                    d2k              call
                                                  group            add
                                    modules          sub-interface
                                                  ncsa           list
                                                  d2k            interface
                                                  modules        call
                                                  core           sub-interface

    32 09-11-2011
Document Similarity

                      • Dot Product – cosine
                        similarity




     33 09-11-2011
Text Mining: Classification definition

    • Given: a collection of labeled records
       (training set)
         •   Each record contains a set of features (attributes), and
             the true class (label)
    • Find: a model for the class as a function
       of the values of the features
    • Goal: previously unseen records should be
       assigned a class as accurately as possible
         •   A test set is used to determine the accuracy of the
             model. Usually, the given data set is divided into training
             and test sets, with training set used to build the model
             and test set used to validate it

      34 09-11-2011
Text Mining: Clustering definition

    • Given: a set of documents and a similarity
       measure among documents
    • Find: clusters such that:
       • Documents in one cluster are more similar to one another
       • Documents in separate clusters are less similar to one another
    • Goal:
       • Finding a correct set of documents




      35 09-11-2011
Supervised vs. Unsupervised Learning

    • Supervised learning (classification)
      • Supervision: The training data (observations,
            measurements, etc.) are accompanied by labels
            indicating the class of the observations
        •   New data is classified based on the training set

    • Unsupervised learning (clustering)
      • The class labels of training data is unknown
      • Given a set of measurements, observations, etc. with the
            aim of establishing the existence of classes or clusters in
            the data




     36 09-11-2011
CONCLUDING REMARKS


  37 09-11-2011
Readings

   • Survey Books in Machine Learning
     • The Elements of Statistical Learning
              • Hastie, Tibshirani, Friedman
   • Pattern Recognition and Machine Learning
     • Bishop
   • Machine Learning
     • Mitchell
   • Questions?




     38 09-11-2011
ACKNOWLEDGEMENTS

 • ISEL – DEETC
    • Final year and MSc supervised students (Tony Tam, ...)
    • Students of Digital Signal Processing
    • Artur Ferreira
 • Instituto Telecomunicações (IT)


 David Coutinho, Hugo Silva,     Ana Fred,    Mário Figueiredo
 • Fundação para a Ciência e Tecnologia (FCT)



     39 09-11-2011
www.it.pt




Thank you for the attention!
André Ribeiro Lourenço
Mail to:   alourenco@deetc.isel.ipl.pt
           arlourenco@gmail.com
    40

Más contenido relacionado

Destacado

Obama comment le Web 2.0 a changé la politique
Obama comment le Web 2.0 a changé la politiqueObama comment le Web 2.0 a changé la politique
Obama comment le Web 2.0 a changé la politiqueguest65231d
 
Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !
Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !
Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !Kantar
 
12.audit et controle interne
12.audit et controle interne12.audit et controle interne
12.audit et controle interneOULAAJEB YOUSSEF
 
Audit par cycle
Audit par cycleAudit par cycle
Audit par cyclenouritta
 
Décoder (ou construire) une image politique
Décoder (ou construire) une image politiqueDécoder (ou construire) une image politique
Décoder (ou construire) une image politiqueREALIZ
 
Déroulement d'une mission d'audit
Déroulement d'une mission d'auditDéroulement d'une mission d'audit
Déroulement d'une mission d'auditBRAHIM MELLOUL
 
Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...
Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...
Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...Miriam drissi kaitouni
 
Machine pour voir_dans_le_futur_!!!
Machine pour voir_dans_le_futur_!!!Machine pour voir_dans_le_futur_!!!
Machine pour voir_dans_le_futur_!!!Jean-Pierre Jetil
 
Barack Obama et la présidentielle 2012 (1/2)
Barack Obama et la présidentielle 2012 (1/2)Barack Obama et la présidentielle 2012 (1/2)
Barack Obama et la présidentielle 2012 (1/2)Newday
 
Campagne Obama 2008 (Octobre 2008)
Campagne Obama 2008 (Octobre 2008)Campagne Obama 2008 (Octobre 2008)
Campagne Obama 2008 (Octobre 2008)Newday
 

Destacado (15)

Obama comment le Web 2.0 a changé la politique
Obama comment le Web 2.0 a changé la politiqueObama comment le Web 2.0 a changé la politique
Obama comment le Web 2.0 a changé la politique
 
Stratégie de communication politique - François Fillon
Stratégie de communication politique - François FillonStratégie de communication politique - François Fillon
Stratégie de communication politique - François Fillon
 
Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !
Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !
Barack Obama : la bataille s' est également jouée sur les réseaux sociaux !
 
Stratégie de communication politique - Dominique Strauss-Kahn
Stratégie de communication politique - Dominique Strauss-KahnStratégie de communication politique - Dominique Strauss-Kahn
Stratégie de communication politique - Dominique Strauss-Kahn
 
Cartographie Métier : méthodologie
Cartographie Métier : méthodologieCartographie Métier : méthodologie
Cartographie Métier : méthodologie
 
12.audit et controle interne
12.audit et controle interne12.audit et controle interne
12.audit et controle interne
 
Audit par cycle
Audit par cycleAudit par cycle
Audit par cycle
 
Décoder (ou construire) une image politique
Décoder (ou construire) une image politiqueDécoder (ou construire) une image politique
Décoder (ou construire) une image politique
 
Audit achat
Audit achatAudit achat
Audit achat
 
Déroulement d'une mission d'audit
Déroulement d'une mission d'auditDéroulement d'une mission d'audit
Déroulement d'une mission d'audit
 
Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...
Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...
Audit opérationnel - Évaluation des procédures du cycle achats/fournisseurs a...
 
Machine pour voir_dans_le_futur_!!!
Machine pour voir_dans_le_futur_!!!Machine pour voir_dans_le_futur_!!!
Machine pour voir_dans_le_futur_!!!
 
Barack Obama et la présidentielle 2012 (1/2)
Barack Obama et la présidentielle 2012 (1/2)Barack Obama et la présidentielle 2012 (1/2)
Barack Obama et la présidentielle 2012 (1/2)
 
Campagne Obama 2008 (Octobre 2008)
Campagne Obama 2008 (Octobre 2008)Campagne Obama 2008 (Octobre 2008)
Campagne Obama 2008 (Octobre 2008)
 
Rama Yade
Rama YadeRama Yade
Rama Yade
 

Similar a Machine Learning: Learning with data

Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Deep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial IntelligenceDeep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial IntelligenceRukshan Batuwita
 
ppt on introduction to Machine learning tools
ppt on introduction to Machine learning toolsppt on introduction to Machine learning tools
ppt on introduction to Machine learning toolsRaviKiranVarma4
 
S.P.A.C.E. Exploration for Software Engineering
 S.P.A.C.E. Exploration for Software Engineering S.P.A.C.E. Exploration for Software Engineering
S.P.A.C.E. Exploration for Software EngineeringCS, NcState
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
IntroML_1_Introduction_Tagged.pdf
IntroML_1_Introduction_Tagged.pdfIntroML_1_Introduction_Tagged.pdf
IntroML_1_Introduction_Tagged.pdfElio Laureano
 
IntroML_1_Introduction
IntroML_1_IntroductionIntroML_1_Introduction
IntroML_1_IntroductionElio Laureano
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
lecture_01.pptx - PowerPoint Presentation
lecture_01.pptx - PowerPoint Presentationlecture_01.pptx - PowerPoint Presentation
lecture_01.pptx - PowerPoint Presentationbutest
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkk01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkkJesusTekonbo
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Adoption-Centric Knowledge Engineering
Adoption-Centric Knowledge EngineeringAdoption-Centric Knowledge Engineering
Adoption-Centric Knowledge EngineeringNeil Ernst
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Thilo Stadelmann
 
Teaching Object Oriented Programming Courses by Sandeep K Singh JIIT,Noida
Teaching Object Oriented Programming Courses by Sandeep K Singh JIIT,NoidaTeaching Object Oriented Programming Courses by Sandeep K Singh JIIT,Noida
Teaching Object Oriented Programming Courses by Sandeep K Singh JIIT,NoidaDr. Sandeep Kumar Singh
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learningKiran Lonikar
 

Similar a Machine Learning: Learning with data (20)

Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Deep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial IntelligenceDeep Learning: Towards General Artificial Intelligence
Deep Learning: Towards General Artificial Intelligence
 
ppt on introduction to Machine learning tools
ppt on introduction to Machine learning toolsppt on introduction to Machine learning tools
ppt on introduction to Machine learning tools
 
S.P.A.C.E. Exploration for Software Engineering
 S.P.A.C.E. Exploration for Software Engineering S.P.A.C.E. Exploration for Software Engineering
S.P.A.C.E. Exploration for Software Engineering
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
IntroML_1_Introduction_Tagged.pdf
IntroML_1_Introduction_Tagged.pdfIntroML_1_Introduction_Tagged.pdf
IntroML_1_Introduction_Tagged.pdf
 
IntroML_1_Introduction
IntroML_1_IntroductionIntroML_1_Introduction
IntroML_1_Introduction
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
lecture_01.pptx - PowerPoint Presentation
lecture_01.pptx - PowerPoint Presentationlecture_01.pptx - PowerPoint Presentation
lecture_01.pptx - PowerPoint Presentation
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Promise notes
Promise notesPromise notes
Promise notes
 
01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkk01_introduction.pdfbnmelllleitrthnjjjkkk
01_introduction.pdfbnmelllleitrthnjjjkkk
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Adoption-Centric Knowledge Engineering
Adoption-Centric Knowledge EngineeringAdoption-Centric Knowledge Engineering
Adoption-Centric Knowledge Engineering
 
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
Deep Learning @ ZHAW Datalab (with Mark Cieliebak & Yves Pauchard)
 
Teaching Object Oriented Programming Courses by Sandeep K Singh JIIT,Noida
Teaching Object Oriented Programming Courses by Sandeep K Singh JIIT,NoidaTeaching Object Oriented Programming Courses by Sandeep K Singh JIIT,Noida
Teaching Object Oriented Programming Courses by Sandeep K Singh JIIT,Noida
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learning
 

Más de ONE Talks

Rui Aires, All-desk
Rui Aires, All-deskRui Aires, All-desk
Rui Aires, All-deskONE Talks
 
ONE Talks * Soraya * Inocrowd
ONE Talks * Soraya * InocrowdONE Talks * Soraya * Inocrowd
ONE Talks * Soraya * InocrowdONE Talks
 
One talks * luis martins * zaask
One talks * luis martins * zaaskOne talks * luis martins * zaask
One talks * luis martins * zaaskONE Talks
 
Onetalk * filipe alves * the great transition
Onetalk * filipe alves * the great transitionOnetalk * filipe alves * the great transition
Onetalk * filipe alves * the great transitionONE Talks
 
One talks * claudian dobos * transição
One talks * claudian dobos * transiçãoOne talks * claudian dobos * transição
One talks * claudian dobos * transiçãoONE Talks
 
ONE Talks * Luis Morgadinho * The Human Element
ONE Talks * Luis Morgadinho * The Human ElementONE Talks * Luis Morgadinho * The Human Element
ONE Talks * Luis Morgadinho * The Human ElementONE Talks
 
2012 02-23 one talk mario alves
2012 02-23 one talk mario alves2012 02-23 one talk mario alves
2012 02-23 one talk mario alvesONE Talks
 
ONE Talks * Cenas a Pedais
ONE Talks * Cenas a PedaisONE Talks * Cenas a Pedais
ONE Talks * Cenas a PedaisONE Talks
 
One talks - mubi - cesar marques
One talks - mubi - cesar marquesOne talks - mubi - cesar marques
One talks - mubi - cesar marquesONE Talks
 
ONE Talks * Marco de Abreu * MOVIMENTO manifesto
ONE Talks * Marco de Abreu * MOVIMENTO manifestoONE Talks * Marco de Abreu * MOVIMENTO manifesto
ONE Talks * Marco de Abreu * MOVIMENTO manifestoONE Talks
 
One talks cc - susana e maria
One talks   cc - susana e mariaOne talks   cc - susana e maria
One talks cc - susana e mariaONE Talks
 
ONE Talks * Permacultura * Filipe Alves
ONE Talks * Permacultura * Filipe AlvesONE Talks * Permacultura * Filipe Alves
ONE Talks * Permacultura * Filipe AlvesONE Talks
 
about ALT Lab
about ALT Lab about ALT Lab
about ALT Lab ONE Talks
 
Partilha de Informação
Partilha de InformaçãoPartilha de Informação
Partilha de InformaçãoONE Talks
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine LearningONE Talks
 
Empreendedorismo nunca!
Empreendedorismo nunca!Empreendedorismo nunca!
Empreendedorismo nunca!ONE Talks
 

Más de ONE Talks (16)

Rui Aires, All-desk
Rui Aires, All-deskRui Aires, All-desk
Rui Aires, All-desk
 
ONE Talks * Soraya * Inocrowd
ONE Talks * Soraya * InocrowdONE Talks * Soraya * Inocrowd
ONE Talks * Soraya * Inocrowd
 
One talks * luis martins * zaask
One talks * luis martins * zaaskOne talks * luis martins * zaask
One talks * luis martins * zaask
 
Onetalk * filipe alves * the great transition
Onetalk * filipe alves * the great transitionOnetalk * filipe alves * the great transition
Onetalk * filipe alves * the great transition
 
One talks * claudian dobos * transição
One talks * claudian dobos * transiçãoOne talks * claudian dobos * transição
One talks * claudian dobos * transição
 
ONE Talks * Luis Morgadinho * The Human Element
ONE Talks * Luis Morgadinho * The Human ElementONE Talks * Luis Morgadinho * The Human Element
ONE Talks * Luis Morgadinho * The Human Element
 
2012 02-23 one talk mario alves
2012 02-23 one talk mario alves2012 02-23 one talk mario alves
2012 02-23 one talk mario alves
 
ONE Talks * Cenas a Pedais
ONE Talks * Cenas a PedaisONE Talks * Cenas a Pedais
ONE Talks * Cenas a Pedais
 
One talks - mubi - cesar marques
One talks - mubi - cesar marquesOne talks - mubi - cesar marques
One talks - mubi - cesar marques
 
ONE Talks * Marco de Abreu * MOVIMENTO manifesto
ONE Talks * Marco de Abreu * MOVIMENTO manifestoONE Talks * Marco de Abreu * MOVIMENTO manifesto
ONE Talks * Marco de Abreu * MOVIMENTO manifesto
 
One talks cc - susana e maria
One talks   cc - susana e mariaOne talks   cc - susana e maria
One talks cc - susana e maria
 
ONE Talks * Permacultura * Filipe Alves
ONE Talks * Permacultura * Filipe AlvesONE Talks * Permacultura * Filipe Alves
ONE Talks * Permacultura * Filipe Alves
 
about ALT Lab
about ALT Lab about ALT Lab
about ALT Lab
 
Partilha de Informação
Partilha de InformaçãoPartilha de Informação
Partilha de Informação
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine Learning
 
Empreendedorismo nunca!
Empreendedorismo nunca!Empreendedorismo nunca!
Empreendedorismo nunca!
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Machine Learning: Learning with data

  • 1. 10/11/2011 - ONE Talks Machine Learning Learning with Data André Lourenço Instituto Superior de Engenharia de Lisboa, Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal © 2005, it - instituto de telecomunicações. Todos os direitos reservados.
  • 2. Outline • Introduction • Examples • What does it mean to learn? • Supervised and Unsupervised Learning • Types of Learning • Classification Problem • Text Mining Example • Conclusions (and further reading) 2
  • 4. What is Machine Learning? • A branch of artificial intelligence (AI) • Arthur Samuel (1959) Field of study that gives computers the ability to learn without being explicitly programmed From: Andrew NG – Standford Machine Learning Classes http://www.youtube.com/watch?v=UzxYlbK2c7E 4 09-11-2011
  • 5. What is Machine Learning? • Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. • Mark Dredze Teaching a computer about the world 5 09-11-2011
  • 6. What is Machine Learning? • Goal: Design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases • How to apply machine Learning? • Observe the world • Develop models that match observations • Teach computer to learn these models • Computer applies learned model to the world 6 09-11-2011
  • 7. Example 1: Prediction of House Price From: Andrew NG – Standford Machine Learning Classes http://www.youtube.com/watch?v=UzxYlbK2c7E 7 09-11-2011
  • 8. Example 2: Learning to automatically classify text documents From: http://www.xcellerateit.com/ 8 09-11-2011
  • 9. Example 3: Face Detection and Tracking http://www.micc.unifi.it/projects/optimal- face-detection-and-tracking/ 9 09-11-2011
  • 10. Example 4: Social Network Mining Users’ Profile Friendship Group & Network U3 U1 U5 Hidden Information ? U2 U4 From: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning, Mingzhen Mo and Irwin King Group ICONIP 2010, Sydney, Australia Network 10 09-11-2011
  • 11. Example 5: Biometric Systems 1. Physical 2. Behavioral 11 09-11-2011
  • 12. WHAT DOES IT MEAN TO LEARN? 12 09-11-2011
  • 13. What does it mean to learn? • Learn patterns in data z Decision ẋ System z : observed signal ẋ Estimated output 13 09-11-2011
  • 14. Unsupervised Learning • Look for patterns in data • No training Data (no examples of output) • Pro: • No labeling of examples for output • Con: • Cannot demonstrate specific types of output • Applications: • Data mining • Finds interesting patterns in data From: Mark Dredze Machine Learning - Finding Patterns in the World 14 09-11-2011
  • 15. Supervised Learning • Learn patterns to simulate given output • Pro: • Can learn complex patterns • Good performance • Con: • Requires many examples of output for examples • Applications: • Classification • Sorts data into predefined groups From: Mark Dredze Machine Learning - Finding Patterns in the World 15 09-11-2011
  • 16. Types of Learning: Output • Classification • Binary, multi‐class, multi‐label, hierarchical, etc. • Classify email as spam • Loss: accuracy • Ranking • Order examples by preference • Rank results of web search • Loss: Swapped pairs • Regression • Real‐valued output • Predict the price of tomorrow’s stock price • Loss: Squared loss • Structured prediction • Sequences, trees, segmentation • Find faces in an image • Loss: Precision/Recall of faces From: Mark Dredze 16 09-11-2011 Machine Learning - Finding Patterns in the World
  • 17. Classification Problem • Classical Architecture z Feature y ẋ Classification Extraction z : observed signal y : feature vector (pattern) y S ẋ Estimated output (class) ẋ {1,2,…,c} 17 09-11-2011
  • 18. Classification Problem • Example with 1 feature • Problem: classify people in non-obese or obese by observation of its weight (only 1 feature) • Is it possible to classify without without making any mistakes? 18 18
  • 19. Classification Problem • Example with 2 features z Feature y = {weight, ẋ = non-obese Classification Extraction Height} or obese z : observed signal y : feature vector (pattern) y S ẋ Estimated output (class) ẋ {1: non-obese, 2: obese} 19 09-11-2011
  • 20. Classification Problem • Example with 2 feature • Problem: classify people in non-obese or obese by observation of its weight and height • Now the decision appears more simple! 20 20
  • 21. Classification Problem • Example with 2 feature • Problem: classify people in non-obese or obese by observation of its weight and height • Regiões de decisão: R1 : non-obese; R2 : obese 21 21
  • 22. Classification Problem • Decision Regions • Goal of the classifier: define a partition of the feature space with c disjoint regions, called decision regions: : R1, R2, …, Rc 22 22
  • 23. TEXT MINING EXAMPLE 23 09-11-2011
  • 24. Text Mining Process Adapted from: Introduction to Text Mining, Yair Even-Zohar, University of Illinois 24 09-11-2011
  • 25. Text Mining Process • Text preprocessing • Syntactic/Semantic text analysis • Features Generation • Bag of words • Features Selection • Simple counting • Statistics • Text/Data Mining • Classification- Supervised learning • Clustering- Unsupervised learning • Analyzing results 25 09-11-2011
  • 26. Syntactic / Semantic text analysis • Part Of Speech (pos) tagging • Find the corresponding pos for each word e.g., John (noun) gave (verb) the (det) ball (noun) • Word sense disambiguation • Context based or proximity based • Parsing • Generates a parse tree (graph) for each sentence • Each sentence is a stand alone graph 26 09-11-2011
  • 27. Feature Generation: Bag of words • Text document is represented by the words it contains (and their occurrences) • e.g., “Lord of the rings”  {“the”, “Lord”, “rings”, “of”} • Highly efficient • Makes learning far simpler and easier • Order of words is not that important for certain applications • Stemming: identifies a word by its root • e.g., flying, flew  fly • Reduce dimensionality • Stop words: The most common words are unlikely to help text mining • e.g., “the”, “a”, “an”, “you” … 27 09-11-2011
  • 28. Example Hi, Here is your weekly update (that unfortunately hasn't gone out in about a month). Not much action here right now. 1) Due to the unwavering insistence of a member of the group, the ncsa.d2k.modules.core.datatype package is month). hi, weekly update (that unfortunately gone out now completely independent of now. d2k application. much action here right the 1) due unwavering insistence 2) Transformations are now handled differently in Tables. package member group, ncsa.d2k.modules.core.datatype Previously, transformations were done using a now completely independent d2k application. 2) TransformationModule. That handled could thentables. previously, transformations now module differently be added to a list that an ExampleTable kept. transformationmodule. module transformations done using Now, there is an interfaceadded list exampletable kept. sub-interface called called Transformation and a now, interface called ReversibleTransformation. unfortunate go out month much action here hi week update transformation sub-interface called right now 1 due unwaver insistence member group ncsa reversibletransformation. d2k modules core datatype package now complete independence d2k application 2 transformation now handle different table previous transformation do use transformationmodule module add list exampletable keep now interface call transformation sub-interface call reversibletransformation 28 09-11-2011
  • 29. Feature Generation: Weighting • Term Frequency Bag of Words Lorem 1 term ti, document dj dolor 1 Praesent 1 • Inverse Document Frequency Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent iaculis 1 et quam sit amet diam porttitor iaculis. Vestibulum ante ipsum Vestibulum 1 primis in faucibus orci luctus et ultrices posuere ipsum 2 cubilia Curae; consectetuer 2 • TF-IDF 29 09-11-2011
  • 30. Feature Generation: Vector Space Model Documents as vectors 30 09-11-2011
  • 31. Feature Selection • Reduce dimensionality • Learners have difficulty addressing tasks with high dimensionality • Irrelevant features • Not all features help! • e.g., the existence of a noun in a news article is unlikely to help classify it as “politics” or “sport” • Stop Words Removal 31 09-11-2011
  • 32. Example hi core week datatype update package unfortunate complete go independence out application month 2 hi do much transformationweek core action handle update datatype here different unfortunate package right table go complete now previous out independence 1 use month hi application datatype due much transformationmodule transformation week package unwaver add action handle update complete insistence list here different unfortunate independence member exampletable right table month application group keep now previous ncsa interface due action use transformation d2k call insistence right add handle modules sub-interface member duelist different do group reversibletransformation keep insistence table ncsa interface member previous d2k call group add modules sub-interface ncsa list d2k interface modules call core sub-interface 32 09-11-2011
  • 33. Document Similarity • Dot Product – cosine similarity 33 09-11-2011
  • 34. Text Mining: Classification definition • Given: a collection of labeled records (training set) • Each record contains a set of features (attributes), and the true class (label) • Find: a model for the class as a function of the values of the features • Goal: previously unseen records should be assigned a class as accurately as possible • A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it 34 09-11-2011
  • 35. Text Mining: Clustering definition • Given: a set of documents and a similarity measure among documents • Find: clusters such that: • Documents in one cluster are more similar to one another • Documents in separate clusters are less similar to one another • Goal: • Finding a correct set of documents 35 09-11-2011
  • 36. Supervised vs. Unsupervised Learning • Supervised learning (classification) • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning (clustering) • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data 36 09-11-2011
  • 37. CONCLUDING REMARKS 37 09-11-2011
  • 38. Readings • Survey Books in Machine Learning • The Elements of Statistical Learning • Hastie, Tibshirani, Friedman • Pattern Recognition and Machine Learning • Bishop • Machine Learning • Mitchell • Questions? 38 09-11-2011
  • 39. ACKNOWLEDGEMENTS • ISEL – DEETC • Final year and MSc supervised students (Tony Tam, ...) • Students of Digital Signal Processing • Artur Ferreira • Instituto Telecomunicações (IT) David Coutinho, Hugo Silva, Ana Fred, Mário Figueiredo • Fundação para a Ciência e Tecnologia (FCT) 39 09-11-2011
  • 40. www.it.pt Thank you for the attention! André Ribeiro Lourenço Mail to: alourenco@deetc.isel.ipl.pt arlourenco@gmail.com 40