SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Data-driven modeling
                                           APAM E4990


                                           Jake Hofman

                                          Columbia University


                                         January 30, 2012




Jake Hofman   (Columbia University)        Data-driven modeling   January 30, 2012   1 / 23
Outline



      1 Digit recognition



      2 Image classification



      3 Acquiring image data




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   2 / 23
Digit recognition

          Classification is an supervised learning task by which we aim to
             predict the correct label for an example given its features




                                               ↓
                                  0 5 4 1 4 9
              e.g. determine which digit {0, 1, . . . , 9} is in depicted in each
                                         image


Jake Hofman    (Columbia University)    Data-driven modeling            January 30, 2012   3 / 23
Digit recognition

          Determine which digit {0, 1, . . . , 9} is in depicted in each image




Jake Hofman   (Columbia University)   Data-driven modeling          January 30, 2012   4 / 23
Images as arrays


               Grayscale images ↔ 2-d arrays of M × N pixel intensities




Jake Hofman   (Columbia University)   Data-driven modeling       January 30, 2012   5 / 23
Images as arrays

               Grayscale images ↔ 2-d arrays of M × N pixel intensities




          Represent each image as a “vector of pixels”, flattening the 2-d
                          array of pixels to a 1-d vector

Jake Hofman   (Columbia University)   Data-driven modeling       January 30, 2012   5 / 23
k-nearest neighbors classification
         k-nearest neighbors: memorize training examples, predict labels
                   using labels of the k closest training points




                          Intuition: nearby points have similar labels

Jake Hofman   (Columbia University)      Data-driven modeling            January 30, 2012   6 / 23
k-nearest neighbors classification




              Small k gives a complex boundary, large k results in coarse
                                     averaging


Jake Hofman    (Columbia University)   Data-driven modeling      January 30, 2012   7 / 23
k-nearest neighbors classification




                Evaluate performance on a held-out test set to assess
                                generalization error


Jake Hofman   (Columbia University)   Data-driven modeling      January 30, 2012   7 / 23
Digit recognition


       Simple digit classifer with k=1 nearest neighbors
          ./ classify_digits . py




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   8 / 23
Outline



      1 Digit recognition



      2 Image classification



      3 Acquiring image data




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   9 / 23
Image classification


                      Determine if an image is a landscape or headshot




                                            ↓                 ↓
                                       ’landscape’        ’headshot’

              Represent each image with a binned RGB intensity histogram




Jake Hofman    (Columbia University)         Data-driven modeling      January 30, 2012   10 / 23
Images as arrays
          Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities




       import matplotlib . image as mpimg
       I = mpimg . imread ( ' chairs . jpg ')

Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   11 / 23
Images as arrays
          Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities




       import matplotlib . image as mpimg
       I = mpimg . imread ( ' chairs . jpg ')


Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   11 / 23
Intensity histograms

        Disregard all spatial information, simply count pixels by intensities
               (e.g. lots of pixels with bright green and dark blue)




Jake Hofman   (Columbia University)   Data-driven modeling       January 30, 2012   12 / 23
Intensity histograms

                                How many bins for pixel intensities?




         Too many bins gives a noisy, overly complex representation of
                                  the data,
           while using too few bins results in an overly simple one


Jake Hofman   (Columbia University)        Data-driven modeling        January 30, 2012   13 / 23
Image classification

       Classify
           ./ classify_flickr . py 16 9
               flickr_headshot flickr_landscape




       Change in performance on test set with number of neighbors
       k      =   1,    accuracy      =   0.7125
       k      =   3,    accuracy      =   0.7425
       k      =   5,    accuracy      =   0.7725
       k      =   7,    accuracy      =   0.7650
       k      =   9,    accuracy      =   0.7500


Jake Hofman   (Columbia University)       Data-driven modeling   January 30, 2012   14 / 23
Outline



      1 Digit recognition



      2 Image classification



      3 Acquiring image data




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   15 / 23
Simple screen scraping




       One-liner to download ESL digit data
          wget - Nr -- level =1 --no - parent http :// www -
            stat . stanford . edu /~ tibs / ElemStatLearn /
            datasets / zip . digits




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   16 / 23
Simple screen scraping


       One-liner to scrape images from a webpage
          wget -O - http :// bit . ly / zxy0jN |
            tr ' ' ' ' " = ' 'n ' |
            egrep '^ http .*( png | jpg | gif ) ' |
            xargs wget




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   17 / 23
Simple screen scraping


       One-liner to scrape images from a webpage
          wget -O - http :// bit . ly / zxy0jN |
            tr ' ' ' ' " = ' 'n ' |
            egrep '^ http .*( png | jpg | gif ) ' |
            xargs wget



              • get page source
              • translate quotes and = to newlines
              • match urls with image extensions
              • download qualifying images



Jake Hofman    (Columbia University)   Data-driven modeling   January 30, 2012   17 / 23
“cat flickr xargs wget”?




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   18 / 23
Flickr API




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   19 / 23
YQL: SELECT * FROM Internet1




                                   http://developer.yahoo.com/yql




              1
                  http://oreillynet.com/pub/e/1369
Jake Hofman       (Columbia University)      Data-driven modeling   January 30, 2012   20 / 23
YQL: Console




                       http://developer.yahoo.com/yql/console


Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   21 / 23
YQL + Python
       Python function for public YQL queries
        def yql_public ( query , env = False ):
            # build dictionary of GET parameters
            params = { 'q ': query , ' format ': ' json '}
            if env :
                params [ ' env '] = env

                 # escape query
                 query_str = urlencode ( params )

                 # fetch results
                 url = '% s ?% s ' % ( YQL_PUBLIC , query_str )
                 result = urlopen ( url )

                 # parse json and return
                 return json . load ( result )[ ' query ' ][ ' results ']
Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   22 / 23
YQL + Python + Flickr


       Fetch info for “interestingness” photos
          ./ simpleyql . py ' select * from flickr . photos
            . interestingness (20) where api_key = " ... " '



       Download thumbnails for photos tagged with “vivid”
          ./ download_flickr . py vivid 500 < api_key >




Jake Hofman   (Columbia University)   Data-driven modeling   January 30, 2012   23 / 23

Más contenido relacionado

Destacado

Chapter 1 market & marketing
Chapter 1 market & marketingChapter 1 market & marketing
Chapter 1 market & marketingHo Cao Viet
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216Pat Maher
 
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Rijksdienst voor Ondernemend Nederland
 
1172. su.-.pps_angelines
1172.  su.-.pps_angelines1172.  su.-.pps_angelines
1172. su.-.pps_angelinesfilipj2000
 
K state candidacy presentation
K state candidacy presentationK state candidacy presentation
K state candidacy presentationTeague Allen
 
Cost Reductions Process - Results
Cost Reductions Process - ResultsCost Reductions Process - Results
Cost Reductions Process - Resultsoferyuval
 
Successful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB toolsSuccessful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB toolsPrageeth Sandakalum
 
Towers of today
Towers of todayTowers of today
Towers of todayfilipj2000
 
Letter to arvind kejriwal dec 31 13 water issue
Letter to arvind kejriwal dec 31 13   water issueLetter to arvind kejriwal dec 31 13   water issue
Letter to arvind kejriwal dec 31 13 water issueMadhukar Varshney
 
PropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solutionPropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solutionRatnesh1979
 
Praias παραλίες
Praias   παραλίεςPraias   παραλίες
Praias παραλίεςfilipj2000
 
Internet Message
Internet MessageInternet Message
Internet Messagebearobo
 

Destacado (20)

Chapter 1 market & marketing
Chapter 1 market & marketingChapter 1 market & marketing
Chapter 1 market & marketing
 
Orosz fotosok
Orosz fotosokOrosz fotosok
Orosz fotosok
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
 
产品早期市场推广探路实践 by XDash
产品早期市场推广探路实践 by XDash产品早期市场推广探路实践 by XDash
产品早期市场推广探路实践 by XDash
 
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
Verkiezing schaduwburgemeester dorpscafé malden ruud van gisteren d.d. 3 3-20...
 
Creativity
CreativityCreativity
Creativity
 
Reuters2
Reuters2Reuters2
Reuters2
 
1172. su.-.pps_angelines
1172.  su.-.pps_angelines1172.  su.-.pps_angelines
1172. su.-.pps_angelines
 
K state candidacy presentation
K state candidacy presentationK state candidacy presentation
K state candidacy presentation
 
Cost Reductions Process - Results
Cost Reductions Process - ResultsCost Reductions Process - Results
Cost Reductions Process - Results
 
Successful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB toolsSuccessful Employee Engagement through Sharepoint OOTB tools
Successful Employee Engagement through Sharepoint OOTB tools
 
Towers of today
Towers of todayTowers of today
Towers of today
 
Letter to arvind kejriwal dec 31 13 water issue
Letter to arvind kejriwal dec 31 13   water issueLetter to arvind kejriwal dec 31 13   water issue
Letter to arvind kejriwal dec 31 13 water issue
 
PropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solutionPropSafe - property renting & mgmt solution
PropSafe - property renting & mgmt solution
 
Tearsofawoman
TearsofawomanTearsofawoman
Tearsofawoman
 
Praias παραλίες
Praias   παραλίεςPraias   παραλίες
Praias παραλίες
 
Grecia linda
Grecia lindaGrecia linda
Grecia linda
 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
 
Imp local
Imp localImp local
Imp local
 
Internet Message
Internet MessageInternet Message
Internet Message
 

Más de jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part Ijakehofman
 

Más de jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 

Último

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Último (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Data-driven modeling: Lecture 02

  • 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University January 30, 2012 Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 1 / 23
  • 2. Outline 1 Digit recognition 2 Image classification 3 Acquiring image data Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 2 / 23
  • 3. Digit recognition Classification is an supervised learning task by which we aim to predict the correct label for an example given its features ↓ 0 5 4 1 4 9 e.g. determine which digit {0, 1, . . . , 9} is in depicted in each image Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 3 / 23
  • 4. Digit recognition Determine which digit {0, 1, . . . , 9} is in depicted in each image Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 4 / 23
  • 5. Images as arrays Grayscale images ↔ 2-d arrays of M × N pixel intensities Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 5 / 23
  • 6. Images as arrays Grayscale images ↔ 2-d arrays of M × N pixel intensities Represent each image as a “vector of pixels”, flattening the 2-d array of pixels to a 1-d vector Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 5 / 23
  • 7. k-nearest neighbors classification k-nearest neighbors: memorize training examples, predict labels using labels of the k closest training points Intuition: nearby points have similar labels Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 6 / 23
  • 8. k-nearest neighbors classification Small k gives a complex boundary, large k results in coarse averaging Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 7 / 23
  • 9. k-nearest neighbors classification Evaluate performance on a held-out test set to assess generalization error Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 7 / 23
  • 10. Digit recognition Simple digit classifer with k=1 nearest neighbors ./ classify_digits . py Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 8 / 23
  • 11. Outline 1 Digit recognition 2 Image classification 3 Acquiring image data Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 9 / 23
  • 12. Image classification Determine if an image is a landscape or headshot ↓ ↓ ’landscape’ ’headshot’ Represent each image with a binned RGB intensity histogram Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 10 / 23
  • 13. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( ' chairs . jpg ') Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 11 / 23
  • 14. Images as arrays Color images ↔ 3-d arrays of M × N × 3 RGB pixel intensities import matplotlib . image as mpimg I = mpimg . imread ( ' chairs . jpg ') Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 11 / 23
  • 15. Intensity histograms Disregard all spatial information, simply count pixels by intensities (e.g. lots of pixels with bright green and dark blue) Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 12 / 23
  • 16. Intensity histograms How many bins for pixel intensities? Too many bins gives a noisy, overly complex representation of the data, while using too few bins results in an overly simple one Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 13 / 23
  • 17. Image classification Classify ./ classify_flickr . py 16 9 flickr_headshot flickr_landscape Change in performance on test set with number of neighbors k = 1, accuracy = 0.7125 k = 3, accuracy = 0.7425 k = 5, accuracy = 0.7725 k = 7, accuracy = 0.7650 k = 9, accuracy = 0.7500 Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 14 / 23
  • 18. Outline 1 Digit recognition 2 Image classification 3 Acquiring image data Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 15 / 23
  • 19. Simple screen scraping One-liner to download ESL digit data wget - Nr -- level =1 --no - parent http :// www - stat . stanford . edu /~ tibs / ElemStatLearn / datasets / zip . digits Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 16 / 23
  • 20. Simple screen scraping One-liner to scrape images from a webpage wget -O - http :// bit . ly / zxy0jN | tr ' ' ' ' " = ' 'n ' | egrep '^ http .*( png | jpg | gif ) ' | xargs wget Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 17 / 23
  • 21. Simple screen scraping One-liner to scrape images from a webpage wget -O - http :// bit . ly / zxy0jN | tr ' ' ' ' " = ' 'n ' | egrep '^ http .*( png | jpg | gif ) ' | xargs wget • get page source • translate quotes and = to newlines • match urls with image extensions • download qualifying images Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 17 / 23
  • 22. “cat flickr xargs wget”? Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 18 / 23
  • 23. Flickr API Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 19 / 23
  • 24. YQL: SELECT * FROM Internet1 http://developer.yahoo.com/yql 1 http://oreillynet.com/pub/e/1369 Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 20 / 23
  • 25. YQL: Console http://developer.yahoo.com/yql/console Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 21 / 23
  • 26. YQL + Python Python function for public YQL queries def yql_public ( query , env = False ): # build dictionary of GET parameters params = { 'q ': query , ' format ': ' json '} if env : params [ ' env '] = env # escape query query_str = urlencode ( params ) # fetch results url = '% s ?% s ' % ( YQL_PUBLIC , query_str ) result = urlopen ( url ) # parse json and return return json . load ( result )[ ' query ' ][ ' results '] Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 22 / 23
  • 27. YQL + Python + Flickr Fetch info for “interestingness” photos ./ simpleyql . py ' select * from flickr . photos . interestingness (20) where api_key = " ... " ' Download thumbnails for photos tagged with “vivid” ./ download_flickr . py vivid 500 < api_key > Jake Hofman (Columbia University) Data-driven modeling January 30, 2012 23 / 23