SlideShare una empresa de Scribd logo
1 de 16
Classification
APAM E4990
Computational Social Science
Jake Hofman
Columbia University
April 26, 2013
Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
Prediction a la Bayes1
• You’re testing for a rare condition:
• 1% of the student population is in this class
• You have a highly sensitive and specific test:
• 99% of students in the class visit compsocialscience.org
• 99% of students who aren’t in the class don’t visit this site
• Given that a student visits the course site, what is probability
the student is in our class?
1
Follows Wiggins, SciAm 2006
Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
So given that a student visits the site (198 ppl), there is a 50%
chance the student is in our class (99 ppl)!
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Prediction a la Bayes
Students
10,000 ppl
1% In class
100 ppl
99% Visit
99 ppl
1% Don’t visit
1 per
99% Not in class
9900 ppl
1% Visit
99 ppl
99% Don’t visit
9801 ppl
The small error rate on the large population outside of our class
produces many false positives.
Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
Inverting conditional probabilities
Bayes’ Theorem
Equate the far right- and left-hand sides of product rule
p (y|x) p (x) = p (x, y) = p (x|y) p (y)
and divide to get the probability of y given x from the probability
of x given y:
p (y|x) =
p (x|y) p (y)
p (x)
where p (x) = y∈ΩY
p (x|y) p (y) is the normalization constant.
Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
Predictions a la Bayes
Given that a patient tests positive, what is probability the patient
is sick?
p (class|visit) =
99/100
p (visit|class)
1/100
p (class)
p (visit)
99/1002+99/1002=198/1002
=
99
198
=
1
2
where p (visit) = p (visit|class) p (class) + p visit|class p class .
Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
(Super) Naive Bayes
We can use Bayes’ rule to build a one-site student classifier:
p (class|site) =
p (site|class) p (class)
p (site)
where we estimate these probabilities with ratios of counts:
ˆp(site|class) =
# students in class who visit site
# students in class
ˆp(site|class) =
# students not in class who visit site
# students not in class
ˆp(class) =
# students in class
# students
ˆp(class) =
# students not in class
# students
Jake Hofman (Columbia University) Classification April 26, 2013 6 / 11
Naive Bayes
Represent each student by a binary vector x where xj = 1 if the
student has visited the j-th site (xj = 0 otherwise).
Modeling each site as an independent Bernoulli random variable,
the probability of visiting a set of sites x given class membership
c = 0, 1:
p (x|c) =
j
θ
xj
jc (1 − θjc)1−xj
where θjc denotes the probability that the j-th site is visited by a
student with class membership c.
Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
Naive Bayes
Using this likelihood in Bayes’ rule and taking a logarithm, we have:
log p (c|x) = log
p (x|c) p (c)
p (x)
=
j
xj log
θjc
1 − θjc
+
j
log(1 − θjc) + log
θc
p (x)
Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
Naive Bayes
We can eliminate p (x) by calculating the log-odds:
log
p (1|x)
p (0|x)
=
j
xj log
θj1(1 − θj0)
θj0(1 − θj1)
wj
+
j
log
1 − θj1
1 − θj0
+ log
θ1
θ0
w0
which gives a linear classifier of the form w · x + w0
Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
Naive Bayes
We train by counting students and sites to estimate θjc and θc:
ˆθjc =
njc
nc
ˆθc =
nc
n
and use these to calculate the weights ˆwj and bias ˆw0:
ˆwj = log
ˆθj1(1 − ˆθj0)
ˆθj0(1 − ˆθj1)
ˆw0 =
j
log
1 − ˆθj1
1 − ˆθj0
+ log
ˆθ1
ˆθ0
.
We we predict by simply adding the weights of the sites that a
student has visited to the bias term.
Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
Naive Bayes
In practice, this works better than one might expect given its
simplicity2
2
http://www.jstor.org/pss/1403452
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
Training is computationally cheap and scalable, and the model is
easy to update given new observations2
2
http://www.springerlink.com/content/wu3g458834583125/
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
Performance varies with document representations and
corresponding likelihood models2
2
http://ceas.cc/2006/15.pdf
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
Naive Bayes
It’s often important to smooth parameter estimates (e.g., by
adding pseudocounts) to avoid overfitting
Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11

Más contenido relacionado

Destacado

Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part Ijakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
LAS PLANTAS
LAS PLANTASLAS PLANTAS
LAS PLANTASrosayago
 
Estancias en Guadalajara
Estancias en GuadalajaraEstancias en Guadalajara
Estancias en GuadalajaraAlice Listing
 
Conferencia educación católica versión final - abril 24, 2009..[1]
Conferencia educación católica   versión final - abril 24, 2009..[1]Conferencia educación católica   versión final - abril 24, 2009..[1]
Conferencia educación católica versión final - abril 24, 2009..[1]julian
 
Cerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOSCerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOSjorchuk
 
4 de cada barroco
4 de cada barroco4 de cada barroco
4 de cada barrocoanamaria35
 
Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015Mario Padilla
 

Destacado (18)

Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part I
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
практ3
практ3практ3
практ3
 
LAS PLANTAS
LAS PLANTASLAS PLANTAS
LAS PLANTAS
 
практ7
практ7практ7
практ7
 
No Esperes
No EsperesNo Esperes
No Esperes
 
Estancias en Guadalajara
Estancias en GuadalajaraEstancias en Guadalajara
Estancias en Guadalajara
 
Conferencia educación católica versión final - abril 24, 2009..[1]
Conferencia educación católica   versión final - abril 24, 2009..[1]Conferencia educación católica   versión final - abril 24, 2009..[1]
Conferencia educación católica versión final - abril 24, 2009..[1]
 
Cerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOSCerveceria LOS VIKINGOS
Cerveceria LOS VIKINGOS
 
4 de cada barroco
4 de cada barroco4 de cada barroco
4 de cada barroco
 
Starbucks
StarbucksStarbucks
Starbucks
 
лаб3
лаб3лаб3
лаб3
 
Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015Boletín IgualSí Nº 3 | Diciembre 2015
Boletín IgualSí Nº 3 | Diciembre 2015
 
Question Four
Question FourQuestion Four
Question Four
 

Más de jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

Más de jakehofman (16)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Último

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Último (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Computational Social Science, Lecture 13: Classification

  • 1. Classification APAM E4990 Computational Social Science Jake Hofman Columbia University April 26, 2013 Jake Hofman (Columbia University) Classification April 26, 2013 1 / 11
  • 2. Prediction a la Bayes1 • You’re testing for a rare condition: • 1% of the student population is in this class • You have a highly sensitive and specific test: • 99% of students in the class visit compsocialscience.org • 99% of students who aren’t in the class don’t visit this site • Given that a student visits the course site, what is probability the student is in our class? 1 Follows Wiggins, SciAm 2006 Jake Hofman (Columbia University) Classification April 26, 2013 2 / 11
  • 3. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 4. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl So given that a student visits the site (198 ppl), there is a 50% chance the student is in our class (99 ppl)! Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 5. Prediction a la Bayes Students 10,000 ppl 1% In class 100 ppl 99% Visit 99 ppl 1% Don’t visit 1 per 99% Not in class 9900 ppl 1% Visit 99 ppl 99% Don’t visit 9801 ppl The small error rate on the large population outside of our class produces many false positives. Jake Hofman (Columbia University) Classification April 26, 2013 3 / 11
  • 6. Inverting conditional probabilities Bayes’ Theorem Equate the far right- and left-hand sides of product rule p (y|x) p (x) = p (x, y) = p (x|y) p (y) and divide to get the probability of y given x from the probability of x given y: p (y|x) = p (x|y) p (y) p (x) where p (x) = y∈ΩY p (x|y) p (y) is the normalization constant. Jake Hofman (Columbia University) Classification April 26, 2013 4 / 11
  • 7. Predictions a la Bayes Given that a patient tests positive, what is probability the patient is sick? p (class|visit) = 99/100 p (visit|class) 1/100 p (class) p (visit) 99/1002+99/1002=198/1002 = 99 198 = 1 2 where p (visit) = p (visit|class) p (class) + p visit|class p class . Jake Hofman (Columbia University) Classification April 26, 2013 5 / 11
  • 8. (Super) Naive Bayes We can use Bayes’ rule to build a one-site student classifier: p (class|site) = p (site|class) p (class) p (site) where we estimate these probabilities with ratios of counts: ˆp(site|class) = # students in class who visit site # students in class ˆp(site|class) = # students not in class who visit site # students not in class ˆp(class) = # students in class # students ˆp(class) = # students not in class # students Jake Hofman (Columbia University) Classification April 26, 2013 6 / 11
  • 9. Naive Bayes Represent each student by a binary vector x where xj = 1 if the student has visited the j-th site (xj = 0 otherwise). Modeling each site as an independent Bernoulli random variable, the probability of visiting a set of sites x given class membership c = 0, 1: p (x|c) = j θ xj jc (1 − θjc)1−xj where θjc denotes the probability that the j-th site is visited by a student with class membership c. Jake Hofman (Columbia University) Classification April 26, 2013 7 / 11
  • 10. Naive Bayes Using this likelihood in Bayes’ rule and taking a logarithm, we have: log p (c|x) = log p (x|c) p (c) p (x) = j xj log θjc 1 − θjc + j log(1 − θjc) + log θc p (x) Jake Hofman (Columbia University) Classification April 26, 2013 8 / 11
  • 11. Naive Bayes We can eliminate p (x) by calculating the log-odds: log p (1|x) p (0|x) = j xj log θj1(1 − θj0) θj0(1 − θj1) wj + j log 1 − θj1 1 − θj0 + log θ1 θ0 w0 which gives a linear classifier of the form w · x + w0 Jake Hofman (Columbia University) Classification April 26, 2013 9 / 11
  • 12. Naive Bayes We train by counting students and sites to estimate θjc and θc: ˆθjc = njc nc ˆθc = nc n and use these to calculate the weights ˆwj and bias ˆw0: ˆwj = log ˆθj1(1 − ˆθj0) ˆθj0(1 − ˆθj1) ˆw0 = j log 1 − ˆθj1 1 − ˆθj0 + log ˆθ1 ˆθ0 . We we predict by simply adding the weights of the sites that a student has visited to the bias term. Jake Hofman (Columbia University) Classification April 26, 2013 10 / 11
  • 13. Naive Bayes In practice, this works better than one might expect given its simplicity2 2 http://www.jstor.org/pss/1403452 Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 14. Naive Bayes Training is computationally cheap and scalable, and the model is easy to update given new observations2 2 http://www.springerlink.com/content/wu3g458834583125/ Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 15. Naive Bayes Performance varies with document representations and corresponding likelihood models2 2 http://ceas.cc/2006/15.pdf Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11
  • 16. Naive Bayes It’s often important to smooth parameter estimates (e.g., by adding pseudocounts) to avoid overfitting Jake Hofman (Columbia University) Classification April 26, 2013 11 / 11