SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Introduction to Counting
APAM E4990
Modeling Social Data
Jake Hofman
Columbia University
January 27, 2017
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 1 / 27
Why counting?
http://bit.ly/august2016poll
p( y
support
| x
age
)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 2 / 27
Why counting?
http://bit.ly/ageracepoll2016
p( y
support
| x1, x2
age, race
)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 2 / 27
Why counting?
?p( y
support
| x1, x2, x3, . . .
age, sex, race, party
)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 2 / 27
Why counting?
Problem:
Traditionally difficult to obtain reliable estimates due to small
sample sizes or sparsity
(e.g., ∼ 100 age × 2 sex × 5 race × 3 party = 3,000 groups,
but typical surveys collect ∼ 1,000s of responses)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 3 / 27
Why counting?
Potential solution:
Sacrifice granularity for precision, by binning observations into
larger, but fewer, groups
(e.g., bin age into a few groups: 18-29, 30-49, 50-64, 65+)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 3 / 27
Why counting?
Potential solution:
Develop more sophisticated methods that generalize well from
small samples
(e.g., fit a model: support ∼ β0 + β1age + β2age2 + . . .)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 3 / 27
Why counting?
(Partial) solution:
Obtain larger samples through other means, so we can just count
and divide to make estimates via relative frequencies
(e.g., with ∼ 1M responses, we have 100s per group and can
estimate support within a few percentage points)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 4 / 27
Why counting?
International Journal of Forecasting 31 (2015) 980–991
Contents lists available at ScienceDirect
International Journal of Forecasting
journal homepage: www.elsevier.com/locate/ijforecast
Forecasting elections with non-representative polls
Wei Wanga,⇤
, David Rothschildb
, Sharad Goelb
, Andrew Gelmana,c
a
Department of Statistics, Columbia University, New York, NY, USA
b
Microsoft Research, New York, NY, USA
c
Department of Political Science, Columbia University, New York, NY, USA
a r t i c l e i n f o
Keywords:
Non-representative polling
Multilevel regression and poststratification
Election forecasting
a b s t r a c t
Election forecasts have traditionally been based on representative polls, in which randomly
sampled individuals are asked who they intend to vote for. While representative polling has
historically proven to be quite effective, it comes at considerable costs of time and money.
Moreover, as response rates have declined over the past several decades, the statistical
benefits of representative sampling have diminished. In this paper, we show that, with
proper statistical adjustment, non-representative polls can be used to generate accurate
election forecasts, and that this can often be achieved faster and at a lesser expense than
traditional survey methods. We demonstrate this approach by creating forecasts from a
novel and highly non-representative survey dataset: a series of daily voter intention polls
for the 2012 presidential election conducted on the Xbox gaming platform. After adjusting
the Xbox responses via multilevel regression and poststratification, we obtain estimates
which are in line with the forecasts from leading poll analysts, which were based on
aggregating hundreds of traditional polls conducted during the election cycle. We conclude
by arguing that non-representative polling shows promise not only for election forecasting,
but also for measuring public opinion on a broad range of social, economic and cultural
issues.
© 2014 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction
At the heart of modern opinion polling is representative
sampling, built around the idea that every individual in a
The wide-scale adoption of representative polling can
be traced largely back to a pivotal polling mishap in
the 1936 US presidential election campaign. During
that campaign, the popular magazine Literary Digest
W. Wang et al. / International Journal of Forecasting 31 (2015) 980–991 981
pollsters, including George Gallup, Archibald Crossley, and
Elmo Roper, used considerably smaller but representative
samples, and predicted the election outcome with a
reasonable level of accuracy (Gosnell, 1937). Accordingly,
non-representative or ‘‘convenience sampling’’ rapidly fell
out of favor with polling experts.
So, why do we revisit this seemingly long-settled
case? Two recent trends spur our investigation. First, ran-
dom digit dialing (RDD), the standard method in modern
representative polling, has suffered increasingly high
non-response rates, due both to the general public’s grow-
ing reluctance to answer phone surveys, and to expand-
ing technical means of screening unsolicited calls (Keeter,
Kennedy, Dimock, Best, & Craighill, 2006). By one mea-
sure, RDD response rates have decreased from 36% in 1997
to 9% in 2012 (Kohut, Keeter, Doherty, Dimock, & Chris-
tian, 2012), and other studies confirm this trend (Holbrook,
Krosnick, & Pfent, 2007; Steeh, Kirgis, Cannon, & DeWitt,
2001; Tourangeau & Plewes, 2013). Assuming that the ini-
tial pool of targets is representative, such low response
rates mean that those who ultimately answer the phone
and elect to respond might not be. Even if the selection is-
sues are not yet a serious problem for accuracy, as some
have argued (Holbrook et al., 2007), the downward trend
in response rates suggests an increasing need for post-
sampling adjustments; indeed, the adjustment methods
we present here should work just as well for surveys ob-
tained by probability sampling as for convenience samples.
The second trend driving our research is the fact that, with
recent technological innovations, it is increasingly conve-
nient and cost-effective to collect large numbers of highly
non-representative samples via online surveys. The data
that took the Literary Digest editors several months to col-
lect in 1936 can now take only a few days, and, for some
surveys, can cost just pennies per response. However, the
challenge is to extract a meaningful signal from these un-
conventional samples.
In this paper, we show that, with proper statistical ad-
justments, non-representative polls are able to yield ac-
curate presidential election forecasts, on par with those
based on traditional representative polls. We proceed as
follows. Section 2 describes the election survey that we
conducted on the Xbox gaming platform during the 45
days leading up to the 2012 US presidential race. Our Xbox
sample is highly biased in two key demographic dimen-
how to transform voter intent into projections of vote
share and electoral votes. We conclude in Section 5 by
discussing the potential for non-representative polling in
other domains.
2. Xbox data
Our analysis is based on an opt-in poll which was avail-
able continuously on the Xbox gaming platform during
the 45 days preceding the 2012 US presidential election.
Each day, three to five questions were posted, one of which
gauged voter intention via the standard query, ‘‘If the elec-
tion were held today, who would you vote for?’’. Full de-
tails of the questionnaire are given in the Appendix. The
respondents were allowed to answer at most once per day.
The first time they participated in an Xbox poll, respon-
dents were also asked to provide basic demographic in-
formation about themselves, including their sex, race, age,
education, state, party ID, political ideology, and who they
voted for in the 2008 presidential election. In total, 750,148
interviews were conducted, with 345,858 unique respon-
dents – over 30,000 of whom completed five or more polls
– making this one of the largest election panel studies ever.
Despite the large sample size, the pool of Xbox respon-
dents is far from being representative of the voting pop-
ulation. Fig. 1 compares the demographic composition of
the Xbox participants to that of the general electorate, as
estimated via the 2012 national exit poll.1
The most strik-
ing differences are for age and sex. As one might expect,
young men dominate the Xbox population: 18- to 29-year-
olds comprise 65% of the Xbox dataset, compared to 19%
in the exit poll; and men make up 93% of the Xbox sam-
ple but only 47% of the electorate. Political scientists have
long observed that both age and sex are strongly correlated
with voting preferences (Kaufmann & Petrocik, 1999), and
indeed these discrepancies are apparent in the unadjusted
time series of Xbox voter intent shown in Fig. 2. In contrast
to estimates based on traditional, representative polls (in-
dicated by the dotted blue line in Fig. 2), the uncorrected
Xbox sample suggests a landslide victory for Mitt Romney,
reminiscent of the infamous Literary Digest error.
3. Estimating voter intent with multilevel regression
and poststratification
3.1. Multilevel regression and poststratification
http://bit.ly/nonreppoll
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 5 / 27
Why counting?
The good:
Shift away from sophisticated statistical methods on small samples
to simpler methods on large samples
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 6 / 27
Why counting?
The bad:
Even simple methods (e.g., counting) are computationally
challenging at large scales
(1M is easy, 1B a bit less so, 1T gets interesting)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 6 / 27
Why counting?
Claim:
Solving the counting problem at scale enables you to investigate
many interesting questions in the social sciences
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 6 / 27
Learning to count
This week:
Counting at small/medium scales on a single machine
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 7 / 27
Learning to count
This week:
Counting at small/medium scales on a single machine
Following weeks:
Counting at large scales in parallel
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 7 / 27
Counting, the easy way
Split / Apply / Combine1
• Load dataset into memory
• Split: Arrange observations into groups of interest
• Apply: Compute distributions and statistics within each group
• Combine: Collect results across groups
1
http://bit.ly/splitapplycombine
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 8 / 27
The generic group-by operation
Split / Apply / Combine
for each observation as (group, value):
place value in bucket for corresponding group
for each group:
apply a function over values in bucket
output group and result
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 9 / 27
The generic group-by operation
Split / Apply / Combine
for each observation as (group, value):
place value in bucket for corresponding group
for each group:
apply a function over values in bucket
output group and result
Useful for computing arbitrary within-group statistics when we
have required memory
(e.g., conditional distribution, median, etc.)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 9 / 27
Why counting?
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 10 / 27
Example: Anatomy of the long tail
Dataset Users Items Rating levels Observations
Movielens 100K 10K 10 10M
Netflix 500K 20K 5 100M
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 11 / 27
Example: Anatomy of the long tail
Dataset Users Items Rating levels Observations
Movielens 100K 10K 10 10M
Netflix 500K 20K 5 100M
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 11 / 27
Example: Movielens
How many ratings are there at each star level?
0
1,000,000
2,000,000
3,000,000
1 2 3 4 5
Rating
Numberofratings
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 12 / 27
Example: Movielens
0
1,000,000
2,000,000
3,000,000
1 2 3 4 5
Rating
Numberofratings
group by rating value
for each group:
count # ratings
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 13 / 27
Example: Movielens
What is the distribution of average ratings by movie?
1 2 3 4 5
Mean Rating by Movie
Density
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 14 / 27
Example: Movielens
group by movie id
for each group:
compute average rating
1 2 3 4 5
Mean Rating by Movie
Density
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 15 / 27
Example: Movielens
What fraction of ratings are given to the most popular movies?
0%
25%
50%
75%
100%
0 3,000 6,000 9,000
Movie Rank
CDF
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 16 / 27
Example: Movielens
0%
25%
50%
75%
100%
0 3,000 6,000 9,000
Movie Rank
CDF
group by movie id
for each group:
count # ratings
sort by group size
cumulatively sum group sizes
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 17 / 27
Example: Movielens
What is the median rank of each user’s rated movies?
0
2,000
4,000
6,000
8,000
100 10,000
User eccentricity
Numberofusers
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 18 / 27
Example: Movielens
join movie ranks to ratings
group by user id
for each group:
compute median movie rank
0
2,000
4,000
6,000
8,000
100 10,000
User eccentricity
Numberofusers
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 19 / 27
Example: Anatomy of the long tail
Dataset Users Items Rating levels Observations
Movielens 100K 10K 10 10M
Netflix 500K 20K 5 100M
What do we do when the full dataset exceeds available memory?
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
Example: Anatomy of the long tail
Dataset Users Items Rating levels Observations
Movielens 100K 10K 10 10M
Netflix 500K 20K 5 100M
What do we do when the full dataset exceeds available memory?
Sampling?
Unreliable estimates for rare groups
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
Example: Anatomy of the long tail
Dataset Users Items Rating levels Observations
Movielens 100K 10K 10 10M
Netflix 500K 20K 5 100M
What do we do when the full dataset exceeds available memory?
Random access from disk?
1000x more storage, but 1000x slower2
2
Numbers every programmer should know
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
Example: Anatomy of the long tail
Dataset Users Items Rating levels Observations
Movielens 100K 10K 10 10M
Netflix 500K 20K 5 100M
What do we do when the full dataset exceeds available memory?
Streaming
Read data one observation at a time, storing only needed state
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
The combinable group-by operation
Streaming
for each observation as (group, value):
if new group:
initialize result
update result for corresponding group as function of
existing result and current value
for each group:
output group and result
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 21 / 27
The combinable group-by operation
Streaming
for each observation as (group, value):
if new group:
initialize result
update result for corresponding group as function of
existing result and current value
for each group:
output group and result
Useful for computing a subset of within-group statistics with a
limited memory footprint
(e.g., min, mean, max, variance, etc.)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 21 / 27
Example: Movielens
0
1,000,000
2,000,000
3,000,000
1 2 3 4 5
Rating
Numberofratings
for each rating:
counts[movie id]++
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 22 / 27
Example: Movielens
for each rating:
totals[movie id] += rating
counts[movie id]++
for each group:
totals[movie id] /
counts[movie id]
1 2 3 4 5
Mean Rating by Movie
Density
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 23 / 27
Yet another group-by operation
Per-group histograms
for each observation as (group, value):
histogram[group][value]++
for each group:
compute result as a function of histogram
output group and result
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 24 / 27
Yet another group-by operation
Per-group histograms
for each observation as (group, value):
histogram[group][value]++
for each group:
compute result as a function of histogram
output group and result
We can recover arbitrary statistics if we can afford to store counts
of all distinct values within in each group
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 24 / 27
The group-by operation
For arbitrary input data:
Memory Scenario Distributions Statistics
N Small dataset Yes General
V*G Small distributions Yes General
G Small # groups No Combinable
V Small # outcomes No No
1 Large # both No No
N = total number of observations
G = number of distinct groups
V = largest number of distinct values within group
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 25 / 27
Examples (w/ 8GB RAM)
Median rating by movie for Netflix
N ∼ 100M ratings
G ∼ 20K movies
V ∼ 10 half-star values
V *G ∼ 200K, store per-group histograms for arbitrary statistics
(scales to arbitrary N, if you’re patient)
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 26 / 27
Examples (w/ 8GB RAM)
Median rating by video for YouTube
N ∼ 10B ratings
G ∼ 1B videos
V ∼ 10 half-star values
V *G ∼ 10B, fails because per-group histograms are too large to
store in memory
G ∼ 1B, but no (exact) calculation for streaming median
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 26 / 27
Examples (w/ 8GB RAM)
Mean rating by video for YouTube
N ∼ 10B ratings
G ∼ 1B videos
V ∼ 10 half-star values
G ∼ 1B, use streaming to compute combinable statistics
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 26 / 27
The group-by operation
For pre-grouped input data:
Memory Scenario Distributions Statistics
N Small dataset Yes General
V*G Small distributions Yes General
G Small # groups No Combinable
V Small # outcomes Yes General
1 Large # both No Combinable
N = total number of observations
G = number of distinct groups
V = largest number of distinct values within group
Jake Hofman (Columbia University) Intro to Counting January 27, 2017 27 / 27

Más contenido relacionado

La actualidad más candente

Privacy Concerns and Social Robots
Privacy Concerns and Social Robots Privacy Concerns and Social Robots
Privacy Concerns and Social Robots Christoph Lutz
 
I3 presentation john mowbray
I3 presentation john mowbrayI3 presentation john mowbray
I3 presentation john mowbrayJohn Mowbray
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News DetectorIrisYoon5
 
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...Axel Bruns
 
Are Americans worried about the NSA?
Are Americans worried about the NSA? Are Americans worried about the NSA?
Are Americans worried about the NSA? AEI
 
Test Bank for Social Welfare A History Of The American Response To Need 8th E...
Test Bank for Social Welfare A History Of The American Response To Need 8th E...Test Bank for Social Welfare A History Of The American Response To Need 8th E...
Test Bank for Social Welfare A History Of The American Response To Need 8th E...OprahMan
 
Being an Information Consumer of Information - Dr. Underwood's Argument Class...
Being an Information Consumer of Information - Dr. Underwood's Argument Class...Being an Information Consumer of Information - Dr. Underwood's Argument Class...
Being an Information Consumer of Information - Dr. Underwood's Argument Class...Amanda Folk
 

La actualidad más candente (9)

Document(2)
Document(2)Document(2)
Document(2)
 
Privacy Concerns and Social Robots
Privacy Concerns and Social Robots Privacy Concerns and Social Robots
Privacy Concerns and Social Robots
 
I3 presentation john mowbray
I3 presentation john mowbrayI3 presentation john mowbray
I3 presentation john mowbray
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News Detector
 
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
Trust Us, Again? Twitter Campaigning Strategies in the 2019 Australian Federa...
 
Are Americans worried about the NSA?
Are Americans worried about the NSA? Are Americans worried about the NSA?
Are Americans worried about the NSA?
 
Test Bank for Social Welfare A History Of The American Response To Need 8th E...
Test Bank for Social Welfare A History Of The American Response To Need 8th E...Test Bank for Social Welfare A History Of The American Response To Need 8th E...
Test Bank for Social Welfare A History Of The American Response To Need 8th E...
 
Gfrp alevel sociology 2020
Gfrp alevel sociology 2020Gfrp alevel sociology 2020
Gfrp alevel sociology 2020
 
Being an Information Consumer of Information - Dr. Underwood's Argument Class...
Being an Information Consumer of Information - Dr. Underwood's Argument Class...Being an Information Consumer of Information - Dr. Underwood's Argument Class...
Being an Information Consumer of Information - Dr. Underwood's Argument Class...
 

Destacado

Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Data-driven Modeling: Lecture 03
Data-driven Modeling: Lecture 03Data-driven Modeling: Lecture 03
Data-driven Modeling: Lecture 03jakehofman
 
Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part Ijakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 
Bitcoin's Killer App 2017 - Sean Walsh
Bitcoin's Killer App 2017 -  Sean WalshBitcoin's Killer App 2017 -  Sean Walsh
Bitcoin's Killer App 2017 - Sean WalshSean Walsh
 
Recomendaciones integrales de política pública para las juventudes en la ar...
Recomendaciones integrales de política pública para las juventudes en la ar...Recomendaciones integrales de política pública para las juventudes en la ar...
Recomendaciones integrales de política pública para las juventudes en la ar...Jorge Roldán
 
Cuadro de los pagos tributarios
Cuadro de los pagos tributariosCuadro de los pagos tributarios
Cuadro de los pagos tributariosselvagomez2872
 
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...Andreas Önnerfors
 
Charitable trust ppt
Charitable trust pptCharitable trust ppt
Charitable trust pptSweety Sharma
 
IBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBIBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBGord Sissons
 
Motivación laboral
Motivación laboralMotivación laboral
Motivación laboralalexander_hv
 
ระบบสารสนเทศ
ระบบสารสนเทศระบบสารสนเทศ
ระบบสารสนเทศPetch Boonyakorn
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skillsAniruddha Chakrabarti
 
2016 Results & Outlook
2016 Results & Outlook 2016 Results & Outlook
2016 Results & Outlook Total
 

Destacado (20)

Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Data-driven Modeling: Lecture 03
Data-driven Modeling: Lecture 03Data-driven Modeling: Lecture 03
Data-driven Modeling: Lecture 03
 
Computational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part IComputational Social Science, Lecture 03: Counting at Scale, Part I
Computational Social Science, Lecture 03: Counting at Scale, Part I
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 
Bitcoin's Killer App 2017 - Sean Walsh
Bitcoin's Killer App 2017 -  Sean WalshBitcoin's Killer App 2017 -  Sean Walsh
Bitcoin's Killer App 2017 - Sean Walsh
 
Recomendaciones integrales de política pública para las juventudes en la ar...
Recomendaciones integrales de política pública para las juventudes en la ar...Recomendaciones integrales de política pública para las juventudes en la ar...
Recomendaciones integrales de política pública para las juventudes en la ar...
 
Social Media Policies
Social Media PoliciesSocial Media Policies
Social Media Policies
 
Cuadro de los pagos tributarios
Cuadro de los pagos tributariosCuadro de los pagos tributarios
Cuadro de los pagos tributarios
 
Blog
BlogBlog
Blog
 
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
 
Charitable trust ppt
Charitable trust pptCharitable trust ppt
Charitable trust ppt
 
IBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TBIBM Hadoop-DS Benchmark Report - 30TB
IBM Hadoop-DS Benchmark Report - 30TB
 
Motivación laboral
Motivación laboralMotivación laboral
Motivación laboral
 
ระบบสารสนเทศ
ระบบสารสนเทศระบบสารสนเทศ
ระบบสารสนเทศ
 
Amazon alexa - building custom skills
Amazon alexa - building custom skillsAmazon alexa - building custom skills
Amazon alexa - building custom skills
 
2016 Results & Outlook
2016 Results & Outlook 2016 Results & Outlook
2016 Results & Outlook
 

Similar a Modeling Social Data, Lecture 2: Introduction to Counting

Nonprobability report-may-2016-final
Nonprobability report-may-2016-finalNonprobability report-may-2016-final
Nonprobability report-may-2016-finalSUMEET VERMA
 
Guide to survey_poll
Guide to survey_pollGuide to survey_poll
Guide to survey_pollKeiko Ono
 
Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?agraefe
 
Day 10 - Dynamics of Voting
Day 10 - Dynamics of VotingDay 10 - Dynamics of Voting
Day 10 - Dynamics of VotingLee Hannah
 
(A)My research questions is is to figure out what people in t.docx
(A)My research questions is is to figure out what people in t.docx(A)My research questions is is to figure out what people in t.docx
(A)My research questions is is to figure out what people in t.docxmayank272369
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptxMarcCollazo1
 
Correlation causality
Correlation causalityCorrelation causality
Correlation causalityveesingh
 
475 2015 media effects methods up
475 2015 media effects methods up475 2015 media effects methods up
475 2015 media effects methods upmpeffl
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copyAlec Mitchell
 
Text Messaging Field Experiment (RootsCampDC 12/06)
Text Messaging Field Experiment (RootsCampDC 12/06)Text Messaging Field Experiment (RootsCampDC 12/06)
Text Messaging Field Experiment (RootsCampDC 12/06)rootscamp
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copyAlec Mitchell
 
Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [
Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [
Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [AlyciaGold776
 
POL SOC 360 Sampling Generalizability
POL SOC 360 Sampling Generalizability POL SOC 360 Sampling Generalizability
POL SOC 360 Sampling Generalizability atrantham
 
Survey research
Survey researchSurvey research
Survey researchshakirhina
 
Study: Discrimination in Topeka, 2002
Study: Discrimination in Topeka, 2002Study: Discrimination in Topeka, 2002
Study: Discrimination in Topeka, 2002Travis Barnhart
 

Similar a Modeling Social Data, Lecture 2: Introduction to Counting (20)

Nonprobability report-may-2016-final
Nonprobability report-may-2016-finalNonprobability report-may-2016-final
Nonprobability report-may-2016-final
 
Guide to survey_poll
Guide to survey_pollGuide to survey_poll
Guide to survey_poll
 
Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?
 
Day 10 - Dynamics of Voting
Day 10 - Dynamics of VotingDay 10 - Dynamics of Voting
Day 10 - Dynamics of Voting
 
(A)My research questions is is to figure out what people in t.docx
(A)My research questions is is to figure out what people in t.docx(A)My research questions is is to figure out what people in t.docx
(A)My research questions is is to figure out what people in t.docx
 
Methodology 2.pptx
Methodology 2.pptxMethodology 2.pptx
Methodology 2.pptx
 
Econometric Analysis
Econometric AnalysisEconometric Analysis
Econometric Analysis
 
Correlation causality
Correlation causalityCorrelation causality
Correlation causality
 
475 2015 media effects methods up
475 2015 media effects methods up475 2015 media effects methods up
475 2015 media effects methods up
 
Hardscrabble Campaigns August 2017
Hardscrabble Campaigns August 2017Hardscrabble Campaigns August 2017
Hardscrabble Campaigns August 2017
 
Hardscrabble Campaigns August 2017
Hardscrabble Campaigns August 2017Hardscrabble Campaigns August 2017
Hardscrabble Campaigns August 2017
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copy
 
Text Messaging Field Experiment (RootsCampDC 12/06)
Text Messaging Field Experiment (RootsCampDC 12/06)Text Messaging Field Experiment (RootsCampDC 12/06)
Text Messaging Field Experiment (RootsCampDC 12/06)
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copy
 
Weinberger07
Weinberger07Weinberger07
Weinberger07
 
Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [
Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [
Discussion # 1 Due Weds 081921Wk 1 Discussion 1 - Statistics [
 
POL SOC 360 Sampling Generalizability
POL SOC 360 Sampling Generalizability POL SOC 360 Sampling Generalizability
POL SOC 360 Sampling Generalizability
 
Survey research
Survey researchSurvey research
Survey research
 
Study: Discrimination in Topeka, 2002
Study: Discrimination in Topeka, 2002Study: Discrimination in Topeka, 2002
Study: Discrimination in Topeka, 2002
 
Monitoring of the Last US Presidential Elections
Monitoring of the Last US Presidential ElectionsMonitoring of the Last US Presidential Elections
Monitoring of the Last US Presidential Elections
 

Más de jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part Ijakehofman
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIjakehofman
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part Ijakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

Más de jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
 
Computational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part IComputational Social Science, Lecture 05: Networks, Part I
Computational Social Science, Lecture 05: Networks, Part I
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Último

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Último (20)

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 

Modeling Social Data, Lecture 2: Introduction to Counting

  • 1. Introduction to Counting APAM E4990 Modeling Social Data Jake Hofman Columbia University January 27, 2017 Jake Hofman (Columbia University) Intro to Counting January 27, 2017 1 / 27
  • 2. Why counting? http://bit.ly/august2016poll p( y support | x age ) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 2 / 27
  • 3. Why counting? http://bit.ly/ageracepoll2016 p( y support | x1, x2 age, race ) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 2 / 27
  • 4. Why counting? ?p( y support | x1, x2, x3, . . . age, sex, race, party ) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 2 / 27
  • 5. Why counting? Problem: Traditionally difficult to obtain reliable estimates due to small sample sizes or sparsity (e.g., ∼ 100 age × 2 sex × 5 race × 3 party = 3,000 groups, but typical surveys collect ∼ 1,000s of responses) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 3 / 27
  • 6. Why counting? Potential solution: Sacrifice granularity for precision, by binning observations into larger, but fewer, groups (e.g., bin age into a few groups: 18-29, 30-49, 50-64, 65+) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 3 / 27
  • 7. Why counting? Potential solution: Develop more sophisticated methods that generalize well from small samples (e.g., fit a model: support ∼ β0 + β1age + β2age2 + . . .) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 3 / 27
  • 8. Why counting? (Partial) solution: Obtain larger samples through other means, so we can just count and divide to make estimates via relative frequencies (e.g., with ∼ 1M responses, we have 100s per group and can estimate support within a few percentage points) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 4 / 27
  • 9. Why counting? International Journal of Forecasting 31 (2015) 980–991 Contents lists available at ScienceDirect International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast Forecasting elections with non-representative polls Wei Wanga,⇤ , David Rothschildb , Sharad Goelb , Andrew Gelmana,c a Department of Statistics, Columbia University, New York, NY, USA b Microsoft Research, New York, NY, USA c Department of Political Science, Columbia University, New York, NY, USA a r t i c l e i n f o Keywords: Non-representative polling Multilevel regression and poststratification Election forecasting a b s t r a c t Election forecasts have traditionally been based on representative polls, in which randomly sampled individuals are asked who they intend to vote for. While representative polling has historically proven to be quite effective, it comes at considerable costs of time and money. Moreover, as response rates have declined over the past several decades, the statistical benefits of representative sampling have diminished. In this paper, we show that, with proper statistical adjustment, non-representative polls can be used to generate accurate election forecasts, and that this can often be achieved faster and at a lesser expense than traditional survey methods. We demonstrate this approach by creating forecasts from a novel and highly non-representative survey dataset: a series of daily voter intention polls for the 2012 presidential election conducted on the Xbox gaming platform. After adjusting the Xbox responses via multilevel regression and poststratification, we obtain estimates which are in line with the forecasts from leading poll analysts, which were based on aggregating hundreds of traditional polls conducted during the election cycle. We conclude by arguing that non-representative polling shows promise not only for election forecasting, but also for measuring public opinion on a broad range of social, economic and cultural issues. © 2014 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. 1. Introduction At the heart of modern opinion polling is representative sampling, built around the idea that every individual in a The wide-scale adoption of representative polling can be traced largely back to a pivotal polling mishap in the 1936 US presidential election campaign. During that campaign, the popular magazine Literary Digest W. Wang et al. / International Journal of Forecasting 31 (2015) 980–991 981 pollsters, including George Gallup, Archibald Crossley, and Elmo Roper, used considerably smaller but representative samples, and predicted the election outcome with a reasonable level of accuracy (Gosnell, 1937). Accordingly, non-representative or ‘‘convenience sampling’’ rapidly fell out of favor with polling experts. So, why do we revisit this seemingly long-settled case? Two recent trends spur our investigation. First, ran- dom digit dialing (RDD), the standard method in modern representative polling, has suffered increasingly high non-response rates, due both to the general public’s grow- ing reluctance to answer phone surveys, and to expand- ing technical means of screening unsolicited calls (Keeter, Kennedy, Dimock, Best, & Craighill, 2006). By one mea- sure, RDD response rates have decreased from 36% in 1997 to 9% in 2012 (Kohut, Keeter, Doherty, Dimock, & Chris- tian, 2012), and other studies confirm this trend (Holbrook, Krosnick, & Pfent, 2007; Steeh, Kirgis, Cannon, & DeWitt, 2001; Tourangeau & Plewes, 2013). Assuming that the ini- tial pool of targets is representative, such low response rates mean that those who ultimately answer the phone and elect to respond might not be. Even if the selection is- sues are not yet a serious problem for accuracy, as some have argued (Holbrook et al., 2007), the downward trend in response rates suggests an increasing need for post- sampling adjustments; indeed, the adjustment methods we present here should work just as well for surveys ob- tained by probability sampling as for convenience samples. The second trend driving our research is the fact that, with recent technological innovations, it is increasingly conve- nient and cost-effective to collect large numbers of highly non-representative samples via online surveys. The data that took the Literary Digest editors several months to col- lect in 1936 can now take only a few days, and, for some surveys, can cost just pennies per response. However, the challenge is to extract a meaningful signal from these un- conventional samples. In this paper, we show that, with proper statistical ad- justments, non-representative polls are able to yield ac- curate presidential election forecasts, on par with those based on traditional representative polls. We proceed as follows. Section 2 describes the election survey that we conducted on the Xbox gaming platform during the 45 days leading up to the 2012 US presidential race. Our Xbox sample is highly biased in two key demographic dimen- how to transform voter intent into projections of vote share and electoral votes. We conclude in Section 5 by discussing the potential for non-representative polling in other domains. 2. Xbox data Our analysis is based on an opt-in poll which was avail- able continuously on the Xbox gaming platform during the 45 days preceding the 2012 US presidential election. Each day, three to five questions were posted, one of which gauged voter intention via the standard query, ‘‘If the elec- tion were held today, who would you vote for?’’. Full de- tails of the questionnaire are given in the Appendix. The respondents were allowed to answer at most once per day. The first time they participated in an Xbox poll, respon- dents were also asked to provide basic demographic in- formation about themselves, including their sex, race, age, education, state, party ID, political ideology, and who they voted for in the 2008 presidential election. In total, 750,148 interviews were conducted, with 345,858 unique respon- dents – over 30,000 of whom completed five or more polls – making this one of the largest election panel studies ever. Despite the large sample size, the pool of Xbox respon- dents is far from being representative of the voting pop- ulation. Fig. 1 compares the demographic composition of the Xbox participants to that of the general electorate, as estimated via the 2012 national exit poll.1 The most strik- ing differences are for age and sex. As one might expect, young men dominate the Xbox population: 18- to 29-year- olds comprise 65% of the Xbox dataset, compared to 19% in the exit poll; and men make up 93% of the Xbox sam- ple but only 47% of the electorate. Political scientists have long observed that both age and sex are strongly correlated with voting preferences (Kaufmann & Petrocik, 1999), and indeed these discrepancies are apparent in the unadjusted time series of Xbox voter intent shown in Fig. 2. In contrast to estimates based on traditional, representative polls (in- dicated by the dotted blue line in Fig. 2), the uncorrected Xbox sample suggests a landslide victory for Mitt Romney, reminiscent of the infamous Literary Digest error. 3. Estimating voter intent with multilevel regression and poststratification 3.1. Multilevel regression and poststratification http://bit.ly/nonreppoll Jake Hofman (Columbia University) Intro to Counting January 27, 2017 5 / 27
  • 10. Why counting? The good: Shift away from sophisticated statistical methods on small samples to simpler methods on large samples Jake Hofman (Columbia University) Intro to Counting January 27, 2017 6 / 27
  • 11. Why counting? The bad: Even simple methods (e.g., counting) are computationally challenging at large scales (1M is easy, 1B a bit less so, 1T gets interesting) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 6 / 27
  • 12. Why counting? Claim: Solving the counting problem at scale enables you to investigate many interesting questions in the social sciences Jake Hofman (Columbia University) Intro to Counting January 27, 2017 6 / 27
  • 13. Learning to count This week: Counting at small/medium scales on a single machine Jake Hofman (Columbia University) Intro to Counting January 27, 2017 7 / 27
  • 14. Learning to count This week: Counting at small/medium scales on a single machine Following weeks: Counting at large scales in parallel Jake Hofman (Columbia University) Intro to Counting January 27, 2017 7 / 27
  • 15. Counting, the easy way Split / Apply / Combine1 • Load dataset into memory • Split: Arrange observations into groups of interest • Apply: Compute distributions and statistics within each group • Combine: Collect results across groups 1 http://bit.ly/splitapplycombine Jake Hofman (Columbia University) Intro to Counting January 27, 2017 8 / 27
  • 16. The generic group-by operation Split / Apply / Combine for each observation as (group, value): place value in bucket for corresponding group for each group: apply a function over values in bucket output group and result Jake Hofman (Columbia University) Intro to Counting January 27, 2017 9 / 27
  • 17. The generic group-by operation Split / Apply / Combine for each observation as (group, value): place value in bucket for corresponding group for each group: apply a function over values in bucket output group and result Useful for computing arbitrary within-group statistics when we have required memory (e.g., conditional distribution, median, etc.) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 9 / 27
  • 18. Why counting? Jake Hofman (Columbia University) Intro to Counting January 27, 2017 10 / 27
  • 19. Example: Anatomy of the long tail Dataset Users Items Rating levels Observations Movielens 100K 10K 10 10M Netflix 500K 20K 5 100M Jake Hofman (Columbia University) Intro to Counting January 27, 2017 11 / 27
  • 20. Example: Anatomy of the long tail Dataset Users Items Rating levels Observations Movielens 100K 10K 10 10M Netflix 500K 20K 5 100M Jake Hofman (Columbia University) Intro to Counting January 27, 2017 11 / 27
  • 21. Example: Movielens How many ratings are there at each star level? 0 1,000,000 2,000,000 3,000,000 1 2 3 4 5 Rating Numberofratings Jake Hofman (Columbia University) Intro to Counting January 27, 2017 12 / 27
  • 22. Example: Movielens 0 1,000,000 2,000,000 3,000,000 1 2 3 4 5 Rating Numberofratings group by rating value for each group: count # ratings Jake Hofman (Columbia University) Intro to Counting January 27, 2017 13 / 27
  • 23. Example: Movielens What is the distribution of average ratings by movie? 1 2 3 4 5 Mean Rating by Movie Density Jake Hofman (Columbia University) Intro to Counting January 27, 2017 14 / 27
  • 24. Example: Movielens group by movie id for each group: compute average rating 1 2 3 4 5 Mean Rating by Movie Density Jake Hofman (Columbia University) Intro to Counting January 27, 2017 15 / 27
  • 25. Example: Movielens What fraction of ratings are given to the most popular movies? 0% 25% 50% 75% 100% 0 3,000 6,000 9,000 Movie Rank CDF Jake Hofman (Columbia University) Intro to Counting January 27, 2017 16 / 27
  • 26. Example: Movielens 0% 25% 50% 75% 100% 0 3,000 6,000 9,000 Movie Rank CDF group by movie id for each group: count # ratings sort by group size cumulatively sum group sizes Jake Hofman (Columbia University) Intro to Counting January 27, 2017 17 / 27
  • 27. Example: Movielens What is the median rank of each user’s rated movies? 0 2,000 4,000 6,000 8,000 100 10,000 User eccentricity Numberofusers Jake Hofman (Columbia University) Intro to Counting January 27, 2017 18 / 27
  • 28. Example: Movielens join movie ranks to ratings group by user id for each group: compute median movie rank 0 2,000 4,000 6,000 8,000 100 10,000 User eccentricity Numberofusers Jake Hofman (Columbia University) Intro to Counting January 27, 2017 19 / 27
  • 29. Example: Anatomy of the long tail Dataset Users Items Rating levels Observations Movielens 100K 10K 10 10M Netflix 500K 20K 5 100M What do we do when the full dataset exceeds available memory? Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
  • 30. Example: Anatomy of the long tail Dataset Users Items Rating levels Observations Movielens 100K 10K 10 10M Netflix 500K 20K 5 100M What do we do when the full dataset exceeds available memory? Sampling? Unreliable estimates for rare groups Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
  • 31. Example: Anatomy of the long tail Dataset Users Items Rating levels Observations Movielens 100K 10K 10 10M Netflix 500K 20K 5 100M What do we do when the full dataset exceeds available memory? Random access from disk? 1000x more storage, but 1000x slower2 2 Numbers every programmer should know Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
  • 32. Example: Anatomy of the long tail Dataset Users Items Rating levels Observations Movielens 100K 10K 10 10M Netflix 500K 20K 5 100M What do we do when the full dataset exceeds available memory? Streaming Read data one observation at a time, storing only needed state Jake Hofman (Columbia University) Intro to Counting January 27, 2017 20 / 27
  • 33. The combinable group-by operation Streaming for each observation as (group, value): if new group: initialize result update result for corresponding group as function of existing result and current value for each group: output group and result Jake Hofman (Columbia University) Intro to Counting January 27, 2017 21 / 27
  • 34. The combinable group-by operation Streaming for each observation as (group, value): if new group: initialize result update result for corresponding group as function of existing result and current value for each group: output group and result Useful for computing a subset of within-group statistics with a limited memory footprint (e.g., min, mean, max, variance, etc.) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 21 / 27
  • 35. Example: Movielens 0 1,000,000 2,000,000 3,000,000 1 2 3 4 5 Rating Numberofratings for each rating: counts[movie id]++ Jake Hofman (Columbia University) Intro to Counting January 27, 2017 22 / 27
  • 36. Example: Movielens for each rating: totals[movie id] += rating counts[movie id]++ for each group: totals[movie id] / counts[movie id] 1 2 3 4 5 Mean Rating by Movie Density Jake Hofman (Columbia University) Intro to Counting January 27, 2017 23 / 27
  • 37. Yet another group-by operation Per-group histograms for each observation as (group, value): histogram[group][value]++ for each group: compute result as a function of histogram output group and result Jake Hofman (Columbia University) Intro to Counting January 27, 2017 24 / 27
  • 38. Yet another group-by operation Per-group histograms for each observation as (group, value): histogram[group][value]++ for each group: compute result as a function of histogram output group and result We can recover arbitrary statistics if we can afford to store counts of all distinct values within in each group Jake Hofman (Columbia University) Intro to Counting January 27, 2017 24 / 27
  • 39. The group-by operation For arbitrary input data: Memory Scenario Distributions Statistics N Small dataset Yes General V*G Small distributions Yes General G Small # groups No Combinable V Small # outcomes No No 1 Large # both No No N = total number of observations G = number of distinct groups V = largest number of distinct values within group Jake Hofman (Columbia University) Intro to Counting January 27, 2017 25 / 27
  • 40. Examples (w/ 8GB RAM) Median rating by movie for Netflix N ∼ 100M ratings G ∼ 20K movies V ∼ 10 half-star values V *G ∼ 200K, store per-group histograms for arbitrary statistics (scales to arbitrary N, if you’re patient) Jake Hofman (Columbia University) Intro to Counting January 27, 2017 26 / 27
  • 41. Examples (w/ 8GB RAM) Median rating by video for YouTube N ∼ 10B ratings G ∼ 1B videos V ∼ 10 half-star values V *G ∼ 10B, fails because per-group histograms are too large to store in memory G ∼ 1B, but no (exact) calculation for streaming median Jake Hofman (Columbia University) Intro to Counting January 27, 2017 26 / 27
  • 42. Examples (w/ 8GB RAM) Mean rating by video for YouTube N ∼ 10B ratings G ∼ 1B videos V ∼ 10 half-star values G ∼ 1B, use streaming to compute combinable statistics Jake Hofman (Columbia University) Intro to Counting January 27, 2017 26 / 27
  • 43. The group-by operation For pre-grouped input data: Memory Scenario Distributions Statistics N Small dataset Yes General V*G Small distributions Yes General G Small # groups No Combinable V Small # outcomes Yes General 1 Large # both No Combinable N = total number of observations G = number of distinct groups V = largest number of distinct values within group Jake Hofman (Columbia University) Intro to Counting January 27, 2017 27 / 27