The document discusses computer-aided content analysis methods for studying digitally-enabled social movements. It outlines applying supervised machine learning to categorize messages from a Facebook group for Egypt's April 6 Youth Movement. Key points:
1. Categories like offline coordination, online actions, and event reporting are defined to classify a training set of messages.
2. Validation is done using cross-validation, and analysis is applied to the full dataset.
3. Results show peaks in offline coordination before protest dates, but other categories did not change as expected, possibly due to errors in training.
2. Agenda
1. Focus on methods of prior digitally-mediated
movement research
2. Outline application to specific case, April 6
Youth Movement
3. Describe and walk through coding procedure
Alexander Hanna, Wisconsin (@alexhanna)
3. The rise of digitally-enabled
movements
Digitally-enabled movement - Movements that
have incorporated some aspect of online
activity through information and communication
technologies (ICTs)
Movements leave traces which are both 1)
records of events and 2) movement activity in
and of themselves
Alexander Hanna, Wisconsin (@alexhanna)
4. Digitally-enabled movement activity
Coordinating people for other (usually offline) activities (emobilization in Earl and Kimport 2012)
Online mobilizations (e-tactics, ibid)
Issue discussion and development of discourses
(analogous to “free spaces” [Evans and Boyte 1986;
Polletta 1999] or counterpublics)
Persuasion and micromobilization (Snow et al. 1986)
Alexander Hanna, Wisconsin (@alexhanna)
5. Previous types of analysis
Case study (e.g. Gurak 1997, Eaton 2010)
Network analysis (e.g. Garrido and Halavais
2003, Bennett, Foot, Xenos 2011)
Volume and group properties (e.g. Caren,
Jowers, and Gaby 2012)
Alexander Hanna, Wisconsin (@alexhanna)
6. Problems with these methods
Need to focus on
content, but too much
data for manual
content analysis
Cost and time
prohibitive to code by
hand
Too many Datas.
Alexander Hanna, Wisconsin (@alexhanna)
7. A solution - computer-aided content
analysis
Computer-aided content analysis (also called
automated content analysis; textual analysis)
- Goal: extracting information out of text
- Includes word search, statistical machine
learning, language modeling
Note: this supplements deep case knowledge,
encourages both inductive and deductive
approaches
Alexander Hanna, Wisconsin (@alexhanna)
8. Case: April 6 Youth Movement
Facebook group
created as solidarity
action with Egyptian
workers
Proposed actions on
April 6, strike date,
and May 4, Mubarak’
s birthday
Alexander Hanna, Wisconsin (@alexhanna)
9. Classification method of current
study
Classification of documents for a number of set
categories as a possible tool for study of
digitally-enabled movements
Supervised machine learning, reporting
proportions of categories in a body of texts
(Hopkins and King 2010)
Categories derived from theory, interviews, and
coding process
Alexander Hanna, Wisconsin (@alexhanna)
10. Supervised machine learning
“Supervised” means 1) categories known a
priori; 2) involves handcoding
“Training set” is handcoded
“Test set” as uncoded, to be
coded by algorithm
Think of supervised machine learning like
regression analysis
Alexander Hanna, Wisconsin (@alexhanna)
11. Categories for classification
Offline coordination (e-mobilization)
- Example: “get to Tahrir Square”
Internet action (e.g. “e-tactics)
- Example: changing profile pictures
Media and press
- Example: links to BBC, al-Jazeera
Reporting on events
- Example: citizen journalism, pictures of events
Request for information
- Example: “What is happening right now in Tahrir?”
Alexander Hanna, Wisconsin (@alexhanna)
12. Expectations
1. Increased offline coordination directly before
mobilization dates
2. Increased reporting and press on
mobilization dates
...but not clear what will go afterward.
Alexander Hanna, Wisconsin (@alexhanna)
13. Analysis process
1. Data collection
2. Coding training set
3. Reliability testing of training set
4. Data preprocessing
5. Validation
6. Applying analysis across dataset
Alexander Hanna, Wisconsin (@alexhanna)
14. Data collection, coding,
preprocessing
Data collection
- Scraping of FB group page March-May 2008
- 64,197 messages, 3,841 unique users
- Messages in Arabic, English, and “Franco”
Human coding of “training set”
- 638 messages, assessed intercoder reliability
Data preprocessing
- Stemming
Generating different parameters per language
- Focusing only on Arabic, Franco
Alexander Hanna, Wisconsin (@alexhanna)
15. Validation
k-fold cross validation as a common method of
validation
Source: http://www.imtech.res.
in/raghava/gpsr/Evaluation_Bioinformatics_Methods.htm
Alexander Hanna, Wisconsin (@alexhanna)
16. Validation results
Following Hopkins and King (2010), split dataset in half,
using one half to estimate other
Alexander Hanna, Wisconsin (@alexhanna)
21. Discussion
Expectations
- Coordination increased before action
- ...but no other categories did
Possible avenues for error
- Coder misclassification in the training set
- Insufficient information in training set
Alexander Hanna, Wisconsin (@alexhanna)
22. Conclusion
Rise of “big data” necessitates new methods, development
of “computational social science” (Lazer et al. 2010)
Drawing on computer-aided methods for content analysis of
digitally-enabled movement texts
The process requires extensive prep, theory- and datainformed categories, sufficient case knowledge, validation
Necessary to integrate these methods with existing
quantitative and qualitative ones
Alexander Hanna, Wisconsin (@alexhanna)