Presentation at the Workshop on "Small Data and Big Data Controversies and Alternatives: Perspectives from The Sage Handbook of Social Media Research Methods" with Anabel Quan-Haase, Luke Sloan, Diane Rasmussen Pennington, et al.
LINK: http://sched.co/7G5N
1. Who are We Studying:
Bots or Humans?
Anatoliy Gruzd
gruzd@ryerson.ca
@gruzd
Canada Research Chair in Social Media Data Stewardship
Associate Professor, Ted Rogers School of Management
Director, Social Media Lab
Ryerson University
2. Research at the Social Media Lab
• Social Media Analytics
• Social Media Data Stewardship
• Networked Influence
• Online Political Engagement
• Learning Analytics
• Social Media & Health
3. Outline
• Social Media Analytics
• The Rise of the Bots
• Case Study: Social Media Use during the 2014 EuroMaidan Revolution in Ukraine
• Detecting Bots
• Next steps
7. Data -> Visualizations -> Understanding
How to Make Sense of Social Media Data?
Twitter: @gruzdAnatoliy Gruzd 7
8. How to Make Sense of Social Media Data?
Example: Geo-based Analysis
Twitter: @gruzdAnatoliy Gruzd 8
9. How to Make Sense of Social Media Data?
Example: Geo-based + Content Analysis
Tracking Hate Speech on Twitter
Twitter: @gruzd Anatoliy Gruzd 9
Source: http://www.fenuxe.com/tag/geo-coded
10. The Rise of Social Bots
• Who are we studying:
Humans or Bots?
Social Media Data Analytics Challenges
11. Social Bot – software
designed to act on the
Internet with some level
of autonomy
12. Different Types of Bots
Free music,
games, books,
downloads
Jewelery,
electronics,
vehicles
Contest,
gambling,
prizes
Finance, loans,
realty
Increase
Twitter
following
DietAdult
(Grier et al, 2010)
17. Platform-reported & Estimated % of Bots
Fake
5%
Fake
2%
Fake
8%
Source: http://blogs.wsj.com/digits/2015/06/30/fake-accounts-still-plague-instagram-despite-purge-study-finds/
1.5B
users
300M
users
400M
users
… but is that everything?
(30,000,000)(15,000,000) (32,000,000)
18. Why does it matter if there are bots,
spammers and fakers in our datasets?
Popular topics mentioned in the
14,500 abstracts of journal &
conference papers on “social media”
or “social networking websites”
published since 1999
(Gruzd, 2015)
19. Why does it matter if there are bots,
spammers and fakers in our datasets?
How many of these 14,500
papers took into account the
presence and influence of bots,
spammers or fakers ?
(Gruzd, 2015)
20. Case Study: 2014 EuroMaidan Revolution in Ukraine
"2014-02-21 11-04 Euromaidan in Kiev" by Amakuha. Licensed
under CC BY-SA 3.0 via Wikimedia
November 21, 2013 - Ukraine gov. suspended the
trade & association agreement with EU
Gruzd, A., & Tsyganova, K. (2015). Information Wars and Online Activism During
the 2013/2014 Crisis in Ukraine: Examining the Social Structures of Pro- and Anti-
Maidan Groups. Policy & Internet, 7(2), 121–158. http://doi.org/10.1002/poi3.91
22. Example:
VK Group User Interface – Posts, Likes, Comments…
…Discussion board, Links & Media Files
23. Data Collection
PRO1
Pro-Maidan
PRO2
Pro-Maidan
ANTI1
Anti-Maidan
ANTI2
Anti-Maidan
Num. of Nodes 141,542 96,402 60,506 69,029
Num. of Connections 338,344 221,452 280,678 192,273
• Data collection: 2 most popular (public) Pro-Maidan and Anti-Maidan groups
• Period: February 18 – May 25, 2015
• Used VK Public API
• Communities – information about groups and group members
• Wall – posts and comments
• Likes – “likes” that members and visitors leave on posts
• Friends – group members’ friendship relations
24. What can we learn from structures of friendship
networks?
Anti-EuroMaidan groupPro-EuroMaidan group
25. Subgroup 3
31@gruzd2014 EuroMaidan Revolution
Example: VK Group – Pro EuroMaidan
Marketing,
Spam
% of spammers among participants with friends is higher
than among all group members
Spam
accounts
5%
Spam
accounts
15%
Group
members
Members
w/friends
26. Reported & Estimated % of Bot Accounts
Fake
5%
Fake
2%
Fake
8%
Source: http://blogs.wsj.com/digits/2015/06/30/fake-accounts-still-plague-instagram-despite-purge-study-finds/
1.5B
users
300M
users
400M
users
… but is that everything?
30. Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Creation date
•Engagement
level
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
31. Detecting Bots…
Fakers like to post on Fridays!
Fake accounts Real accounts
Frequency of Twitter posts
(Gurajala et.al, 2015)
32. Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Creation date
•Engagement
level
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
33. Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Engagement
level
•Creation date
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
34. Detecting Bots…
Fake accounts tend to be created later in the week
Fakers Real accounts
Frequency of creation days for Twitter accounts
(Gurajala et.al, 2015)
35. Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Engagement
level
•Creation date
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
36. Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Engagement
level
•Creation date
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)
41. Calling on computational researchers to
• Develop and share principles, protocols, tools and
techniques around handling and cleaning social media
data.
• Develop stronger partnerships with social science
researchers to start discussing how to handle bot-like
accounts properly.
• …because the nature of bots and their influence on users’
online behavior is not just a computational, but also social
science issue.
42. Questions to consider
Remove or Keep?
Scenario 1: Marketing-related
bots, if they do not interact
with anyone else in the study
group but just there to
increase their follower base
Scenario 2: Automated
Twitter accounts designed
to repost certain news
stories
43. Questions to consider
Remove or Keep?
Scenario 1: Marketing-related
bots, if they do not interact
with anyone else in the study
group but just there to
increase their follower base
Scenario 2: Automated
Twitter accounts designed
to repost Trump’s tweets
44. 2017 #SMSociety Theme: Social Media for Social Good or Evil
https://socialmediaandsociety.org
45. Who are We Studying:
Bots or Humans?
Anatoliy Gruzd
gruzd@ryerson.ca
@gruzd
Canada Research Chair in Social Media Data Stewardship
Associate Professor, Ted Rogers School of Management
Director, Social Media Lab
Ryerson University
46. References
• Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding High-
quality Content in Social Media. In Proceedings of the 2008 International Conference
on Web Search and Data Mining (pp. 183–194). New York, NY, USA: ACM.
• Gruzd, A., & Roy, J. (2014). Investigating Political Polarization on Twitter: A Canadian
Perspective. Policy & Internet, 6(1), 28–45. http://doi.org/10.1002/1944-2866.POI354
• Gruzd, A., & Tsyganova, K. (2015). Information Wars and Online Activism During the
2013/2014 Crisis in Ukraine: Examining the Social Structures of Pro- and Anti-Maidan
Groups. Policy & Internet, 7(2), 121–158. http://doi.org/10.1002/poi3.91
• Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010). @spam: the underground on 140
characters or less (p. 27). ACM Press.
• Wang, A. H. (2010). Don’t follow me: Spam detection in Twitter. In Proceedings of the
2010 International Conference on Security and Cryptography (SECRYPT) (pp. 1–10). IEEE.
• Yardi, S., Romero, D., Schoenebeck, G., & Boyd, D. (2009). Detecting spam in a Twitter
network. First Monday, 15(1).