Who are We Studying: Humans or Bots?

Who are We Studying:
Bots or Humans?
Anatoliy Gruzd
gruzd@ryerson.ca
@gruzd
Canada Research Chair in Social Media Data Stewardship
Associate Professor, Ted Rogers School of Management
Director, Social Media Lab
Ryerson University

Research at the Social Media Lab
• Social Media Analytics
• Social Media Data Stewardship
• Networked Influence
• Online Political Engagement
• Learning Analytics
• Social Media & Health

Outline
• Social Media Analytics
• The Rise of the Bots
• Case Study: Social Media Use during the 2014 EuroMaidan Revolution in Ukraine
• Detecting Bots
• Next steps

1.7B
users
500M
users
300M
users
Growth of Social Media Data

Self-
collected
/reported
Public
APIs
Data
Resellers
+ More Ways to Access Social Media Data

Cloud &
Distributed
Computing
Data &
Information
Organization
Analytics Visualization
+ More Tools for Big Data Analytics

Data -> Visualizations -> Understanding
How to Make Sense of Social Media Data?
Twitter: @gruzdAnatoliy Gruzd 7

Example: Geo-based Analysis
Twitter: @gruzdAnatoliy Gruzd 8

Example: Geo-based + Content Analysis
Tracking Hate Speech on Twitter
Twitter: @gruzd Anatoliy Gruzd 9
Source: http://www.fenuxe.com/tag/geo-coded

The Rise of Social Bots
• Who are we studying:
Humans or Bots?
Social Media Data Analytics Challenges

Social Bot – software
designed to act on the
Internet with some level
of autonomy

Different Types of Bots
Free music,
games, books,
downloads
Jewelery,
electronics,
vehicles
Contest,
gambling,
prizes
Finance, loans,
realty
Increase
Twitter
following
DietAdult
(Grier et al, 2010)

Social Bot Example:
Microsoft’s AI Twitter
chatbot

Platform-reported & Estimated % of Bots
Fake
5%
Fake
2%
Fake
8%
Source: http://blogs.wsj.com/digits/2015/06/30/fake-accounts-still-plague-instagram-despite-purge-study-finds/
1.5B
users
300M
users
400M
users
… but is that everything?
(30,000,000)(15,000,000) (32,000,000)

Why does it matter if there are bots,
spammers and fakers in our datasets?
Popular topics mentioned in the
14,500 abstracts of journal &
conference papers on “social media”
or “social networking websites”
published since 1999
(Gruzd, 2015)

Why does it matter if there are bots,
spammers and fakers in our datasets?
How many of these 14,500
papers took into account the
presence and influence of bots,
spammers or fakers ?
(Gruzd, 2015)

Case Study: 2014 EuroMaidan Revolution in Ukraine
"2014-02-21 11-04 Euromaidan in Kiev" by Amakuha. Licensed
under CC BY-SA 3.0 via Wikimedia
November 21, 2013 - Ukraine gov. suspended the
trade & association agreement with EU
Gruzd, A., & Tsyganova, K. (2015). Information Wars and Online Activism During
the 2013/2014 Crisis in Ukraine: Examining the Social Structures of Pro- and Anti-
Maidan Groups. Policy & Internet, 7(2), 121–158. http://doi.org/10.1002/poi3.91

About Vkontakte:
#1 Social Networking Website in Ukraine
26
source: http://en.wikipedia.org
@gruzd

Example:
VK Group User Interface – Posts, Likes, Comments…
…Discussion board, Links & Media Files

Data Collection
PRO1
Pro-Maidan
PRO2
Pro-Maidan
ANTI1
Anti-Maidan
ANTI2
Anti-Maidan
Num. of Nodes 141,542 96,402 60,506 69,029
Num. of Connections 338,344 221,452 280,678 192,273
• Data collection: 2 most popular (public) Pro-Maidan and Anti-Maidan groups
• Period: February 18 – May 25, 2015
• Used VK Public API
• Communities – information about groups and group members
• Wall – posts and comments
• Likes – “likes” that members and visitors leave on posts
• Friends – group members’ friendship relations

What can we learn from structures of friendship
networks?
Anti-EuroMaidan groupPro-EuroMaidan group

Subgroup 3
31@gruzd2014 EuroMaidan Revolution
Example: VK Group – Pro EuroMaidan
Marketing,
Spam
% of spammers among participants with friends is higher
than among all group members
Spam
accounts
5%
Spam
accounts
15%
Group
members
Members
w/friends

Reported & Estimated % of Bot Accounts
Fake
5%
Fake
2%
Fake
8%
Source: http://blogs.wsj.com/digits/2015/06/30/fake-accounts-still-plague-instagram-despite-purge-study-finds/
1.5B
users
300M
users
400M
users
… but is that everything?

Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Creation date
•Engagement
level
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)

Example: Fake
Twitter account
@gruzd?
Not really!

Detecting Bots…
Fakers like to post on Fridays!
Fake accounts Real accounts
Frequency of Twitter posts
(Gurajala et.al, 2015)

Detecting Bots…
Photo
•Color & Edge
histograms
•Color & Edge
Directivity
Descriptor
(CEDD)
•Image
Similarity
Message
•Sensitive
words
•URL
•Duplicates
•#hashtags
•@replies
Poster
•Username
•Engagement
level
•Creation date
SocialNetwork
•# Friends
•# Following
•In/out degree
centrality
•Clustering
(Yardi et al, ‘09; Grier et al, ‘10; Wang, ‘10; Jin et al, ‘11)

Detecting Bots…
Fake accounts tend to be created later in the week
Fakers Real accounts
Frequency of creation days for Twitter accounts
(Gurajala et.al, 2015)

Detecting Bots…
Using Social
Network
Analysis
Example:
A spam account
attempting to take over
a conference hashtag

to introduce these emerging
techniques to researchers
who are increasingly relying
on social media data as
their go-to data source!
The challenge is ...
© Chris Allen licensed under Creative Commons

Calling on computational researchers to
• Develop and share principles, protocols, tools and
techniques around handling and cleaning social media
data.
• Develop stronger partnerships with social science
researchers to start discussing how to handle bot-like
accounts properly.
• …because the nature of bots and their influence on users’
online behavior is not just a computational, but also social
science issue.

Questions to consider
Remove or Keep?
Scenario 1: Marketing-related
bots, if they do not interact
with anyone else in the study
group but just there to
increase their follower base
Scenario 2: Automated
Twitter accounts designed
to repost certain news
stories

Questions to consider
Remove or Keep?
Scenario 1: Marketing-related
bots, if they do not interact
with anyone else in the study
group but just there to
increase their follower base
Scenario 2: Automated
Twitter accounts designed
to repost Trump’s tweets

2017 #SMSociety Theme: Social Media for Social Good or Evil
https://socialmediaandsociety.org

References
• Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding High-
quality Content in Social Media. In Proceedings of the 2008 International Conference
on Web Search and Data Mining (pp. 183–194). New York, NY, USA: ACM.
• Gruzd, A., & Roy, J. (2014). Investigating Political Polarization on Twitter: A Canadian
Perspective. Policy & Internet, 6(1), 28–45. http://doi.org/10.1002/1944-2866.POI354
• Gruzd, A., & Tsyganova, K. (2015). Information Wars and Online Activism During the
2013/2014 Crisis in Ukraine: Examining the Social Structures of Pro- and Anti-Maidan
Groups. Policy & Internet, 7(2), 121–158. http://doi.org/10.1002/poi3.91
• Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010). @spam: the underground on 140
characters or less (p. 27). ACM Press.
• Wang, A. H. (2010). Don’t follow me: Spam detection in Twitter. In Proceedings of the
2010 International Conference on Security and Cryptography (SECRYPT) (pp. 1–10). IEEE.
• Yardi, S., Romero, D., Schoenebeck, G., & Boyd, D. (2009). Detecting spam in a Twitter
network. First Monday, 15(1).

Who are We Studying: Humans or Bots?

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (11)

Similar a Who are We Studying: Humans or Bots?

Similar a Who are We Studying: Humans or Bots? (20)

Más de Toronto Metropolitan University

Más de Toronto Metropolitan University (16)

Último

Último (20)

Who are We Studying: Humans or Bots?