#1 Access, control, ownership, and interpretation of data on social media platforms like Twitter are interrelated issues that raise questions about power dynamics.
#2 Market forces, legislation, social norms, and code are dynamic regulatory forces that shape how data is accessed, controlled, owned, and interpreted on social media platforms.
#3 While Twitter's terms of service and APIs regulate different levels of access to Twitter data for various actors like users, researchers, and data resellers, questions remain around how representative data accessed through APIs is and whether all stakeholders have equal access to "their" data.
Access, control and ownership of social media data
1. Cornelius Puschmann, Humboldt-Universität zu Berlin
Jean Burgess, Queensland University of Technology
Axel Bruns, Queensland University of Technology
Merja Mahrt, Heinrich-Heine-Universität Düsseldorf
Data Access, Ownership and Control in Social Web Services:
Issues for Twitter Research
ICA 2012
Track: Communication and Technology
Session: Researching Social Media: Ethical and
Methodological Challenges
26 May 2012, Phoenix
2. “There are also significant questions of truth, control, and
power in Big Data studies: researchers have the tools and the
access, while social media users as a whole do not. Their data
were created in highly context-sensitive spaces, and it is entirely
possible that some users would not give permission for their
data to be used elsewhere.”
(boyd & Crawford, 2012, p.12)
3. #1
Access, control, ownership and
interpretation of data are interrelated facets
that raise questions of power.
#2
Market, legislation, social norms and code
are dynamic regulatory forces in social web
platforms.
4. Access (technology) Control (ability)
TOS API
“law” defines Data enables “code”
Ownership (law) Interpretation (competence)
5. • founded in 2006 by Jack Dorsey
• 140 mio active users
• 340 mio tweets per day
• source of real-time information on a breadth of issues
from pop culture to politics
• increasingly used as a data source among researchers
(e.g. on election prediction via Twitter: Tumasjan et al,
2010, Jungherr et al, 2011, Gayo-Avello, 2012)
6. • Twitter‘s (future) business model is based on advertising
• ad revenue of $260 mio in 2012
• sources of revenue:
• promoted accounts
• promoted tweets
• promoted trends
7. Twitter Rules
“Don‘t do what gets
us into trouble”
Terms of Service
“What‘s yours is yours
(but also ours)”
API Rules
“..but only if you
know how to get it”
8. The TOS
“By submitting, posting or displaying Content on or through
the Services, you grant us a worldwide, non-exclusive,
royalty-free license (with the right to sublicense) to use,
copy, reproduce, process, adapt, modify, publish, transmit,
display and distribute such Content in any and all media or
distribution methods (now known or later developed).”
“You agree that this license includes the right for Twitter to
make such Content available to other companies,
organizations or individuals who partner with Twitter for
the syndication, broadcast, distribution or publication of
such Content on other media and services, subject to our
terms and conditions for such Content use.”
“We encourage and permit broad re-use of
Content. The Twitter API exists to enable this.”
9. API Rules
“You will not attempt or encourage others to: sell, rent,
lease, sublicense, redistribute, or syndicate access to the
Twitter API or Twitter Content to any third party without
prior written approval from Twitter. If you provide an API
that returns Twitter data, you may only return IDs (including
tweet IDs and user IDs).You may export or extract non-
programmatic, GUI-driven Twitter Content as a PDF or
spreadsheet by using "save as" or similar functionality.
Exporting Twitter Content to a datastore as a service or
other cloud based service, however, is not permitted.”
“Except as permitted through the Services (or these Terms),
you have to use the Twitter API if you want to reproduce,
modify, create derivative works, distribute, sell, transfer,
publicly display, publicly perform, transmit, or otherwise use
the Content or Services.”
10. The APIs
Search API REST API Streaming API
• similar to site • allows interaction • real-time access to
search functionality with Twitter similar information moving
• originally a third- to an individual through Twitter
party product user (“core” data) • for developers with
• rate-limited • rate-limited “data-intensive
• use of Streaming • whitelisting was needs”
API for high previously
velocity queries is possible, now
recommended discontinued
11. Intermediaries of Data
• Twitter doesn‘t look to analytics as a source of revenue
• providing data is costly in terms of computing resources
• analytics are left to companies like Gnip and Datasift
• these data resellers have little to gain by catering to the
scientific community or Twitter‘s users
12. Actors and Options
Data reseller Large data Small data
Individual
Twitter (Gnip, interpreter interpreter
user
Datasift) (orga.) (individual)
Log data
Historical
data
Real-time
data (all)
Real-time
data
(sample)
13. Conclusions
• the exact sample size and quality of any data from
Twitter is unknown (see e.g. Gnip‘s Power Track)
• TOS and API regulate access to Twitter data for
different actors (users, researchers) on different
levels (access, control, ownership, interpretation)
• for users, the API is the only point of access to
“their” data apart from the web interface
• the implicit audience for virtually all services built on
Twitter data are companies
• both users and scholars lacking access to high-
performance computing infrastructure are likely to
be sidelined by the trend towards Big Twitter Data
14. images retrieved from Twitter 1% random sample
Thank you for your attention!
Contact: Cornelius Puschmann
puschmann@ibi.hu-berlin.de / @coffee001