This document discusses various tools for extracting data from social media platforms, including Netlytic, Netvizz and Socioviz. It explains how to use each tool to extract data from platforms like Twitter, Facebook, Instagram and YouTube. The document also discusses the technical and ethical considerations of data extraction and the various file formats like .gdf and .csv that the tool outputs can take. It emphasizes that the extracted social media data represents human beings and is dependent on platform policies.
1. Data Extraction Tools
Universidade NOVA de Lisboa I NOVA FCSH
iNOVA Media Lab ˚ @CristianCJRuiz ˚
Cristian Jiménez Ruiz
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
2. 1. SMART Goals.
2. API approach.
a. Technical (and ethical) debate.
3. How to extract and collect data.
a. Entry points.
b. How to use Netlytic.
c. How to use DMI: Netvizz.
d. How to use Socioviz
4. Tool’s Output.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
3. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Social Media
Methods
Internet as a source of
data, method, technique
(Rogers, 2013)
Internet-related research
4. Where to find data?
API Approach
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
5. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
6. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Technical (and Ethical) debate
Data we extract are human being!!!
Strong dependence on platform politics and policies
7. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Strong dependence on platform
environment, politics and policies.
9. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Technical (and Ethical) debate
Beyond limitations and debate, some key points that shape your
research design:
1. Extraction and analysis software
2. What are the entry points used to collect social media
grammars?
3. What entry points cannot be captured anymore?
4. What grammars (digital objects) can be part of my
study?
5. How far back in time can data be retrieved?
6. What are the standard output files? (Omena, JJ 2018)
10. How to extract and collect
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
11. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Entry Points
Native digital objects (Rogers, 2013):
Hashtags, keywords, ID’s, locations, likes, links, etc.
12. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic
1. Create/open your account
2. Be aware of your type account
13. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Twitter
14. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Facebook
15. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Instagram
16. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Youtube
17. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Datasets
18. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netvizz
2017 2018
19. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netvizz
20. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Socioviz
1. Create/open your account
2. Connect your Twitter account
21. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Socioviz
This tool retrieves per
keyword, not per digital
object separately (Hashtag,
account, mentions, etc)
22. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Output .gdf; .tab; .cvs; etc.
23. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
What to do with the output?
Go to a software analysis and introduce your files!
24. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
25. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
References
Rieder B. (2013). Studying Facebook via data extraction: the Netvizz application. In WebSci '13 Proceedings of the
5th Annual ACM Web Science Conference (pp. 346-355). New York: ACM.
Omena JJ. (2018) The Grammars of Social Media: Thinking platform data under the modes of technicity. Digital
Media Winter Institute 2018 Smart Data Sprint: Interpreters of Platform Data, Jan. 29 - Feb.2, Universidade Nova de
Lisboa, Lisbon, Portugal.
https://pt.slideshare.net/jannajoceli/the-grammars-of-social-media-thinking-platform-data-under-the-modes-of-
technicity
Gruzd, A. (2016). Netlytic: Software for Automated Text and Social Network Analysis. Available at http://Netlytic.org
Rogers, R. (2013). Digital methods. MIT press.
26. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.