1. ASIST Webinar 12/2013
Conducting
Twitter Research
Kim Holmberg, PhD
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
(e) kim.holmberg@abo.fi
(w3) http://kimholmberg.fi
2. Cascades, Islands, or Streams?
Time, Topic, and Scholarly Activities in
Humanities and Social Science Research
Indiana University, Bloomington, USA
University of Wolverhampton, UK
Université de Montréal, Canada
3. Cascades, Islands, or Streams?
Integrate several datasets representing a
broad range of scholarly activities
Use methodological and data triangulation
to explore the lifecycle of topics within and
across a range of scholarly activities
Develop transparent tools and techniques
to enable future predictive analyses
5. DATA COLLECTION
Webometric Analyst, for data
collection via Twitter’s API, data
cleaning and analysis
http://lexiurl.wlv.ac.uk/
For detailed instructions visit
http://lexiurl.wlv.ac.uk/searcher/twitter.htm
6. DATA COLLECTION
Other data collection tools
Twitter Archiving Google Spreadsheet (TAGS)
http://mashe.hawksey.info/2013/02/twitter-archive-tagsv5/
HootSuite
http://hootsuite.com/
Or you can write your own script:
https://dev.twitter.com/
http://140dev.com/free-twitter-api-source-code-library/twitter-database-server/
8. DATA EXTRACTION
Use Webometric Analyst to sort the data and
depending on your research goals, to extract
URLs, hashtags or usernames or to remove
stopwords from the tweets
9. ETHICS
Data collected from social media sites is openly available on the web,
hence it is already fully public and does not raise any ethical concerns
(Wilkinson & Thelwall, 2011). However, in some cases the content of the
tweets, blog entries or comments collected may contain identifiable,
sensitive information. Although already public, publicizing such
information by discussing it in an academic article could potentially have
unwanted side-effects. Hence, one must consider to anonymise all data
and treate it confidentially.
Wilkinson, D. & Thelwall, M. (2011). Researching personal information
on the public Web: Methods and ethics, Social Science Computer Review,
vol. 29, no. 4, pp. 387-401.
10. What can we research?
1
1. Networks (users, words, topics, …)
2
2. Content (tweets, RTs, hashtags, …)
11. FIRST STEPS
Step 1. What do you want to research?
Step 2. Collect tweets that are relevant for your research
questions
Step 3. Sort and clean the tweets (e.g. tweets vs.
retweets, remove tweets in other languages,
remove spam, remove false positives, ...)
Step 4. Extract the data that you need (e.g. tweeters,
usernames mentioned, hashtags, URLs, ...)
1
2
12. 1 NETWORK ANALYSIS
Possible research questions:
How different communities related to A are in
connection to each other?
Who is most central/influential (has most
connections) in a certain network of tweeters?
How information is disseminated in the network?
Who the actors involved in a certain network are?
What kind of local communities are there in a
certain network and what do those communities
represent?
and many more...
14. CREATE THE NETWORK
ALTERNATIVE 1
This creates a network file (.net) based
on the connections between tweeters
and those they mention (@username) in
their tweets.
Detailed instructions on how to create
and analyze conversational networks on
Twitter are available at:
http://lexiurl.wlv.ac.uk/searcher/twitterC
onversationNetworks.html
15. CREATE THE NETWORK
ALTERNATIVE 2
Sort the data
Then convert the data
into a network file
Source
Username1
Username1
Username2
Username3
Username3
Username3
Target
Username2
Username3
Username3
Username1
Username2
Username4
16. OBJECTS OF ANALYSIS
1. An actors
(person, group,
organisation,
word, etc.)
position in the
network
2. Structure of the
network (in
relation to other
networks) or
subnetworks
(clusters)
17. AN ACTORS POSITION
Degree centrality
Used to locate actors with
influence in the network or
those that are in a position
where they can spread
information in the network.
Can be divided into in- and
outdegree.
How many other actors can
this actor reach directly?
Other often used centrality
measures: closeness,
betweenness, Eigen-vector
18. NETWORK STRUCTURE
Communities in the
network
Tells something about the
structure of the network
and how the different
actors are spread and
connected to each other in
the network
19. NETWORK ANALYSIS
- tools of the trade
Gephi (for network visualizations)
http://gephi.org/
Ucinet (for network analysis and visualization)
https://sites.google.com/site/ucinetsoftware/
Pajek (for network analysis and visualization)
http://pajek.imfm.si/doku.php
21. Analyzing astrophysicists’ conversational
connections on Twitter
Holmberg, Haustein, Bowman & Peters (work in progress)
100 %
7.4
90 %
4.4
0.0
2.9
2.9
80 %
13.2
70 %
5.9
60 %
4.4
2.9
6.7
3.3
16.7
4.5
7.8
1.1
0.6
3.3
12.5
16.7
13.3
5.7
10.2
3.4
17.5
9.2
0.9
2.8
0.9
1.1
19.3
20 %
Amateur astronomer
Teacher or educator
0.0
5.0
0.0
2.5
7.5
12.5
33.3
46.7
27.2
Corporative
Organization or association
36.7
Science communicator
0.6
4.4
0.0
10 %
Other
11.4
18.2
47.1
Unknown
13.8
26.7
40 %
33.3
40.0
50 %
30 %
0.0
10.1
0.0
0.0
Other researchers
0.9
3.7
13.3
13.8
8.0
5.7
2.5
5.0
6.7
Mod1
(n=88)
Mod2
(n=40)
Mod3
(n=180)
Mod4
(n=30)
Mod5
(n=109)
Researcher
7.3
Mod0
(n=68)
Other astrophysicists
33.3
12.5
8.8
Students
0%
Mod6 (n=3)
Percentage of people with different roles in the 7 communities
22. Climate change on Twitter: topics, communities
and conversations about the IPCC
Pearce, Holmberg, Hellsten & Nerlich (under review).
Three groups coded
based on their stance
to climate change:
• Convinced
• Skeptic
• Neutral
23. 1 NETWORK ANALYSIS
Summary
Step 4. Extract the data that you need (e.g. Tweeters and the
usernames they mentioned, following or followers
lists, ...)
Step 5. Convert your data into a network file
Step 6. Visualize the network and analyse
In addition you may want to run some social network
analysis on the network (e.g. centrality) or code the actors
according to suitable titles (e.g. work roles, opinion about
something, etc.)
24. 2 CONTENT ANALYSIS
Possible research questions:
How is topic A discussed on Twitter?
How certain activities on Twitter correlate with
offline activities?
How popular is A compared with B, based on
visibility on Twitter?
What is the public opinion (of tweeters) about A?
What are tweeters saying about A?
and many more...
28. CONTENT ANALYSIS
- manual coding
Positive-Neutral-Negative
Scientific-Not scientific-Not clear
Skeptic-Convinced-Neutral
Personal-Work related
Astrophysics-Biochemistry-Cheminformatics ...
Pro something-Against something
and many more depending on your research goals...
29. Holmberg, K. & Thelwall, M. (2013). Disciplinary differences in Twitter
scholarly communication. In the Proceedings of 14th International Society
for Scientometrics and Informetrics conference, 2013, Vienna, Austria.
Available at: http://issi2013.org/proceedings.html.
40%
35%
5
30%
25%
7
Other
20%
3.5
Links
3.5
7.5
15%
Conversations
Retweets
10
3
10%
3
18
3
0.5
8.5
6.5
0%
Astrophysics
Biochemistry
Digital humanities
1.5
5
4.5
0
1
5%
0.5
1
Economics
History of science
Scientific content of the tweets by communication type
30. CONTENT ANALYSIS
- tools of the trade
VOSviewer (to extract noun-phrases from tweets)
http://www.vosviewer.com/
BibExcel (for co-word analysis)
http://www8.umu.se/inforsk/Bibexcel/
Notepad++ (to search and replace in your data)
http://notepad-plus-plus.org/
Screaming Frog SEO Spider (to decode short urls)
http://www.screamingfrog.co.uk/seo-spider/
31. Noun-phrases
from one of the
communities
Analyzing astrophysicists’ conversational connections on Twitter
Holmberg, Haustein, Bowman & Peters (work in progress)
32. TIME SERIES
- tools of the trade
Mozdeh (Persian for Good news)
Visit http://mozdeh.wlv.ac.uk/index.html
for free download and instructions
33. TIME SERIES
Pearce, Holmberg, Hellsten & Nerlich (under review). Climate change on
Twitter: topics, communities and conversations about the IPCC.
35. Pope Francis - Jorge Mario Bergoglio
Was mentioned in 9 tweets...
36. ONLINE/OFFLINE
CORRELATIONS
Comparison of Twitter and publication activity and impact
• publications and tweets per day: ρ=−0.339*
• citation rate and tweets per day: ρ=−0.457**
Haustein, Bowman, Holmberg, Larivière, & Peters, (under review). Astrophysicists on
Twitter: An in-depth analysis of tweeting and scientific publication behavior.
37. ONLINE/OFFLINE
CORRELATIONS
Overall similarity between abstracts and tweets is low
• cosine=0.081
• 4.1% of 50,854 tweet NPs in abstracts
• 16.0% of 12,970 abstract NPs in tweets
Haustein, Bowman, Holmberg, Larivière, & Peters, (under review). Astrophysicists on
Twitter: An in-depth analysis of tweeting and scientific publication behavior.
38. 2 CONTENT ANALYSIS
Summary
Step 4. Extract the data that you need (e.g. hashtags,
usernames, original tweets, ...)
And then, depending on your research goals:
Step 5A. Analyze frequencies (e.g. most used hashtags, etc.)
Step 5B. Classify the tweets manually
Step 5C. Extract the noun phrases and create a co-mention
network of them with VOSviewer
Step 5D. Analyze time series of certain word/hashtag
occurrences
Step 5E. Run sentiment analysis on the tweets
41. Thank you for your attention
Kim Holmberg
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
kim.holmberg@abo.fi
http://kimholmberg.fi
@kholmber
Acknowledgements
This presentation is based upon work supported by the international funding initiative Digging into Data. Specifically, funding comes
from the National Science Foundation in the United States (Grant No. 1208804), JISC in the United Kingdom, and the Social Sciences and
Humanities Research Council of Canada.