SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Collecting Twitter data
           Dr. Cornelius Puschmann
   School of Library and Information Science
       Humboldt-University of Berlin /
   Humboldt Institute for Internet and Society
                 16 April 2013
            Royal Statistical Society
Overview
1. Examples of research using Twitter data


            2. Twitter's data infrastructure


               3. Tools for collecting data


                         4. Sampling issues
Examples of research using
      Twitter data
•   Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What is Twitter, a Social
    Network or a News Media ? Categories and Subject Descriptors.
    Proceedings of the 19th International Conference on the World Wide Web
    (WWW ’10) (pp. 591–600). Raleigh, NC.

•   González-Bailón, S., Borge-Holthoefer, J., Rivero, A., & Moreno,Y. (2011). The
    dynamics of protest recruitment through an online network. Scientific
    reports, 1, 197. doi:10.1038/srep00197

•   Ausserhofer, J., & Maireder, A. (2013). National politics on Twitter: Structures
    and topics of a networked public sphere. Information, Communication &
    Society, 16(3), 291–314. doi:10.1080/1369118X.2012.756050

•   Papacharissi, Z., & De Fatima Oliveira, M. (2012). Affective News and
    Networked Publics: The Rhythms of News Storytelling on #Egypt. Journal of
    Communication, 62(2), 266–282. doi:10.1111/j.1460-2466.2012.01630.x
Example questions
Twitter as a platform
• How can Twitter's structure be described?
Social graph
• Who follows whom?
• How does information spread?
Hashtags, keywords, and geography
• How can the discussion of topic X be characterized?
• Who is participating in discussions on X?
• Where are users discussing X?
Example questions
URLs in Twitter
• How is mass media content discussed?
• How are academic papers cited on Twitter?
Creative approaches
• Where, when, and with what devices do people
  call taxis?

Prediction/application
• Can election results/flu outbreaks/consumption
  patterns be reliably predicted?
#phdchat data set (30k tweets)
visualization of keywords using Gephi
Extracting Twitter data
HTTP request
           return all data from a given user/hashtag/geolocation/...



                 Application Programming
                      Interface (API)



             Data (usually in a database or spreadsheet)
Tweet in
browser


Tweet
source
via API
Three Twitter APIs




REST API          1) data: tweets,API
                        Streaming social graph
                                             Search API
• traditionally used complex tools needed • same functionality
                  2)    • public, user, and
  by most client 3) constraints on how
                          site streams         as Twitter search
  software        much data can data in      •
                        • provides be captured rate-limited
• v1.0 will be phased     real time and
  out in May 2013         largely
• to be replaced by       unprocessed as it
  more restrictive        flows through the
  v1.1                    platform
Legal issues: Twitter's terms of service
"By submitting, posting or displaying Content on or through
the Services, you grant us a worldwide, non-exclusive,
royalty-free license (with the right to sublicense) to use,
copy, reproduce, process, adapt, modify, publish, transmit,
display and distribute such Content in any and all media or
distribution methods (now known or later developed)."

                  "You agree that this license includes the right for Twitter to
                  make such Content available to other companies,
                  organizations or individuals who partner with Twitter for
                  the syndication, broadcast, distribution or publication of
                  such Content on other media and services, subject to our
                  terms and conditions for such Content use."

"We encourage and permit broad re-use of
Content. The Twitter API exists to enable this."
Legal issues: API rules
"You will not attempt or encourage others to: sell, rent,
lease, sublicense, redistribute, or syndicate access to the
Twitter API or Twitter Content to any third party without
prior written approval from Twitter. If you provide an API
that returns Twitter data, you may only return IDs (including
tweet IDs and user IDs).You may export or extract non-
programmatic, GUI-driven Twitter Content as a PDF or
spreadsheet by using "save as" or similar functionality.
Exporting Twitter Content to a datastore as a service or
other cloud based service, however, is not permitted."

                  "Except as permitted through the Services (or these Terms),
                  you have to use the Twitter API if you want to reproduce,
                  modify, create derivative works, distribute, sell, transfer,
                  publicly display, publicly perform, transmit, or otherwise use
                  the Content or Services."
Tweet Archivist Desktop
(Windows desktop software)
yourTwapperKeeper
(runs on a dedicated web server)
140kit
(hosted platform for
 academic research)
DataSift/Gnip
(social data resellers)
Sampling approaches
Strategy #1: Sample by hashtag, keyword, user, geographical
location, or other filtering parameters
+ representativeness unclear     - time frame and parameters
  on multiple levels               have to be carefully chosen

Strategy #2: Use the 1% or 10% sample provided by the
Streaming API
+ generally assumed to be        - time frame has to be
  representative (of Twitter)      carefully chosen

Strategy #3: Capture Twitter's entire throughput
+ highly representative         - technically very difficult/costly
  (of Twitter)
Summary
        develop a question/general direction



       collect data using these or other tools


      store in a database or spreadsheet (CSV)



annotate, analyze and visualize using a variety of tools
        (Excel, Tableau, R, Gephi, NVIVO, ...)
Questions?




http://www.teachthought.com/wp-content/uploads/2012/11/twitter-logo-hashtag.jpg

Más contenido relacionado

La actualidad más candente

The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...Shalin Hai-Jew
 
#Socialtagging: Defining its role in the academic library
#Socialtagging: Defining its role in the academic library#Socialtagging: Defining its role in the academic library
#Socialtagging: Defining its role in the academic libraryksbertel
 
The Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionThe Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionJohn Breslin
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Searching of Web and Electronic Resources
Searching of Web and Electronic Resources Searching of Web and Electronic Resources
Searching of Web and Electronic Resources Bramesha B
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesKrisKasianovitz
 
Social Semantic Web on Facebook Open Graph protocol and Twitter Annotations
Social Semantic Web on Facebook Open Graph protocol and Twitter AnnotationsSocial Semantic Web on Facebook Open Graph protocol and Twitter Annotations
Social Semantic Web on Facebook Open Graph protocol and Twitter AnnotationsMyungjin Lee
 
Altmetrics presentation mla'14 english version
Altmetrics presentation mla'14 english versionAltmetrics presentation mla'14 english version
Altmetrics presentation mla'14 english versionLilian Takahashi Hoffecker
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaNattiya Kanhabua
 
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TAUS - The Language Data Network
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationMathieu d'Aquin
 
Scholarly social media applications platforms for knowledge sharing and net...
Scholarly social media applications   platforms for knowledge sharing and net...Scholarly social media applications   platforms for knowledge sharing and net...
Scholarly social media applications platforms for knowledge sharing and net...tullemich
 
Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Louise Spiteri
 

La actualidad más candente (19)

The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
How to share and publish data: resources, law, and policy
How to share and publish data: resources, law, and policyHow to share and publish data: resources, law, and policy
How to share and publish data: resources, law, and policy
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
 
Ir1
Ir1Ir1
Ir1
 
#Socialtagging: Defining its role in the academic library
#Socialtagging: Defining its role in the academic library#Socialtagging: Defining its role in the academic library
#Socialtagging: Defining its role in the academic library
 
The Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionThe Social Semantic Web: An Introduction
The Social Semantic Web: An Introduction
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Searching of Web and Electronic Resources
Searching of Web and Electronic Resources Searching of Web and Electronic Resources
Searching of Web and Electronic Resources
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker Notes
 
Social Semantic Web on Facebook Open Graph protocol and Twitter Annotations
Social Semantic Web on Facebook Open Graph protocol and Twitter AnnotationsSocial Semantic Web on Facebook Open Graph protocol and Twitter Annotations
Social Semantic Web on Facebook Open Graph protocol and Twitter Annotations
 
Altmetrics presentation mla'14 english version
Altmetrics presentation mla'14 english versionAltmetrics presentation mla'14 english version
Altmetrics presentation mla'14 english version
 
Niso library law
Niso library lawNiso library law
Niso library law
 
On the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in WikipediaOn the Value of Temporal Anchor Texts in Wikipedia
On the Value of Temporal Anchor Texts in Wikipedia
 
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Scholarly social media applications platforms for knowledge sharing and net...
Scholarly social media applications   platforms for knowledge sharing and net...Scholarly social media applications   platforms for knowledge sharing and net...
Scholarly social media applications platforms for knowledge sharing and net...
 
Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Indexing presentation 2013 06-04
Indexing presentation 2013 06-04
 

Similar a Collecting Twitter Data

Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Cornelius Puschmann
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
 
Twitter Terms of Service Explained - Jake White
Twitter Terms of Service Explained - Jake WhiteTwitter Terms of Service Explained - Jake White
Twitter Terms of Service Explained - Jake WhiteJake White
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?Andrew Long
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataAxel Bruns
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social MediaDr Wasim Ahmed
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Serge Beckers
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Serge Beckers
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterDan Nguyen
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET Journal
 
Twitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for ResearchersTwitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for ResearchersKMb Unit, York University
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Online data sources and information exposure
Online data sources and information exposureOnline data sources and information exposure
Online data sources and information exposureUniversity of Southampton
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for eventijma
 
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGSENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGIRJET Journal
 

Similar a Collecting Twitter Data (20)

Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
Twitter Terms of Service Explained - Jake White
Twitter Terms of Service Explained - Jake WhiteTwitter Terms of Service Explained - Jake White
Twitter Terms of Service Explained - Jake White
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?Thou Shalt not Share Collections of Tweets: Should we give a TOS?
Thou Shalt not Share Collections of Tweets: Should we give a TOS?
 
New Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter DataNew Methodologies for Capturing and Working with Publicly Available Twitter Data
New Methodologies for Capturing and Working with Publicly Available Twitter Data
 
Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
Chapter7a McHaney
Chapter7a McHaneyChapter7a McHaney
Chapter7a McHaney
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Twet
TwetTwet
Twet
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitter
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
Twitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for ResearchersTwitter: A Hands On Learning Session for Researchers
Twitter: A Hands On Learning Session for Researchers
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Social Networking As A Tool For Learning
Social Networking As A Tool For  LearningSocial Networking As A Tool For  Learning
Social Networking As A Tool For Learning
 
Online data sources and information exposure
Online data sources and information exposureOnline data sources and information exposure
Online data sources and information exposure
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for event
 
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGSENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
 

Más de Cornelius Puschmann

A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...Cornelius Puschmann
 
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Cornelius Puschmann
 
Twitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchTwitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchCornelius Puschmann
 
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Cornelius Puschmann
 
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Cornelius Puschmann
 
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Cornelius Puschmann
 
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerWissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerCornelius Puschmann
 
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Cornelius Puschmann
 
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Cornelius Puschmann
 
(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...Cornelius Puschmann
 
Doing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyDoing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyCornelius Puschmann
 
Social data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careSocial data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careCornelius Puschmann
 
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativTwitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativCornelius Puschmann
 
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Cornelius Puschmann
 
Hourly Twitter activity under the #Jan25 hashtag
Hourly Twitter activity under the #Jan25 hashtagHourly Twitter activity under the #Jan25 hashtag
Hourly Twitter activity under the #Jan25 hashtagCornelius Puschmann
 
Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...
Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...
Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...Cornelius Puschmann
 

Más de Cornelius Puschmann (20)

A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
 
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
 
Twitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchTwitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic research
 
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...
 
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
 
The Pragmatics of Retweeting
The Pragmatics of RetweetingThe Pragmatics of Retweeting
The Pragmatics of Retweeting
 
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
 
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerWissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
 
Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?
 
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
 
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
 
(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...
 
Doing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyDoing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User Study
 
Social data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careSocial data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should care
 
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativTwitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und Mikronarrativ
 
#www2010 user activity chart
#www2010 user activity chart#www2010 user activity chart
#www2010 user activity chart
 
#s21 user activity chart
#s21 user activity chart#s21 user activity chart
#s21 user activity chart
 
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
 
Hourly Twitter activity under the #Jan25 hashtag
Hourly Twitter activity under the #Jan25 hashtagHourly Twitter activity under the #Jan25 hashtag
Hourly Twitter activity under the #Jan25 hashtag
 
Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...
Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...
Elektronisches Publizieren und Open Access für Geistes- und Sozialwissenschaf...
 

Collecting Twitter Data

  • 1. Collecting Twitter data Dr. Cornelius Puschmann School of Library and Information Science Humboldt-University of Berlin / Humboldt Institute for Internet and Society 16 April 2013 Royal Statistical Society
  • 2. Overview 1. Examples of research using Twitter data 2. Twitter's data infrastructure 3. Tools for collecting data 4. Sampling issues
  • 3. Examples of research using Twitter data • Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media ? Categories and Subject Descriptors. Proceedings of the 19th International Conference on the World Wide Web (WWW ’10) (pp. 591–600). Raleigh, NC. • González-Bailón, S., Borge-Holthoefer, J., Rivero, A., & Moreno,Y. (2011). The dynamics of protest recruitment through an online network. Scientific reports, 1, 197. doi:10.1038/srep00197 • Ausserhofer, J., & Maireder, A. (2013). National politics on Twitter: Structures and topics of a networked public sphere. Information, Communication & Society, 16(3), 291–314. doi:10.1080/1369118X.2012.756050 • Papacharissi, Z., & De Fatima Oliveira, M. (2012). Affective News and Networked Publics: The Rhythms of News Storytelling on #Egypt. Journal of Communication, 62(2), 266–282. doi:10.1111/j.1460-2466.2012.01630.x
  • 4. Example questions Twitter as a platform • How can Twitter's structure be described? Social graph • Who follows whom? • How does information spread? Hashtags, keywords, and geography • How can the discussion of topic X be characterized? • Who is participating in discussions on X? • Where are users discussing X?
  • 5. Example questions URLs in Twitter • How is mass media content discussed? • How are academic papers cited on Twitter? Creative approaches • Where, when, and with what devices do people call taxis? Prediction/application • Can election results/flu outbreaks/consumption patterns be reliably predicted?
  • 6. #phdchat data set (30k tweets)
  • 8. Extracting Twitter data HTTP request return all data from a given user/hashtag/geolocation/... Application Programming Interface (API) Data (usually in a database or spreadsheet)
  • 10. Three Twitter APIs REST API 1) data: tweets,API Streaming social graph Search API • traditionally used complex tools needed • same functionality 2) • public, user, and by most client 3) constraints on how site streams as Twitter search software much data can data in • • provides be captured rate-limited • v1.0 will be phased real time and out in May 2013 largely • to be replaced by unprocessed as it more restrictive flows through the v1.1 platform
  • 11. Legal issues: Twitter's terms of service "By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed)." "You agree that this license includes the right for Twitter to make such Content available to other companies, organizations or individuals who partner with Twitter for the syndication, broadcast, distribution or publication of such Content on other media and services, subject to our terms and conditions for such Content use." "We encourage and permit broad re-use of Content. The Twitter API exists to enable this."
  • 12. Legal issues: API rules "You will not attempt or encourage others to: sell, rent, lease, sublicense, redistribute, or syndicate access to the Twitter API or Twitter Content to any third party without prior written approval from Twitter. If you provide an API that returns Twitter data, you may only return IDs (including tweet IDs and user IDs).You may export or extract non- programmatic, GUI-driven Twitter Content as a PDF or spreadsheet by using "save as" or similar functionality. Exporting Twitter Content to a datastore as a service or other cloud based service, however, is not permitted." "Except as permitted through the Services (or these Terms), you have to use the Twitter API if you want to reproduce, modify, create derivative works, distribute, sell, transfer, publicly display, publicly perform, transmit, or otherwise use the Content or Services."
  • 13. Tweet Archivist Desktop (Windows desktop software)
  • 14. yourTwapperKeeper (runs on a dedicated web server)
  • 15. 140kit (hosted platform for academic research)
  • 17. Sampling approaches Strategy #1: Sample by hashtag, keyword, user, geographical location, or other filtering parameters + representativeness unclear - time frame and parameters on multiple levels have to be carefully chosen Strategy #2: Use the 1% or 10% sample provided by the Streaming API + generally assumed to be - time frame has to be representative (of Twitter) carefully chosen Strategy #3: Capture Twitter's entire throughput + highly representative - technically very difficult/costly (of Twitter)
  • 18. Summary develop a question/general direction collect data using these or other tools store in a database or spreadsheet (CSV) annotate, analyze and visualize using a variety of tools (Excel, Tableau, R, Gephi, NVIVO, ...)