Five steps to search and store tweets by keywords

Five Steps to Search and Store
Tweets by Keywords
• Created by The Curiosity Bits Blog (curiositybits.com)
• With the support from Dr. Gregory D. Saxton
(http://social-metrics.org/ )

The output you will get…
Let’s say I want to study Twitter discussions of the missing Malaysian airliner
MH370. I plan to gather all tweets that include the keywords MH370 or
Malaysian.
You will get an ample amount of metadata for each tweet. Here is a breakdown
of each metadata type:
name Def.
tweet_id The unique identifier for a tweet
inserted_date When the tweet is downloaded into your database
language language
retweeted_status Is the tweet a RETWEET?
content The content of the tweet
from_user_scree
n_name
The screen name of the tweet sender

name Def.
from_user_followers_count The number of followers the sender has
from_user_friends_count The number of users the sender is following
from_user_listed_count How many times the sender is listed
from_user_statuses_count The number of tweets sent by the sender
from_user_description The profile bio of the sender
from_user_location The location of the sender
from_user_created_at When the Twitter account is created
retweet_count How many times the tweet is retweeted
entities_urls The URLs included in the tweet
entities_urls_count The number of URLs included in the tweet
entities_hashtags The hashtags included in the tweet
entities_hashtags_count The number of hashtags in the tweet
entities_mentions The screen-names mentioned in a tweet

name Def.
in_reply_to_screen_name The screen name of the user who is replied to
by the sender
in_reply_to_status_id The unique identifier of a reply
entities_expanded_urls Complete URLs extracted from short URLs
json_output The ENTIRE metadata in JSON format,
including metadata not parsed into columns
entities_media_count NA
media_expanded_url NA
media_url NA
media_type NA
video_link NA
photo_link NA
twitpic NA

Step 1: Checklist
• Do you know how to install necessary Python
libraries? If not, please review pg.8 in
http://curiositybits.com/python-for-mining-the-social-web/python-
tutorial-mining-twitter-user-profile/
• Do you know how to browse and edit SQLite
database through SQLite Database Browser? If not,
please review pg.10-14 in http://curiositybits.com/python-for-
mining-the-social-web/python-tutorial-mining-twitter-user-profile/
Download the code
https://drive.google.com/file/d/0Bwwg6GLCW_I
Pdm1mcHNXeU85Nkk/edit?usp=sharing

Have you installed these necessary
Python libraries?
Step 1: Checklist

Step 1: Checklist
Most importantly, we need to install a Twitter mining
library called Twython
(https://twython.readthedocs.org/en/latest/index.html)

Step 2: enter the search terms
You can enter multiple search terms, separated by comas. Please notice
that the last search term ends by a coma.
You can enter non-English search terms. But make sure the Python
script starts by the following block of code:

Step 3: enter your API keys
API Key
API secret
Access token
Access token secret
Enter the key inside the quotation marks

• Set up your API keys - 1
First, go to https://dev.twitter.com/, and sign in your
Twitter account. Go to my applications page to create
an application.

• Set up your API keys - 2
Enter any name that makes sense to you
Enter any text that makes sense to you
you can enter any legitimate URL, here, I put in the URL of my institution.
Same as above, you can enter any legitimate
URL, here, I put in the URL of my institution.

Step 4: change the parameter
result_type defined by the Twitter API Documents. Now, we
set it to recent, we can also set it to mixed or popular.

Here is a list of parameters you can tweak or add:
https://dev.twitter.com/docs/api/1.1/get/search/tweets
For example, if you want to limit the search to Chinese, you
can add lang = ‘zh’

For another example, if you want to limit the search to all
tweets sent until April 1 of 2014. You can add until = ‘2014-
04-01’

Step 5: set up SQLite database
• When you type in just a file name, the database will be
saved in the same folder with the Python script. You can
use a full file path such as
sqlite:///C:/xxxx/xxx/MH370.sqlite.

If you run the script daily or twice a day, you should be
good enough to cover all tweets generated on that day,
and tweets a few days old.
But, historical tweets are EXPENSIVE! Tweets older than
a week can be purchased through http://gnip.com/
Are we getting all the tweets?

Five steps to search and store tweets by keywords

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (20)

Similar to Five steps to search and store tweets by keywords

Similar to Five steps to search and store tweets by keywords (20)

More from Weiai Wayne Xu

More from Weiai Wayne Xu (6)

Recently uploaded

Recently uploaded (20)

Five steps to search and store tweets by keywords