Salient Features of India constitution especially power and functions
Five steps to search and store tweets by keywords
1. Five Steps to Search and Store
Tweets by Keywords
• Created by The Curiosity Bits Blog (curiositybits.com)
• With the support from Dr. Gregory D. Saxton
(http://social-metrics.org/ )
2. The output you will get…
Let’s say I want to study Twitter discussions of the missing Malaysian airliner
MH370. I plan to gather all tweets that include the keywords MH370 or
Malaysian.
You will get an ample amount of metadata for each tweet. Here is a breakdown
of each metadata type:
name Def.
tweet_id The unique identifier for a tweet
inserted_date When the tweet is downloaded into your database
language language
retweeted_status Is the tweet a RETWEET?
content The content of the tweet
from_user_scree
n_name
The screen name of the tweet sender
3. name Def.
from_user_followers_count The number of followers the sender has
from_user_friends_count The number of users the sender is following
from_user_listed_count How many times the sender is listed
from_user_statuses_count The number of tweets sent by the sender
from_user_description The profile bio of the sender
from_user_location The location of the sender
from_user_created_at When the Twitter account is created
retweet_count How many times the tweet is retweeted
entities_urls The URLs included in the tweet
entities_urls_count The number of URLs included in the tweet
entities_hashtags The hashtags included in the tweet
entities_hashtags_count The number of hashtags in the tweet
entities_mentions The screen-names mentioned in a tweet
4. name Def.
in_reply_to_screen_name The screen name of the user who is replied to
by the sender
in_reply_to_status_id The unique identifier of a reply
entities_expanded_urls Complete URLs extracted from short URLs
json_output The ENTIRE metadata in JSON format,
including metadata not parsed into columns
entities_media_count NA
media_expanded_url NA
media_url NA
media_type NA
video_link NA
photo_link NA
twitpic NA
5. Step 1: Checklist
• Do you know how to install necessary Python
libraries? If not, please review pg.8 in
http://curiositybits.com/python-for-mining-the-social-web/python-
tutorial-mining-twitter-user-profile/
• Do you know how to browse and edit SQLite
database through SQLite Database Browser? If not,
please review pg.10-14 in http://curiositybits.com/python-for-
mining-the-social-web/python-tutorial-mining-twitter-user-profile/
Download the code
https://drive.google.com/file/d/0Bwwg6GLCW_I
Pdm1mcHNXeU85Nkk/edit?usp=sharing
7. Step 1: Checklist
Most importantly, we need to install a Twitter mining
library called Twython
(https://twython.readthedocs.org/en/latest/index.html)
8. Step 2: enter the search terms
You can enter multiple search terms, separated by comas. Please notice
that the last search term ends by a coma.
You can enter non-English search terms. But make sure the Python
script starts by the following block of code:
9. Step 3: enter your API keys
API Key
API secret
Access token
Access token secret
Enter the key inside the quotation marks
10. Step 3: enter your API keys
• Set up your API keys - 1
First, go to https://dev.twitter.com/, and sign in your
Twitter account. Go to my applications page to create
an application.
11. Step 3: enter your API keys
• Set up your API keys - 2
Enter any name that makes sense to you
Enter any text that makes sense to you
you can enter any legitimate URL, here, I put in the URL of my institution.
Same as above, you can enter any legitimate
URL, here, I put in the URL of my institution.
12. Step 4: change the parameter
result_type defined by the Twitter API Documents. Now, we
set it to recent, we can also set it to mixed or popular.
13. Step 4: change the parameter
Here is a list of parameters you can tweak or add:
https://dev.twitter.com/docs/api/1.1/get/search/tweets
For example, if you want to limit the search to Chinese, you
can add lang = ‘zh’
14. Step 4: change the parameter
For another example, if you want to limit the search to all
tweets sent until April 1 of 2014. You can add until = ‘2014-
04-01’
15. Step 5: set up SQLite database
• When you type in just a file name, the database will be
saved in the same folder with the Python script. You can
use a full file path such as
sqlite:///C:/xxxx/xxx/MH370.sqlite.
17. If you run the script daily or twice a day, you should be
good enough to cover all tweets generated on that day,
and tweets a few days old.
But, historical tweets are EXPENSIVE! Tweets older than
a week can be purchased through http://gnip.com/
Are we getting all the tweets?