Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter's models and API endpoints, and it transparently handles various implementation details, such as: Data encoding and decoding.
1. NADAR SARASWATHI COLLEGE OF ARTS & SCIENCE,THENI.
DEPARTMENT OF CS & IT
PYTHON PROGRAMMING
PRESENTED BY
G.KAVIYA
M.SC(IT)
TOPIC:COLLECTING
INFROMATION FROM
TWITTER.
2. SYNOPSIS:
History of Python
What is Python?
What is python used for?
Advantages and Disadvantages of Python
Applications of Python
3. History of Python
Guido van Rossum began working on Python in the late 1980s as a
successor to the ABC programming language and first released it in 1991
as Python 0.9.0.Python 2.0 was released in 2000 and introduced new
features such as list comprehensions, cycle-detecting garbage
collection, reference counting, and Unicode support. Python 3.0, released
in 2008, was a major revision that is not completely backward-
compatible with earlier versions. Python 2 was discontinued with
version 2.7.18 in 2020.
Python consistently ranks as one of the most popular programming
languages.
4. What is Python?
Python is a high-level, interpreted, general-purpose programming
language. Its design philosophy emphasizes code readability with the use
of significant indentation.
Python is dynamically-typed and garbage-collected. It supports
multiple programming paradigms,
including structured (particularly procedural), object-
oriented and functional programming. It is often described as a "batteries
included" language due to its comprehensive standard library.
5. What is python used for?
Python is commonly used for developing websites and software, task
automation, data analysis, and data visualization. Since it’s relatively easy
to learn, Python has been adopted by many non-programmers such as
accountants and scientists, for a variety of everyday tasks, like organizing
finances.
What can you do with python? Some things include:
Data analysis and machine learning
Web development
Automation or scripting
Software testing and prototyping
Everyday tasks
6. Advantages and Disadvantages of Python
Advantages Disadvantages
It is easy to learn and use, and it has an
extensive library.
Because of its elementary programming,
users face difficulty while working with
other programming languages.
Python increases productivity.
Python is a time-consuming language. It
has a low execution speed.
It is very flexible.
There are many issues with the design of
the language, which only gets displayed
during runtime.
It has a very supportive community.
It is not suited for memory-intensive
programs and mobile applications.
7. Applications of Python:
These are some real-world Python applications:
Web and Internet Development
Desktop GUI Applications
Science and Numeric
Software Development
Education
Database Access
Network Programming
Games and 3D Graphics
Business Application
8. Web and Internet Development:
Python offers many choices for web
development:
Frameworks such as Django and Pyramid.
Micro-frameworks such
as Flask and Bottle.
Advanced content management systems
such as Plone and django CMS.
Python's standard library supports
many Internet protocols:
HTML and XML
JSON
E-mail processing.
Support for FTP, IMAP, and other Internet
protocols.
Easy-to-use socket interface.
And the Package Index has yet more
libraries:
Requests, a powerful HTTP client library.
Beautiful Soup, an HTML parser that can
handle all sorts of oddball HTML.
Feedparser for parsing RSS/Atom feeds.
Paramiko, implementing the SSH2
protocol.
Twisted Python, a framework for
asynchronous network programming.
9. Desktop GUIs:
The Tk GUI library is included with most binary distributions of Python.
Some toolkits that are usable on several platforms are available separately:
wxWidgets
Kivy, for writing multitouch applications.
Qt via pyqt or pyside
Platform-specific toolkits are also available:
GTK+
Microsoft Foundation Classes through the win32 extensions
10. Scientific and Numeric:
Python is widely used in scientific and numeric computing:
SciPy is a collection of packages for mathematics, science, and engineering.
Pandas is a data analysis and modeling library.
IPython is a powerful interactive shell that features easy editing and
recording of a work session, and supports visualizations and parallel
computing.
The Software Carpentry Course teaches basic skills for scientific
computing, running bootcamps and providing open-access teaching
materials.
11. Software Development:
Python is often used as a support language for software developers, for
build control and management, testing, and in many other ways.
SCons for build control.
Buildbot and Apache Gump for automated continuous compilation and
testing.
Roundup or Trac for bug tracking and project management.
12. Education:
Python is a superb language for teaching programming, both at the
introductory level and in more advanced courses.
Books such as How to Think Like a Computer Scientist, Python
Programming: An Introduction to Computer Science, and Practical
Programming.
The Education Special Interest Group is a good place to discuss teaching
issues.
13. Database Access:
This is one of the hottest Python Applications.
With Python, you have:
Custom and ODBC interfaces to MySQL, Oracle, PostgreSQL, MS SQL Server,
and others. These are freely available for download.
Object databases like Durus and ZODB
Standard Database API
14. Network Programming:
With all those possibilities, how would Python slack in network
programming? It does provide support for lower-level network
programming:
Twisted Python — A framework for asynchronous network programming.
We mentioned it in section 2.
An easy-to-use socket interface
15. Games and 3D Graphics:
Safe to say, this one is the most interesting. When people hear someone say
they’re learning Python, the first thing they get asked is — ‘So, did you
make a game yet?’
PyGame, PyKyra are two frameworks for game-development with Python.
Apart from these, we also get a variety of 3D-rendering libraries.
If you’re one of those game-developers, you can check out PyWeek, a semi-
annual game programming contest.
16. Business Applications
Python is also used to build ERP and e-commerce systems:
Odoo is an all-in-one management software that offers a range of business
applications that form a complete suite of enterprise management
applications.
Tryton is a three-tier high-level general purpose application platform.
17. Collecting information from twitter
Introduction:
Twitter is a world wide densely used channel for sharing thoughts,
opinions and experiences. Making this web site a great source of media and
text content which is useful data for analyzing and taking insights.
Furthermore, there is a Twitter feature that offers the possibility to grep
tweets about certain subject, tracking data related to some words and,
then, obtaining information about trend topics, persons, hashtags or any
other theme.
In this article, it is described a way for consuming this feature using the
programming language Python through the library Tweepy.
18. Tracking
In order to access Twitter data by code, it is necessary to apply for Twitter
Developer to get your own API keys. This process is a little time consuming
but is required to proceed.
To start coding, create a Python script file and set the variables below using
your keys.
CONSUMER_KEY = 'XXXXXXX'
CONSUMER_SECRET = 'XXXXXXX'
ACCESS_TOKEN = 'XXXXXXX'
ACCESS_TOKEN_SECRET = 'XXXXXXX'
19. Tweepy
There are a lot of possible ways for accessing Twitter API with Python. In
this article, Tweepy library will be used. To install this Python module
with pip. Run:
Then, import Tweepy module and apply your keys to authentication,
creating a Twitter API object that allows the access.
$ pip install tweepy
import tweepyauth = tweepy.OAuthHandler(CONSUMER_KEY,
CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
20. Streaming
Using Tweepy module, it’s possible to access and customize the tweet
streaming feature, which is useful for obtaining a very high volume of
tweet data, since it returns real time published tweets.
Setting tracking behavior
In order to be able to define what the program will do whenever a tweet is
published, it’s required to create a class that extends StreamListener from
Tweepy and override on_status method to add the desired behavior. Below
is an example for just printing tweet text.
21. Continue:
class TweetListener(tweepy.StreamListener):
def on_status(self, tweet):
print(tweet.text)
Tweepy offers a class called Stream that requires authentication and a
listener to be instantiated. So, create a Stream object that receives
the auth attribute from api variable defined earlier and uses an instance of
the above TweetListener class.
22. Continue:
listener = TweetListener()
stream = tweepy.Stream(auth = api.auth,
listener=listener)
Start stream
There are several streaming process available through Tweepy. To
start streaming tweets, you can use the filter process available
through filter method of the stream object. With it, it’s possible to track
tweets containing a list of words or follow tweets from multiple users and
even select the languages that will be considered.
23. Continue:
The code below, for example, starts printing tweets wrote in english containing words
related to COVID-19 (“coronavirus”, “covid”, “covid19”, “covid-19”). Notice that this
is just an example, feel free to change filter parameters.
# filter parameters
words = ['coronavirus', 'covid', 'covid19', 'covid-
19']
languages = ['en']# streaming...
stream.filter(track=words, languages=languages)
24. Continue:
By now, the script is only printing tweets. Once started, it won’t end until be
manually stopped (pressing CTRL + C or killing the system process) and it
will not record any information. Thus, for further analyses, it’s necessary to
label and store the data.
25. Auto cancel
A way to archive the recording feature is updating the TweetListener class, setting up a list
attribute that is filled by on_status method. Since the streaming process is infinite, it’s also
required to set a threshold that will automatically cancel the stream by
returning False on on_status once it’s reached.
# set default threshold value
DEFAULT_THRESHOLD = 10# older listener with changes
class TweetListener(tweepy.StreamListener) :
def __init__(self, threshold = DEFAULT_THRESHOLD) :
super().__init__()
self.threshold = threshold
self.tweets = [] def on_status(self, tweet):
if len(self.tweets) < self.threshold :
print(tweet)
self.tweets.append(tweet)
else:
return False
26. Labels and fields
A single tweet carry a lot of data, such as content text, media, favorite
count, owner and so on. For more details, take a look on this page about
the Tweet object at Twitter Developer docs. Every applying case requires
different information, choose the interesting fields for your case and
discard what is left.
It is important to mention that if tweet text exceed 140 characters,
the text attribute will be truncated. In this case, the tweet object will have
the extended_tweet attribute. So, to access the full text,
use extended_tweet[‘full_text’].
27. Continue:
# older listener with changes
class TweetListener(tweepy.StreamListener) :
def __init__(self, threshold = DEFAULT_THRESHOLD) :
super().__init__()
self.threshold = threshold
self.tweets = [] def on_status(self, tweet):
if len(self.tweets) < self.threshold :
text = (
tweet.extended_tweet['full_text']
if hasattr(tweet, 'extended_tweet')
else tweet.text
)
desired_fields = [tweet.id, text]
print(desired_fields)
self.tweets.append(desired_fields)
else:
return False
28. Storing
By now, all tweets tracked are stored at tweets attribute inner TweetListener object.
We can use Pandas to create a DataFrame and save it in a CSV file:
import pandas as pdcolumns = ['id', 'text']
output_file = 'tweets.csv'tweets =
pd.DataFrame(listener.tweets, columns=columns])
tweets.to_csv(output_file, index = False)