SlideShare una empresa de Scribd logo
1 de 42
Curiosity Bits - curiositybits.com
• This tutorial is created for social scientists interested in grabbing data from
the image-hosting site, Imgur (imgur.com).
• Find out more about Python for mining the social web, please visit Curiosity
Bits (curiositybits.com).
• Social-Metrics.org also hosts a series of Python tutorials on aggregating
and analyzingTwitter/Facebook data. More at social-metrics.org
CURIOSITY BITS©
Get Imgur Data through Python
Curiosity Bits - curiositybits.com
This tutorial shows you how to download images and the images’ meta-
data (i.e., image title, description, source, upload time, etc.).
CURIOSITY BITS©
Imgur.com is a popular site for
sharing selfies and images intended
for persuasion.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Images will be saved in a
designated folder.
The metadata will be saved in a SQLite database
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Imgur images are available through keyword search, reddit timelines and public
albums.
Here are the examples:
• http://imgur.com/search?q=hillary images from the search using the keyword “hillary”
• http://imgur.com/r/transtimelines images from a public reddit timeline called
transtimelines
• http://imgur.com/gallery/ROYAZ images from a public album called ROYAZ
Three sources of Imgur images
Curiosity Bits - curiositybits.com
Final note before we start: for simplicity, I am laying out only the most
essential steps. Previous tutorials provide details about how to set up a
Python programing environment, please visit curiositybits.com and click the
PYTHON tab.
CURIOSITY BITS©
Curiosity Bits - curiositybits.com
1. Install Anaconda Python (with Spyder and Ipython Notebook)
2. Install SQLite Browser
3. Install four essential Python packages (imgurpython, sqlalchemy, urllib, sqlite3)
4. Register a Imgur client to get client ID and client secret
5. Create a SQLite database for images from keyword search
6. Download images from the keyword search
7. Create a SQLite database for images from a reddit timeline
8. Download images from a reddit timeline
9. Create a SQLite database for images from a public album
10. Download images from a public album
CURIOSITY BITS©
Steps
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Install Anaconda Python
• store.continuum.io/cshop/anaconda/
You can run Python codes in Spyder,
which is a component in Anaconda
Python
Or use IPython Notebook, a web-
based interactive Python
environment
CURIOSITY BITS©
This is what Spyder looks like.The left side of
the window displays codes.The one at the
bottom right shows you the results.
IPython Notebook runs your codes on
the web and displays results in your
browser.
Install Anaconda Python
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
http://sqlitebrowser.org
Install SQLite Browser
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
• In the Command Prompt, use the command line pip install, followed by the
package name, to install necessary packages.
Required packages
• Imgurpython
• Sqlalchemy
• Urllib
• sqlite3
Install four essential Python packages
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
type the following command lines in the command prompt:
pip install imgurpython
pip install sqlalchemy
pip install urllib
pip install sqlite3
Install four essential Python packages
Curiosity Bits - curiositybits.com
Imgur FetcherV1 is a collection of
codes used in this tutorial.You
can download them at
https://github.com/cosmopolitan
van/imgur_curiositybits_v1
On the site, you can also find
three examples of SQLite
database.The database files end
with the extention .sqlite
CURIOSITY BITS©
The Python codes used in the tutorial
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Sign in your Imgur account,
go to
https://api.imgur.com/oauth
2/addclient and get your
“security clearance”.
Register a Imgur client
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
You need Client ID and Client
secret to grab data from
Imgur API
Register a Imgur client
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Let’s get images with the keyword “hilliary”
http://imgur.com/search?q=hillary
We will download all available images, but at first, let’s get their metadata. By
metadata, I mean, attributes related to each image. Examples of the metadata
include image title, image description, image upload date, image link (which is
what we will use to download images).
Create a SQLite database for images from keyword search
You can right-click the
link and download the file
with the extension .py.
Open the file in Spyder.
Or, copy the entire block
of codes into Ipython
Notebook.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Use the code named Imgur search v1.py
CURIOSITY BITS©
You don’t have to
change anything in this
block of codes. It is
used for importing
necessary Python
packages.Think of
packages as apps
running on iOS.
Open Imgur search v1.py in Spyder
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
This is where you enter
the keyword(s) you
want to apply to the
search.You can have
multiple keywords,
wrapped in
parenthesis, and
separated by a comma.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
This is where you enter
the client ID and client
secret generated in the
previous step.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
No tweaking is needed
for this block. But it
gives you a sense of
what data are to be
collected.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Go to line 136, this is where you specify the name of the SQLite database
to be saved. If no absolute file path is given, the database will be saved in
the same folder with your Python script (Imgur search v1.py).
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Imgur images are indexed into multiple pages. Here, 5 means that we are to get five
pages of images from the keyword search.You can put 3 or 1 or 2, just play around to
see what number gives you the most adequate, while at the same time manageable
amount of data.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Now, the tweaking is done! From the
menu, chose Run-Configure and hit RUN
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
This is the SQLite database that contains all
the metadata.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Notice that the last column is called multiple_images.
Most image links will end with an extension name of a
picture (.jpg, .png, .,bmp, .gif). If that is the case, the
value on the column will be NO.
But, some links contain multiple images, and you will
see “this link contains multiple images…” warning in
the column.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Now, we have a saved SQLite database with image links.The next step is to
use Python to download images from those image links
Use the .py file named Imgur search_downloader v1.py
Download images from the keyword search
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Go to line 72, and make sure the filename
there matches the SQLite database we
have just created.
Download images from the keyword search
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Please create a new folder and
enter the folder path here.
Image filename is comprised of
the image’s unique identifier id
from Imgur API.
Download images from the keyword search
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Again, this script downloads
images from links with the
extension of .jpg, .png, .gif,
.bmp
If an image link ends with
anything but an image
extension, the script will prompt
a reminder message.
Download images from the keyword search
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Now, execute the code and you will find the downloaded images in the folder you have
just created.
Download images from the keyword search
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Getting image from a reddit timeline is very similar to getting images from keyword
search.This time, we will try executing the Python code in Ipython Notebook.
Use the script called Imgur reddit timeline v1.py
Create a SQLite database for images from a reddit timeline
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Create a new notebook in IPython Notebook and copy and paste the code from Imgur
reddit timeline v1.py
Create a SQLite database for images from a reddit timeline
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Enter the name of a reddit timeline.
In the URL term, a reddit timeline displayed
as
http://imgur.com/r/buffalo
Enter your client ID and client secret.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Enter or change the filename of the SQLite
database to be saved
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
The number of pages to be grabbed…
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Click Run Cell
The output will be saved in your
IPython Notebooks folder. By
default, it is
…DocumentsIPython Notebooks
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Download images from a reddit timeline
Use the .py file named Imgur reddit_downloader v1.py
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Download images from a reddit timeline
Load the code in IPython Notebooks, Like what you do to download images from
keyword search, make sure the file path in the code matches the database you have just
created.
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Download images from a reddit timeline
Specify the folder path,
and run the code!
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Create a SQLite database for images from a public album
Exactly as what we do in previous steps.There are only a few places in the script that
need tweaking.You need to enter client ID, client secret, the name of the album, the file
path of the database, the number of pages to be grabbed.Then you are all good to go!
To create a SQLite database for images from a public album, use the script named
Imgur album v1.py
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Download images from a public album
To download images from a public album, use the script named Imgur
alubm_downloader v1.py
Curiosity Bits - curiositybits.com
CURIOSITY BITS©
Having questions?
Contact @cosmopolitanvan onTwitter

Más contenido relacionado

La actualidad más candente

Android Presentation
Android Presentation Android Presentation
Android Presentation Nik Sharma
 
Corporate Secret Challenge - CyberDefenders.org by Azad
Corporate Secret Challenge - CyberDefenders.org by AzadCorporate Secret Challenge - CyberDefenders.org by Azad
Corporate Secret Challenge - CyberDefenders.org by AzadAzad Mzuri
 
Feed the Masses
Feed the MassesFeed the Masses
Feed the Massespbugni
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Android session 4-behestee
Android session 4-behesteeAndroid session 4-behestee
Android session 4-behesteeHussain Behestee
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with PythonOlga Scrivner
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachbutest
 
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookMiriam Fernandez
 

La actualidad más candente (10)

Android Presentation
Android Presentation Android Presentation
Android Presentation
 
Corporate Secret Challenge - CyberDefenders.org by Azad
Corporate Secret Challenge - CyberDefenders.org by AzadCorporate Secret Challenge - CyberDefenders.org by Azad
Corporate Secret Challenge - CyberDefenders.org by Azad
 
Browser Extensions
Browser ExtensionsBrowser Extensions
Browser Extensions
 
R project(Analyze Twitter with R)
R project(Analyze Twitter with R)R project(Analyze Twitter with R)
R project(Analyze Twitter with R)
 
Feed the Masses
Feed the MassesFeed the Masses
Feed the Masses
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Android session 4-behestee
Android session 4-behesteeAndroid session 4-behestee
Android session 4-behestee
 
Introduction to Web Scraping with Python
Introduction to Web Scraping with PythonIntroduction to Web Scraping with Python
Introduction to Web Scraping with Python
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
 
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from FacebookESWC 2014 Tutorial Handson 1: Collect Data from Facebook
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
 

Similar a Python Tutorial-Mining imgur images

How to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico MeisenzahlHow to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico MeisenzahlContainerDay Security 2023
 
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico MeisenzahlHow to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico MeisenzahlContainerDay Security 2023
 
Container Day Security: How to Prevent Your Kubernetes Cluster From Being Hacked
Container Day Security: How to Prevent Your Kubernetes Cluster From Being HackedContainer Day Security: How to Prevent Your Kubernetes Cluster From Being Hacked
Container Day Security: How to Prevent Your Kubernetes Cluster From Being HackedNico Meisenzahl
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python courseEran Shlomo
 
DEVNET-2003 Coding 203: Python - User Input, File I/O, Logging and REST API C...
DEVNET-2003	Coding 203: Python - User Input, File I/O, Logging and REST API C...DEVNET-2003	Coding 203: Python - User Input, File I/O, Logging and REST API C...
DEVNET-2003 Coding 203: Python - User Input, File I/O, Logging and REST API C...Cisco DevNet
 
[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...
[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...
[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...Kobkrit Viriyayudhakorn
 
Getting started with titanium
Getting started with titaniumGetting started with titanium
Getting started with titaniumNaga Harish M
 
Getting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumGetting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumTechday7
 
Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"Lviv Startup Club
 
Tic tac toe with IBM DevOps
Tic tac toe with IBM DevOpsTic tac toe with IBM DevOps
Tic tac toe with IBM DevOpsShaily Dubey
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without ServersIan Massingham
 
Expanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate UsabilityExpanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate UsabilityTeamstudio
 
StoryCode Immersion #5 - Popcorn.JS Deep Dive
StoryCode Immersion #5 - Popcorn.JS Deep DiveStoryCode Immersion #5 - Popcorn.JS Deep Dive
StoryCode Immersion #5 - Popcorn.JS Deep Divestorycode
 
TTN things connected acount creation
TTN things connected acount creationTTN things connected acount creation
TTN things connected acount creationJisc
 
Automating Deployment with Github and CodeDeploy
Automating Deployment with Github and CodeDeployAutomating Deployment with Github and CodeDeploy
Automating Deployment with Github and CodeDeployAmazon Web Services
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonCodeOps Technologies LLP
 
[English][Test Girls] Zero to Hero: Start Test automation with Cypress
[English][Test Girls] Zero to Hero: Start Test automation with Cypress[English][Test Girls] Zero to Hero: Start Test automation with Cypress
[English][Test Girls] Zero to Hero: Start Test automation with CypressTest Girls
 
Reark : a Reference Architecture for Android using RxJava
Reark : a Reference Architecture for Android using RxJavaReark : a Reference Architecture for Android using RxJava
Reark : a Reference Architecture for Android using RxJavaFuturice
 

Similar a Python Tutorial-Mining imgur images (20)

Web Scrapping Using Python
Web Scrapping Using PythonWeb Scrapping Using Python
Web Scrapping Using Python
 
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico MeisenzahlHow to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
 
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico MeisenzahlHow to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
How to Prevent Your Kubernetes Cluster From Being Hacked by Nico Meisenzahl
 
Container Day Security: How to Prevent Your Kubernetes Cluster From Being Hacked
Container Day Security: How to Prevent Your Kubernetes Cluster From Being HackedContainer Day Security: How to Prevent Your Kubernetes Cluster From Being Hacked
Container Day Security: How to Prevent Your Kubernetes Cluster From Being Hacked
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
 
DEVNET-2003 Coding 203: Python - User Input, File I/O, Logging and REST API C...
DEVNET-2003	Coding 203: Python - User Input, File I/O, Logging and REST API C...DEVNET-2003	Coding 203: Python - User Input, File I/O, Logging and REST API C...
DEVNET-2003 Coding 203: Python - User Input, File I/O, Logging and REST API C...
 
Photogram - English Manual
Photogram  - English Manual Photogram  - English Manual
Photogram - English Manual
 
[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...
[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...
[React-Native Tutorial 10] Camera Roll / Gallery / Camera / Native Modules by...
 
Getting started with titanium
Getting started with titaniumGetting started with titanium
Getting started with titanium
 
Getting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumGetting started with Appcelerator Titanium
Getting started with Appcelerator Titanium
 
Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"Viktor Tsykunov "Microsoft AI platform for every Developer"
Viktor Tsykunov "Microsoft AI platform for every Developer"
 
Tic tac toe with IBM DevOps
Tic tac toe with IBM DevOpsTic tac toe with IBM DevOps
Tic tac toe with IBM DevOps
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without Servers
 
Expanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate UsabilityExpanding XPages with Bootstrap Plugins for Ultimate Usability
Expanding XPages with Bootstrap Plugins for Ultimate Usability
 
StoryCode Immersion #5 - Popcorn.JS Deep Dive
StoryCode Immersion #5 - Popcorn.JS Deep DiveStoryCode Immersion #5 - Popcorn.JS Deep Dive
StoryCode Immersion #5 - Popcorn.JS Deep Dive
 
TTN things connected acount creation
TTN things connected acount creationTTN things connected acount creation
TTN things connected acount creation
 
Automating Deployment with Github and CodeDeploy
Automating Deployment with Github and CodeDeployAutomating Deployment with Github and CodeDeploy
Automating Deployment with Github and CodeDeploy
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
 
[English][Test Girls] Zero to Hero: Start Test automation with Cypress
[English][Test Girls] Zero to Hero: Start Test automation with Cypress[English][Test Girls] Zero to Hero: Start Test automation with Cypress
[English][Test Girls] Zero to Hero: Start Test automation with Cypress
 
Reark : a Reference Architecture for Android using RxJava
Reark : a Reference Architecture for Android using RxJavaReark : a Reference Architecture for Android using RxJava
Reark : a Reference Architecture for Android using RxJava
 

Más de Weiai Wayne Xu

Big data, small data and everything in between
Big data, small data and everything in betweenBig data, small data and everything in between
Big data, small data and everything in betweenWeiai Wayne Xu
 
Say search and sales e-cigar and big data
Say search and sales   e-cigar and big data Say search and sales   e-cigar and big data
Say search and sales e-cigar and big data Weiai Wayne Xu
 
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social MediaPredicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social MediaWeiai Wayne Xu
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0Weiai Wayne Xu
 
The Networked Cultural Diffusion of Kpop on YouTube
The Networked Cultural Diffusion of Kpop on YouTubeThe Networked Cultural Diffusion of Kpop on YouTube
The Networked Cultural Diffusion of Kpop on YouTubeWeiai Wayne Xu
 
Network Structures For A Better Twitter Community
Network Structures For A Better Twitter CommunityNetwork Structures For A Better Twitter Community
Network Structures For A Better Twitter CommunityWeiai Wayne Xu
 
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR) How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR) Weiai Wayne Xu
 
What makes an image worth a thousand words NCA2014
What makes an image worth a thousand words   NCA2014What makes an image worth a thousand words   NCA2014
What makes an image worth a thousand words NCA2014Weiai Wayne Xu
 
Predicting opinion leadership on twitter
Predicting opinion leadership on twitter   Predicting opinion leadership on twitter
Predicting opinion leadership on twitter Weiai Wayne Xu
 

Más de Weiai Wayne Xu (10)

Big data, small data and everything in between
Big data, small data and everything in betweenBig data, small data and everything in between
Big data, small data and everything in between
 
Say search and sales e-cigar and big data
Say search and sales   e-cigar and big data Say search and sales   e-cigar and big data
Say search and sales e-cigar and big data
 
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social MediaPredicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
 
Xu talk 3-17-2015
Xu talk 3-17-2015Xu talk 3-17-2015
Xu talk 3-17-2015
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0
 
The Networked Cultural Diffusion of Kpop on YouTube
The Networked Cultural Diffusion of Kpop on YouTubeThe Networked Cultural Diffusion of Kpop on YouTube
The Networked Cultural Diffusion of Kpop on YouTube
 
Network Structures For A Better Twitter Community
Network Structures For A Better Twitter CommunityNetwork Structures For A Better Twitter Community
Network Structures For A Better Twitter Community
 
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR) How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
 
What makes an image worth a thousand words NCA2014
What makes an image worth a thousand words   NCA2014What makes an image worth a thousand words   NCA2014
What makes an image worth a thousand words NCA2014
 
Predicting opinion leadership on twitter
Predicting opinion leadership on twitter   Predicting opinion leadership on twitter
Predicting opinion leadership on twitter
 

Último

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 

Último (20)

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 

Python Tutorial-Mining imgur images

  • 1. Curiosity Bits - curiositybits.com • This tutorial is created for social scientists interested in grabbing data from the image-hosting site, Imgur (imgur.com). • Find out more about Python for mining the social web, please visit Curiosity Bits (curiositybits.com). • Social-Metrics.org also hosts a series of Python tutorials on aggregating and analyzingTwitter/Facebook data. More at social-metrics.org CURIOSITY BITS© Get Imgur Data through Python
  • 2. Curiosity Bits - curiositybits.com This tutorial shows you how to download images and the images’ meta- data (i.e., image title, description, source, upload time, etc.). CURIOSITY BITS© Imgur.com is a popular site for sharing selfies and images intended for persuasion.
  • 3. Curiosity Bits - curiositybits.com CURIOSITY BITS© Images will be saved in a designated folder. The metadata will be saved in a SQLite database
  • 4. Curiosity Bits - curiositybits.com CURIOSITY BITS© Imgur images are available through keyword search, reddit timelines and public albums. Here are the examples: • http://imgur.com/search?q=hillary images from the search using the keyword “hillary” • http://imgur.com/r/transtimelines images from a public reddit timeline called transtimelines • http://imgur.com/gallery/ROYAZ images from a public album called ROYAZ Three sources of Imgur images
  • 5. Curiosity Bits - curiositybits.com Final note before we start: for simplicity, I am laying out only the most essential steps. Previous tutorials provide details about how to set up a Python programing environment, please visit curiositybits.com and click the PYTHON tab. CURIOSITY BITS©
  • 6. Curiosity Bits - curiositybits.com 1. Install Anaconda Python (with Spyder and Ipython Notebook) 2. Install SQLite Browser 3. Install four essential Python packages (imgurpython, sqlalchemy, urllib, sqlite3) 4. Register a Imgur client to get client ID and client secret 5. Create a SQLite database for images from keyword search 6. Download images from the keyword search 7. Create a SQLite database for images from a reddit timeline 8. Download images from a reddit timeline 9. Create a SQLite database for images from a public album 10. Download images from a public album CURIOSITY BITS© Steps
  • 7. Curiosity Bits - curiositybits.com CURIOSITY BITS© Install Anaconda Python • store.continuum.io/cshop/anaconda/ You can run Python codes in Spyder, which is a component in Anaconda Python Or use IPython Notebook, a web- based interactive Python environment
  • 8. CURIOSITY BITS© This is what Spyder looks like.The left side of the window displays codes.The one at the bottom right shows you the results. IPython Notebook runs your codes on the web and displays results in your browser. Install Anaconda Python
  • 9. Curiosity Bits - curiositybits.com CURIOSITY BITS© http://sqlitebrowser.org Install SQLite Browser
  • 10. Curiosity Bits - curiositybits.com CURIOSITY BITS© • In the Command Prompt, use the command line pip install, followed by the package name, to install necessary packages. Required packages • Imgurpython • Sqlalchemy • Urllib • sqlite3 Install four essential Python packages
  • 11. Curiosity Bits - curiositybits.com CURIOSITY BITS© type the following command lines in the command prompt: pip install imgurpython pip install sqlalchemy pip install urllib pip install sqlite3 Install four essential Python packages
  • 12. Curiosity Bits - curiositybits.com Imgur FetcherV1 is a collection of codes used in this tutorial.You can download them at https://github.com/cosmopolitan van/imgur_curiositybits_v1 On the site, you can also find three examples of SQLite database.The database files end with the extention .sqlite CURIOSITY BITS© The Python codes used in the tutorial
  • 13. Curiosity Bits - curiositybits.com CURIOSITY BITS© Sign in your Imgur account, go to https://api.imgur.com/oauth 2/addclient and get your “security clearance”. Register a Imgur client
  • 14. Curiosity Bits - curiositybits.com CURIOSITY BITS© You need Client ID and Client secret to grab data from Imgur API Register a Imgur client
  • 15. Curiosity Bits - curiositybits.com CURIOSITY BITS© Let’s get images with the keyword “hilliary” http://imgur.com/search?q=hillary We will download all available images, but at first, let’s get their metadata. By metadata, I mean, attributes related to each image. Examples of the metadata include image title, image description, image upload date, image link (which is what we will use to download images). Create a SQLite database for images from keyword search
  • 16. You can right-click the link and download the file with the extension .py. Open the file in Spyder. Or, copy the entire block of codes into Ipython Notebook. Curiosity Bits - curiositybits.com CURIOSITY BITS© Use the code named Imgur search v1.py
  • 17. CURIOSITY BITS© You don’t have to change anything in this block of codes. It is used for importing necessary Python packages.Think of packages as apps running on iOS. Open Imgur search v1.py in Spyder
  • 18. Curiosity Bits - curiositybits.com CURIOSITY BITS© This is where you enter the keyword(s) you want to apply to the search.You can have multiple keywords, wrapped in parenthesis, and separated by a comma.
  • 19. Curiosity Bits - curiositybits.com CURIOSITY BITS© This is where you enter the client ID and client secret generated in the previous step.
  • 20. Curiosity Bits - curiositybits.com CURIOSITY BITS© No tweaking is needed for this block. But it gives you a sense of what data are to be collected.
  • 21. Curiosity Bits - curiositybits.com CURIOSITY BITS© Go to line 136, this is where you specify the name of the SQLite database to be saved. If no absolute file path is given, the database will be saved in the same folder with your Python script (Imgur search v1.py).
  • 22. Curiosity Bits - curiositybits.com CURIOSITY BITS© Imgur images are indexed into multiple pages. Here, 5 means that we are to get five pages of images from the keyword search.You can put 3 or 1 or 2, just play around to see what number gives you the most adequate, while at the same time manageable amount of data.
  • 23. Curiosity Bits - curiositybits.com CURIOSITY BITS© Now, the tweaking is done! From the menu, chose Run-Configure and hit RUN
  • 24. Curiosity Bits - curiositybits.com CURIOSITY BITS© This is the SQLite database that contains all the metadata.
  • 25. Curiosity Bits - curiositybits.com CURIOSITY BITS© Notice that the last column is called multiple_images. Most image links will end with an extension name of a picture (.jpg, .png, .,bmp, .gif). If that is the case, the value on the column will be NO. But, some links contain multiple images, and you will see “this link contains multiple images…” warning in the column.
  • 26. Curiosity Bits - curiositybits.com CURIOSITY BITS© Now, we have a saved SQLite database with image links.The next step is to use Python to download images from those image links Use the .py file named Imgur search_downloader v1.py Download images from the keyword search
  • 27. Curiosity Bits - curiositybits.com CURIOSITY BITS© Go to line 72, and make sure the filename there matches the SQLite database we have just created. Download images from the keyword search
  • 28. Curiosity Bits - curiositybits.com CURIOSITY BITS© Please create a new folder and enter the folder path here. Image filename is comprised of the image’s unique identifier id from Imgur API. Download images from the keyword search
  • 29. Curiosity Bits - curiositybits.com CURIOSITY BITS© Again, this script downloads images from links with the extension of .jpg, .png, .gif, .bmp If an image link ends with anything but an image extension, the script will prompt a reminder message. Download images from the keyword search
  • 30. Curiosity Bits - curiositybits.com CURIOSITY BITS© Now, execute the code and you will find the downloaded images in the folder you have just created. Download images from the keyword search
  • 31. Curiosity Bits - curiositybits.com CURIOSITY BITS© Getting image from a reddit timeline is very similar to getting images from keyword search.This time, we will try executing the Python code in Ipython Notebook. Use the script called Imgur reddit timeline v1.py Create a SQLite database for images from a reddit timeline
  • 32. Curiosity Bits - curiositybits.com CURIOSITY BITS© Create a new notebook in IPython Notebook and copy and paste the code from Imgur reddit timeline v1.py Create a SQLite database for images from a reddit timeline
  • 33. Curiosity Bits - curiositybits.com CURIOSITY BITS© Enter the name of a reddit timeline. In the URL term, a reddit timeline displayed as http://imgur.com/r/buffalo Enter your client ID and client secret.
  • 34. Curiosity Bits - curiositybits.com CURIOSITY BITS© Enter or change the filename of the SQLite database to be saved
  • 35. Curiosity Bits - curiositybits.com CURIOSITY BITS© The number of pages to be grabbed…
  • 36. Curiosity Bits - curiositybits.com CURIOSITY BITS© Click Run Cell The output will be saved in your IPython Notebooks folder. By default, it is …DocumentsIPython Notebooks
  • 37. Curiosity Bits - curiositybits.com CURIOSITY BITS© Download images from a reddit timeline Use the .py file named Imgur reddit_downloader v1.py
  • 38. Curiosity Bits - curiositybits.com CURIOSITY BITS© Download images from a reddit timeline Load the code in IPython Notebooks, Like what you do to download images from keyword search, make sure the file path in the code matches the database you have just created.
  • 39. Curiosity Bits - curiositybits.com CURIOSITY BITS© Download images from a reddit timeline Specify the folder path, and run the code!
  • 40. Curiosity Bits - curiositybits.com CURIOSITY BITS© Create a SQLite database for images from a public album Exactly as what we do in previous steps.There are only a few places in the script that need tweaking.You need to enter client ID, client secret, the name of the album, the file path of the database, the number of pages to be grabbed.Then you are all good to go! To create a SQLite database for images from a public album, use the script named Imgur album v1.py
  • 41. Curiosity Bits - curiositybits.com CURIOSITY BITS© Download images from a public album To download images from a public album, use the script named Imgur alubm_downloader v1.py
  • 42. Curiosity Bits - curiositybits.com CURIOSITY BITS© Having questions? Contact @cosmopolitanvan onTwitter