SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
1

Mining Social Web APIs
with IPython Notebook
Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com
Data Day Texas - 11 January 2014
2

Intro
3

Hello, My Name Is ... Matthew
Background in Computer Science
Data mining & machine learning
CTO @ Digital Reasoning Systems
Data mining; machine learning
Author @ O'Reilly Media
5 published books on technology
Principal @ Zaffra
Selective boutique consulting
4

Transforming Curiosity Into Insight
An open source software (OSS) project
http://bit.ly/MiningTheSocialWeb2E
A book
http://bit.ly/135dHfs
Accessible to (virtually) everyone
Virtual machine with turn-key coding
templates for data science experiments
Think of the book as "premium" support for the
OSS project
5

The Social Web Is All the Rage
World population: ~7B people
Facebook: 1.15B users
Twitter: 500M users
Google+ 343M users
LinkedIn: 238M users
~200M+ blogs (conservative estimate)
6

Overview
Intro (5 mins)
Module 1 - Virtual Machine & IPython Notebook Overview (10 mins)
Module 2 - Twitter Intro/Overview (45 mins)
Module 3 - Twitter Firehose Analysis with pandas (45 mins)
Module 4 - Overview of other MTSW IPython Notebooks (5 mins)
Wrap Up/Final Q&A (10 mins)
7

Workshop Objective
To send you away as a social web hacker
Hands-on experience hacking on Twitter data
Empowered to walk away ready for on Facebook, LinkedIn, Google+, etc.
Broad working knowledge popular social web APIs
To have fun and learn a few things
8

Just a Few More Things
This workshop is...
An adaptation of Chapters 1+9 from Mining the Social Web, 2nd Edition
More of a guided hacking session where you follow along (vs a lecture)
Designed to be very hands-on, not a lecture

I'm available 24/7 this week (and beyond) to help you be successful
9

Assumptions
At some point in your life, you have
Programmed with Python
Worked with JSON
Made requests and processed responses to/from web servers

Or you want to learn to do these things now...
And you're a quick learner
10

Module 1: Virtual Machine Setup
11

Why do you need a VM?
To save time
Because installation and configuration management is harder than it first
appears
So that you can focus on the task at hand instead
So that I can support you regardless of your hardware and operating
system
12

But I can do all of that myself...
True...
If you would rather troubleshoot unexpected installation/configuration issues
instead of immediately focusing on the real task at hand

At least give it a shot before resorting to your own devices so that you
don't have to install specific versions of ~40 Python packages
Including scientific computing tools that require underlying C/C++ code to
be compiled
Which requires specific versions of developer libraries to be installed

You get the idea...
13

The Virtual Machine Experience
Vagrant
A nice abstraction around virtual machine providers
One ring to rule them all
Virtualbox, VMWare, AWS, ...

IPython Notebook
The easiest way to program with Python
A better REPL (interpreter)
Great for hacking
14

What happens when you vagrant up?
Vagrant follows the instructions in your Vagrantfile
Starts up a Virtualbox instance
Uses Chef to provision it
Installs OS patches/updates
Installs MTSW software dependencies
Starts IPython Notebook server on port 8888
15

Why Should I Use IPython Notebook?
Because it's great for hacking
And hacking is usually the first step

Because it's great for collaboration
Sharing/publishing results is trivial

Because the UX is as easy as working in a notepad
Think of it as "executable paper"
16
17
18

VM Quick Start Instructions
Go to http://MiningTheSocialWeb.com/quick-start/
Follow the instructions
And watch the screencasts!

Basically:
Install Virtualbox & Vagrant
Run "vagrant up" in a terminal to start a guest VM
Then, go to http://localhost:8888 on your host machine's web browser
19

What Could Be Easier?
A hosted version of the VM!
But only for a few hours during this workshop
Because it costs money to run these servers

Go to http://bit.ly/mtsw-ddtx14 and pick a machine
Please do not share the URLs outside of this workshop!
With a cherry on top...
20

A Hosted Virtual Machine
Is it free?
Perhaps...
...Sign-up for the AWS free tier at http://aws.amazon.com/free/
But not right now. Do it later

See this blog post for some inspiration on how to easily build your own
AMI from Vagrant boxes
http://wp.me/p3QiJd-3T
21

One More Thing

There's a new alpha product from O'Reilly Media that hosts IPython
Notebooks and other software to enhance reading experiences
I can share out "invites" with any interested volunteers
22

Module 2: Twitter Intro/Overview
23

Objectives
Be able to identify Twitter primitives
Understand tweet metadata and how to use it
Learn how to extract entities such as user mentions, hashtags, and URLs
Apply techniques for performing frequency analysis with Python
Be able to plot histograms of Twitter data with IPython Notebook
Learn about a Twitter cookbook that you can easily adapt
24

Twitter Primitives
Accounts Types: "Anything"
"Following" Relationships
Favorites
Retweets
Replies
(Almost) No Privacy Controls
25

API Requests
RESTful requests
Everything is a "resource"
You GET, PUT, POST, and DELETE resources
Standard HTTP "verbs"

Example: GET https://api.twitter.com/1.1/statuses/user_timeline.json?
screen_name=SocialWebMining

Streaming API filters
JSON responses
Cursors (not quite pagination)
26

Twitter is an Interest Graph
Johnny
Araya
Roberto

Mercedes

Rodolfo
Hernández

Ana

Jorge

Nina
27

What's in a Tweet?
140 Characters ...
... Plus ~5KB of metadata!
Authorship
Time & location
Tweet "entities"
Replying, retweeting, favoriting, etc.
28

What are Tweet Entities?
Essentially, the "easy to get at" data in the 140 characters
@usermentions
#hashtags
URLs
multiple variations

(financial) symbols
stock tickers

media
29

Data Mining Is Often Just...

Counting
Comparing
Filtering
Ranking
30

Histograms

A chart that is handy for frequency analysis
They look like bar charts...except they're not bar charts
Each value on the x-axis is a range (or "bin") of values
Not categorical data
Each value on the y-axis is the combined frequency of values in each range
31

Example: Histogram of Retweets
32

Social Media Analysis Framework
A memorable four step process to guide data science experiments:
Aspire
To test a hypothesis (answer a question)

Acquire
Get the data

Analyze
Count things

Summarize
Plot the results
33

Exercises
Review Python idioms in the "Appendix C (Python Tips & Tricks)" notebook
Follow the setup instructions in the "Chapter 1 (Mining Twitter)" notebook
Fill in Example 1-1 with credentials and begin work
See https://vimeo.com/79220146 for a helpful video
Execute each example sequentially
Customize queries, explore tweet metadata, count tweet entities, etc.
Explore the "Chapter 9 (Twitter Cookbook)" notebook
In particular, check out Example 9-8 (Twitter's Streaming API)
34

Module 3: Twitter Firehose
Analysis with pandas
35

Objectives

To understand how to capture data from Twitter's firehose
A understand basic pandas usage for tweets
To work through a data science experiment with a systematic 4-step
process
36

Social Media Analysis Framework

Remember:
Aspire
Acquire
Analyze
Summarize
37

Understanding the Reaction Amazon Prime Air

Open up the notebook entitled __Understanding the Reaction to Amazon Prime
Air.ipynb and follow along
Or, visit http://bit.ly/mtsw-amazon-prime-air and follow along if you're just joining us
38

Module 4: Overview of other
MTSW IPython Notebooks
39

Mining the Social Web ToC
Chapter 1 - Mining Twitter
Chapter 2 - Mining Facebook
Chapter 3 - Mining LinkedIn
Chapter 4 - Mining Google+
Chapter 5 - Mining Web Pages
Chapter 6 - Mining Mailboxes
Chapter 7 - Mining GitHub
Chapter 8 - Mining the Semantically Marked-Up Web
Chapter 9 - Twitter Cookbook
40

A Recommendation

Bookmark http://nbviewer.ipython.org
Take note of Mining the Social Web under "Books"
Notice lots of other terrific notebooks, too
41

Wrap Up / Final Q&A
42

Helpful Links & Free Stuff
http://MiningTheSocialWeb.com
Mining the Social Web 2E Chapter 1 (Chimera)
http://bit.ly/13XgNWR
Source Code (GitHub)
http://bit.ly/MiningTheSocialWeb2E
http://bit.ly/1fVf5ej (numbered examples)
Screencasts (Vimeo)
http://bit.ly/mtsw2e-screencasts

Más contenido relacionado

Similar a Mining Social Web APIs with IPython Notebook - Data Day Texas 2014

OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonCodeOps Technologies LLP
 
Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023Hal Speed
 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptxKaviya452563
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
MySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdfMySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdfNho Vĩnh
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceRoy Cecil
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxAnkitMishra616883
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiryVishwas N
 
Manual de ayuda: Programación desde el inicio en Python.
Manual de ayuda: Programación desde el inicio en Python.Manual de ayuda: Programación desde el inicio en Python.
Manual de ayuda: Programación desde el inicio en Python.cjgaland
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
Combining Machine Learning with Physical Computing - June 2023
Combining Machine Learning with Physical Computing - June 2023Combining Machine Learning with Physical Computing - June 2023
Combining Machine Learning with Physical Computing - June 2023Hal Speed
 
AI/ML/DL: Introduction to Deep Learning with Cognitive ToolKit
AI/ML/DL: Introduction to Deep Learning with Cognitive ToolKitAI/ML/DL: Introduction to Deep Learning with Cognitive ToolKit
AI/ML/DL: Introduction to Deep Learning with Cognitive ToolKitMarvin Heng
 
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019Codemotion
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Isabel Palomar
 
Creating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/EverywhereCreating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/EverywhereIsabel Palomar
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022GoDataDriven
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusRed Hat Developers
 

Similar a Mining Social Web APIs with IPython Notebook - Data Day Texas 2014 (20)

OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
 
Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023
 
Python
Python Python
Python
 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptx
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
MySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdfMySQL for Python_ Nho Vĩnh Share.pdf
MySQL for Python_ Nho Vĩnh Share.pdf
 
Deep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science ExperienceDeep Learning using Tensorflow and Data Science Experience
Deep Learning using Tensorflow and Data Science Experience
 
antrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptxantrikshindutrialmachinelearningPPT.pptx
antrikshindutrialmachinelearningPPT.pptx
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiry
 
Manual de ayuda: Programación desde el inicio en Python.
Manual de ayuda: Programación desde el inicio en Python.Manual de ayuda: Programación desde el inicio en Python.
Manual de ayuda: Programación desde el inicio en Python.
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
Combining Machine Learning with Physical Computing - June 2023
Combining Machine Learning with Physical Computing - June 2023Combining Machine Learning with Physical Computing - June 2023
Combining Machine Learning with Physical Computing - June 2023
 
AI/ML/DL: Introduction to Deep Learning with Cognitive ToolKit
AI/ML/DL: Introduction to Deep Learning with Cognitive ToolKitAI/ML/DL: Introduction to Deep Learning with Cognitive ToolKit
AI/ML/DL: Introduction to Deep Learning with Cognitive ToolKit
 
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...
 
Creating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/EverywhereCreating a Custom ML Model for your Application - Kotlin/Everywhere
Creating a Custom ML Model for your Application - Kotlin/Everywhere
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
2011 03-03-blti-umass
2011 03-03-blti-umass2011 03-03-blti-umass
2011 03-03-blti-umass
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Mining Social Web APIs with IPython Notebook - Data Day Texas 2014

  • 1. 1 Mining Social Web APIs with IPython Notebook Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com Data Day Texas - 11 January 2014
  • 3. 3 Hello, My Name Is ... Matthew Background in Computer Science Data mining & machine learning CTO @ Digital Reasoning Systems Data mining; machine learning Author @ O'Reilly Media 5 published books on technology Principal @ Zaffra Selective boutique consulting
  • 4. 4 Transforming Curiosity Into Insight An open source software (OSS) project http://bit.ly/MiningTheSocialWeb2E A book http://bit.ly/135dHfs Accessible to (virtually) everyone Virtual machine with turn-key coding templates for data science experiments Think of the book as "premium" support for the OSS project
  • 5. 5 The Social Web Is All the Rage World population: ~7B people Facebook: 1.15B users Twitter: 500M users Google+ 343M users LinkedIn: 238M users ~200M+ blogs (conservative estimate)
  • 6. 6 Overview Intro (5 mins) Module 1 - Virtual Machine & IPython Notebook Overview (10 mins) Module 2 - Twitter Intro/Overview (45 mins) Module 3 - Twitter Firehose Analysis with pandas (45 mins) Module 4 - Overview of other MTSW IPython Notebooks (5 mins) Wrap Up/Final Q&A (10 mins)
  • 7. 7 Workshop Objective To send you away as a social web hacker Hands-on experience hacking on Twitter data Empowered to walk away ready for on Facebook, LinkedIn, Google+, etc. Broad working knowledge popular social web APIs To have fun and learn a few things
  • 8. 8 Just a Few More Things This workshop is... An adaptation of Chapters 1+9 from Mining the Social Web, 2nd Edition More of a guided hacking session where you follow along (vs a lecture) Designed to be very hands-on, not a lecture I'm available 24/7 this week (and beyond) to help you be successful
  • 9. 9 Assumptions At some point in your life, you have Programmed with Python Worked with JSON Made requests and processed responses to/from web servers Or you want to learn to do these things now... And you're a quick learner
  • 10. 10 Module 1: Virtual Machine Setup
  • 11. 11 Why do you need a VM? To save time Because installation and configuration management is harder than it first appears So that you can focus on the task at hand instead So that I can support you regardless of your hardware and operating system
  • 12. 12 But I can do all of that myself... True... If you would rather troubleshoot unexpected installation/configuration issues instead of immediately focusing on the real task at hand At least give it a shot before resorting to your own devices so that you don't have to install specific versions of ~40 Python packages Including scientific computing tools that require underlying C/C++ code to be compiled Which requires specific versions of developer libraries to be installed You get the idea...
  • 13. 13 The Virtual Machine Experience Vagrant A nice abstraction around virtual machine providers One ring to rule them all Virtualbox, VMWare, AWS, ... IPython Notebook The easiest way to program with Python A better REPL (interpreter) Great for hacking
  • 14. 14 What happens when you vagrant up? Vagrant follows the instructions in your Vagrantfile Starts up a Virtualbox instance Uses Chef to provision it Installs OS patches/updates Installs MTSW software dependencies Starts IPython Notebook server on port 8888
  • 15. 15 Why Should I Use IPython Notebook? Because it's great for hacking And hacking is usually the first step Because it's great for collaboration Sharing/publishing results is trivial Because the UX is as easy as working in a notepad Think of it as "executable paper"
  • 16. 16
  • 17. 17
  • 18. 18 VM Quick Start Instructions Go to http://MiningTheSocialWeb.com/quick-start/ Follow the instructions And watch the screencasts! Basically: Install Virtualbox & Vagrant Run "vagrant up" in a terminal to start a guest VM Then, go to http://localhost:8888 on your host machine's web browser
  • 19. 19 What Could Be Easier? A hosted version of the VM! But only for a few hours during this workshop Because it costs money to run these servers Go to http://bit.ly/mtsw-ddtx14 and pick a machine Please do not share the URLs outside of this workshop! With a cherry on top...
  • 20. 20 A Hosted Virtual Machine Is it free? Perhaps... ...Sign-up for the AWS free tier at http://aws.amazon.com/free/ But not right now. Do it later See this blog post for some inspiration on how to easily build your own AMI from Vagrant boxes http://wp.me/p3QiJd-3T
  • 21. 21 One More Thing There's a new alpha product from O'Reilly Media that hosts IPython Notebooks and other software to enhance reading experiences I can share out "invites" with any interested volunteers
  • 22. 22 Module 2: Twitter Intro/Overview
  • 23. 23 Objectives Be able to identify Twitter primitives Understand tweet metadata and how to use it Learn how to extract entities such as user mentions, hashtags, and URLs Apply techniques for performing frequency analysis with Python Be able to plot histograms of Twitter data with IPython Notebook Learn about a Twitter cookbook that you can easily adapt
  • 24. 24 Twitter Primitives Accounts Types: "Anything" "Following" Relationships Favorites Retweets Replies (Almost) No Privacy Controls
  • 25. 25 API Requests RESTful requests Everything is a "resource" You GET, PUT, POST, and DELETE resources Standard HTTP "verbs" Example: GET https://api.twitter.com/1.1/statuses/user_timeline.json? screen_name=SocialWebMining Streaming API filters JSON responses Cursors (not quite pagination)
  • 26. 26 Twitter is an Interest Graph Johnny Araya Roberto Mercedes Rodolfo Hernández Ana Jorge Nina
  • 27. 27 What's in a Tweet? 140 Characters ... ... Plus ~5KB of metadata! Authorship Time & location Tweet "entities" Replying, retweeting, favoriting, etc.
  • 28. 28 What are Tweet Entities? Essentially, the "easy to get at" data in the 140 characters @usermentions #hashtags URLs multiple variations (financial) symbols stock tickers media
  • 29. 29 Data Mining Is Often Just... Counting Comparing Filtering Ranking
  • 30. 30 Histograms A chart that is handy for frequency analysis They look like bar charts...except they're not bar charts Each value on the x-axis is a range (or "bin") of values Not categorical data Each value on the y-axis is the combined frequency of values in each range
  • 32. 32 Social Media Analysis Framework A memorable four step process to guide data science experiments: Aspire To test a hypothesis (answer a question) Acquire Get the data Analyze Count things Summarize Plot the results
  • 33. 33 Exercises Review Python idioms in the "Appendix C (Python Tips & Tricks)" notebook Follow the setup instructions in the "Chapter 1 (Mining Twitter)" notebook Fill in Example 1-1 with credentials and begin work See https://vimeo.com/79220146 for a helpful video Execute each example sequentially Customize queries, explore tweet metadata, count tweet entities, etc. Explore the "Chapter 9 (Twitter Cookbook)" notebook In particular, check out Example 9-8 (Twitter's Streaming API)
  • 34. 34 Module 3: Twitter Firehose Analysis with pandas
  • 35. 35 Objectives To understand how to capture data from Twitter's firehose A understand basic pandas usage for tweets To work through a data science experiment with a systematic 4-step process
  • 36. 36 Social Media Analysis Framework Remember: Aspire Acquire Analyze Summarize
  • 37. 37 Understanding the Reaction Amazon Prime Air Open up the notebook entitled __Understanding the Reaction to Amazon Prime Air.ipynb and follow along Or, visit http://bit.ly/mtsw-amazon-prime-air and follow along if you're just joining us
  • 38. 38 Module 4: Overview of other MTSW IPython Notebooks
  • 39. 39 Mining the Social Web ToC Chapter 1 - Mining Twitter Chapter 2 - Mining Facebook Chapter 3 - Mining LinkedIn Chapter 4 - Mining Google+ Chapter 5 - Mining Web Pages Chapter 6 - Mining Mailboxes Chapter 7 - Mining GitHub Chapter 8 - Mining the Semantically Marked-Up Web Chapter 9 - Twitter Cookbook
  • 40. 40 A Recommendation Bookmark http://nbviewer.ipython.org Take note of Mining the Social Web under "Books" Notice lots of other terrific notebooks, too
  • 41. 41 Wrap Up / Final Q&A
  • 42. 42 Helpful Links & Free Stuff http://MiningTheSocialWeb.com Mining the Social Web 2E Chapter 1 (Chimera) http://bit.ly/13XgNWR Source Code (GitHub) http://bit.ly/MiningTheSocialWeb2E http://bit.ly/1fVf5ej (numbered examples) Screencasts (Vimeo) http://bit.ly/mtsw2e-screencasts