SlideShare a Scribd company logo
1 of 19
Download to read offline
Twitter: The Tweet, the Whole Tweet,
    and Nothing but the Tweet #2



                    charsyam@naver.com
Data Mining
Discover New Knowledge
Discover New Knowledge
from Existing Information
What do #TeaParty and
    #JustinBieber
   have in common
Tools: Pymongo, MongoDB
apt-get install python-dev
pip install pymongo
Get Tweets
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

import tweepy
api = tweepy.API()
tweets = api.search('#JustinBieber', rpp=100)
for tweet in tweets:
   db.foo.save(tweet.__getstate__())
Insert TO MongoDB
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

import tweepy
api = tweepy.API()
for num in range(1,16):
    tweets = api.search('#JustinBieber', rpp=100, page=num)
    for tweet in tweets:
       db.foo.save(tweet.__getstate__())
Count Frequency in mongo

   MAP
map = function(){
   words = this.text.split(' ');
   for ( i in words ){
      emit({ key: words[i] }, {count: 1});
   }
};
Count Frequency in mongo

   REDUCE
reduce = function (key, values) {
   var count = 0;
   values.forEach(function (v) {count += v.count;});
   return {count:count};
}
Count Frequency in mongo

   EXECUTE
res = db.foo.mapReduce( map, reduce, {out: "mystring"});
Count Frequency in mongo

   RESULT
{ "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } }
{ "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
Get From MongoDB
from pymongo.connection import Connection
import sys
import tweepy

connection = Connection("localhost")

db = connection.foo

cursor = db.mystring.find()
for d in cursor:
   print d
What Entities Co-Occur
  Most Often with
  #JustinBieber and
 #TeaParty Tweets?
intersection
import sys
from sets import Set

if __name__=='__main__':
   r1 = open( sys.argv[1] )
   r2 = open( sys.argv[2] )

   s1 = Set()
   s2 = Set()
   for line in r1.readlines():
       key = line.split()
       if( len(key) > 0 ):
           s1.add(key[0])

   for line in r2.readlines():
      key = line.split()
      if( len(key) > 0 ):
          s2.add(key[0])

   s3 = s1.intersection(s2)
   print len(s1)
   print len(s2)
   print len(s3)
On Average, Do #JustinBieber
 or #TeaParty Tweets Have
      More Hashtags?
Which Get Retweeted More
 Often: #JustinBieber or
       #TeaParty?
How Much Overlap Exists
Between the Entities of
    #TeaParty and
 #JustinBieber Tweet?
Thank You!

More Related Content

Viewers also liked

Viewers also liked (20)

Twemproxy flow
Twemproxy flowTwemproxy flow
Twemproxy flow
 
Opensource sw day
Opensource sw dayOpensource sw day
Opensource sw day
 
Redis sentinelinternals deview
Redis sentinelinternals deviewRedis sentinelinternals deview
Redis sentinelinternals deview
 
No sql preview
No sql previewNo sql preview
No sql preview
 
Guid python
Guid pythonGuid python
Guid python
 
Mongo db 잡학상식
Mongo db 잡학상식Mongo db 잡학상식
Mongo db 잡학상식
 
Libtcc and gwan
Libtcc and gwanLibtcc and gwan
Libtcc and gwan
 
Python andselenium
Python andseleniumPython andselenium
Python andselenium
 
To become Open Source Contributor
To become Open Source ContributorTo become Open Source Contributor
To become Open Source Contributor
 
Hide method
Hide methodHide method
Hide method
 
Facebook fql and tweepy
Facebook fql and tweepyFacebook fql and tweepy
Facebook fql and tweepy
 
Libcloud
LibcloudLibcloud
Libcloud
 
Elastic webservice
Elastic webserviceElastic webservice
Elastic webservice
 
Cell architecture
Cell architectureCell architecture
Cell architecture
 
OpenSource Contributor
OpenSource ContributorOpenSource Contributor
OpenSource Contributor
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
 
Facebook f8 seoul
Facebook f8 seoulFacebook f8 seoul
Facebook f8 seoul
 
Twitter6
Twitter6Twitter6
Twitter6
 
50 tips & tricks for mongo db developers
50 tips & tricks for mongo db developers50 tips & tricks for mongo db developers
50 tips & tricks for mongo db developers
 
Libcloud and j clouds
Libcloud and j cloudsLibcloud and j clouds
Libcloud and j clouds
 

Similar to Twitter6

Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Artur Rodrigues
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTKFrancesco Bruni
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFTimur Safin
 
Python postgre sql a wonderful wedding
Python postgre sql   a wonderful weddingPython postgre sql   a wonderful wedding
Python postgre sql a wonderful weddingStéphane Wirtel
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairMark
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joinedennui2342
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMontreal Python
 
Frsa
FrsaFrsa
Frsa_111
 
WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015Fernando Daciuk
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdfsimenehanmut
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Christian Aichinger
 
Using the following code Install Packages pip install .pdf
Using the following code Install Packages   pip install .pdfUsing the following code Install Packages   pip install .pdf
Using the following code Install Packages pip install .pdfpicscamshoppe
 

Similar to Twitter6 (20)

Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook

 
python codes
python codespython codes
python codes
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 
Procesamiento del lenguaje natural con python
Procesamiento del lenguaje natural con pythonProcesamiento del lenguaje natural con python
Procesamiento del lenguaje natural con python
 
ECE-PYTHON.docx
ECE-PYTHON.docxECE-PYTHON.docx
ECE-PYTHON.docx
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTK
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoF
 
Python postgre sql a wonderful wedding
Python postgre sql   a wonderful weddingPython postgre sql   a wonderful wedding
Python postgre sql a wonderful wedding
 
CoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love AffairCoffeeScript - A Rubyist's Love Affair
CoffeeScript - A Rubyist's Love Affair
 
Small pieces loosely joined
Small pieces loosely joinedSmall pieces loosely joined
Small pieces loosely joined
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook game
 
Frsa
FrsaFrsa
Frsa
 
WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015WordPress Realtime - WordCamp São Paulo 2015
WordPress Realtime - WordCamp São Paulo 2015
 
PyLecture2 -NetworkX-
PyLecture2 -NetworkX-PyLecture2 -NetworkX-
PyLecture2 -NetworkX-
 
Python slide
Python slidePython slide
Python slide
 
Progr2
Progr2Progr2
Progr2
 
Advanced Web Technology ass.pdf
Advanced Web Technology ass.pdfAdvanced Web Technology ass.pdf
Advanced Web Technology ass.pdf
 
Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017Parallel Computing With Dask - PyDays 2017
Parallel Computing With Dask - PyDays 2017
 
Using the following code Install Packages pip install .pdf
Using the following code Install Packages   pip install .pdfUsing the following code Install Packages   pip install .pdf
Using the following code Install Packages pip install .pdf
 
Python basic
Python basicPython basic
Python basic
 

More from DaeMyung Kang

How to use redis well
How to use redis wellHow to use redis well
How to use redis wellDaeMyung Kang
 
The easiest consistent hashing
The easiest consistent hashingThe easiest consistent hashing
The easiest consistent hashingDaeMyung Kang
 
How to name a cache key
How to name a cache keyHow to name a cache key
How to name a cache keyDaeMyung Kang
 
Integration between Filebeat and logstash
Integration between Filebeat and logstash Integration between Filebeat and logstash
Integration between Filebeat and logstash DaeMyung Kang
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
Massive service basic
Massive service basicMassive service basic
Massive service basicDaeMyung Kang
 
Data Engineering 101
Data Engineering 101Data Engineering 101
Data Engineering 101DaeMyung Kang
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better EngineerDaeMyung Kang
 
Kafka timestamp offset_final
Kafka timestamp offset_finalKafka timestamp offset_final
Kafka timestamp offset_finalDaeMyung Kang
 
Kafka timestamp offset
Kafka timestamp offsetKafka timestamp offset
Kafka timestamp offsetDaeMyung Kang
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lakeDaeMyung Kang
 
webservice scaling for newbie
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbieDaeMyung Kang
 

More from DaeMyung Kang (20)

Count min sketch
Count min sketchCount min sketch
Count min sketch
 
Redis
RedisRedis
Redis
 
Ansible
AnsibleAnsible
Ansible
 
Why GUID is needed
Why GUID is neededWhy GUID is needed
Why GUID is needed
 
How to use redis well
How to use redis wellHow to use redis well
How to use redis well
 
The easiest consistent hashing
The easiest consistent hashingThe easiest consistent hashing
The easiest consistent hashing
 
How to name a cache key
How to name a cache keyHow to name a cache key
How to name a cache key
 
Integration between Filebeat and logstash
Integration between Filebeat and logstash Integration between Filebeat and logstash
Integration between Filebeat and logstash
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
Massive service basic
Massive service basicMassive service basic
Massive service basic
 
Data Engineering 101
Data Engineering 101Data Engineering 101
Data Engineering 101
 
How To Become Better Engineer
How To Become Better EngineerHow To Become Better Engineer
How To Become Better Engineer
 
Kafka timestamp offset_final
Kafka timestamp offset_finalKafka timestamp offset_final
Kafka timestamp offset_final
 
Kafka timestamp offset
Kafka timestamp offsetKafka timestamp offset
Kafka timestamp offset
 
Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
 
Redis acl
Redis aclRedis acl
Redis acl
 
Coffee store
Coffee storeCoffee store
Coffee store
 
Scalable webservice
Scalable webserviceScalable webservice
Scalable webservice
 
Number system
Number systemNumber system
Number system
 
webservice scaling for newbie
webservice scaling for newbiewebservice scaling for newbie
webservice scaling for newbie
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Twitter6

  • 1. Twitter: The Tweet, the Whole Tweet, and Nothing but the Tweet #2 charsyam@naver.com
  • 4. Discover New Knowledge from Existing Information
  • 5. What do #TeaParty and #JustinBieber have in common
  • 6. Tools: Pymongo, MongoDB apt-get install python-dev pip install pymongo
  • 7. Get Tweets from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() tweets = api.search('#JustinBieber', rpp=100) for tweet in tweets: db.foo.save(tweet.__getstate__())
  • 8. Insert TO MongoDB from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo import tweepy api = tweepy.API() for num in range(1,16): tweets = api.search('#JustinBieber', rpp=100, page=num) for tweet in tweets: db.foo.save(tweet.__getstate__())
  • 9. Count Frequency in mongo MAP map = function(){ words = this.text.split(' '); for ( i in words ){ emit({ key: words[i] }, {count: 1}); } };
  • 10. Count Frequency in mongo REDUCE reduce = function (key, values) { var count = 0; values.forEach(function (v) {count += v.count;}); return {count:count}; }
  • 11. Count Frequency in mongo EXECUTE res = db.foo.mapReduce( map, reduce, {out: "mystring"});
  • 12. Count Frequency in mongo RESULT { "_id" : { "key" : "#1000ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#1000aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500ADay" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#500aday" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#AutoFollow" }, "value" : { "count" : 1 } } { "_id" : { "key" : "#Bieber" }, "value" : { "count" : 1 } }
  • 13. Get From MongoDB from pymongo.connection import Connection import sys import tweepy connection = Connection("localhost") db = connection.foo cursor = db.mystring.find() for d in cursor: print d
  • 14. What Entities Co-Occur Most Often with #JustinBieber and #TeaParty Tweets?
  • 15. intersection import sys from sets import Set if __name__=='__main__': r1 = open( sys.argv[1] ) r2 = open( sys.argv[2] ) s1 = Set() s2 = Set() for line in r1.readlines(): key = line.split() if( len(key) > 0 ): s1.add(key[0]) for line in r2.readlines(): key = line.split() if( len(key) > 0 ): s2.add(key[0]) s3 = s1.intersection(s2) print len(s1) print len(s2) print len(s3)
  • 16. On Average, Do #JustinBieber or #TeaParty Tweets Have More Hashtags?
  • 17. Which Get Retweeted More Often: #JustinBieber or #TeaParty?
  • 18. How Much Overlap Exists Between the Entities of #TeaParty and #JustinBieber Tweet?