SlideShare una empresa de Scribd logo
1 de 19
Lecture @ International Hellenic University
Thessaloniki, 8 May 2014
Social Media Crawling and Mining
Overview of Hands-on Workshop
Symeon (Akis) Papadopoulos, Manos Schinas, Katerina Iliakopoulou,
Yiannis Kompatsiaris
Information Technologies Institute (ITI)
Centre for Research & Technologies Hellas (CERTH)
IHU SocialSensor Seminar – May 2014 CERTH-ITI#2
Stream Manager
Supports search by: 1. Keywords
2. Users
3. Locations
Supports storage to: 1. MongoDB
2. Solr
Supports retrieval 1. Twitter
from: 2. Facebook
3. Google+, etc.
input.conf.xmlinput.conf.xml
streams.conf.xmlstreams.conf.xml
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Streams Manager
#3
How to run :
java –jar StreamsManager.jar stream.conf.xml input.conf.xml
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Items, MediaItems and StreamUsers
#4
Item class
Basic fields:
String id
String title
String[] tags
long publicationTime
String uid
String reference
String referenceUserId
String[] mentions
MediaItem class
Basic fields:
String id
String title
String[] tags
long publicationTime
String uid
String reference
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Items, MediaItems and StreamUsers
#5
StreamUser class
Basic fields:
String id
String username
String url
int items
long followers
long friends
Getters / Setters for each field
IHU SocialSensor Seminar – May 2014 CERTH-ITI
MongoDB – Import Data
#6
mongoimport –h localhost –d Snow14 –c Items –file ../../Items
mongoimport –h localhost –d Snow14 –c MediaItems
–file ../../MediaItems
IHU SocialSensor Seminar – May 2014 CERTH-ITI
MongoDB – Direct Queries
#7
1. Find an Item by its id
db.Items.find({“id” : “Twitter#438612090748416”})
2. Find all Items posted before a certain date
db.Items.find({“publicationTime” : {$lt:1393408367000}})
3. Find a Media Item by its reference
db.MediaItems.find({“reference” : “Twitter#438612090748416”})
4. Find all Users with at least 1000 followers
db.StreamsUsers.find({“followers” : {$gt:1000}})
IHU SocialSensor Seminar – May 2014 CERTH-ITI
MongoDB – Query using DAO classes
#8
1. Create instance of ItemDAO to retrieve item
ItemDAO itemDAO = new ItemDAOImpl(“localhost”, “Snow14”, “Items”)
2. Create instance of MediaItemDAO to retrieve mediaItems
MediaItemDAO mediaItemDAO =
new MediaItemDAOImpl(“localhost”, “Snow14”, “MediaItems”)
3. Create instance of StreamUserDAO to retrieve users
StreamUserDAO userDAO =
new StreamUserDAOImpl(“localhost”, “Snow14”, “StreamUsers”)
IHU SocialSensor Seminar – May 2014 CERTH-ITI
MongoDB – Query using DAO classes
#9
1. Find an Item by its id
ItemDAO.getItem(“Twitter#438612090748416”)
2. Find a Media Item by its reference
List<String> items = new ArrayList<String>;
items.add(“Twitter#438612090748416”);
MediaItemDAO.getMediaItemsForItems(items,image,20);
3. Find 1000 latest Items
ItemDAO.getLatestItems(1000);
IHU SocialSensor Seminar – May 2014 CERTH-ITI
MongoDB – Generic queries & Iteration
#10
Use BasicDBObject class to represent JSON objects
e.g {“id” : “Twitter#1234567”} ->
BasicDBObject query = new BasicDBObject(“id” : “Twitter#1234567”)
List<Item> items = itemDAO.getItems(query);
To iterate:
ItemIterator it = itemDAO.getIterator(query);
Use methods hasNext() and next() to iterate over
the collection of Items.
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Solr – Query using SocialSensor wrappers
#11
1. Create instance of SolrItemHandler to index and retrieve
items
SolrItemHandler itemHandler =
SolrItemHandler.getInstance(
“http://localhost:8080/solr/Items”)
2. Create instance of SolrMediaItemHandler to index and
retrieve mediaItems
SolrMediaItemHandler itemHandler =
SolrMediaItemHandler.getInstance(
“http://localhost:8080/solr/MediaItems”)
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Solr – Use of UI and SocialSensor wrappers
#12
Assignment #1
Index all the items from MongoDB to Solr
Fill the method eu.socialsensor.ihu_workshop.indexItems
Assignment #2
Run the following queries to get relevant Items
Q1 : terror attack Q2 : Crimea Q3 : Bitcoin
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Basic Social Media Analytics
#13
Assignment #1
1. Find the N most frequent hashtag in a list of Items
1. Process one by one all items in the list
2. Create a map of all detected hashtags and their number of
occurrences.
3. Select the hashtag with the highest value.
2.Find the N most frequent terms in a list of Items using
tokenization
3.Find the N most re-tweeted tweets in the dataset
1. Process one by one all items in collection
2. Create a map of the item (item id) and its retweets
3. Select the item with the highest value
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Basic Social Media Analytics
#14
Assignment #1
4. Find N top users based on:
a) Number of posted items
b) Aggregated number of retweets
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Basic Social Media Analytics
#15
Assignment #1
5. Create an activity timeline for the tweets in the dataset
and for the set of original tweets
6. Create the timeline of the tweets that contain a hashtag
(or keyword) of your choice
7. Try to visualize the timelines you have created in the
previous steps.
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Detection of Trending Topics and Events
#16
What is a trending topic?
Keywords, N-grams, Named Entities,
Phrases, which are shared a lot in
social media for a certain period of
time.
Keywords, N-grams, Named Entities,
Phrases, which are shared a lot in
social media for a certain period of
time.
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Detection of Trending Topics and Events
#17
Assignment #2
Feature pivot topic detection by using hashtag
1.Baseline method: Split the data into timeslots of the same
length. Calculate the most frequent hashtags of each timeslot
2.Calculate the most trending hashtags by comparing the
current frequency of a hashtag with the values of the previous
timeslots.
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Detection of Trending Topics and Events
#18
Assignment #2
Document pivot event detection by clustering tweets
Cluster “similar” tweets to create groups of tweets that
represent candidate events.
The similarity between two tweets could be a combination of
similarity measures across different dimensions, e.g textual
similarity, time and space proximity, etc.
IHU SocialSensor Seminar – May 2014 CERTH-ITI
Detection of Trending Topics and Events
#19
Assignment #2
Frequency pivot event detection by clustering tweets
1.Run document-pivot clustering provided by SocialSensor to
create a set of candidate events.
2.For each produced topic find a list of representative hashtags.
3.Try to calculate a measure of “trendiness” of each event.

Más contenido relacionado

Similar a Social media crawling and mining [exercises]

Data extraction tools
Data extraction toolsData extraction tools
Data extraction toolsCristian Ruiz
 
2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emke2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emkeDr Martina Emke
 
Social media for researchers [beginners!] (web version)
Social media for researchers [beginners!] (web version)Social media for researchers [beginners!] (web version)
Social media for researchers [beginners!] (web version)Jamie Bisset
 
Social media for researchers (web version)
Social media for researchers (web version)Social media for researchers (web version)
Social media for researchers (web version)Durham_Library_DTP
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesKrisKasianovitz
 
DRES Work Skills 2020 - New Media Literacy Ryan.ADAMS
DRES Work Skills 2020 - New Media Literacy Ryan.ADAMSDRES Work Skills 2020 - New Media Literacy Ryan.ADAMS
DRES Work Skills 2020 - New Media Literacy Ryan.ADAMSRyan Adams
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...John Domingue
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET Journal
 
Content Exploration Lecture
Content Exploration LectureContent Exploration Lecture
Content Exploration Lectureddougd
 
Teaching with Storify, Diigo and HootSuite
Teaching with Storify, Diigo and HootSuiteTeaching with Storify, Diigo and HootSuite
Teaching with Storify, Diigo and HootSuiteCorinne Weisgerber
 
Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...oiisdp
 
Methodology SISM Social Impact through Social Media
Methodology SISM Social Impact through Social MediaMethodology SISM Social Impact through Social Media
Methodology SISM Social Impact through Social Mediaciberconta
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET Journal
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python37point2
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social networkChanon Hongsirikulkit
 

Similar a Social media crawling and mining [exercises] (20)

F017433947
F017433947F017433947
F017433947
 
Data extraction tools
Data extraction toolsData extraction tools
Data extraction tools
 
Slide
SlideSlide
Slide
 
2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emke2016 09-28 social network analysis with node-xl_emke
2016 09-28 social network analysis with node-xl_emke
 
Social media for researchers [beginners!] (web version)
Social media for researchers [beginners!] (web version)Social media for researchers [beginners!] (web version)
Social media for researchers [beginners!] (web version)
 
Social media for researchers (web version)
Social media for researchers (web version)Social media for researchers (web version)
Social media for researchers (web version)
 
John Domingue. Developing rich interactive e books to teach linked open data ...
John Domingue. Developing rich interactive e books to teach linked open data ...John Domingue. Developing rich interactive e books to teach linked open data ...
John Domingue. Developing rich interactive e books to teach linked open data ...
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker Notes
 
DRES Work Skills 2020 - New Media Literacy Ryan.ADAMS
DRES Work Skills 2020 - New Media Literacy Ryan.ADAMSDRES Work Skills 2020 - New Media Literacy Ryan.ADAMS
DRES Work Skills 2020 - New Media Literacy Ryan.ADAMS
 
Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...Developing rich interactive eBooks to teach linked open data to professionals...
Developing rich interactive eBooks to teach linked open data to professionals...
 
Ire major project
Ire major projectIre major project
Ire major project
 
E017433538
E017433538E017433538
E017433538
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 
Content Exploration Lecture
Content Exploration LectureContent Exploration Lecture
Content Exploration Lecture
 
Teaching with Storify, Diigo and HootSuite
Teaching with Storify, Diigo and HootSuiteTeaching with Storify, Diigo and HootSuite
Teaching with Storify, Diigo and HootSuite
 
Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...
 
Methodology SISM Social Impact through Social Media
Methodology SISM Social Impact through Social MediaMethodology SISM Social Impact through Social Media
Methodology SISM Social Impact through Social Media
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python
 
Stock prediction using social network
Stock prediction using social networkStock prediction using social network
Stock prediction using social network
 

Último

Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCRElite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCRDelhi Call girls
 
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 OnlyVIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 Onlykhanf3647647
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyWSI INTERNET PARTNER
 
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrCall Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrSapana Sha
 
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCR
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCRStunning ➥8448380779▻ Call Girls In Paharganj Delhi NCR
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCRDelhi Call girls
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Nitya salvi
 
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCRElite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCRDelhi Call girls
 
Film show post-production powerpoint for site
Film show post-production powerpoint for siteFilm show post-production powerpoint for site
Film show post-production powerpoint for siteAshtonCains
 
Elite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCRElite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCRDelhi Call girls
 
Film the city investagation powerpoint :)
Film the city investagation powerpoint :)Film the city investagation powerpoint :)
Film the city investagation powerpoint :)AshtonCains
 
Film show pre-production powerpoint for site
Film show pre-production powerpoint for siteFilm show pre-production powerpoint for site
Film show pre-production powerpoint for siteAshtonCains
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfmacawdigitalseo2023
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutioneliklein8
 
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...SocioCosmos
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfeliklein8
 
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceVellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceDamini Dixit
 
Film show investigation powerpoint for the site
Film show investigation powerpoint for the siteFilm show investigation powerpoint for the site
Film show investigation powerpoint for the siteAshtonCains
 
Social media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSocial media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSheikhSaifAli1
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<Health
 

Último (20)

Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCRElite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nizammuddin Delhi NCR
 
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 OnlyVIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
VIP Call Girls Morena 9332606886 Free Home Delivery 5500 Only
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing Company
 
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncrCall Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
Call Girls In Gurgaon Dlf pHACE 2 Women Delhi ncr
 
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCR
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCRStunning ➥8448380779▻ Call Girls In Paharganj Delhi NCR
Stunning ➥8448380779▻ Call Girls In Paharganj Delhi NCR
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
 
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCRElite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
Elite Class ➥8448380779▻ Call Girls In New Friends Colony Delhi NCR
 
Vip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Film show post-production powerpoint for site
Film show post-production powerpoint for siteFilm show post-production powerpoint for site
Film show post-production powerpoint for site
 
Elite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCRElite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCR
Elite Class ➥8448380779▻ Call Girls In Nehru Place Delhi NCR
 
Film the city investagation powerpoint :)
Film the city investagation powerpoint :)Film the city investagation powerpoint :)
Film the city investagation powerpoint :)
 
Film show pre-production powerpoint for site
Film show pre-production powerpoint for siteFilm show pre-production powerpoint for site
Film show pre-production powerpoint for site
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolution
 
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
 
Capstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdfCapstone slidedeck for my capstone project part 2.pdf
Capstone slidedeck for my capstone project part 2.pdf
 
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceVellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Film show investigation powerpoint for the site
Film show investigation powerpoint for the siteFilm show investigation powerpoint for the site
Film show investigation powerpoint for the site
 
Social media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSocial media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketing
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
 

Social media crawling and mining [exercises]

  • 1. Lecture @ International Hellenic University Thessaloniki, 8 May 2014 Social Media Crawling and Mining Overview of Hands-on Workshop Symeon (Akis) Papadopoulos, Manos Schinas, Katerina Iliakopoulou, Yiannis Kompatsiaris Information Technologies Institute (ITI) Centre for Research & Technologies Hellas (CERTH)
  • 2. IHU SocialSensor Seminar – May 2014 CERTH-ITI#2 Stream Manager Supports search by: 1. Keywords 2. Users 3. Locations Supports storage to: 1. MongoDB 2. Solr Supports retrieval 1. Twitter from: 2. Facebook 3. Google+, etc. input.conf.xmlinput.conf.xml streams.conf.xmlstreams.conf.xml
  • 3. IHU SocialSensor Seminar – May 2014 CERTH-ITI Streams Manager #3 How to run : java –jar StreamsManager.jar stream.conf.xml input.conf.xml
  • 4. IHU SocialSensor Seminar – May 2014 CERTH-ITI Items, MediaItems and StreamUsers #4 Item class Basic fields: String id String title String[] tags long publicationTime String uid String reference String referenceUserId String[] mentions MediaItem class Basic fields: String id String title String[] tags long publicationTime String uid String reference
  • 5. IHU SocialSensor Seminar – May 2014 CERTH-ITI Items, MediaItems and StreamUsers #5 StreamUser class Basic fields: String id String username String url int items long followers long friends Getters / Setters for each field
  • 6. IHU SocialSensor Seminar – May 2014 CERTH-ITI MongoDB – Import Data #6 mongoimport –h localhost –d Snow14 –c Items –file ../../Items mongoimport –h localhost –d Snow14 –c MediaItems –file ../../MediaItems
  • 7. IHU SocialSensor Seminar – May 2014 CERTH-ITI MongoDB – Direct Queries #7 1. Find an Item by its id db.Items.find({“id” : “Twitter#438612090748416”}) 2. Find all Items posted before a certain date db.Items.find({“publicationTime” : {$lt:1393408367000}}) 3. Find a Media Item by its reference db.MediaItems.find({“reference” : “Twitter#438612090748416”}) 4. Find all Users with at least 1000 followers db.StreamsUsers.find({“followers” : {$gt:1000}})
  • 8. IHU SocialSensor Seminar – May 2014 CERTH-ITI MongoDB – Query using DAO classes #8 1. Create instance of ItemDAO to retrieve item ItemDAO itemDAO = new ItemDAOImpl(“localhost”, “Snow14”, “Items”) 2. Create instance of MediaItemDAO to retrieve mediaItems MediaItemDAO mediaItemDAO = new MediaItemDAOImpl(“localhost”, “Snow14”, “MediaItems”) 3. Create instance of StreamUserDAO to retrieve users StreamUserDAO userDAO = new StreamUserDAOImpl(“localhost”, “Snow14”, “StreamUsers”)
  • 9. IHU SocialSensor Seminar – May 2014 CERTH-ITI MongoDB – Query using DAO classes #9 1. Find an Item by its id ItemDAO.getItem(“Twitter#438612090748416”) 2. Find a Media Item by its reference List<String> items = new ArrayList<String>; items.add(“Twitter#438612090748416”); MediaItemDAO.getMediaItemsForItems(items,image,20); 3. Find 1000 latest Items ItemDAO.getLatestItems(1000);
  • 10. IHU SocialSensor Seminar – May 2014 CERTH-ITI MongoDB – Generic queries & Iteration #10 Use BasicDBObject class to represent JSON objects e.g {“id” : “Twitter#1234567”} -> BasicDBObject query = new BasicDBObject(“id” : “Twitter#1234567”) List<Item> items = itemDAO.getItems(query); To iterate: ItemIterator it = itemDAO.getIterator(query); Use methods hasNext() and next() to iterate over the collection of Items.
  • 11. IHU SocialSensor Seminar – May 2014 CERTH-ITI Solr – Query using SocialSensor wrappers #11 1. Create instance of SolrItemHandler to index and retrieve items SolrItemHandler itemHandler = SolrItemHandler.getInstance( “http://localhost:8080/solr/Items”) 2. Create instance of SolrMediaItemHandler to index and retrieve mediaItems SolrMediaItemHandler itemHandler = SolrMediaItemHandler.getInstance( “http://localhost:8080/solr/MediaItems”)
  • 12. IHU SocialSensor Seminar – May 2014 CERTH-ITI Solr – Use of UI and SocialSensor wrappers #12 Assignment #1 Index all the items from MongoDB to Solr Fill the method eu.socialsensor.ihu_workshop.indexItems Assignment #2 Run the following queries to get relevant Items Q1 : terror attack Q2 : Crimea Q3 : Bitcoin
  • 13. IHU SocialSensor Seminar – May 2014 CERTH-ITI Basic Social Media Analytics #13 Assignment #1 1. Find the N most frequent hashtag in a list of Items 1. Process one by one all items in the list 2. Create a map of all detected hashtags and their number of occurrences. 3. Select the hashtag with the highest value. 2.Find the N most frequent terms in a list of Items using tokenization 3.Find the N most re-tweeted tweets in the dataset 1. Process one by one all items in collection 2. Create a map of the item (item id) and its retweets 3. Select the item with the highest value
  • 14. IHU SocialSensor Seminar – May 2014 CERTH-ITI Basic Social Media Analytics #14 Assignment #1 4. Find N top users based on: a) Number of posted items b) Aggregated number of retweets
  • 15. IHU SocialSensor Seminar – May 2014 CERTH-ITI Basic Social Media Analytics #15 Assignment #1 5. Create an activity timeline for the tweets in the dataset and for the set of original tweets 6. Create the timeline of the tweets that contain a hashtag (or keyword) of your choice 7. Try to visualize the timelines you have created in the previous steps.
  • 16. IHU SocialSensor Seminar – May 2014 CERTH-ITI Detection of Trending Topics and Events #16 What is a trending topic? Keywords, N-grams, Named Entities, Phrases, which are shared a lot in social media for a certain period of time. Keywords, N-grams, Named Entities, Phrases, which are shared a lot in social media for a certain period of time.
  • 17. IHU SocialSensor Seminar – May 2014 CERTH-ITI Detection of Trending Topics and Events #17 Assignment #2 Feature pivot topic detection by using hashtag 1.Baseline method: Split the data into timeslots of the same length. Calculate the most frequent hashtags of each timeslot 2.Calculate the most trending hashtags by comparing the current frequency of a hashtag with the values of the previous timeslots.
  • 18. IHU SocialSensor Seminar – May 2014 CERTH-ITI Detection of Trending Topics and Events #18 Assignment #2 Document pivot event detection by clustering tweets Cluster “similar” tweets to create groups of tweets that represent candidate events. The similarity between two tweets could be a combination of similarity measures across different dimensions, e.g textual similarity, time and space proximity, etc.
  • 19. IHU SocialSensor Seminar – May 2014 CERTH-ITI Detection of Trending Topics and Events #19 Assignment #2 Frequency pivot event detection by clustering tweets 1.Run document-pivot clustering provided by SocialSensor to create a set of candidate events. 2.For each produced topic find a list of representative hashtags. 3.Try to calculate a measure of “trendiness” of each event.