SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
mining	
  the	
  social	
  web	
  
Aris2des	
  Gionis	
  
Michael	
  Mathioudakis	
  
firstname.lastname@aalto.fi	
  
	
  
	
  
Aalto	
  University	
  
Spring	
  2015	
  
social	
  web	
  
	
  
	
  
facebook	
  twiEer	
  linkedin	
  
foursquare	
  flickr	
  instagram	
  
pinterest	
  youtube	
  ustream	
  
github	
  stackoverflow	
  wikipedia	
  
	
  
2	
  
social	
  web	
  
	
  
websites	
  and	
  plaHorms	
  that	
  enable	
  users	
  to	
  
produce	
  content	
  
blog	
  posts,	
  ‘status’	
  messages,	
  videos,	
  pictures,	
  podcasts	
  
consume	
  content	
  
read	
  text	
  -­‐	
  blog	
  posts,	
  ‘status’	
  messages	
  
listen	
  to	
  podcasts,	
  watch	
  videos	
  
interact	
  with	
  each	
  other	
  
comment	
  on	
  each	
  other’s	
  posts,	
  ‘like’	
  or	
  rate	
  items	
  
3	
  
mining	
  the	
  social	
  web	
  
a	
  lot	
  of	
  users...	
  a	
  lot	
  of	
  data...	
  
what	
  could	
  we	
  learn*?	
  
*	
  assuming	
  we	
  have	
  the	
  data	
  -­‐	
  more	
  on	
  that	
  later	
  
	
  
gain	
  insights	
  into...	
  
social	
  behavior	
  
how	
  many	
  connec2ons	
  does	
  an	
  average	
  person	
  have?	
  
do	
  people	
  connect	
  with	
  like-­‐minded	
  people?	
  
poli2cal	
  sen2ment	
  
what	
  do	
  people	
  think	
  about	
  current	
  poli2cal	
  issues?	
  
how	
  we	
  experience	
  our	
  ci2es	
  
what’s	
  the	
  best	
  neighborhood	
  for	
  food/nightlife?	
  
how	
  we	
  build	
  our	
  careers	
  
how	
  oRen	
  do	
  people	
  change	
  careers?	
  
how	
  beneficial	
  is	
  it	
  to	
  ‘network’	
  professionally?	
  
other?	
  
4	
  
mining	
  the	
  social	
  web	
  
	
  
there	
  is	
  already	
  research	
  that	
  
explores	
  those	
  ques2ons	
  
	
  
we	
  will	
  discuss	
  some	
  of	
  it	
  
now	
  and	
  in	
  the	
  next	
  two	
  lectures	
  
5	
  
twiEer	
  
•  a	
  social	
  sensor	
  
– social	
  network	
  +	
  news	
  media	
  
– what	
  is	
  happening?	
  
– where,	
  who?	
  happening?	
  
– trends	
  
– events	
  
– opinions	
  
– poli2cal	
  views	
  
– sen2ments	
  
– demographics	
  
6	
  
twiEer	
  studies	
  
•  finding	
  news	
  events	
  and	
  stories	
  
•  detec2ng	
  trends	
  
•  predic2ng	
  consumer	
  behavior	
  
•  predic2ng	
  stock	
  market(!)	
  
•  disaster	
  response	
  
•  rumor	
  analysis	
  and	
  credibility	
  assessment	
  
•  influence	
  analysis	
  
•  poli2cal	
  analysis	
  
–  polariza2on,	
  bias	
  of	
  news	
  media	
  
•  sociology	
  studies	
  
–  sen2ment	
  vs.	
  demographics,	
  gender	
  inequality	
  
	
   7	
  
•  photo	
  sharing	
  +	
  social	
  network	
  
•  photos	
  contain	
  addi2onal	
  informa2on	
  
– tags	
  
– geoloca2on	
  
– comments,	
  favorites	
  
– assigned	
  to	
  groups	
  
8	
  
9	
  
Eric	
  Fischer	
   10	
  
recommend	
  tourist	
  i2neraries	
  
11	
  
foursquare	
  
•  loca2on-­‐based	
  social	
  network	
  
•  users	
  check-­‐in	
  to	
  different	
  loca2ons	
  
•  loca2ons	
  have	
  types	
  (hierarchy)	
  
– restaurant,	
  sport	
  venue,	
  museum,	
  college,	
  …	
  	
  	
  
•  ques2ons:	
  
– where	
  do	
  people	
  hang	
  out?	
  
– where	
  events	
  take	
  place?	
  
– do	
  friends	
  influence	
  each	
  other?	
  
12	
  
when/where	
  people	
  check	
  in?	
  . exploration 
0 5 10 15 20
New-York
London
Barcelona
Helsinki
Total
(a) Hourly check-ins frequency during the day. The activity is at its lowest
around  a.m. and after that, there are three peaks: one when people
go to work in the morning, one in the middle of the day and the last
one at the end of the evening. Yet, depending of the city, these peaks
do not happen at the same time, nor with the same intensity. Therefore,
instead of working directly the raw values of features, we use the number
of standard deviation or z-score.
– – – – – – – –
10
20
hour
perce
– – – – – –
10
20
30
40
50
60
hour
percentage
 hours time clusters in Paris
Figure : Venues clustered by time of check-ins.
13	
  
when/where	
  people	
  check	
  in?	
   datasets
City Name Category Entropy
Barcelona
Castellers de Barcelona Non-Profit 0.0139
Café de la Pompeu Café 0.0172
Ràdio  Radio Station 0.0176
Paris
Boutique Orange Electronics Store 0.0099
Métro Goncourt [] Subway 0.0105
Blue Acacia Office 0.0112
Barcelona
Plaça de Catalunya Plaza 0.5835
Sants Estació Train Station 0.6298
Sagrada Família Government Building 0.6309
Camp Nou Stadium 0.6852
Paris
Gare SNCF : Gare de Lyon Train Station 0.6725
Gare SNCF : Paris Nord Train Station 0.6911
Musée du Louvre Museum 0.6924
Tour Eiffel Government Building 0.7167
(a) Venues in Paris and Barcelona with lowest and highest user en-
tropy.
14	
  
data	
  sources	
  less	
  obvious	
  
traffic	
  sensors	
  
15	
  
detec2ng	
  events	
  with	
  traffic	
  sensors	
  
16	
  
project	
  ideas	
  less	
  obvious	
  
17	
  
your	
  project	
  
come	
  up	
  with	
  a	
  project	
  idea	
  
implement	
  it!	
  
report	
  on	
  your	
  results	
  and	
  findings	
  
18	
  
types	
  of	
  projects	
  
•  form	
  a	
  hypothesis	
  and	
  set	
  out	
  to	
  test	
  it	
  
–  are	
  rich	
  people	
  happier?	
  
•  start	
  with	
  an	
  interes2ng	
  ques2on	
  
–  which	
  are	
  hipster	
  neighborhoods	
  in	
  my	
  city?	
  
•  start	
  with	
  a	
  business	
  idea	
  
–  recommend	
  relevant	
  music	
  to	
  music	
  listeners	
  
–  recommend	
  clothes	
  to	
  music	
  listeners	
  
•  start	
  with	
  a	
  problem	
  that	
  you	
  (think)	
  can	
  solve	
  	
  
–  how	
  to	
  iden2fy	
  trends	
  in	
  space	
  and	
  2me?	
  
•  start	
  with	
  a	
  cool	
  dataset	
  and	
  explore	
  it	
  
19	
  
your	
  project	
  
analyze	
  data	
  
set	
  a	
  goal	
  for	
  your	
  project	
  
(what’s	
  the	
  ques2on	
  you	
  want	
  to	
  answer)	
  
study	
  related	
  literature	
  
(what	
  has	
  /	
  hasn’t	
  been	
  done	
  already?	
  
or	
  you	
  think	
  you	
  can	
  do	
  it	
  beEer)	
  
collect	
  data	
  
(some	
  data	
  are	
  more	
  difficult	
  to	
  come	
  by)	
  
results	
  
evalua2on	
  
(have	
  you	
  answered	
  the	
  ques2on	
  
asked	
  originally?	
  possible	
  improvements?	
  
future	
  work?)	
  
1	
   2	
  
3	
  
4	
  
5	
  
6	
  
20	
  
coming	
  up	
  with	
  a	
  project	
  idea	
  
•  conferences:	
  	
  
SIGKDD,	
  ICWSM,	
  WWW,	
  WSDM	
  
•  themes	
  
–  urban	
  compu2ng,	
  trend	
  /	
  event	
  detec2on,	
  social	
  
networks,	
  poli2cal	
  sen2ment,	
  privacy	
  
–  other	
  
•  google	
  scholar	
  
•  talk	
  with	
  us	
  
office	
  hours:	
  Mon,	
  14:15-­‐15:30	
  	
  
and	
  by	
  appointment	
  
21	
  
collec2ng	
  the	
  data	
  
•  what	
  data	
  are	
  available?	
  
–  different	
  plaHorms	
  share	
  different	
  data	
  about	
  their	
  users’	
  ac2vity	
  
–  browse	
  dev	
  sites	
  of	
  social	
  networks	
  	
  find	
  out	
  about	
  privacy	
  policies	
  
and	
  APIs	
  
–  browse	
  public	
  data	
  repositories	
  
–  the	
  data	
  mining	
  group	
  has	
  data	
  for	
  
blog	
  posts,	
  twiEer,	
  google+,	
  facebook,	
  foursquare	
  
	
  
•  code	
  
Mining	
  the	
  Social	
  Web	
  (github)	
  
hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐
Web-­‐2nd-­‐Edi2on	
  
22	
  
schedule	
  
•  Today:	
  overview	
  
•  February	
  2nd	
  :	
  discuss	
  literature	
  (Aris)	
  
•  February	
  9th	
  :	
  discuss	
  literature	
  (Michael)	
  
•  February	
  16th	
  	
  23rd:	
  present	
  project	
  proposals	
  
•  March	
  30th	
  :	
  students	
  submit	
  progress	
  report	
  
•  March	
  30th	
  	
  April	
  6th:	
  intermediate	
  presenta2ons	
  
•  May	
  4th	
  	
  May	
  11th	
  :	
  final	
  presenta2ons	
  
•  May	
  15th	
  :	
  final	
  report	
  due	
  
23	
  
final	
  report	
  
•  introduc2on	
  
•  related	
  work	
  
•  problem	
  statement	
  
•  proposed	
  technique	
  (algorithms)	
  
•  data	
  descrip2on	
  
•  empirical	
  evalua2on	
  	
  
–  results	
  
–  comparison	
  with	
  state	
  of	
  the	
  art	
  
•  future	
  work	
  
24	
  
grading	
  
•  originality	
  (has	
  it	
  been	
  done	
  before)	
  
•  poten2al	
  impact	
  (how	
  interes2ng	
  it	
  is	
  	
  why)	
  
•  rigorousness	
  of	
  proposed	
  technique	
  
•  reproducibility	
  (public	
  code)	
  
•  presenta2on	
  
•  teams	
  of	
  2	
  are	
  encouraged	
  
•  presenta2ons	
  	
  reports	
  are	
  required	
  
•  surveys	
  of	
  exis2ng	
  techniques	
  are	
  ok,	
  too	
  
25	
  
schedule	
  
•  Today:	
  overview	
  
•  February	
  2nd	
  :	
  discuss	
  literature	
  (Aris)	
  
•  February	
  9th	
  :	
  discuss	
  literature	
  (Michael)	
  
•  February	
  16th	
  and	
  23rd:	
  students	
  present	
  project	
  
proposals	
  
•  March	
  30th	
  :	
  students	
  submit	
  progress	
  report	
  
•  March	
  30th	
  	
  April	
  6th:	
  intermediate	
  presenta2ons	
  
•  May	
  4th	
  	
  May	
  11th	
  :	
  final	
  presenta2ons	
  
•  May	
  15th	
  :	
  final	
  report	
  due	
  
26	
  
un2l	
  then...	
  
browse	
  literature	
  
see	
  papers	
  posted	
  on	
  noppa	
  for	
  a	
  sample	
  
conferences	
  KDD,	
  ICWSM,	
  WWW,	
  WSDM	
  	
  
google	
  scholar	
  
dev	
  websites,	
  
for	
  example...	
  
hEps://dev.twiEer.com,	
  hEps://developers.facebook.com,	
  
hEps://developer.github.com/,	
  hEps://developer.foursquare.com	
  
code	
  samples,	
  
hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on	
  
data	
  repositories,	
  
hEp://snap.stanford.edu/,	
  hEp://icwsm.org/2013/datasets/datasets/,	
  
hEp://wadam-­‐data.dis.uniroma1.it	
  
and	
  talk	
  to	
  us!	
   27	
  
see	
  you	
  next	
  week!	
  
	
  
Aris2des	
  Gionis	
  
Michael	
  Mathioudakis	
  
contact:	
  firstname.lastname@aalto.fi	
  
	
  
	
  
Office	
  Hours:	
  Mon,	
  14:15-­‐15:30	
  	
  
and	
  by	
  appointment	
  
28	
  

Más contenido relacionado

La actualidad más candente

2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media snaMarc Smith
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part ITHomas Plotkowiak
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesThe Open University
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Mediasuresh sood
 
Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lora Aroyo
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL OverviewMarc Smith
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Denis Parra Santander
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social MediaMarco Brambilla
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...Marc Smith
 
From smart meters to smart behaviour
From smart meters to smart behaviourFrom smart meters to smart behaviour
From smart meters to smart behaviourThe Open University
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smithMarc Smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network AnalysisMarc Smith
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network AnalysisRory Sie
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...Fabien Gandon
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 

La actualidad más candente (20)

Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna2015 pdf-marc smith-node xl-social media sna
2015 pdf-marc smith-node xl-social media sna
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Monitoring and Analysis of Online Communities
Monitoring and Analysis of Online CommunitiesMonitoring and Analysis of Online Communities
Monitoring and Analysis of Online Communities
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
 
Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
 
From smart meters to smart behaviour
From smart meters to smart behaviourFrom smart meters to smart behaviour
From smart meters to smart behaviour
 
Roles In Networks
Roles In NetworksRoles In Networks
Roles In Networks
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 

Similar a Mining Social Media Data for Insights

Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportFabien Gandon
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitiesCUbRIK Project
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Lauri Eloranta
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenMaxKemman
 
Malina aug 24 ash steam 2020
Malina aug 24  ash steam 2020Malina aug 24  ash steam 2020
Malina aug 24 ash steam 2020roger malina
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapAxel Bruns
 
24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To PublisherJane Finnis
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at WorkTarek Hoteit
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Fabien Gandon
 

Similar a Mining Social Media Data for Insights (20)

Fail ir16 intro
Fail ir16 introFail ir16 intro
Fail ir16 intro
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1ESWC 2014 Tutorial part 1
ESWC 2014 Tutorial part 1
 
eLeader Conference Milan 2014
eLeader Conference Milan 2014eLeader Conference Milan 2014
eLeader Conference Milan 2014
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
 
World CTCUS2012 Scoopit Cytomics
World CTCUS2012 Scoopit CytomicsWorld CTCUS2012 Scoopit Cytomics
World CTCUS2012 Scoopit Cytomics
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Malina aug 24 ash steam 2020
Malina aug 24  ash steam 2020Malina aug 24  ash steam 2020
Malina aug 24 ash steam 2020
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher24 Hour Museum - Portal To Publisher
24 Hour Museum - Portal To Publisher
 
World ctc2013scoopitcytomics
World ctc2013scoopitcytomicsWorld ctc2013scoopitcytomics
World ctc2013scoopitcytomics
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at Work
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 

Más de Michael Mathioudakis

Measuring polarization on social media
Measuring polarization on social mediaMeasuring polarization on social media
Measuring polarization on social mediaMichael Mathioudakis
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systemsMichael Mathioudakis
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systemsMichael Mathioudakis
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Michael Mathioudakis
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Michael Mathioudakis
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Michael Mathioudakis
 
Bump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationBump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationMichael Mathioudakis
 

Más de Michael Mathioudakis (9)

Measuring polarization on social media
Measuring polarization on social mediaMeasuring polarization on social media
Measuring polarization on social media
 
Lecture 07 - CS-5040 - modern database systems
Lecture 07 -  CS-5040 - modern database systemsLecture 07 -  CS-5040 - modern database systems
Lecture 07 - CS-5040 - modern database systems
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
 
Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02Modern Database Systems - Lecture 02
Modern Database Systems - Lecture 02
 
Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01Modern Database Systems - Lecture 01
Modern Database Systems - Lecture 01
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020Mining the Social Web - Lecture 3 - T61.6020
Mining the Social Web - Lecture 3 - T61.6020
 
Absorbing Random Walk Centrality
Absorbing Random Walk CentralityAbsorbing Random Walk Centrality
Absorbing Random Walk Centrality
 
Bump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationBump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentation
 

Último

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 

Último (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 

Mining Social Media Data for Insights

  • 1. mining  the  social  web   Aris2des  Gionis   Michael  Mathioudakis   firstname.lastname@aalto.fi       Aalto  University   Spring  2015  
  • 2. social  web       facebook  twiEer  linkedin   foursquare  flickr  instagram   pinterest  youtube  ustream   github  stackoverflow  wikipedia     2  
  • 3. social  web     websites  and  plaHorms  that  enable  users  to   produce  content   blog  posts,  ‘status’  messages,  videos,  pictures,  podcasts   consume  content   read  text  -­‐  blog  posts,  ‘status’  messages   listen  to  podcasts,  watch  videos   interact  with  each  other   comment  on  each  other’s  posts,  ‘like’  or  rate  items   3  
  • 4. mining  the  social  web   a  lot  of  users...  a  lot  of  data...   what  could  we  learn*?   *  assuming  we  have  the  data  -­‐  more  on  that  later     gain  insights  into...   social  behavior   how  many  connec2ons  does  an  average  person  have?   do  people  connect  with  like-­‐minded  people?   poli2cal  sen2ment   what  do  people  think  about  current  poli2cal  issues?   how  we  experience  our  ci2es   what’s  the  best  neighborhood  for  food/nightlife?   how  we  build  our  careers   how  oRen  do  people  change  careers?   how  beneficial  is  it  to  ‘network’  professionally?   other?   4  
  • 5. mining  the  social  web     there  is  already  research  that   explores  those  ques2ons     we  will  discuss  some  of  it   now  and  in  the  next  two  lectures   5  
  • 6. twiEer   •  a  social  sensor   – social  network  +  news  media   – what  is  happening?   – where,  who?  happening?   – trends   – events   – opinions   – poli2cal  views   – sen2ments   – demographics   6  
  • 7. twiEer  studies   •  finding  news  events  and  stories   •  detec2ng  trends   •  predic2ng  consumer  behavior   •  predic2ng  stock  market(!)   •  disaster  response   •  rumor  analysis  and  credibility  assessment   •  influence  analysis   •  poli2cal  analysis   –  polariza2on,  bias  of  news  media   •  sociology  studies   –  sen2ment  vs.  demographics,  gender  inequality     7  
  • 8. •  photo  sharing  +  social  network   •  photos  contain  addi2onal  informa2on   – tags   – geoloca2on   – comments,  favorites   – assigned  to  groups   8  
  • 12. foursquare   •  loca2on-­‐based  social  network   •  users  check-­‐in  to  different  loca2ons   •  loca2ons  have  types  (hierarchy)   – restaurant,  sport  venue,  museum,  college,  …       •  ques2ons:   – where  do  people  hang  out?   – where  events  take  place?   – do  friends  influence  each  other?   12  
  • 13. when/where  people  check  in?  . exploration 0 5 10 15 20 New-York London Barcelona Helsinki Total (a) Hourly check-ins frequency during the day. The activity is at its lowest around a.m. and after that, there are three peaks: one when people go to work in the morning, one in the middle of the day and the last one at the end of the evening. Yet, depending of the city, these peaks do not happen at the same time, nor with the same intensity. Therefore, instead of working directly the raw values of features, we use the number of standard deviation or z-score. – – – – – – – – 10 20 hour perce – – – – – – 10 20 30 40 50 60 hour percentage hours time clusters in Paris Figure : Venues clustered by time of check-ins. 13  
  • 14. when/where  people  check  in?   datasets City Name Category Entropy Barcelona Castellers de Barcelona Non-Profit 0.0139 Café de la Pompeu Café 0.0172 Ràdio Radio Station 0.0176 Paris Boutique Orange Electronics Store 0.0099 Métro Goncourt [] Subway 0.0105 Blue Acacia Office 0.0112 Barcelona Plaça de Catalunya Plaza 0.5835 Sants Estació Train Station 0.6298 Sagrada Família Government Building 0.6309 Camp Nou Stadium 0.6852 Paris Gare SNCF : Gare de Lyon Train Station 0.6725 Gare SNCF : Paris Nord Train Station 0.6911 Musée du Louvre Museum 0.6924 Tour Eiffel Government Building 0.7167 (a) Venues in Paris and Barcelona with lowest and highest user en- tropy. 14  
  • 15. data  sources  less  obvious   traffic  sensors   15  
  • 16. detec2ng  events  with  traffic  sensors   16  
  • 17. project  ideas  less  obvious   17  
  • 18. your  project   come  up  with  a  project  idea   implement  it!   report  on  your  results  and  findings   18  
  • 19. types  of  projects   •  form  a  hypothesis  and  set  out  to  test  it   –  are  rich  people  happier?   •  start  with  an  interes2ng  ques2on   –  which  are  hipster  neighborhoods  in  my  city?   •  start  with  a  business  idea   –  recommend  relevant  music  to  music  listeners   –  recommend  clothes  to  music  listeners   •  start  with  a  problem  that  you  (think)  can  solve     –  how  to  iden2fy  trends  in  space  and  2me?   •  start  with  a  cool  dataset  and  explore  it   19  
  • 20. your  project   analyze  data   set  a  goal  for  your  project   (what’s  the  ques2on  you  want  to  answer)   study  related  literature   (what  has  /  hasn’t  been  done  already?   or  you  think  you  can  do  it  beEer)   collect  data   (some  data  are  more  difficult  to  come  by)   results   evalua2on   (have  you  answered  the  ques2on   asked  originally?  possible  improvements?   future  work?)   1   2   3   4   5   6   20  
  • 21. coming  up  with  a  project  idea   •  conferences:     SIGKDD,  ICWSM,  WWW,  WSDM   •  themes   –  urban  compu2ng,  trend  /  event  detec2on,  social   networks,  poli2cal  sen2ment,  privacy   –  other   •  google  scholar   •  talk  with  us   office  hours:  Mon,  14:15-­‐15:30     and  by  appointment   21  
  • 22. collec2ng  the  data   •  what  data  are  available?   –  different  plaHorms  share  different  data  about  their  users’  ac2vity   –  browse  dev  sites  of  social  networks    find  out  about  privacy  policies   and  APIs   –  browse  public  data  repositories   –  the  data  mining  group  has  data  for   blog  posts,  twiEer,  google+,  facebook,  foursquare     •  code   Mining  the  Social  Web  (github)   hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐ Web-­‐2nd-­‐Edi2on   22  
  • 23. schedule   •  Today:  overview   •  February  2nd  :  discuss  literature  (Aris)   •  February  9th  :  discuss  literature  (Michael)   •  February  16th    23rd:  present  project  proposals   •  March  30th  :  students  submit  progress  report   •  March  30th    April  6th:  intermediate  presenta2ons   •  May  4th    May  11th  :  final  presenta2ons   •  May  15th  :  final  report  due   23  
  • 24. final  report   •  introduc2on   •  related  work   •  problem  statement   •  proposed  technique  (algorithms)   •  data  descrip2on   •  empirical  evalua2on     –  results   –  comparison  with  state  of  the  art   •  future  work   24  
  • 25. grading   •  originality  (has  it  been  done  before)   •  poten2al  impact  (how  interes2ng  it  is    why)   •  rigorousness  of  proposed  technique   •  reproducibility  (public  code)   •  presenta2on   •  teams  of  2  are  encouraged   •  presenta2ons    reports  are  required   •  surveys  of  exis2ng  techniques  are  ok,  too   25  
  • 26. schedule   •  Today:  overview   •  February  2nd  :  discuss  literature  (Aris)   •  February  9th  :  discuss  literature  (Michael)   •  February  16th  and  23rd:  students  present  project   proposals   •  March  30th  :  students  submit  progress  report   •  March  30th    April  6th:  intermediate  presenta2ons   •  May  4th    May  11th  :  final  presenta2ons   •  May  15th  :  final  report  due   26  
  • 27. un2l  then...   browse  literature   see  papers  posted  on  noppa  for  a  sample   conferences  KDD,  ICWSM,  WWW,  WSDM     google  scholar   dev  websites,   for  example...   hEps://dev.twiEer.com,  hEps://developers.facebook.com,   hEps://developer.github.com/,  hEps://developer.foursquare.com   code  samples,   hEps://github.com/ptwobrussell/Mining-­‐the-­‐Social-­‐Web-­‐2nd-­‐Edi2on   data  repositories,   hEp://snap.stanford.edu/,  hEp://icwsm.org/2013/datasets/datasets/,   hEp://wadam-­‐data.dis.uniroma1.it   and  talk  to  us!   27  
  • 28. see  you  next  week!     Aris2des  Gionis   Michael  Mathioudakis   contact:  firstname.lastname@aalto.fi       Office  Hours:  Mon,  14:15-­‐15:30     and  by  appointment   28