SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Towards Context-Aware Search and Analysis
                   on
           Social Media Data
                Leon Derczynski
                 Bin Yang 杨彬
               Christian S. Jensen
Evolution of communication

Functional utterances

Vowels

Velar closure: consonants

Speech

New modality: writing
                                Increased
Digital text
                                 machine-



                            ?
E-mail                           readable
Social media
                                information
Social Media = Big Data
Gartner ''3V'' definition:

1.Volume

2.Velocity

3.Variety

High volume & velocity of messages:

   Twitter has     ~20 000 000 users per month
   They write     ~500 000 000 messages per day

Massive variety:
  Stock markets;
  Earthquakes;
  Social arrangements;
  … Bieber
What is machine-readable now?
Messages now contain

-   not only linguistic content

-   but also:
       Links (e.g. URI)
       Topic markers (e.g. hashtags)
       Meta-information

What kind of meta-information?

    User profile (including home location)
    Images
    Messages replied to
    Message language

    Time of message
    Location of message
What resources do we have now?


Large, content-rich, linked, digital streams of human communication

We transfer knowledge via communication

Sampling communication gives a sample of human knowledge


          ''You've only done that which you can communicate''


The metadata (time – place – imagery) gives a richer resource:


      → A sampling of human behaviour
What can we do with this resource?
Context increases the data's richness

Increased richness enables novel applications

Time and Place are interesting parts of message context




1.What kinds of applications are there?

2.What are the practical challenges?
Temporal Context
Messages have timestamps:




                                    +
Two temporal retrieval scenarios:

      1. Historical analyses

      2. Emerging data
Historical search
Ability to retrieve from archives: Longitudinal query mode 0

Retrieve information on:

      ●   Lifecycle of socially connected groups

      ●   Analyse precursors to events, post-hoc




                       2008                                                      2011

0. Weikum et al. 2011: ''Longitudinal analytics on web archive data: It’s about time'', Proc. CIDR
Historical search
Retrospective analyses into cause and effect




                                     ''There's a dead crow
                                         in my garden''



Social media mentions of dead crows predict WNV in humans 1




1. Sugumaran & Voss 2012: ''Real-time spatio-temporal analysis of West Nile Virus using Twitter Data'', Proc.
Int'l conference on Computing for Geospatial Research and Applications
Emerging search
Data emerging at high velocity:

      185 000 documents per minute

Gives a high temporal density




Search over this info enables:

      ●   Live coverage of events

      ●
          Realtime identification of emerging events 2



2. Cohen at al. 2011: ''Computational journalism: A call to arms to database researchers'', Proc. CIDR
Temporal indexing
What are our requirements?

   ●   High-frequency document creation

   ●   Temporal cross-sections of varying size

   ●   Time-sensitive TF/IDF: stopwords are fluid



How can we do this? - Open challenge

   ●   Tree indexing hard to distribute

   ●   Maybe with adaptive multi-resolution grids?
Spatial Context
Demand for spatial information:

      20% of all Google searches

      53% of Bing mobile searches

Heterogeneous spatial context sources

      GPS locations (most reliable)

      Origin bounding boxes (e.g. city)

      User profile text??? 3

      Author's friends' locations 4

3. Hecht at al. 2011: ''Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in User
Profiles'', Proc. ACM CHI ;       4. Rout et al. 2013: ''Where's @wally? A Graph Based Method for Geolocating
Users in Social Networks'', Proc. ACM Hypertext
Spatial Keyword Search
How can we query a set of social media messages?

   Treat as a a set of objects, each having
      Text           
      Location       

   Query parameters:
     Query text
     Query location

Given query and set of messages, rank by similarity:

   Text similarity (Cosine, Siamese Learning Net, Oriented PCA)
   Separating distance (Haversine, Manhattan, Eco-routed)
   Blend this with balancing coeff 


   (just like conventional spatial keyword search)
Spatial Keyword Search
Query:                                                  E
  ''good bar in north copenhagen''
                                                                  B
Issued from location 

Five candidate messages                                 A               C

Query region established
                                                                            D
Rank by blend of location and textual similarity

           Message                                          loca text
       A   So drunk last night at @BarSyv                   0.7       0.6
       B   Out shoe shopping!!! #louboutintime              0.9       0.0
       C   Who pays $9 for a beer?!                         0.6       0.5
       D   wow found cph's greatest cocktail bar lol        0.1       1.0
       E   Traffic. Traffic everywhere. Need a drink.       0.4       0.2
Continuous Spatial Queries
Social media scenario characterised by:

   Streaming data

   New spatial objects constantly appearing

Two new spatial keyword query types:

   Static Continuous (SCSKQ)
      - Fixed query location
      - Tracks newly appearing objects

   Moving Continuous (MCSKQ)
     - Query location transits locus
     - Result updated with new objects

Novel part: fresh objects continuously introduced
Location Diversity
Location data unreliable

Reliability of location data... is also unreliable

''There are known knowns.. we also know there are known unknowns..
            but there are also unknown unknowns'' – Donald Rumsfeld

Text mentions require disambiguation


   ●   In profile
   ●   In messages
   ●   In queries




Requirement is to rank vague points given vague query
Willingness to travel
Determines useful search radius

Based on mode of transport:
                   14.9km
                        22.0km
                                 40.6km
                                          61.5km
                                            >100km

Different for varying classes of Point Of Interest?


ST Social media = huge dataset

   Easy data collection

   Useful for e.g. town planning
Spatio-temporal Challenges
We've seen temporal and spatial challenges; let's combine!

Given all these spatio-temporal utterances, what can we do?

   - Spatial gives relevance from physical or travel proximity

   - Temporal gives relevance from recency and historical



Adding text to the spatio-temporal points gives


             explicit semantic context


Not only are ST patterns in the data, we are told what they mean!
Topic-based Retrieval
Retrieving results on a topic is useful; ''Tell me about X''

Specific terms vary between places and over time



2007                                                               England English



en.wikipedia.org/wiki/President_of_the_United_States   ''Jelly''



2011                                                                  US English




    … Spatio-temporally sensitive indexing?
Sentiment Monitoring
Measure how attitudes change over time and over location

Business uses:      where to send marketing

Political uses:     data-driven democratic.. campaigning

Governance uses: what are citizen priorities in a region

Temporal dimension enables tracking of trends and reactions



                                  red = upbeat;

                                  blue = complaint.

                                  - no normalisation for vocality!
Local Computational Journalism
Social media is quick

Social media is uncurated

''Citizen Journalism''


News has relevance scope:
  Recency
  Proximity


Different events relevant in different contexts:
    Rain in London
    Rain in Addis Ababa

Automatic event detection5 - and also reporting!
5. Ritter at al. 2012: 'Open domain event extraction from Twitter'', Proc. ACM SIGKDD
Summary

Social media is a rich source of ''big data''

A small sampling of all human discourse

It comes with temporal and spatial context


Context-aware search and analysis is very demanding!

   - Novel, powerful applications

   - Wide variety of domains

   - An open set of challenges
Thank you!


Thank you for listening!

   Do you have any questions?

Más contenido relacionado

Destacado

Introduction to Social Media in Asia
Introduction to Social Media in AsiaIntroduction to Social Media in Asia
Introduction to Social Media in AsiaGaurav Mishra
 
Surrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative LeadershipSurrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative LeadershipKelsey Ruger
 
Media Research - Research Hypothesis
Media Research- Research HypothesisMedia Research- Research Hypothesis
Media Research - Research HypothesisTrinity Dwarka
 
The Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social MediaThe Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social MediaTactica Interactive
 
Social Media Measurement
Social Media MeasurementSocial Media Measurement
Social Media MeasurementKelsey Ruger
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social MediaKelsey Ruger
 

Destacado (6)

Introduction to Social Media in Asia
Introduction to Social Media in AsiaIntroduction to Social Media in Asia
Introduction to Social Media in Asia
 
Surrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative LeadershipSurrounded By Genius: Practical Advice On Creative Leadership
Surrounded By Genius: Practical Advice On Creative Leadership
 
Media Research - Research Hypothesis
Media Research- Research HypothesisMedia Research- Research Hypothesis
Media Research - Research Hypothesis
 
The Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social MediaThe Conversation - An Introduction to Social Media
The Conversation - An Introduction to Social Media
 
Social Media Measurement
Social Media MeasurementSocial Media Measurement
Social Media Measurement
 
Introduction to Social Media
Introduction to Social MediaIntroduction to Social Media
Introduction to Social Media
 

Similar a Towards Context-Aware Search and Analysis on Social Media Data

Phd Colloquium Spatial Analysis
Phd Colloquium Spatial AnalysisPhd Colloquium Spatial Analysis
Phd Colloquium Spatial Analysisalistairleak
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Digital Methods Initiative
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Augmenting offical datasets with volunteered geographic information a case ...
Augmenting offical datasets with volunteered geographic information   a case ...Augmenting offical datasets with volunteered geographic information   a case ...
Augmenting offical datasets with volunteered geographic information a case ...Institute for Transport Studies (ITS)
 
Geographic Information Management Transformation
Geographic Information Management TransformationGeographic Information Management Transformation
Geographic Information Management TransformationPat Kenny
 
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...FIA2010
 
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017kjanowicz
 
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebNoshir Contractor
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLICwebmaster
 
Big Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationBig Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationAndrew Prescott
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?Han Woo PARK
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 

Similar a Towards Context-Aware Search and Analysis on Social Media Data (20)

Phd Colloquium Spatial Analysis
Phd Colloquium Spatial AnalysisPhd Colloquium Spatial Analysis
Phd Colloquium Spatial Analysis
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
ICAME 2010
ICAME 2010ICAME 2010
ICAME 2010
 
Augmenting offical datasets with volunteered geographic information a case ...
Augmenting offical datasets with volunteered geographic information   a case ...Augmenting offical datasets with volunteered geographic information   a case ...
Augmenting offical datasets with volunteered geographic information a case ...
 
Geographic Information Management Transformation
Geographic Information Management TransformationGeographic Information Management Transformation
Geographic Information Management Transformation
 
ICCM 2014 -- Ignite Talks -- Session 2
ICCM 2014 -- Ignite Talks -- Session 2ICCM 2014 -- Ignite Talks -- Session 2
ICCM 2014 -- Ignite Talks -- Session 2
 
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
 
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
 
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
 
Big Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationBig Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentation
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016Big Data Challenges and Trust Management at CTS -2016
Big Data Challenges and Trust Management at CTS -2016
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
History of hci
History of hciHistory of hci
History of hci
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 

Más de Leon Derczynski

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and VeracityLeon Derczynski
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018Leon Derczynski
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceLeon Derczynski
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCLeon Derczynski
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingLeon Derczynski
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social MediaLeon Derczynski
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesLeon Derczynski
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Leon Derczynski
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social MediaLeon Derczynski
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doLeon Derczynski
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsLeon Derczynski
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextLeon Derczynski
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy DataLeon Derczynski
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseLeon Derczynski
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyLeon Derczynski
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkLeon Derczynski
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseLeon Derczynski
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceLeon Derczynski
 

Más de Leon Derczynski (20)

Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 
RumourEval
RumourEvalRumourEval
RumourEval
 
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition ResourceBroad Twitter Corpus: A Diverse Named Entity Recognition Resource
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
 
Handling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGCHandling and Mining Linguistic Variation in UGC
Handling and Mining Linguistic Variation in UGC
 
Efficient named entity annotation through pre-empting
Efficient named entity annotation through pre-emptingEfficient named entity annotation through pre-empting
Efficient named entity annotation through pre-empting
 
Leveraging the Power of Social Media
Leveraging the Power of Social MediaLeveraging the Power of Social Media
Leveraging the Power of Social Media
 
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice GuidelinesCorpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
 
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
 
Starting to Process Social Media
Starting to Process Social MediaStarting to Process Social Media
Starting to Process Social Media
 
Christmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I doChristmas Presentation at Aarhus: What I do
Christmas Presentation at Aarhus: What I do
 
Recognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal ExpressionsRecognising and Interpreting Named Temporal Expressions
Recognising and Interpreting Named Temporal Expressions
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data Twitter Part-of-Speech Tagging for All:  Overcoming Sparse and Noisy Data
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
Microblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracyMicroblog-genre noise and its impact on semantic annotation accuracy
Microblog-genre noise and its impact on semantic annotation accuracy
 
Empirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense FrameworkEmpirical Validation of Reichenbach’s Tense Framework
Empirical Validation of Reichenbach’s Tense Framework
 
Determining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in DiscourseDetermining the Types of Temporal Relations in Discourse
Determining the Types of Temporal Relations in Discourse
 
TIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation ResourceTIMEN: An Open Temporal Expression Normalisation Resource
TIMEN: An Open Temporal Expression Normalisation Resource
 

Último

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 

Último (20)

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 

Towards Context-Aware Search and Analysis on Social Media Data

  • 1. Towards Context-Aware Search and Analysis on Social Media Data Leon Derczynski Bin Yang 杨彬 Christian S. Jensen
  • 2. Evolution of communication Functional utterances Vowels Velar closure: consonants Speech New modality: writing Increased Digital text machine- ? E-mail readable Social media information
  • 3. Social Media = Big Data Gartner ''3V'' definition: 1.Volume 2.Velocity 3.Variety High volume & velocity of messages: Twitter has ~20 000 000 users per month They write ~500 000 000 messages per day Massive variety: Stock markets; Earthquakes; Social arrangements; … Bieber
  • 4. What is machine-readable now? Messages now contain - not only linguistic content - but also: Links (e.g. URI) Topic markers (e.g. hashtags) Meta-information What kind of meta-information? User profile (including home location) Images Messages replied to Message language Time of message Location of message
  • 5. What resources do we have now? Large, content-rich, linked, digital streams of human communication We transfer knowledge via communication Sampling communication gives a sample of human knowledge ''You've only done that which you can communicate'' The metadata (time – place – imagery) gives a richer resource: → A sampling of human behaviour
  • 6. What can we do with this resource? Context increases the data's richness Increased richness enables novel applications Time and Place are interesting parts of message context 1.What kinds of applications are there? 2.What are the practical challenges?
  • 7. Temporal Context Messages have timestamps: + Two temporal retrieval scenarios: 1. Historical analyses 2. Emerging data
  • 8. Historical search Ability to retrieve from archives: Longitudinal query mode 0 Retrieve information on: ● Lifecycle of socially connected groups ● Analyse precursors to events, post-hoc 2008 2011 0. Weikum et al. 2011: ''Longitudinal analytics on web archive data: It’s about time'', Proc. CIDR
  • 9. Historical search Retrospective analyses into cause and effect ''There's a dead crow in my garden'' Social media mentions of dead crows predict WNV in humans 1 1. Sugumaran & Voss 2012: ''Real-time spatio-temporal analysis of West Nile Virus using Twitter Data'', Proc. Int'l conference on Computing for Geospatial Research and Applications
  • 10. Emerging search Data emerging at high velocity: 185 000 documents per minute Gives a high temporal density Search over this info enables: ● Live coverage of events ● Realtime identification of emerging events 2 2. Cohen at al. 2011: ''Computational journalism: A call to arms to database researchers'', Proc. CIDR
  • 11. Temporal indexing What are our requirements? ● High-frequency document creation ● Temporal cross-sections of varying size ● Time-sensitive TF/IDF: stopwords are fluid How can we do this? - Open challenge ● Tree indexing hard to distribute ● Maybe with adaptive multi-resolution grids?
  • 12. Spatial Context Demand for spatial information: 20% of all Google searches 53% of Bing mobile searches Heterogeneous spatial context sources GPS locations (most reliable) Origin bounding boxes (e.g. city) User profile text??? 3 Author's friends' locations 4 3. Hecht at al. 2011: ''Tweets from Justin Bieber’s Heart: The Dynamics of the “Location” Field in User Profiles'', Proc. ACM CHI ; 4. Rout et al. 2013: ''Where's @wally? A Graph Based Method for Geolocating Users in Social Networks'', Proc. ACM Hypertext
  • 13. Spatial Keyword Search How can we query a set of social media messages? Treat as a a set of objects, each having Text  Location  Query parameters: Query text Query location Given query and set of messages, rank by similarity: Text similarity (Cosine, Siamese Learning Net, Oriented PCA) Separating distance (Haversine, Manhattan, Eco-routed) Blend this with balancing coeff  (just like conventional spatial keyword search)
  • 14. Spatial Keyword Search Query: E ''good bar in north copenhagen'' B Issued from location  Five candidate messages A C Query region established D Rank by blend of location and textual similarity Message loca text A So drunk last night at @BarSyv 0.7 0.6 B Out shoe shopping!!! #louboutintime 0.9 0.0 C Who pays $9 for a beer?! 0.6 0.5 D wow found cph's greatest cocktail bar lol 0.1 1.0 E Traffic. Traffic everywhere. Need a drink. 0.4 0.2
  • 15. Continuous Spatial Queries Social media scenario characterised by: Streaming data New spatial objects constantly appearing Two new spatial keyword query types: Static Continuous (SCSKQ) - Fixed query location - Tracks newly appearing objects Moving Continuous (MCSKQ) - Query location transits locus - Result updated with new objects Novel part: fresh objects continuously introduced
  • 16. Location Diversity Location data unreliable Reliability of location data... is also unreliable ''There are known knowns.. we also know there are known unknowns.. but there are also unknown unknowns'' – Donald Rumsfeld Text mentions require disambiguation ● In profile ● In messages ● In queries Requirement is to rank vague points given vague query
  • 17. Willingness to travel Determines useful search radius Based on mode of transport: 14.9km 22.0km 40.6km 61.5km >100km Different for varying classes of Point Of Interest? ST Social media = huge dataset Easy data collection Useful for e.g. town planning
  • 18. Spatio-temporal Challenges We've seen temporal and spatial challenges; let's combine! Given all these spatio-temporal utterances, what can we do? - Spatial gives relevance from physical or travel proximity - Temporal gives relevance from recency and historical Adding text to the spatio-temporal points gives explicit semantic context Not only are ST patterns in the data, we are told what they mean!
  • 19. Topic-based Retrieval Retrieving results on a topic is useful; ''Tell me about X'' Specific terms vary between places and over time 2007 England English en.wikipedia.org/wiki/President_of_the_United_States ''Jelly'' 2011 US English … Spatio-temporally sensitive indexing?
  • 20. Sentiment Monitoring Measure how attitudes change over time and over location Business uses: where to send marketing Political uses: data-driven democratic.. campaigning Governance uses: what are citizen priorities in a region Temporal dimension enables tracking of trends and reactions red = upbeat; blue = complaint. - no normalisation for vocality!
  • 21. Local Computational Journalism Social media is quick Social media is uncurated ''Citizen Journalism'' News has relevance scope: Recency Proximity Different events relevant in different contexts: Rain in London Rain in Addis Ababa Automatic event detection5 - and also reporting! 5. Ritter at al. 2012: 'Open domain event extraction from Twitter'', Proc. ACM SIGKDD
  • 22. Summary Social media is a rich source of ''big data'' A small sampling of all human discourse It comes with temporal and spatial context Context-aware search and analysis is very demanding! - Novel, powerful applications - Wide variety of domains - An open set of challenges
  • 23. Thank you! Thank you for listening! Do you have any questions?