SlideShare una empresa de Scribd logo
1 de 38
Twitter, Twinder, Twitcident: Filtering
and Search on Social Web Streams

Data Bridges Workshop, Inria, Paris, April 12th 2012



                        Fabian Abel, Claudia Hauff, Geert-Jan Houben,
                                           Richard Stronkman, Ke Tao
                              Web Information Systems, TU Delft, the Netherlands

        Delft
        University of
        Technology
200,000,000
  number of tweets published per day



 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   2
Pukkelpop 2011




                 People tweet about everything,
                                 everywhere :-)




                                              3
200,000,000
Pukkelpop 2011
became a tragedy

                            Filtering




                   81,000 tweets in four hours



                            Search &
                            Browsing
                                             4
Challenges
  1. (Automatic) Filtering: Given a topic (e.g. expressed via
     some keywords), how can one automatically identify
     those tweets that are relevant to the topic?

  2. Search & Browsing: How can one improve search and
     browsing capabilities so that users can explore
     information in the streams of tweets (that are relevant for
     a topic)?
                                                       Twinder
                            Filtering
                                           Search &    filtering
                                           Browsing    and search
                                                       framework
Twitter streams
                                          topic            information need
        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   5
Search &
                                                  Filtering
                                                                           Browsing


    Twitter streams
                                                    topic            information need




1. Filtering of Twitter streams


     Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams     6
Filtering onTwitter
                                                                        Query:
                                                                      www2012

                                                                       Typical approach:
                                                                       Keyword-based
                                                                       matching



Are there further features that can be used as
indicators for estimating the relevance of a tweet
for a topic?

         Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   7
Syntactical feature: hashtags
Is a tweet more relevant ifitcontains a #hashtag?

  Hypothesis: tweets that contain hashtags are more likely
  to be relevant than tweets that do not contain hashtags.




                                                                         #Hashtag


        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams    8
Syntactical feature: URLs
Is a tweetthatcontains a URL more relevant?

  Hypothesis: tweets that contain a URL are more likely to
  be relevant than tweets that do not contain a URL.




       URL

       Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   9
Syntactical feature: “mentions”
 Is a tweetthatmentions@somebodymore relevant?

   Hypothesis: tweets that are formulated as a reply to another
   tweet are less likely to be relevant than other tweets.




Reply




  @mention

        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   10
Syntacticalfeature: length
Does the length of a tweetinfluenceitsrelevancefor a topic?




                                                           54 characters (9 words)

                                   vs.
                                                           140 characters (20 words)


  Hypothesis: the longer a tweet, the more likely it is to be
  relevant and interesting.

        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   11
Overview of features
Topic-sensitive and topic-insensitive features




     Topic sensitive                             Topic insensitive
      Keyword-based
                                                 Syntactical features
        relevance

       What about the semantics?


        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   12
Semanticfeatures: number of entities
Findsemantics in a tweettoestimate the relevance

          dbp:Tim_Berners-Lee                      dbp:World_Wide_Web




     dbp:WWW_Conference                                      dbp:France

                                                    dbp:Lyon


  Hypothesis: the more entities a tweet mentions, the more
  likely it is to be relevant and interesting.

        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   13
Semanticfeatures: diversity
The types of entitiesthat are featuredby a tweet matter

                                                                   Place    Place     Place Place
  Person                   Thing


                                           vs.           I plan to visit
                                                         Paris, Bordeaux, Grenoble, Nice, Marseille and
                                                         Lyon.

     Event                      Place
                            Place                    Place                 Place



  Hypothesis: the higher the diversity of entities that are
  mentioned in a tweet, the more likely it is to be relevant.


           Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams            14
Semanticfeatures: sentiment
       Opinionsexpressed in tweets are interesting




Looking forward to the WWW
conference :-) Yes!          vs.       I plan to visit Paris, Bordeaux,
                                                                       vs.
                                       Grenoble, Nice, Marseille and Lyon.
                                                                              Why are the big players not releasing
                                                                              query logs to the WWW community? :-(
                                                                              #fail



   :-)                              neutral                                              :-(
           Hypothesis: the likelihood of a tweet’s relevance is
           influenced by its sentiment polarity.


                   Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams          15
Semanticrelatedness
    Exploitsemantics to relate query withtweets




                                     dbp:International_World_Wide_Web_Conference

dbp:Tim_Berners-Lee




             Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   16
Overview of features
Bynow, we have 4 types of features.




     Topic sensitive                            Topic insensitive
      Keyword-based                                Syntactical
     Semantic-based                                Semantics
        Context?                                   Context?

 What kind of contextual features
        might be helpful?
       Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   17
Contextual feature: authority of the publisher
 Itmatterswhopublished a tweet




     Hypothesis: the higher the number of tweets that have
     been published by the creator of a tweet, the more likely
     it is that the tweet is relevant.


        Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   18
Contextual feature: time w.r.t. query
When was a tweetpublished?
 Hypothesis: the lower the temporal distance between the
 query time and the creation time of a tweet, the more likely
 is the tweet relevant to the topic.




                    Tweet                          query

            March 31                       April 16

       Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   19
Summary of Features



  Topic sensitive                           Topic insensitive
   Keyword-based                               Syntactical
  Semantic-based                               Semantics
   Context-based                                 Context




   Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   20
Results
Achievedfor the TREC MicroblogChallenge


Features             Precision            Recall            F-measure
keyword relevance                0.3040            0.2924                   0.2981
without semantics
semantic relevance            0.3363
                             0.3053                0.4828
                                                   0.2931                  0.3965
                                                                          0.2991
all features                 0.3674                0.4736                 0.4138




 Overall, we can achieve the precision and
 recall of over 35% and 45% respectively by
 applying all the features.

                                                            Challenge the future   21
Importance of Features
          Topic-sensitive                               Topic-insensitive
     2                                       2
             Keyword-based                                       Syntactical
     1                                       1

     0                                       0
               Keyword-based relevance            hasHashtag      hasURL               isReply       length
     -1                                      -1



      2                                      2


      1
            Semantic-based                   1
                                                                 Semantics
      0                                      0
           Relevance           Relatedness           #entities             diversity             sentiment
     -1                                      -1


Semantic relatedness, URLs, !isReply, diversity and
      2
          Context-based            Context
                                             2



sentiment are good indicators for estimating the
      1                                      1

      0                                      0
relevance of a tweet.
     -1
                 Temporal context
               Keyword-based relevance
                                             -1
                                                               Social context
                                                         Keyword-based relevance




                                                                             Challenge the future    22
Search &
                                                  Filtering
                                                                           Browsing


    Twitter streams
                                                    topic            information need




2. Search & Browsing in Twitter Streams


     Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams    23
Idea: Faceted Search

   Expand Query:                          Current Query:
   Suggestions:                            Eindhoven                 Music
   + Guilty Simpson
   + Area51                               Results:
                                          1. Yskiddd: Next saturday
   Locations more...                         @thatsimpsonguy aka Guilty Simpson
                                             will be performing at Area51 in my
   Events more...                            homeytown Eindhoven. #realliveshit
                                             #iwillspinrecords2
   Music Artists:                         2. Usee123: Cool #EV3door7980 !!!
   + Guilty Simpson                          http://bit.ly/igyyRhL
   + Bryan Adams
   + Elton John                           3. sanmiquelmusic: This Saturday I'm
                                             joining @KrusadersMusic to Intents
   + Golden Earring
   more...
      Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams    24
Adaptive Faceted Search
                                               user

                               Adaptive Faceted Search
                                                                           How to adapt the
How to represent                                                            facet-value pair
the content of a                                                             ranking to the
                            User and Context Modeling
     tweet?                                                               current demands of
 facet extraction                                                             the user?
                                                                          query suggestions
                                 Semantic Enrichment




                                     Twitter posts
            Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   25
Facet Extraction and Semantic Enrichment
    powered by

                                                     Julian Assange

                          @bob: Julian Assange got                                 Tweet-based
                          arrested                                                 enrichment

Julian Assange

                             Julian Assange
                             Julian Assange arrested                               Link-based
 London                Julian Assange, the founder of
                       Julian Assange                                              enrichment
                       WikiLeaks, is under arrest in
                       WikiLeaks
                       London…
                       London
WikiLeaks
                 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   26
Faceted-search vs. hashtag-based
    (keyword) search
                                                        Faceted search based on
                                                        semantic enrichment of
                                                          tweets outperforms
                                                         hashtgag-based search
                                                              significantly.




    Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   27
Impact of link-based enrichment
                                             Personalized strategy
                                             outperforms baseline
                                                 significantly

                                                    Link-based enrichment
                                                   improves quality for both
                                                          strategies




     Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   28
Twitcident application



                                                                           Search &
                                                  Filtering
                                                                           Browsing


     Twitter streams
                                                    topic            information need


Twitcident: Applying filter & search functionality
  for distilling information from Twitter during
  incidents (e.g. fires, extreme weather situations)
         Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams     29
200,000,000
Pukkelpop 2011
became a tragedy

                            Filtering




                   81,000 tweets in four hours



                            Search &
                            Browsing
                                             30
Search &
                                                               Browsing



                                                               Automatic
                                                                Filtering




                                                  Twitcident Pipeline
Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   31
Faceted Search




Filtered Twitter stream
                                           32
Real-time visualizations
                           33
Could we see it coming?

                                              Popular artist
                                              made a joke                    Impact
                                            about the weather                 storm




                                 Term usage 25 minutes before the incident

     1.   heavy weather, hail balls, lightning, pitch black…
     2.   drama, panic, hell, serious, extreme…




“                                                                                     ”
                                                                                      34
Spotting eye witnesses
                         35
Real-time information from eyewitness




                                        36
Summary
Automatic Filtering of Tweets: [#MSM@WWW ’12]
• Topic-sensitive and topic-insensitive features
• Semantic features (semantic relatedness, diversity, sentiment
  are beneficial)
Search and browsing: [ISWC ’11]
• Faceted Search
• Personalization & contextualization helps
Application: [Hypertext ‘12, Demo@WWW’12]
• Twitcident: fulfilling information needs during incidents
Future works:
• Weak signal detection based on tweets
• Duplicate detection
         Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   37
Thank you!


                 @fabianabel
                 http://wis.ewi.tudelft.nl/

Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams   38

Más contenido relacionado

Similar a Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

Rob Procter
Rob ProcterRob Procter
Rob ProcterNSMNSS
 
What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?Ke Tao
 
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and TwitterA Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and TwitterQi Gao
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Ke Tao
 
Are You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAre You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAndrew Walker
 
Are you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringAre you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringKlaxon
 
Twitter for business
Twitter for businessTwitter for business
Twitter for businessAckermann PR
 
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsIntroduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsMike Kujawski
 
Twitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi OtwayTwitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi OtwayHeidi Otway, APR, CPRC
 
Webinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social CommunityWebinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social CommunityEric Athas
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
 
How to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityHow to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityEric Athas
 
Institute of Management Consultants
Institute of Management ConsultantsInstitute of Management Consultants
Institute of Management ConsultantsThinktank Social
 
Twitter mining
Twitter miningTwitter mining
Twitter miningmagicpeach
 
2: Social media services and blogging
2: Social media services and blogging2: Social media services and blogging
2: Social media services and bloggingCOMP 113
 
What Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPsWhat Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPsRachel Beaney
 
Twitter & Tweets
Twitter & TweetsTwitter & Tweets
Twitter & TweetsBARRY HAMMOND
 

Similar a Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams (20)

Rob Procter
Rob ProcterRob Procter
Rob Procter
 
What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?What makes a tweet relevant for a topic?
What makes a tweet relevant for a topic?
 
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and TwitterA Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
A Comparative Study of Users' Microblogging Behavior on Sina Weibo and Twitter
 
Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams Twinder: A Search Engine for Twitter Streams
Twinder: A Search Engine for Twitter Streams
 
Are You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAre You Listening? Real time data and social media
Are You Listening? Real time data and social media
 
Are you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringAre you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and Monitoring
 
Twitter for business
Twitter for businessTwitter for business
Twitter for business
 
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT ToolsIntroduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
 
Twitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi OtwayTwitter for Business 2011 by Heidi Otway
Twitter for Business 2011 by Heidi Otway
 
Webinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social CommunityWebinar: How to Track and Identify Your Social Community
Webinar: How to Track and Identify Your Social Community
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
How to Identify and Track Your Social Community
How to Identify and Track Your Social CommunityHow to Identify and Track Your Social Community
How to Identify and Track Your Social Community
 
Institute of Management Consultants
Institute of Management ConsultantsInstitute of Management Consultants
Institute of Management Consultants
 
The Value of Twitter
The Value of TwitterThe Value of Twitter
The Value of Twitter
 
Twitter mining
Twitter miningTwitter mining
Twitter mining
 
Twitter 101
Twitter 101Twitter 101
Twitter 101
 
2: Social media services and blogging
2: Social media services and blogging2: Social media services and blogging
2: Social media services and blogging
 
What Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPsWhat Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPs
 
Twitter Ecosystem
Twitter EcosystemTwitter Ecosystem
Twitter Ecosystem
 
Twitter & Tweets
Twitter & TweetsTwitter & Tweets
Twitter & Tweets
 

Más de Web Information Systems, TU Delft

GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebWeb Information Systems, TU Delft
 
Generating Resource Profiles by Exploiting the Context of Social Annotations
Generating Resource Profiles by Exploiting the Context of Social AnnotationsGenerating Resource Profiles by Exploiting the Context of Social Annotations
Generating Resource Profiles by Exploiting the Context of Social AnnotationsWeb Information Systems, TU Delft
 
Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
Leveraging the Semantics of Tweets for Adaptive Faceted Search on TwitterLeveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
Leveraging the Semantics of Tweets for Adaptive Faceted Search on TwitterWeb Information Systems, TU Delft
 
#SDoW2011 Keynote: User Modeling and Personalization on Twitter
#SDoW2011 Keynote: User Modeling and Personalization on Twitter#SDoW2011 Keynote: User Modeling and Personalization on Twitter
#SDoW2011 Keynote: User Modeling and Personalization on TwitterWeb Information Systems, TU Delft
 
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...Web Information Systems, TU Delft
 
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...Web Information Systems, TU Delft
 

Más de Web Information Systems, TU Delft (10)

GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic Web
 
Generating Resource Profiles by Exploiting the Context of Social Annotations
Generating Resource Profiles by Exploiting the Context of Social AnnotationsGenerating Resource Profiles by Exploiting the Context of Social Annotations
Generating Resource Profiles by Exploiting the Context of Social Annotations
 
Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
Leveraging the Semantics of Tweets for Adaptive Faceted Search on TwitterLeveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
 
Payday on the Social Semantic Web
Payday on the Social Semantic WebPayday on the Social Semantic Web
Payday on the Social Semantic Web
 
#SDoW2011 Keynote: User Modeling and Personalization on Twitter
#SDoW2011 Keynote: User Modeling and Personalization on Twitter#SDoW2011 Keynote: User Modeling and Personalization on Twitter
#SDoW2011 Keynote: User Modeling and Personalization on Twitter
 
About the Social Semantic Web
About the Social Semantic WebAbout the Social Semantic Web
About the Social Semantic Web
 
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
 
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
UMAP 2011: Analyzing User Modeling on Twitter for Personalized News Recommend...
 
Analyzing Cross-System User Modeling on the Social Web
Analyzing Cross-System User Modeling on the Social WebAnalyzing Cross-System User Modeling on the Social Web
Analyzing Cross-System User Modeling on the Social Web
 
Learning Semantic Relationships between Entities in Twitter
Learning Semantic Relationships between Entities in TwitterLearning Semantic Relationships between Entities in Twitter
Learning Semantic Relationships between Entities in Twitter
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

  • 1. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams Data Bridges Workshop, Inria, Paris, April 12th 2012 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
  • 2. 200,000,000 number of tweets published per day Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2
  • 3. Pukkelpop 2011 People tweet about everything, everywhere :-) 3
  • 4. 200,000,000 Pukkelpop 2011 became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 4
  • 5. Challenges 1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic? 2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)? Twinder Filtering Search & filtering Browsing and search framework Twitter streams topic information need Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 5
  • 6. Search & Filtering Browsing Twitter streams topic information need 1. Filtering of Twitter streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 6
  • 7. Filtering onTwitter Query: www2012 Typical approach: Keyword-based matching Are there further features that can be used as indicators for estimating the relevance of a tweet for a topic? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 7
  • 8. Syntactical feature: hashtags Is a tweet more relevant ifitcontains a #hashtag? Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags. #Hashtag Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 8
  • 9. Syntactical feature: URLs Is a tweetthatcontains a URL more relevant? Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL. URL Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 9
  • 10. Syntactical feature: “mentions” Is a tweetthatmentions@somebodymore relevant? Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets. Reply @mention Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 10
  • 11. Syntacticalfeature: length Does the length of a tweetinfluenceitsrelevancefor a topic? 54 characters (9 words) vs. 140 characters (20 words) Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 11
  • 12. Overview of features Topic-sensitive and topic-insensitive features Topic sensitive Topic insensitive Keyword-based Syntactical features relevance What about the semantics? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 12
  • 13. Semanticfeatures: number of entities Findsemantics in a tweettoestimate the relevance dbp:Tim_Berners-Lee dbp:World_Wide_Web dbp:WWW_Conference dbp:France dbp:Lyon Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 13
  • 14. Semanticfeatures: diversity The types of entitiesthat are featuredby a tweet matter Place Place Place Place Person Thing vs. I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon. Event Place Place Place Place Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 14
  • 15. Semanticfeatures: sentiment Opinionsexpressed in tweets are interesting Looking forward to the WWW conference :-) Yes! vs. I plan to visit Paris, Bordeaux, vs. Grenoble, Nice, Marseille and Lyon. Why are the big players not releasing query logs to the WWW community? :-( #fail :-) neutral :-( Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 15
  • 16. Semanticrelatedness Exploitsemantics to relate query withtweets dbp:International_World_Wide_Web_Conference dbp:Tim_Berners-Lee Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 16
  • 17. Overview of features Bynow, we have 4 types of features. Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context? Context? What kind of contextual features might be helpful? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 17
  • 18. Contextual feature: authority of the publisher Itmatterswhopublished a tweet Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 18
  • 19. Contextual feature: time w.r.t. query When was a tweetpublished? Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic. Tweet query March 31 April 16 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 19
  • 20. Summary of Features Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context-based Context Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 20
  • 21. Results Achievedfor the TREC MicroblogChallenge Features Precision Recall F-measure keyword relevance 0.3040 0.2924 0.2981 without semantics semantic relevance 0.3363 0.3053 0.4828 0.2931 0.3965 0.2991 all features 0.3674 0.4736 0.4138 Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features. Challenge the future 21
  • 22. Importance of Features Topic-sensitive Topic-insensitive 2 2 Keyword-based Syntactical 1 1 0 0 Keyword-based relevance hasHashtag hasURL isReply length -1 -1 2 2 1 Semantic-based 1 Semantics 0 0 Relevance Relatedness #entities diversity sentiment -1 -1 Semantic relatedness, URLs, !isReply, diversity and 2 Context-based Context 2 sentiment are good indicators for estimating the 1 1 0 0 relevance of a tweet. -1 Temporal context Keyword-based relevance -1 Social context Keyword-based relevance Challenge the future 22
  • 23. Search & Filtering Browsing Twitter streams topic information need 2. Search & Browsing in Twitter Streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 23
  • 24. Idea: Faceted Search Expand Query: Current Query: Suggestions: Eindhoven Music + Guilty Simpson + Area51 Results: 1. Yskiddd: Next saturday Locations more... @thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my Events more... homeytown Eindhoven. #realliveshit #iwillspinrecords2 Music Artists: 2. Usee123: Cool #EV3door7980 !!! + Guilty Simpson http://bit.ly/igyyRhL + Bryan Adams + Elton John 3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents + Golden Earring more... Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 24
  • 25. Adaptive Faceted Search user Adaptive Faceted Search How to adapt the How to represent facet-value pair the content of a ranking to the User and Context Modeling tweet? current demands of  facet extraction the user?  query suggestions Semantic Enrichment Twitter posts Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 25
  • 26. Facet Extraction and Semantic Enrichment powered by Julian Assange @bob: Julian Assange got Tweet-based arrested enrichment Julian Assange Julian Assange Julian Assange arrested Link-based London Julian Assange, the founder of Julian Assange enrichment WikiLeaks, is under arrest in WikiLeaks London… London WikiLeaks Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 26
  • 27. Faceted-search vs. hashtag-based (keyword) search Faceted search based on semantic enrichment of tweets outperforms hashtgag-based search significantly. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 27
  • 28. Impact of link-based enrichment Personalized strategy outperforms baseline significantly Link-based enrichment improves quality for both strategies Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 28
  • 29. Twitcident application Search & Filtering Browsing Twitter streams topic information need Twitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations) Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 29
  • 30. 200,000,000 Pukkelpop 2011 became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 30
  • 31. Search & Browsing Automatic Filtering Twitcident Pipeline Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 31
  • 34. Could we see it coming? Popular artist made a joke Impact about the weather storm Term usage 25 minutes before the incident 1. heavy weather, hail balls, lightning, pitch black… 2. drama, panic, hell, serious, extreme… “ ” 34
  • 36. Real-time information from eyewitness 36
  • 37. Summary Automatic Filtering of Tweets: [#MSM@WWW ’12] • Topic-sensitive and topic-insensitive features • Semantic features (semantic relatedness, diversity, sentiment are beneficial) Search and browsing: [ISWC ’11] • Faceted Search • Personalization & contextualization helps Application: [Hypertext ‘12, Demo@WWW’12] • Twitcident: fulfilling information needs during incidents Future works: • Weak signal detection based on tweets • Duplicate detection Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 37
  • 38. Thank you! @fabianabel http://wis.ewi.tudelft.nl/ Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 38

Notas del editor

  1. Motivation:Information overloadPersonalised “better” search
  2. Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  3. Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  4. Traditional Twitter SearchHighlight what does keyword matching means, the keywords, in search query and tweets.
  5. Title -> syntactical featuresBox in the tweetGreen boxes for the hypothesesFlow from keyword-based relevance to … Slide 5-8, flow
  6. Subtitle -> question?
  7. Introduction to the usage of @, including mentions, and reply. Reply tweets frequently occur in private conversations. Therefore particularly, make a hypothesis about reply tweet.
  8. The 21st International World Wide Web Conference #www2012 will take place in Lyon, France April 16-20 2012 @www2012Lyon www2012.wwwconference.orgSubtitle questionOne short, one longcomparison
  9. Fade in the question later.
  10. Fade in the entities one by one.
  11. Fade in the entities one by one.
  12. Fade in the entities one by one.
  13. Not highlight www, lyon, france
  14. 18Can we utilize the contextual features.
  15. Titles,
  16. Timeline
  17. Number of features.
  18. ComparisonFade in the pairsHighlightTextbox -> Conclusion, (precision)
  19. Very time consuming and overwhelming indeed!
  20. entity extraction and semantic enrichment and relation discovery.
  21. Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  22. Case #1: vroegsignalering
  23. Case 1:handhaving (beeldrondom incident)
  24. Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.