SlideShare una empresa de Scribd logo
1 de 32
Twitris – System for Understanding
  Perceptions From Social Data	

                      	

                        	

                        	

           http://twitris.knoesis.org/	

 Ohio Center of Excellence in Knowledge Enabled
             Computing (Kno.e.sis)	

      Wright State University, Dayton, OH	



                                                  1
Twitris - Motivation	

1.  Information Overload"
•    WHAT to be aware of"
•    Multiple Storylines about same event!!"




                                        Image: http://bit.ly/etFezl 2
Twitris - Motivation	


2. Evolution of Citizen Observation"
     •  with location, time and occurrence of other
        events"




                                                      3
Twitris - Motivation	


3. Big picture of the event"
   –  How to find out "
     •  Location and time based interesting facts for an
        event from Twitter"
     •  Event related information from other sources
        (images, videos, news and Wikipedia articles)"
     "


                                                           4
Twitris: Twitter + Tetris	

•  Twitris lets you browse citizen reports using social
   perceptions as the fulcrum"
   –  What is being said about an event (theme)"
   –  Where (spatial)"
   –  When (temporal)"

•  Contextual information from web resources like news,
   Wikipedia articles, Flickr, TwitPic and Youtube"

•  Study diversity and change in perceptions"


                                                          5
Twitris Architecture	


                       4




                              2
1                  3


                                  6
Data Collection and Preprocessing:
 Semi-automated Tweet Crawler	

Extract topically relevant tweets using Twitter search
   API and search keywords"
    –  Because tweets are not pre-categorized!"

Strategy: Semi-automated Multithread Continuous
   "   " Tweet Crawler"
"


       l    Start with manually selected keywords (seed)"
       l    Crawl using keywords, hashtags"
       l    Periodically update keywords used for crawl "
             (to capture evolution of the topic)"
       l    Continue crawl"                                 7
Data Collection and Preprocessing:
      Metadata Extraction	

 •    Tweet published date-time, author, location"
 •    Location from where tweet is originated"
      −  From the tweet"
      −  From authorʼs profile"
            •    Location: Dayton, OH (Google geocoder service)"
            •    Location: “best place in the world” (fail!)"
 •        Location Geocode lookup"
 •        Cache (location, latitude, longitude) for speedup"
      "



                                                                   8
Key Phrase Extraction:	

    1. Spatio-Temporal Clustering	

•  Objective: from volume of tweets to event descriptive key
   phrases, preserving spatio-temporal-thematic aspects of
   social perceptions!
"
1.  Spatio-temporal clustering"
"
    –  Group observations based on location and time"
    "
    –  Global events (Iran Election Protest, Japan
        Earthquake)"
       •  clusters by country and day"
        "
    –  Local events (Heathcare reform debate, Austin
        Plane crash)"
       •  clusters by state and day"
                                                          9
Key Phrase Extraction: 	

    Spatio-Temporal clustering	

Temporal navigation   Spatial Markers




                                        10
Key Phrase Extraction:	

"
              2. N-gram generation	

"
"
"
"
"
"

     “President Obama in trying to regain control of
    the health-care debate will likely shift his pitch in
    September”"
    "1-grams: President, Obama, in, trying, to, regain, ..."
    "2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”...
    "3-grams: “President Obama in”, “Obama in trying”; “in trying to”..."

                                                                            11
Key Phrase Extraction:	

        3. n-gram Weight Calculation	

A n-gramʼs weight is calculated by"
"


         1.  Thematic Importance"
            –    redundancy: statistically discriminatory in nature"
            –    variability: contextually important"

         2.  Spatial Importance (local vs. global popularity)"
         3.  Temporal Importance (always popular vs.
             currently trending)"
    "


                                                                       12
Key Phrase Extraction:	

3.1.A Thematic Importance of a n-gram	


A.  Exploiting Redundancy"

   1.  TF-IDF of n-gram (Lucene Index)"
   2.  Amplify by fraction of nouns in the n-gram (Stanford
       Natural Language Parser)"
   3.  Amplify by fraction of non-stop words (ʻgoing to tryʼ)"
   4.  Pick higher order n-gram (for overlapping segments and
       same TF-IDF)"
   5.  Select top 5 n-grams for further analysis"
Key Phrase Extraction:	

    3.1.B Thematic Importance of a n-gram	


B. Exploiting Variability"
     –  Contextually relevant words boost statistical
        importance"
•  Focus word (fw) : “n-gram”"
"


•  Associated words (awi) : top 5 co-occurring words in
   spatio-temporal set of tweets"
•  Association strength: Point-wise Mutual Information"
Key Phrase Extraction:	

3.2 Thematic-Temporal Importance	

•  Temporal Importance of the n-gram"
     •  always popular vs. currently trending"
•  Certain descriptors always dominate observations"
     –  Obama, President in the US presidential election"
"

    •  To allow less popular, interesting descriptors to surface, we
       discount thematic score proportional to recent popularity"



    •  Spatio-temporal-thematic score of a descriptor"
       "= thematic score - spatio-temporal discounts"
                                                                       15
Key Phrase Extraction:	

3.3 Thematic-Temporal-Spatial Importance	


•  Descriptors that occur all over the world not as
   interesting as those local to a region "
   –  (local vs. global popularity)"

•  Discount thematic-temporal score proportional to number
   of spatial sets (not local) that mention the descriptor"


•  Final Spatio-Temporal-Thematic (STT) weight of a "
   n-gram is"


                                                         16
Key Phrase Extraction: Results	

TFIDF vs. Spatio-
Temporal-Thematic
(STT) Scores of
Descriptors"




                                    17
Key Phrase Extraction: Example	

•  Objective: from volume of tweets to event descriptive key phrases,
   preserving spatio-temporal-thematic aspects of social perceptions




                                                                    18
Analysis of Embedded Links	

•  Due 140 character tweet size limit people are
   increasingly integrating hyperlinks into tweets (Articles,
   blogs, Images, video)"
•  Steps: "
   –  Extraction and resolution of links"
   –  Provide hyperlink to articles, blogs"
   –  Check semantic relevance for images and videos"
       •  Based on title and description "



                                                           19
External Context for
        Understanding Event	


•  Wikipedia articles"
•  Related news"




                                 20
Twitris: Widgets	





                      21
Sentiment Analysis	


•  using statistical and machine"
   learning techniques




                                    22
Entity-Relationship Graph	

•  using semantically annotated Dbpedia"
   entities mentioned in the tweets "




                                           23
Tweet Traffic Analysis	

•  Event popularity over a period of time"




                                             24
Twitris:  
Functional    
Overview	




                 25
Twitris: Demo, Quick Show	





    •  http://twitris.knoesis.org/




                                     26
Ongoing work	





                  27
Continuous Semantics 	

Domain models to enhance understanding of the content"




                                                   28
Coordination	

•  Coordinating needs and resources in disaster
   situation"
  –  Analyze SMS and Web reports from disaster location"
  –  Use domain models for efficient and timely coordination"
                                                  Image: http://bit.ly/hcp4PG




                                                                         29
Twitris Team 	



Meena Nagarajan




                              Amit Sheth              Hemant Purohit

      Ashutosh Jadhav




                                                   Lu Chen
       Pramod Anantharam
                               Pavan Kapanipathi
References	

1.  Twitris: Twitter through space, time and theme. http://twitris.knoesis.org"
2.  Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Jadhav, A., Mutharaju, R.: Spatio-temporal-
    thematic analysis of citizen-sensor data - challenges and experiences. In: Web
    Information Systems Engineering. (2009)"
3.  Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen,              Amit
    P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced
    Browsing, Semantic Web Challenge 2009, 8th International Semantic Web Conference, Oct. 25-29
    2009, Washington, DC, USA"
4.  A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path
    towards event monitoring and situational awareness, February 17, 2009"
5.  A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet
    Computing, July/August 2009."
6.  Thomas, C., Mehra, P., Brooks, R., Sheth, A.P.: Growing fields of interest – using an expand and
    reduce strategy for domain model extraction. In: Web Intelligence. (2008) 496–502"
7.  Mendes PN, Passant A, Kapanipathi P, Sheth AP, 'Linked Open Social Signals,' WI2010 IEEE/WIC/
    ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010"
8.  Meenakshi Nagarajan, Hemant Purohit, Amit Sheth. A Qualitative Examination of Topical Tweet and
    Retweet Practices. 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010"

                                                                                                      31
                               * All the trademarks belong to their respective owners
 

      Thanks!	

        	

     Questions?	



                     32

Más contenido relacionado

Similar a Twitris - Web Information System 2011 Course

Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Pavan Kapanipathi
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
final_nlp
final_nlpfinal_nlp
final_nlpaphex34
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis Zelia Blaga
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTu Nguyen
 
Data for nuclear non-proliferation
Data for nuclear non-proliferation Data for nuclear non-proliferation
Data for nuclear non-proliferation fisherali
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...Yiannis Kompatsiaris
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussenwkwsci-research
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptxDennicaRivera
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and futureRoi Blanco
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestSylvain Carle
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
 
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...HopeBay Technologies, Inc.
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Artificial Intelligence Institute at UofSC
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Shalin Hai-Jew
 
DMG_final
DMG_finalDMG_final
DMG_finalaphex34
 
Social media analytics
Social media analyticsSocial media analytics
Social media analyticsJithu Pettan
 

Similar a Twitris - Web Information System 2011 Course (20)

Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
Adressing Volume and Velocity Challenge on the Social Web using Crowd Sourced...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
A framework for real time semantic social media analysis
A framework for real time semantic social media analysis A framework for real time semantic social media analysis
A framework for real time semantic social media analysis
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the Web
 
Data for nuclear non-proliferation
Data for nuclear non-proliferation Data for nuclear non-proliferation
Data for nuclear non-proliferation
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie RasmussenWords and More Words: Challenges of Big Data by Prof. Edie Rasmussen
Words and More Words: Challenges of Big Data by Prof. Edie Rasmussen
 
Ieee visap bkang
Ieee visap bkangIeee visap bkang
Ieee visap bkang
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 
Twitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFest
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 
Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...Emerging topic detection on twitter based on temporal and social terms evalua...
Emerging topic detection on twitter based on temporal and social terms evalua...
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
 
DMG_final
DMG_finalDMG_final
DMG_final
 
Social media analytics
Social media analyticsSocial media analytics
Social media analytics
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Twitris - Web Information System 2011 Course

  • 1. Twitris – System for Understanding Perceptions From Social Data http://twitris.knoesis.org/ Ohio Center of Excellence in Knowledge Enabled Computing (Kno.e.sis) Wright State University, Dayton, OH 1
  • 2. Twitris - Motivation 1.  Information Overload" •  WHAT to be aware of" •  Multiple Storylines about same event!!" Image: http://bit.ly/etFezl 2
  • 3. Twitris - Motivation 2. Evolution of Citizen Observation" •  with location, time and occurrence of other events" 3
  • 4. Twitris - Motivation 3. Big picture of the event" –  How to find out " •  Location and time based interesting facts for an event from Twitter" •  Event related information from other sources (images, videos, news and Wikipedia articles)" " 4
  • 5. Twitris: Twitter + Tetris •  Twitris lets you browse citizen reports using social perceptions as the fulcrum" –  What is being said about an event (theme)" –  Where (spatial)" –  When (temporal)" •  Contextual information from web resources like news, Wikipedia articles, Flickr, TwitPic and Youtube" •  Study diversity and change in perceptions" 5
  • 7. Data Collection and Preprocessing: Semi-automated Tweet Crawler Extract topically relevant tweets using Twitter search API and search keywords" –  Because tweets are not pre-categorized!" Strategy: Semi-automated Multithread Continuous " " Tweet Crawler" " l  Start with manually selected keywords (seed)" l  Crawl using keywords, hashtags" l  Periodically update keywords used for crawl " (to capture evolution of the topic)" l  Continue crawl" 7
  • 8. Data Collection and Preprocessing: Metadata Extraction •  Tweet published date-time, author, location" •  Location from where tweet is originated" −  From the tweet" −  From authorʼs profile" •  Location: Dayton, OH (Google geocoder service)" •  Location: “best place in the world” (fail!)" •  Location Geocode lookup" •  Cache (location, latitude, longitude) for speedup" " 8
  • 9. Key Phrase Extraction: 1. Spatio-Temporal Clustering •  Objective: from volume of tweets to event descriptive key phrases, preserving spatio-temporal-thematic aspects of social perceptions! " 1.  Spatio-temporal clustering" " –  Group observations based on location and time" " –  Global events (Iran Election Protest, Japan Earthquake)" •  clusters by country and day" " –  Local events (Heathcare reform debate, Austin Plane crash)" •  clusters by state and day" 9
  • 10. Key Phrase Extraction: Spatio-Temporal clustering Temporal navigation Spatial Markers 10
  • 11. Key Phrase Extraction: " 2. N-gram generation " " " " " " “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September”" "1-grams: President, Obama, in, trying, to, regain, ..." "2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... "3-grams: “President Obama in”, “Obama in trying”; “in trying to”..." 11
  • 12. Key Phrase Extraction: 3. n-gram Weight Calculation A n-gramʼs weight is calculated by" " 1.  Thematic Importance" –  redundancy: statistically discriminatory in nature" –  variability: contextually important" 2.  Spatial Importance (local vs. global popularity)" 3.  Temporal Importance (always popular vs. currently trending)" " 12
  • 13. Key Phrase Extraction: 3.1.A Thematic Importance of a n-gram A.  Exploiting Redundancy" 1.  TF-IDF of n-gram (Lucene Index)" 2.  Amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser)" 3.  Amplify by fraction of non-stop words (ʻgoing to tryʼ)" 4.  Pick higher order n-gram (for overlapping segments and same TF-IDF)" 5.  Select top 5 n-grams for further analysis"
  • 14. Key Phrase Extraction: 3.1.B Thematic Importance of a n-gram B. Exploiting Variability" –  Contextually relevant words boost statistical importance" •  Focus word (fw) : “n-gram”" " •  Associated words (awi) : top 5 co-occurring words in spatio-temporal set of tweets" •  Association strength: Point-wise Mutual Information"
  • 15. Key Phrase Extraction: 3.2 Thematic-Temporal Importance •  Temporal Importance of the n-gram" •  always popular vs. currently trending" •  Certain descriptors always dominate observations" –  Obama, President in the US presidential election" " •  To allow less popular, interesting descriptors to surface, we discount thematic score proportional to recent popularity" •  Spatio-temporal-thematic score of a descriptor" "= thematic score - spatio-temporal discounts" 15
  • 16. Key Phrase Extraction: 3.3 Thematic-Temporal-Spatial Importance •  Descriptors that occur all over the world not as interesting as those local to a region " –  (local vs. global popularity)" •  Discount thematic-temporal score proportional to number of spatial sets (not local) that mention the descriptor" •  Final Spatio-Temporal-Thematic (STT) weight of a " n-gram is" 16
  • 17. Key Phrase Extraction: Results TFIDF vs. Spatio- Temporal-Thematic (STT) Scores of Descriptors" 17
  • 18. Key Phrase Extraction: Example •  Objective: from volume of tweets to event descriptive key phrases, preserving spatio-temporal-thematic aspects of social perceptions 18
  • 19. Analysis of Embedded Links •  Due 140 character tweet size limit people are increasingly integrating hyperlinks into tweets (Articles, blogs, Images, video)" •  Steps: " –  Extraction and resolution of links" –  Provide hyperlink to articles, blogs" –  Check semantic relevance for images and videos" •  Based on title and description " 19
  • 20. External Context for Understanding Event •  Wikipedia articles" •  Related news" 20
  • 22. Sentiment Analysis •  using statistical and machine" learning techniques 22
  • 23. Entity-Relationship Graph •  using semantically annotated Dbpedia" entities mentioned in the tweets " 23
  • 24. Tweet Traffic Analysis •  Event popularity over a period of time" 24
  • 25. Twitris:   Functional     Overview 25
  • 26. Twitris: Demo, Quick Show •  http://twitris.knoesis.org/ 26
  • 28. Continuous Semantics Domain models to enhance understanding of the content" 28
  • 29. Coordination •  Coordinating needs and resources in disaster situation" –  Analyze SMS and Web reports from disaster location" –  Use domain models for efficient and timely coordination" Image: http://bit.ly/hcp4PG 29
  • 30. Twitris Team Meena Nagarajan Amit Sheth Hemant Purohit Ashutosh Jadhav Lu Chen Pramod Anantharam Pavan Kapanipathi
  • 31. References 1.  Twitris: Twitter through space, time and theme. http://twitris.knoesis.org" 2.  Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Jadhav, A., Mutharaju, R.: Spatio-temporal- thematic analysis of citizen-sensor data - challenges and experiences. In: Web Information Systems Engineering. (2009)" 3.  Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced Browsing, Semantic Web Challenge 2009, 8th International Semantic Web Conference, Oct. 25-29 2009, Washington, DC, USA" 4.  A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009" 5.  A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009." 6.  Thomas, C., Mehra, P., Brooks, R., Sheth, A.P.: Growing fields of interest – using an expand and reduce strategy for domain model extraction. In: Web Intelligence. (2008) 496–502" 7.  Mendes PN, Passant A, Kapanipathi P, Sheth AP, 'Linked Open Social Signals,' WI2010 IEEE/WIC/ ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010" 8.  Meenakshi Nagarajan, Hemant Purohit, Amit Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices. 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010" 31 * All the trademarks belong to their respective owners
  • 32.   Thanks! Questions? 32