Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Twitris - Web Information System 2011 Course
1. Twitris – System for Understanding
Perceptions From Social Data
http://twitris.knoesis.org/
Ohio Center of Excellence in Knowledge Enabled
Computing (Kno.e.sis)
Wright State University, Dayton, OH
1
2. Twitris - Motivation
1. Information Overload"
• WHAT to be aware of"
• Multiple Storylines about same event!!"
Image: http://bit.ly/etFezl 2
3. Twitris - Motivation
2. Evolution of Citizen Observation"
• with location, time and occurrence of other
events"
3
4. Twitris - Motivation
3. Big picture of the event"
– How to find out "
• Location and time based interesting facts for an
event from Twitter"
• Event related information from other sources
(images, videos, news and Wikipedia articles)"
"
4
5. Twitris: Twitter + Tetris
• Twitris lets you browse citizen reports using social
perceptions as the fulcrum"
– What is being said about an event (theme)"
– Where (spatial)"
– When (temporal)"
• Contextual information from web resources like news,
Wikipedia articles, Flickr, TwitPic and Youtube"
• Study diversity and change in perceptions"
5
7. Data Collection and Preprocessing:
Semi-automated Tweet Crawler
Extract topically relevant tweets using Twitter search
API and search keywords"
– Because tweets are not pre-categorized!"
Strategy: Semi-automated Multithread Continuous
" " Tweet Crawler"
"
l Start with manually selected keywords (seed)"
l Crawl using keywords, hashtags"
l Periodically update keywords used for crawl "
(to capture evolution of the topic)"
l Continue crawl" 7
8. Data Collection and Preprocessing:
Metadata Extraction
• Tweet published date-time, author, location"
• Location from where tweet is originated"
− From the tweet"
− From authorʼs profile"
• Location: Dayton, OH (Google geocoder service)"
• Location: “best place in the world” (fail!)"
• Location Geocode lookup"
• Cache (location, latitude, longitude) for speedup"
"
8
9. Key Phrase Extraction:
1. Spatio-Temporal Clustering
• Objective: from volume of tweets to event descriptive key
phrases, preserving spatio-temporal-thematic aspects of
social perceptions!
"
1. Spatio-temporal clustering"
"
– Group observations based on location and time"
"
– Global events (Iran Election Protest, Japan
Earthquake)"
• clusters by country and day"
"
– Local events (Heathcare reform debate, Austin
Plane crash)"
• clusters by state and day"
9
11. Key Phrase Extraction:
"
2. N-gram generation
"
"
"
"
"
"
“President Obama in trying to regain control of
the health-care debate will likely shift his pitch in
September”"
"1-grams: President, Obama, in, trying, to, regain, ..."
"2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”...
"3-grams: “President Obama in”, “Obama in trying”; “in trying to”..."
11
12. Key Phrase Extraction:
3. n-gram Weight Calculation
A n-gramʼs weight is calculated by"
"
1. Thematic Importance"
– redundancy: statistically discriminatory in nature"
– variability: contextually important"
2. Spatial Importance (local vs. global popularity)"
3. Temporal Importance (always popular vs.
currently trending)"
"
12
13. Key Phrase Extraction:
3.1.A Thematic Importance of a n-gram
A. Exploiting Redundancy"
1. TF-IDF of n-gram (Lucene Index)"
2. Amplify by fraction of nouns in the n-gram (Stanford
Natural Language Parser)"
3. Amplify by fraction of non-stop words (ʻgoing to tryʼ)"
4. Pick higher order n-gram (for overlapping segments and
same TF-IDF)"
5. Select top 5 n-grams for further analysis"
14. Key Phrase Extraction:
3.1.B Thematic Importance of a n-gram
B. Exploiting Variability"
– Contextually relevant words boost statistical
importance"
• Focus word (fw) : “n-gram”"
"
• Associated words (awi) : top 5 co-occurring words in
spatio-temporal set of tweets"
• Association strength: Point-wise Mutual Information"
15. Key Phrase Extraction:
3.2 Thematic-Temporal Importance
• Temporal Importance of the n-gram"
• always popular vs. currently trending"
• Certain descriptors always dominate observations"
– Obama, President in the US presidential election"
"
• To allow less popular, interesting descriptors to surface, we
discount thematic score proportional to recent popularity"
• Spatio-temporal-thematic score of a descriptor"
"= thematic score - spatio-temporal discounts"
15
16. Key Phrase Extraction:
3.3 Thematic-Temporal-Spatial Importance
• Descriptors that occur all over the world not as
interesting as those local to a region "
– (local vs. global popularity)"
• Discount thematic-temporal score proportional to number
of spatial sets (not local) that mention the descriptor"
• Final Spatio-Temporal-Thematic (STT) weight of a "
n-gram is"
16
17. Key Phrase Extraction: Results
TFIDF vs. Spatio-
Temporal-Thematic
(STT) Scores of
Descriptors"
17
18. Key Phrase Extraction: Example
• Objective: from volume of tweets to event descriptive key phrases,
preserving spatio-temporal-thematic aspects of social perceptions
18
19. Analysis of Embedded Links
• Due 140 character tweet size limit people are
increasingly integrating hyperlinks into tweets (Articles,
blogs, Images, video)"
• Steps: "
– Extraction and resolution of links"
– Provide hyperlink to articles, blogs"
– Check semantic relevance for images and videos"
• Based on title and description "
19
20. External Context for
Understanding Event
• Wikipedia articles"
• Related news"
20
29. Coordination
• Coordinating needs and resources in disaster
situation"
– Analyze SMS and Web reports from disaster location"
– Use domain models for efficient and timely coordination"
Image: http://bit.ly/hcp4PG
29
30. Twitris Team
Meena Nagarajan
Amit Sheth Hemant Purohit
Ashutosh Jadhav
Lu Chen
Pramod Anantharam
Pavan Kapanipathi
31. References
1. Twitris: Twitter through space, time and theme. http://twitris.knoesis.org"
2. Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Jadhav, A., Mutharaju, R.: Spatio-temporal-
thematic analysis of citizen-sensor data - challenges and experiences. In: Web
Information Systems Engineering. (2009)"
3. Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit
P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced
Browsing, Semantic Web Challenge 2009, 8th International Semantic Web Conference, Oct. 25-29
2009, Washington, DC, USA"
4. A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path
towards event monitoring and situational awareness, February 17, 2009"
5. A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet
Computing, July/August 2009."
6. Thomas, C., Mehra, P., Brooks, R., Sheth, A.P.: Growing fields of interest – using an expand and
reduce strategy for domain model extraction. In: Web Intelligence. (2008) 496–502"
7. Mendes PN, Passant A, Kapanipathi P, Sheth AP, 'Linked Open Social Signals,' WI2010 IEEE/WIC/
ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010"
8. Meenakshi Nagarajan, Hemant Purohit, Amit Sheth. A Qualitative Examination of Topical Tweet and
Retweet Practices. 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010"
31
* All the trademarks belong to their respective owners