SlideShare una empresa de Scribd logo
1 de 251
Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web ApplicationsTutorial at WWW2011, Hyderabad, IndiaMarch 28, 2011 1
Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 2 Outline
Acknowledgements 3
Selvam Velmurugan  (Kiirti, eMoksha NGOs) Meena Nagarajan (Content Analysis) Hemant Purohit (People & Network analysis) AmitSheth (Semantic Web) Ashutosh Jadhav (Event Analysis) Lu Chen (Sentiment Analysis) Pramod Anantharam (Social & Sensor web) Pavan Kapanipathi (Real Time Web)
Preliminaries Tutorial description: http://www2011india.com/tutorialstr27.html and  http://knoesis.org/library/resource.php?id=1030 Lots of breadth: many examples, some depth: few algorithms, mainly to convey insights Twitter > Myspace/Facebook > SMS Each has different reach/focus/importance Given the time, only parts will be covered today! Citations, further reading at bottom and at the end Images belong to their copyright holders. Copyright info. for images, where available are at the end. 5
Aim What are research opportunities and technical challenges in gaining insights and use of social media content (esp. citizen sensing)? Provide a structure to a vast array of issues Breath, not depth 6
Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 7 Outline
Citizen Sensing Common person (citizens of Internet) is able to use Web2.0 and social networks  The human centric activity** of observing, reporting, disseminating information (facts, opinions, views) via text, audio, video and built in device sensor (and smart devices) 	** direct/indirect, collective/individual  Human-in-the-loop (participatory) sensing       + Web 2.0  + Mobile computing = Emergence of      Citizen-Sensor networks Image: http://bit.ly/hmZe428  A. Sheth, 'Citizen Sensing, Social Signals, and Enriching Human Experience', IEEE Internet Computing, July/August 2009, pp. 80-85.
Understanding meaningful citizen sensor observations Social Signal Processing: Aggregation, Enhancement, Analysis, Visualization, and Interpretation Citizen-Sensor network: Immense potential to disseminate social signals quickly and in real-time 9 Social Signals  A. Sheth, 'Citizen Sensing, Social Signals, and Enriching Human Experience', IEEE Internet Computing, July/August 2009, pp. 80-85. Image:http://bit.ly/gWHSjD
[object Object]
  1+B with internet connected mobile devices (2010)
Smartphones> Notebooks + Netbooks (2010E)
  500K+ mobile phone applications
  74% of mobile phone users (2.4B) worldwide used SMS (2007)Mobile device might qualify as humankind's primary tool Redefines the way we engage with people, information, etc. Enablers: Mobile Devices & Ubiquitous Connectivity Mobile is Global Ubiquity, 24x7 Built in sensors environmental,   biometric/biomedical,... 10
Enablers: Web 2.0 & Social Media 500M+ Facebook Users 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Large variety of social media and traditional media interact, creating potent mixture  11 Types of UGC:  ,[object Object]
Facebook (multimedia)
  YouTube (videos)
  Flicker (images)
  Blogs (text)
  Ping(Social network for music) Image: http://bit.ly/euLETT
Citizen Sensing Overview, Social Signals, Enablers Role of Social Media (important classes of applications) Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 12 Outline
Citizen Sensors in Action Mumbai Terror Attack Iran Election 2009 Haiti Earthquake 2010 US Healthcare Debate 2009 13 Image: http://huff.to/hp0OhA
Revolution 2.0  Political/Social Activism Ghonim, who has been a figurehead for the movement against the Egyptian government, told Blitzer “If you want to liberate a government, give them the internet.” ,[object Object],    Ghonim replied succinctly “Ask Facebook.” http://cnn.com/video/?/video/world/2011/02/13/nr.social.media.revolution.cnn http://cnn.com/video/?/video/tech/2011/02/11/barnett.egypt.social.media.cnn Egyptian anti-government demonstrator sleeps on the pavement under spray paint that reads 'Al-Jazeera' and 'Facebook' at Cairo's Tahrir square on February 7, 2011.  http://www.cbsnews.com/stories/2011/02/15/eveningnews/main20032118.shtml 14
Citizen Journalism 15 Twitter Journalism Images: http://bit.ly/9GVfPQ,  http://bit.ly/hmrTYV
News is increasingly Social Social News Social Media and Global Media are inter-twined. 16
Business Intelligence:  Trend Spotting, Forecasting, Brand Tracking, Targeted Advertising Sysomos(http://www.sysomos.com/)     - Business intelligence by engaging, measuring and understanding                activities in Social Media Trendspotting(http://trendspotting.com)      - Detecting, analyzingandevaluating trends for business. Simplify(http://simplify360.com/)     - A collaborativeplatform to monitor, measureandengage customersusing Social Media. Shoutlet(http://www.shoutlet.com/)     - Managing social media marketing communication using a single platform.   Reputation.com(http://www.reputationdefender.com/)     - Preserves privacyanddefendsreputationbyprotectingattacks onpersonalinformation. Image: http://bit.ly/eAebBb 17
Social Development  (Education, Health, eGov) LiveMocha (http://www.livemocha.com/) Online Language learning tool with social engagement          	- bridging the gap!! Soliya(http://www.soliya.net/) Dialogue between students from diverse backgrounds  across the globe using latest multimedia technologies ProjectEinstein  (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal program connecting youths in refugee camps to the world PatientsLikeMe  (http://mashable.com/2010/07/13/social-media-health-trends/)           -   Facilitates sharing of health profiles, finding patients with similar                ailments, and learn from discussions. TrialX(http://trialx.com/)  - Finding clinical trials of new treatments and connecting with clinical trial investigators. 18 Image: http://bit.ly/ayyjlU
Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 19 Outline
Collaboration We “simply do not have enough genes to program the brain fully in advance,” we must work together, extending and supporting our own intelligence with “social prosthetic” systems that make up for our missing cognitive and emotional capacities: “Evolution has allowed our brains to be configured during development so that we are ‘plug compatible’ with other humans, so that others can help us extend ourselves.” - Harvard "Group Brain Project" 20
Beginnings Open Source  Linux, Apache Social Networks FaceBook, Twitter, MySpace Crowd Sourcing Wikipedia, Kiva, Ushahidi, Kiirti, SwiftRiver, Sahana Collaborative Governance Peer-to-Patent, e-Demogracia 21
Popular Initiatives FaceBook + Twitter Iran post-election protests Tunisia and Egypt uprisings   Ushahidi Kenyan post-election violence India, Lebanon, Afghanistan, and Sudan elections Haiti Earthquake Pakistan Floods Kiirti BBMP election monitoring Bangalore AutoWatch  22
FixOurCity - Chennai   Built on top of FixMyCity open-source codebase   Stage I  Report by Area/Ward and Street  Integration with Google Map Displays Ward member name/contact details Select category of issue, description and severity Confirmation through email to avoid misuse   Stage II/III  Normalize incoming reports to official wards and categories Integration with Corporation website to allow auto-forwarding and updating of reports 23
Ushahidi Information Collection: SMS (FrontlineSMS, Clickatell),     Email, Web   Visualization/Interactive Mapping: Timeline, Category,     Geo-spatial   Alerts: Geo-spatial   Admin: User Management, Report Moderation / Creation,     Site Statistics 24
SwiftRiver Filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds. Offers organizations an easy way to apply semantic analysis and verification algorithms to different sources of information.  Speed up the process of managing real-time data streams (email, web, sms, twitter) Add elusive context (location, historical data) and history (reputation of sources) to online research Offer a dashboard for monitoring multiple channels of information Offer advanced aggregation and analytic tools on or offline Give the user control over advance curation tools and filter 25
SwiftRiver Architecture - I 26
SwiftRiver Architecture - II 27
Free and Open Source Disaster Management system. A web based collaboration tool that addresses the common coordination problems during a disaster between Government, the civil society (NGOs) and the victims themselves. Sahana
Mapping - Situation Awareness & Geospatial Analysis. Messaging - Sends & Receives Alerts via Email & SMS. Document Library - A library of digital resources, such as Photos & Office documents. Missing Persons Registry: Report and Search for Missing Persons. Disaster Victim Identification Requests Management:  Tracks requests for aid and matches them against donors who have pledged aid. Shelter Registry - Tracks the location, distribution, capacity and breakdown of victims in Shelters Hospital Management System - Hospitals can share information on resources & needs. Organization Registry - "Who is doing What & Where". Allows relief agencies to coordinate their activities. Ticketing - Master Message Log to process incoming reports & requests. Delphi Decision Maker - Supports the decision making of large groups of Experts Sahana
Peer to Patent Peer To Patent opens the patent examination process to public participation for the first time. It is an online system that aims to improve the quality of issued patents by enabling the public to supply the USPTO with information relevant to assessing the claims of pending patent applications. 30
http://www.peertopatent.org/video/p2p640/VideoPlayer.html 31 Peer to Patent - Video
Kiirti Allows you to set up your own instance of the Ushahidi Platform without having to install it on your own web server.  Provides pre-integrated Voice and SMS reporting capabilities within India. 32
33 Kiirti – Home Page
34 Kiirti – User Interaction Flow
Kiirti - Flywheel of Engagement 35
Future Possibilities   Online Dispute Resolution 30M+ pending cases in India's courts   Public Policy Reviews Crisis Management Effective Local Governance 36
Challenges Challenges Information overload Processing and de-duping messages Accessibility (e.x. network congestion, access points, …) Incorrect or partial data Trustworthiness of source (e.x. influence, reputation, …) Metadata extraction (e.x. geo data, name-entity, sentiment/opinion, …) Collaboration Policy discussions Structure or hierarchy
Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 38 Outline
Dimensions of Systematic Study of Social Media Spatio - Temporal -Thematic+ People - Content - Network 39
Social Information Processing "Who says what, to whom, 	why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties) People: poster identities, the active effort of accomplishing interaction Content : studying the content of communication  40
Studying Online Human Social Dynamics How does the (semantics or style of) content fit into the observations made about the network?   Often, the three-dimensional dynamic of people, content and link structure is what shapes the social dynamic.  41 Example: how does the topic of discussion, emotional charge of a  conversation, the presence of an expert and connections between participants; together explain information propagation in a social network? Image: http://bit.ly/dFzjU2
Why People-Content-Network  + Spatial-Temporal-Thematic metadata?(Example of Understanding Crisis Data) 42
Metadata/Annotations Metadata: an organized way to study Types  Creation/extraction and storage Use  43 Image: http://www.biowisdom.com/tag/metadata/
Metadata Infrastructure:  Example for Tweet Annotation (mapped out tweet) 44 Image: http://rww.to/9zyoQa
45 http://www.readwriteweb.com/archives what_twitter_annotations_mean.php
46
` People Metadata:  Variety of  Self-expression Modes  on Multiple Social Media Platforms   Explicit information from user profiles    User Names, Pictures, Videos, Links, Demographic Information, Group memberships...        Implicit information from user attention metadata Page views, Facebook 'Likes', Comments; Twitter 'Follows', Retweets, Replies..  47
People Metadata: Various Types Identification Interests Activity Network 48
People Metadata: Continued 49
People Metadata: Continued Web Presence: -  User affiliations -  KLOUT Score – influence measure  (www.klout.com) 50
Content Metadata Content Independent metadata 	• date, location, author etc. 51 2. Content Dependent metadata Direct content-based metadata  i. Explicit/Mentioned Content metadata • named entities in content ii. Implicit/Inferred Content Metadata • related named entities from knowledge sources b. 	Indirect content-based metadata (External metadata)  • context inferred from URLs in content (images, links to articles, FourSquarecheckins etc.) V. Kashyap and A. Sheth, 'Semantic Heterogeneity in Global Information Systems: The Role of Metadata,  Context and Ontologies,’ in Cooperative Information Systems: Current Trends and Directions, M. Papazoglou and G. Schlageter (Eds.), Academic Press, 1998, pp. 139-178.
Content Metadata:  Content Independent  For Tweets Published date and time Location (where tweet was generated from) Tweet posting method (smart-phone, twitter.com, clients for twitter) Author information 52 ,[object Object],Publish date and time Location (where SMS is generated) Receiver (NGO, Government organization) carrier information (available on request)
Content Metadata:  Content Dependent  (Tweet) 53 Direct Content-based Metadata Indirect content-based metadata (External metadata)
Content Metadata:  Content Dependent  (SMS) Direct Content-based Metadata 54
Network Metadata Connections/Relationships matter! (foundation for the network) 55
Metadata: Creation, Extraction and Storage 56
Metadata Creation & Extraction Extracted Metadata  Directly visible information from the user profile, tweet content & community structure Created Metadata  After processing information in the user profile, content and/or network structure 57
An Example Length: 109 charactersGeneral topic: Egypt protest  This poor {sentiment_expression: {target: “Lara Logan”, polarity: “negative”}} woman! RT @THRCBS News‘ {entity:{type=“News Agency”}} Lara Logan {entity:{type=“Person”}} Released FromHospital {entity:{type=“Hospital”}} After Egypt {entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0 {external_URL} 58
Why Semantic Web is a Standard  for Social Metadata? Rich Snippet, open graph: RDFa - Semantic Web based social data standards Relationships/connections play central role (not just hyperlinks as in Web data)– so relationship as first class object is important Semantic Web technologies and standards provide better techniques to capture and represent metadata, relationships 59
Semantic Web in One Slide Representing Semantic Web Data RDF: relationships as first class object <subject, predicate,object> Representing Knowledge  and Agreements nomenclature, taxonomy, folksonomy, ontology: OWL Annotation: RDFa, Xlink, model reference Web of Data: Linked Open Data  Querying: SPARQL  Rules: SWRL, RIF  60
How to Save and Use Metadata? Store metadata as data and use standard database technique   Use filtering and clustering, summarization, statistics - implicit semantics 61 ,[object Object]
 Semantics = meaning
 Richer representation, support for relationships, context
 Supports use of background knowledge
 Better integration, powerful analysis
 Use of RDF data stores/LOD 
Semantics- the implicit, the formal and the powerful
Social metadata on the Web [H. Dacquin],[object Object]
Metadata Extraction from Informal Text 63 Meena Nagarajan,‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
64 Characteristics of Text on Social Media
The Formality of Text 65
Content Analysis: Typical Sub-tasks ,[object Object]
What opinions are people conveying via the content?
Author Profiling
What can we infer about the author from the content he posts?
Context (external to content) extraction
URL extraction, analyzing external contentRecognize key entities mentioned in content  Information Extraction (entity recognition, anaphora resolution, entity classification..) Discovery of Semantic Associations between entities  Topic Classification, Aboutness of content  What is the content about?  Intention Analysis  Why did they share this content?  66
Research Efforts, Contributions           in this space.. Examining usefulness of multiple context cues for text mining algorithms  Compensating for informal, highly variable language, lack of context Using context cues: Document corpus, syntactic, structural cues, social medium, external domain knowledge…    In this talk, highlighting sample metadata creation tasks:  NER Key Phrase Extraction Intention Sentiment/Opinion Mining 67
Named Entity Recognition I loved <movie> the hangover </movie>! Key Phrase Extraction 68 Part 1: NER, Key Phrase Extraction
Multiple Context Cues Utilized for NER in Blogs and MySpace Forums 69 Meena Nagarajan,‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
70 Multiple Context Cues Utilized for Keyphrase Extraction from Twitter, Facebook and MySpace Meena Nagarajan,‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
Focus, Impact We focus on techniques that exploit content and context aspects on social media platforms Our methods highlight a combination of top-down, bottom-up analysis for informal text Statistical NLP, ML algorithms over large corpora (bottom-up) Models and rich knowledge bases in a domain(top-down) 71
NAMED ENTITY RECOGNITION 72
Named Entity Recognition “I loved your music Yesterday!”  Yesterday is an album “It was THEHANGOVER of the year..lasted forever..  The Hangover is not a movie So I went to the movies..badchoice picking “GI Jane”worse now” GI Jane is a movie 73 Task of NER : Identifying and classifying tokens
NER in prior work vs. NER for Informal Text 74
Cultural Named Entities •   NER focus in this work: Cultural Named Entities Artifacts of Culture  Name of a books, music albums, films, video games, etc. Common words in a language The Lord of the Rings, Lips, Crash, Up, Wanted, Today, Twilight, Dark Knight… 75
What makes cultural entity extraction challenging.. Varied senses, several poorly documented Star Trek: movies, TV series, media franchise.. and cuisines !! Changing contexts with recent events The Dark Knight is a movie, it is also a reference to Obamaand the health care policy Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses .. Are Unrealistic expectationswhen building a NER system     NER Relaxing the closed-world sense assumptions 76
77 NER in prior work  vs. NER for Informal Text
A Spot and Disambiguate Paradigm NER is generally a sequential prediction problem NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task (PER, LOC, ORGN entities) 78 ,[object Object]
Starting off with a dictionary or list of entities we want to spot
Spot, then disambiguate in context (natural language, domain knowledge cues)
Binary Classification
Is this mention of “the hangover” in a sentence referring to a movie?CoNLL 2003 -- http://www.cnts.ua.ac.be/conll2003/ner/
79 NER in prior work  vs. NER for Informal Text
Cultural NER - Two Flavors 80
(a) Multiple Senses in the Same Domain 81
Algorithm Preliminaries Problem Definition –  Cultural Entity Identification : Music album, tracks 	e.g. Smile (Lilly Allen), Celebration (Madonna) •	Corpus: MySpace comments –  Context-poor utterances 	e.g. “Happy 25th Lilly, Alfieis funny” 82 •	Goal: Semantic Annotation of music named entities (w.r.t MusicBrainz) MusicBrainz Schema
Using a Knowledge Resource for NER is not straight-forward.. 83
Approach Overview  Which ‘Merry Christmas’?; ‘So Good’is also a song! Scoped Relationship graphs       –   Using context cues from the content, webpage title, url…  e.g. new Merry Christmas tune       –   Reduce potential entity spot size                       e.g. new albums/songs        •   Generate candidate entities 	•   Spot and Disambiguate 84
Sample Real-world Constraints Which ‘Merry Christmas’?; ‘So Good’is also a song! Career Restrictions - “release your third album already..” Recent Album restrictions - “I loved your new album..” Artist age restrictions -”happy 25thrihanna, loved alfie btw..” etc. 85
86
Scoping via Real-world Restrictions 87
Scoped Entity Lists User comments are on MySpace artist pages 	–  Contextual Restriction: Artist name 	–   Assumption: no other artist/work mention Naive spotter has advantage of spotting all possible mentions (modulo spelling errors) 	–   Generates several false positives 			“this is bad news, ill miss you MJ” 88
But there are also non-music mentions Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem partially Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on! 89
Using Language Features to eliminate incorrect mentions.. Syntactic features POS Tags, Typed dependencies.. Word-level features Capitalization, Quotes Domain-level features 90
Supervised Learners 91
Hand-labeling - Fairly Subjective 1800+ spots in MySpace user comments from artist pages  Manual annotations for a post: “Keep your <track>SMILE<track>on!” valid album/track named entity (good spot)invalid named entity (bad spot)hard-to tell (inconclusive) 4-way annotator agreements – shows that agreeing on the accuracy of a spot is hard to do even for domain experts 	–  Madonna 90% agreement 	–  Rihanna 84% agreement 	–  Lily Allen 53% agreement (many named entities of ambiguous nature and usage) 92
Combining a Dictionary Spotter + NLP Analytics 93 Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, AmitSheth,‘Context and Domain Knowledge Enhanced Entity Spotting in Informal Text,’ The 8th International Semantic Web Conference, 2009: 260-276
Lessons Learned - NER on Social Media Text using a Knowledge Base Intelligent pruning of a knowledge base goes a long way in improving precision Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% allows the more time-intensive NLP analytics to run on less than the full set of input data 94
95 Music NER application : BBC SoundIndex (IBM Almaden)Pulse of the Online Music Populace  Daniel Gruhl, MeenakshiNagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’ special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010   Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
The Vision http://www.almaden.ibm.com/cs/projects/iis/sound/ 96
97
Several Insights 98 Trending popularity of artists Trending topics in artist pages Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source? Ignoring Spam can change ordering  of popular artists
Predictive Power of Data Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. Challenging traditional polling methods! 99
KEY PHRASE EXTRACTION 100
Key Phrase Extraction - Example Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day 101
Key Phrase Extraction from Social Media Text Different from Information Extraction Key phrase extraction does not concern itself with classification into a type Extracting vs. Assigning Key Phrases 	 Focus: Key Phrase Extraction Prior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book.. Focus here: summarize multiple documents (UGC) around same event/topic of interest 102
Key Phrase Extraction on Social Media Content has some differences       1. Need to preserve/isolate the social behind the social data in         summarizing key phrases   What is said in Egypt vs. the USA should be viewed in isolation 2. Need to Accounting for redundancy, variability, off-topic content “Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.” 103
Where is the Social and Cultural Logic in UGC ? Thematic components similar messages convey similar ideas Space, time metadata role of community and geography in communication Poster attributes age, gender, socio-economic status reflect similar perceptions ‘Social applies to data as well as metadata’ 104
Features used in social Key Phrase extraction (common to prior efforts) Focus: n-grams, spatio-temporal metadata (social components)  Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc.  Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc. 105
Key Phrase Extraction Overview     “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” 1-grams: President, Obama, in, trying, to, regain, ... 	2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”...                                                              	3-grams: “President Obama in”, “Obama in trying”; “in trying to”... 106
A descriptor is an n-gram weighted by: Thematic Importance ,[object Object]
Redundancy: statistically discriminatory in nature
Variability: contextually importantSpatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending) ` 107
108 TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently Foreign relations surfaces up M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009: 539-553
Next task : Eliminating Off-topic Content Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR 109 Popular Key phrases “single”, “Jesus” are unrelated to Madonna’s music M. Nagarajan et al., Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
Elimination off-topic content : Example •   “Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonys” •   “CanonHV20.Great little cameras under $1000.” Possible relevant phrases are: ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras', 'canon'] 110
•   Assume one or more seed words (from domain knowledge base) C1 -['camcorder'] •   Extracted Key words / phrasesC2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] •   Gradually expand C1 by adding phrases from C2 that are strongly associated with C1 •   Mutual Information based algorithm [WISE2009] 111 Eliminating off-topic content : Approach Overview
Key Phrases & Aboutness - Evaluations Are the key phrases we extracted topical and good indicators of what the content is about? If it is, it should act as an effective index/search phrase and return relevant content Evaluation Application: Targeted Content Delivery 112
Targeted Content Delivery -Evaluations We took 12K posts from MySpace and Facebook Electronics forums Extracted Baseline phrases using Yahoo Term Extractor Extracted phrases using the Key phrase extraction, elimination algorithm described earlier Generated Targeted Content from Google AdSense Asked users if the delivered content matched the posts 113
Targeted Content for all content vs. extracted key phrases 114
User Studies and Results 115
Social Key Phrase Extraction : Impact, Contributions TFIDF + social contextual cues yield more useful phrases that preserve social perceptions Corpus + seeds from a domain knowledge base eliminate off-topic phrases effectively 116
INTENTION MINING 117
Why do people share? Outside of the psychological incentives, broadly, people share to Seek Information OR Share Information If we understand the intent behind a post, we can build systems that respond to it better Focus of our work: Understand intent to deliver targeted content Use case: Online Content-Targeted Advertisements on Social Media Platforms 118
Circa 2009 -Content-based Ads 119
Today – Content-based Ads on Profiles 120
What is going on here.. ,[object Object]
But Interests on profiles do not translate to purchase intents	–  Interests are often outdated.. 	–  Intents are rarely stated on a profile..  •  Some profile data does seem to work –  Example: New store openings, sales targeted at location information in a profile 121
But Monetizable Intents are Elsewhere, away from their profiles.. 122
Showing clear intents on MySpace posts but no relevant ads.. 123
Targeted Content-based Advertizing  –Non-trivial 	–Non-policed content 		•Brand image, Unfavorable sentiments 	–People are there to network 		•User attention to ads is not guaranteed 	–Informal, casual nature of content 		•People are sharing experiences and events 			–Main message overloaded with off                         topic content I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008  124
Focus: Discuss Methodology, Preliminary Results in…  •   Identifying intents behind user posts on social networks –   Identify Content with monetization potential •   Identifying keywords for advertizing in user-generated content –   Considering interpersonal communication & off-topic chatter 125 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
Investigations User studies 	–  Hard to compare activity based ads to s.o.t.a 	–  Impressions to Clickthroughs 	–  How well are we able to identify monetizable posts 	–  How targeted are ads generated using our             keywords vs. entire user generated content 126 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
Identifying Intents on SM is different from that on the Web.. Scribe Intent not same as Web Search Intent1 People write sentences, not keywords or phrases Presence of a keyword does not imply navigational / transactional intents –  ‘am thinking of getting X’ (transactional) 	–  ‘I like my new X’ (information sharing) 	–  ‘what do you think about X’ (information seeking) Useful here would be to identify: Transactional and Information Seeking intents 1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web queries,”Inf. Process. Manage., vol. 44, no. 3, 2008. 127
Not Focusing on the entity but Action Patterns surrounding the entity “where can I find a chottopspcam” 	–   User post also has an entity, which is a plus but not the main target of intent identification.. Goal is to study How questions are asked and  not topic words that indicate what the question is about 128
Conceptual Overview Bootstrapping to learn IS patterns Take a set of user posts from SNSs Not annotated for presence or absence of any intent 129
Bootstrapping to learn IS patterns Generate a universal set of n- gram patterns; freq > f S = set of all 4-grams; freq > 3 130
Bootstrapping to learn IS patterns Generate set of candidate patterns from seed words  (why,when,where,how,what) Sc= all 4-grams in S that extract seed words 131
Bootstrapping to learn IS patterns User picks 10 seed patterns from Sc Sis= ‘does anyone know how’, ‘where do I find’, ‘someone  tell me where’…	 132
Bootstrapping to learn IS patterns      Gradually expand Sis by adding       Information Seeking patterns from Sc 133
Bootstrapping to learn IS patterns For every pis in Sis generate set of filler patterns 134
Bootstrapping to learn IS patterns ‘.* anyone know how’‘	      does .* know how’    	‘does anyone .* how’                 ‘does anyone know .*’ Look for patterns in Sc Functional compatibility of filler ,[object Object],Empirical support for filler 135
Expanding the Pattern Pool Functional properties / communicative functions of words From a subset of LIWC1 	–   cognitive mechanical (e.g., if, whether, wondering, find)  		•   ‘I am thinking about getting X’  	–   adverbs(e.g., how, somehow, where)  	–(e.g., someone, anybody, whichever) 		•   ‘Someone tell me where can I find X’  1Linguistic Inquiry Word Count, LIWC, http://liwc.net 136
Example - Acquiring New Intent Patterns.. •  ‘does * know how’ –  ‘does someone know how’ •  Functional Compatibility -Impersonal pronouns •  Empirical Support –1/3 –  ‘does somebody know how’ •  Functional Compatibility -Impersonal pronouns •  Empirical Support –0 •  Pattern Retained –  ‘does john know how’ •  Pattern discarded Sc= {‘does anyone know how’, ‘where do I find’, ‘someone tell me where’} •   pis= `does anyone know how’ 137
Finer Details of the Approach are in the paper.. Iterative algorithm, single-word substitutions, functional usage and empirical support conservatively expand the intent-seeking pool of patterns.. Infusing new patterns and seed words Stopping conditions 138 M. Nagarajan et al., Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
Sample Extracted Patterns 139
Identifying Monetizable Posts Information Seeking patterns just described are generated offline Finding the Information seeking intent score of a post –  Extract and compare patterns in posts with extracted information seeking patterns ,[object Object]
Using LIWC ‘Money’ dictionary : 173 words and word forms indicative  of transactions, e.g., trade, deal, buy, sell, worth, price etc.140
Benchmarking with FB Marketplace Training corpus 8000 user posts from MySpace Computers, Electronics, Gadgets forum ,[object Object]
309 unique new patterns, 263 unambiguous•	Testing patterns for recall using ‘To buy’ Facebook Marketplace where all posts are information seeking –	extracted patterns average 81 % recall 141
Next task: Identifying Keywords for Advertizing Identifying keywords in monetizable posts –  Plethora of work in this space Off-topic noise removal is our focus I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 142
Conceptual Overview (also see slides in Key Phrase elimination section)  ,[object Object],	–  C1 -['camcorder'] ,[object Object],	–  C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] ,[object Object],	–  Relatedness determined using information gain 	–  Using the Web as a corpus, domain independent 143
Example: Off-topic Chatter Elimination •  C1 -['camcorder'] •  C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] •  Informative words 	['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras', 'canon'] 144
Evaluations- User Study Keywords from 60 monetizable user posts 	–  Monetizable intent, at least 3 keywords in content 	–  45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students 	–  10 sets of 6 posts each 	–  Each set evaluated by 3 randomly selected users •	Monetizable intents? 	–  All 60 posts voted as unambiguously information seeking in intent 145
Effectiveness of using topical keywords •  Google AdSenseads for user post vs. extracted topical keywords 146
Instructions –User Study 147
Result -2X Relevant Impressions Users picked ads relevant to the post 	–  At least 50% inter-evaluator agreement For the 60 posts –  Total of 144 ad impressions 	–  17% of ads picked as relevant For the topical keywords 	–  Total of 162 ad impressions –  40% of ads picked as relevant 148
Evaluations: Profile Ads vs. Activity Ads •	Are ads generated from activity more interesting than those generated from user profiles? Gather user’s profile information 	–  Interests, hobbies, TV shows.. (non-demographic information) •	Ask them to submit a post (simulating their social media entry) 	–  Looking to buy and why (induce off-topic content) •	Generate ads from profiles, from post (keywords) 149
Result - 8X more interest for non-profile ads.. •	Using profile ads 	–  Total of 56 ad impressions 	–  7% of ads generated interest •	Using authored posts 	–  Total of 56 ad impressions 	–  43% of ads generated interest •	Using topical keywords from authored posts 	–  Total of 59 ad impressions 	–  59% of ads generated interest 150
To note… ,[object Object],	–  Monetization potential in user activity 	–  Improvement for Ad programs in terms of relevant impressions ,[object Object],	–  Verbose content 	–  Status updates, notes, community and event memberships… 	–  One size may not fit all 151
To note… A world between relevant impressions and click throughs 	–  Objectionable content, vocabulary impedance, Ad placement, network behavior 	–  In a pipeline of other community efforts ,[object Object],–  Cannot custom send information to Google AdSense 152
SENTIMENT / OPINION MINING 153
Content Analysis: Sentiment Analysis/Opinion Mining Two main types of information we can learn from user-generated content: fact vs. opinion Much of social media text (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions.    For example," Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14{fact}... looks like they "learned" from #Egypt {opinion}"
Sentiment Analysis: Motivation Why do people oppose health care reform? What customers complain about? Which movie should I see? 155 Image: http://bit.ly/eZtKBF
Sentiment Analysis: Tasks Example:      “How awful that many #Egypt ian artifacts are in danger of being Destroyed. What ZahiHawassmust be thinking#jan25” Classification: Overall sentiment polarity [Pang et al. 2002], [Turney 2002], etc. the overall polarity is positive, neutral or negative (on the document/sentence/word level) For the example: overall polarity is negative Target-specific sentiment polarity [Yi et al. 2003], [Hu et al. 2004], etc.  The polarity toward the given target is positive, neutral or negative  For the Example: polarity is "negative“ for the target "egyptian artifacts“; polarity is "neutral“for target "ZahiHawass" 156
Sentiment Analysis: Tasks Example:      “How awful that many #Egypt ian artifacts are in danger of being Destroyed. What ZahiHawassmust be thinking #jan25” Identification & Extraction:  opinion[Dave et al. 2003] etc. opinion holder [Bethard et al. 2004] etc. opinion target [Hu et al. 2004] etc. For the example:      opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger”     opinion="must be thinking", opinion holder="the author", target="ZahiHawass" 157
Sentiment Analysis: Classification Supervised[Pang et al. 2002] etc. Labeled training data: e.g., product review, movie review, etc. Features: e.g., term-based, part-of-speech, syntactic relations, etc.  Learning strategies: e.g.,SVMs), Naive Bayes, .. Unsupervised [Turney 2002] etc. lexicon-based approach [Hu et al. 2004], [Ding et al. 2008] etc.  Using a sentiment lexicon of positive/negative sentiment words  Bootstrapping [Thelen et al. 2002] etc. Iteratively trains and evaluates a classifier, starting from an unannotated corpus and a few predefined seed words,  The task of extracting the opinion/holder/target is similar to the traditional IE task. Key distinction- the relations between opinion and opinion target are considered important. 158
159 Sentiment Analysis: Identification & Extraction ,[object Object]
Proximity[Hu et al. 2004] etc.
extract the nearby adjectives modifying the target topic as opinion clues
Syntactic dependency [Popescu et al. 2005] etc.
employed language parser to compute the syntactic dependencies to extract the opinion clues with a given target topic
Co-occurrence[Choi et al. 2009]etc.
heuristics: the more frequently a candidate opinion target co-occurs with any opinion clues, the more likely it is the real opinion target
Prepared patterns/rules [Kobayashi et al. 2004] etc.
using a set of predefined extraction patterns/rules,[object Object]
  Highlight the potential of text streams as a substitute and supplement for traditional polling.Connect public opinion measured from polls with sentiment measured from tweets.  Lexicon-based approachfor sentiment analysis of tweets Within topic tweets, count messages containing positive and negative words defined by the sentiment lexicon 160
Sentiment Analysis: Predicting the Future With Social Media [Asur et al. 2010] Use tweets to forecast box-office revenues for movies. Traina language model classifier for sentiment classification of tweets. Findings: The prediction model using the rate at which tweets are created about a movie outperforms the market-based methods. The sentiments present in tweets can be used to improve the prediction. 161
Sentiment Analysis: Target-specific Opinion Identification & Classification of Tweets-Unsupervised Approach [kno.e.sis ongoing work] Simple lexicon-based method doesn't work well. Target of “sexy” is “Helena” Target of “terrific” is “reviews” “free” is not opinionated in  movie domain.  Target of “loving” is “telling” “well” in “as well” is not  opinionated Observations: The opinion clues may not be toward the given target (1,2,3,6) The opinion clues are domain and context dependent (5,7) Single words are not enough (4,7,8) 162
Domain and context-aware sentiment lexicon generation (here take the movie domain as example) General subjective lexicon Commonly used subjective lexicon + polar slangs learned from dictionary Select candidate opinion clues from the domain-specific corpus based on the general lexicon word + surrounding context E.g., {“free”, “free movie”, “free movie streaming”... }, {“must”, “must see”, “a must see”, “must see movie”…} , {“well”, “as well”, “well done”… } Identify the opinion clues and their polarity Utilize information from multiple sources, including the corpus, domain knowledge (e.g., freebase, imdb), general lexicon, etc. Bootstrapping + statistical model  E.g., <“must”, “must see”, positive>; <“well”, “well done”, positive>    Sentiment Analysis: Target-specific Opinion Identification & Classification of Tweets-Unsupervised Approach [kno.e.sis ongoing work] 163
164 Sentiment Analysis: Target-specific Opinion Identification & Classification of Tweets-Unsupervised Approach [kno.e.sis ongoing work] ,[object Object]
Predefined rules
When generating the domain and context-aware sentiment lexicon, use a set of predefined rules to select toward-target candidate opinion clues
 Syntactic dependencies
When using the generated lexicon to extract target-specific opinion, for each pair of <target, opinion clues> in one tweet, determine whether the opinion clues is toward the target based on their syntactic dependency.
E.g., Lovedthe King's Speech. Funny, moving...Colin Firth is so amazing. I know, you already knew that. (“amazing” won’t be extracted since nsubj(amazing, Firth) )
We also use predefined rules and proximity for complement ,[object Object]
 from Content Analysis
 from Network Analysis
 Merge of two approaches
 People Analysis showing use of Merger approach (Content+Network) and derived metadata
 Finding Influential Users
 Finding User Types & Affiliation
 Measuring Social Engagement 165
People Analysis: Extracting People Metadata 166
People Analysis: Using Content to Derive People Metadata Personality Signals Extrovert, agreeable, open etc Blogs, Style of Writing Loose and periodic sentence, connotation etc. Psychometric analysis of content Knowledge, abilities, attitude etc. Sample study: Gendered writing styles online [Ellison et al. 2006, Nagarajan et al. 2009, ICWSM  etc.] Self-expression tends towards attempting homophily in online dating profiles, given the tendency to 'imitate and impress' in courtship 167 Image: http://bit.ly/JZ6eF Read: ‘How’ people write @Kno.e.sis
People Analysis: Using Network to Derive People Metadata Interesting questions to ask: Who are the most popular people* in the network Who are the most influential people in the network What are the types of people in the network Who are the most active people in the communities Who are the bridges between communities in the network,  etc. (*People may also refer to an organization) 168 Metadata from Network: ,[object Object],   e.g., An Influential node in the network will be function of time and interest of his audience.
People Analysis: Influence Adding Flavor of Context Analysis ,[object Object]
For individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity.  [Romero et al. 2010]
Interest Similarity
Homophily causing Reciprocity on Twitter [TwitterRank, Weng et al. 2010]
Klout Score - True Reach, Amplification [http://klout.com]By Link Analysis Algorithms Hits [Kleinberg 1999] & variants   PageRank[Brin et al. 1998] & variants etc.. Links not sufficient! Audience size doesn’t prove influence on twitter [Million Follower Fallacy,Cha et al. 2010] 169 Image: http://bit.ly/9pfTO4
People Analysis: User types & Affiliation Blogger, Scientist, Journalist, Artist, Trustee, Company X in  Domain Y..     - Multiple types and affiliations! User interest mining Key Phrase Extraction followed by semantic association on user bio, tweets, lists, favorite posts Twitter Study [Banerjee et al. 2009] ,[object Object]
Web Presence: Use of Web & Knowledge bases (Wikipedia, Blogs) to build context for user types
Entity Spotting & Extraction, followed by Semantic Association and Similarity with user-type context170 Image: kahunainstitute.com *Read Semantics driven Social Media Analysis@ Kno.e.sis
People Analysis: Social Engagement 171 Imagine a crisis scenario such as Haiti (2010) or Japan (2011) Earthquake ,[object Object]
 How effectively the community of people talking about this event online,   can grow to reach potential donors and people in need of resources   (food, water, first aids etc.)?
 What are the best possible ways to communicate between resource   providers and people in need of resources?
 How teams can coordinate well between volunteers at a victim site, to   managers in organizational structure, sitting in offices?,[object Object]
NETWORK ANALYSIS - Deriving Network Metadata  Interesting questions  Network Analysis – Methods  Models  Metrics  Network Analysis – Algorithms  Graph Partitioning, Traversal  Community Discovery, Evolution  Social Network Analysis  Diffusion  Homophily  Study of 3-D Dynamics (People-Content-Network) - Analysis & Visualization tools 173
Network Analysis  “To Discover How A, Who is in Touch with B and C,   Is Affected by the Relation Between B & C”     				-John Barnes Interesting questions to ask: How communitiesform around topics- growth & evolution What are the effectsof presence of influential participants in the communities What are the effectsof content nature (or sentiment, opinions) flowing in network on the community life What is the community structure: degree of separation and sub-communities 174 Foundation of network:  ,[object Object]
Connections/RelationshipsImage: http://www.onasurveys.com/
Network Analysis: Methods Network Modeling Approaches  Random graph model (Erdos-Renyi model) start with n vertices and add edges between them at random Small-world model most nodes are not neighbors of one another, but they can be reached from every other by a small number of hops or steps (Small World Phenomenon) Scale-free model  degree distribution follows a power law, i.e., frequency of degree varies as a power of  its size 175 Image: http://www.kudosdynamics.com/ Important Literature:      [Wasserman et al. 1992, Watts et al. 1998, Albert et al. 2002, Newman et al. 2006, Marin et al. 2010, Easley et al. 2010]
Network Analysis: Methods 176 Network Structure metrics Centrality, Connected Component, Avg. Degree, Clustering Coefficient, Avg. Path Length, Bridge, Cohesion, Prestige, Reciprocity etc. Social Network Analysis methods ,[object Object]
 Clusters (Cliques and extensions, Communities)Image: http://www.kudosdynamics.com/ Important Literature:      [Wasserman et al. 1992, Watts et al. 1998, Albert et al. 2002, Newman et al. 2006, Marin et al. 2010, Easley et al. 2010]
Network Analysis: Algorithms  Graph Partitioning & Traversal Goal: Best time-complexity & reachability Generally follows Greedy paths     e.g., K-way multilevel Partitioning, Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS, MST Community Discovery, growth, evolution Based on relationship types (e.g., signed network), geography/location based, interest based etc. Generally follow cluster analysis     e.g., Hierarchical clustering algorithms – Top-down, bottom-up      Further Reading: Modularity Maximization [Newman et al. 2006] Algorithms comparison survey [Balakrishnan et al. 2006] Online Communities [Preece 2001] 177 "We dream in Graph and We analyze in Matrix” - Barry Wellman, INSNA 
Social Network Analysis: Diffusion & Homophily Social Network Analysis (Interested in information flow)  Can we predict user actions?   Understanding dynamics is challenging! Why to study Diffusion Maximizing Spread (Opinion, Innovation, Recommendation) Outbreak Detection (e.g., disease) Diffusion Behavior Power Law distribution[Leskovec et al. 2007] Factors impacting Diffusion User Homophily – similar behavior tendency [McPherson et al. 2001] Sampling strategy [Choudhury et al. 2010], content nature[Nagarajan et al. 2010]etc. 178 Image: http://bit.ly/fGkIBK
Study of 3-D Dynamics- People, Content, Network Intra Community Activity and connectivity How well connected are individual nodes (People) What keeps them strongly connected over time (Relationship types - Knowledge of Content)  179 Will the two communities coordinate well during an event- crisis or disaster? 	- Interplay between all three dimensions – P, C, N ,[object Object]
Any bridges to connect to the other community? (People)
Any Similarity in actions with the other community (Can Content help?)Image: http://themelis-cuiper.com
Study of 3-D Dynamics-People, Content, Network Metadata a powerful tool to explore this dynamics* Studies in this direction A Qualitative Examination of Topical Tweet and Retweet Practices [Nagarajan et al. 2010]  How content dictates the network flow User-Community Engagement by Multi-faceted Features: A Case Study on Twitter [Purohit et al. 2011] [TO BE PRESENTED TOMORROW IN SoME'11] What factors impact user engagement in topic discussion 180 *Read People-Content-Network Analysis  @ Kno.esis
Graphs showing sparse (A) and dense (B) RT networks and their corresponding follower graphs for 'call for action' and 'information sharing' type of tweets M. Nagarajan, H. Purohit, and A. Sheth,  ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010 181
Analysis & Visualization Tools Network WorkBench (NWB) Truthy  Graph-tool Orange Pajek Tulip            …. Many tools!! Resource: http://en.wikipedia.org/wiki/Social_network_analysis_software 182 Image:http://truthy.indiana.edu/
Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 183 Outline
Trustworthiness in Social Media Why? ,[object Object]
In Disaster scenarios (e.g. Haiti earthquake, Gulf oil spill)
For Political revolution (e.g. Egypt political crisis)
In Political and Social policies (e.g. health care reforms) What? ,[object Object]
Remove off-topic content
(How?) Detect spam and misleading content.
Assess data quality of on-topic content

Más contenido relacionado

La actualidad más candente

twitter_visual_chi2016
twitter_visual_chi2016twitter_visual_chi2016
twitter_visual_chi2016Cat Yao
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3Dave King
 
Twitter And Social Justice
Twitter And Social JusticeTwitter And Social Justice
Twitter And Social JusticeJodi Sperber
 
The Reader To Leader Framework Motivating Technology Mediated So
The Reader To Leader Framework Motivating Technology Mediated SoThe Reader To Leader Framework Motivating Technology Mediated So
The Reader To Leader Framework Motivating Technology Mediated SoPath of the Blue Eye Project
 
Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...
Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...
Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...Fredrick Ishengoma
 
How information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occurHow information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occurFarida Vis
 
3 ways to engage citizens using social media
3 ways to engage citizens using social media3 ways to engage citizens using social media
3 ways to engage citizens using social mediaGohar Khan
 
Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019 Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019 Chris Marsden
 
Technology, human rights & movement building around the world
Technology, human rights & movement building around the worldTechnology, human rights & movement building around the world
Technology, human rights & movement building around the worldTechSoup Canada
 
A review for the online social networks literature
A review for the online social networks literatureA review for the online social networks literature
A review for the online social networks literatureAlexander Decker
 
Social media in government - presentation to NSW Health
Social media in government - presentation to NSW HealthSocial media in government - presentation to NSW Health
Social media in government - presentation to NSW HealthCraig Thomler
 
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...rahulmonikasharma
 
Future Internet - Webinar UNIFACS Laureate 2015 - With Access Link
Future Internet - Webinar UNIFACS Laureate 2015 - With Access LinkFuture Internet - Webinar UNIFACS Laureate 2015 - With Access Link
Future Internet - Webinar UNIFACS Laureate 2015 - With Access LinkJoberto Martins
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Paul Gilbreath
 
20130318 socialmediaingov-craigthomler-130319003343-phpapp02
20130318 socialmediaingov-craigthomler-130319003343-phpapp0220130318 socialmediaingov-craigthomler-130319003343-phpapp02
20130318 socialmediaingov-craigthomler-130319003343-phpapp02Fayecel Abdelkarim
 
Lesson 1 2 Edited
Lesson 1 2 EditedLesson 1 2 Edited
Lesson 1 2 EditedJuvywen
 

La actualidad más candente (16)

twitter_visual_chi2016
twitter_visual_chi2016twitter_visual_chi2016
twitter_visual_chi2016
 
Digital Trails Dave King 1 5 10 Part 1 D3
Digital Trails   Dave King   1 5 10   Part 1 D3Digital Trails   Dave King   1 5 10   Part 1 D3
Digital Trails Dave King 1 5 10 Part 1 D3
 
Twitter And Social Justice
Twitter And Social JusticeTwitter And Social Justice
Twitter And Social Justice
 
The Reader To Leader Framework Motivating Technology Mediated So
The Reader To Leader Framework Motivating Technology Mediated SoThe Reader To Leader Framework Motivating Technology Mediated So
The Reader To Leader Framework Motivating Technology Mediated So
 
Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...
Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...
Fredrick Ishengoma - Online Social Networks and Terrorism 2.0 in Developing C...
 
How information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occurHow information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occur
 
3 ways to engage citizens using social media
3 ways to engage citizens using social media3 ways to engage citizens using social media
3 ways to engage citizens using social media
 
Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019 Marsden Disinformation Algorithms #IGF2019
Marsden Disinformation Algorithms #IGF2019
 
Technology, human rights & movement building around the world
Technology, human rights & movement building around the worldTechnology, human rights & movement building around the world
Technology, human rights & movement building around the world
 
A review for the online social networks literature
A review for the online social networks literatureA review for the online social networks literature
A review for the online social networks literature
 
Social media in government - presentation to NSW Health
Social media in government - presentation to NSW HealthSocial media in government - presentation to NSW Health
Social media in government - presentation to NSW Health
 
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
 
Future Internet - Webinar UNIFACS Laureate 2015 - With Access Link
Future Internet - Webinar UNIFACS Laureate 2015 - With Access LinkFuture Internet - Webinar UNIFACS Laureate 2015 - With Access Link
Future Internet - Webinar UNIFACS Laureate 2015 - With Access Link
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
 
20130318 socialmediaingov-craigthomler-130319003343-phpapp02
20130318 socialmediaingov-craigthomler-130319003343-phpapp0220130318 socialmediaingov-craigthomler-130319003343-phpapp02
20130318 socialmediaingov-craigthomler-130319003343-phpapp02
 
Lesson 1 2 Edited
Lesson 1 2 EditedLesson 1 2 Edited
Lesson 1 2 Edited
 

Similar a Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications

Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social SciencesDavid De Roure
 
Digital development and Online Gender-Based Violence
Digital development and Online Gender-Based ViolenceDigital development and Online Gender-Based Violence
Digital development and Online Gender-Based ViolenceAnand Sheombar
 
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATION
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATIONA CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATION
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATIONijasa
 
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...Amit Sheth
 
Disaster Strikes. Social Media Responds. Helpful Resources
Disaster Strikes. Social Media Responds. Helpful ResourcesDisaster Strikes. Social Media Responds. Helpful Resources
Disaster Strikes. Social Media Responds. Helpful ResourcesArielle Slam
 
Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...
Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...
Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...Amit Sheth
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for eventijma
 
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...Amit Sheth
 
International Digital Direct Election
International Digital Direct ElectionInternational Digital Direct Election
International Digital Direct ElectionAngela Iara Zotti
 
Social Machines Oxford Hendler
Social Machines Oxford HendlerSocial Machines Oxford Hendler
Social Machines Oxford HendlerJames Hendler
 
Ushahidi: Made in Africa
Ushahidi: Made in AfricaUshahidi: Made in Africa
Ushahidi: Made in AfricaKeisha Taylor
 
We go 2014_chinagohar_shorter
We go 2014_chinagohar_shorterWe go 2014_chinagohar_shorter
We go 2014_chinagohar_shorterGohar Feroz Khan
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of DataDavid De Roure
 
Trends : Social Computing & Mobile Technology
Trends : Social Computing & Mobile TechnologyTrends : Social Computing & Mobile Technology
Trends : Social Computing & Mobile TechnologyBoonlert Aroonpiboon
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Yiannis Kompatsiaris
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 

Similar a Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (20)

Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social Sciences
 
Digital development and Online Gender-Based Violence
Digital development and Online Gender-Based ViolenceDigital development and Online Gender-Based Violence
Digital development and Online Gender-Based Violence
 
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATION
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATIONA CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATION
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATION
 
Thesis Intro Presentation
Thesis Intro PresentationThesis Intro Presentation
Thesis Intro Presentation
 
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
 
Citymatter: UX / UI Design
Citymatter: UX / UI DesignCitymatter: UX / UI Design
Citymatter: UX / UI Design
 
Disaster Strikes. Social Media Responds. Helpful Resources
Disaster Strikes. Social Media Responds. Helpful ResourcesDisaster Strikes. Social Media Responds. Helpful Resources
Disaster Strikes. Social Media Responds. Helpful Resources
 
Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...
Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...
Computing for Human Experience: Semantics empowered Cyber-Physical, Social an...
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for event
 
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
Computing for Human Experience: Sensors, Perception, Semantics, Social Comput...
 
International Digital Direct Election
International Digital Direct ElectionInternational Digital Direct Election
International Digital Direct Election
 
Social Machines Oxford Hendler
Social Machines Oxford HendlerSocial Machines Oxford Hendler
Social Machines Oxford Hendler
 
Ushahidi: Made in Africa
Ushahidi: Made in AfricaUshahidi: Made in Africa
Ushahidi: Made in Africa
 
We go 2014_chinagohar_shorter
We go 2014_chinagohar_shorterWe go 2014_chinagohar_shorter
We go 2014_chinagohar_shorter
 
New and Emerging Forms of Data
New and Emerging Forms of DataNew and Emerging Forms of Data
New and Emerging Forms of Data
 
Trends : Social Computing & Mobile Technology
Trends : Social Computing & Mobile TechnologyTrends : Social Computing & Mobile Technology
Trends : Social Computing & Mobile Technology
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)
 
Edrd 3160 chowdhury
Edrd 3160 chowdhuryEdrd 3160 chowdhury
Edrd 3160 chowdhury
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 

Último

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 

Último (20)

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications

  • 1. Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web ApplicationsTutorial at WWW2011, Hyderabad, IndiaMarch 28, 2011 1
  • 2. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 2 Outline
  • 4. Selvam Velmurugan (Kiirti, eMoksha NGOs) Meena Nagarajan (Content Analysis) Hemant Purohit (People & Network analysis) AmitSheth (Semantic Web) Ashutosh Jadhav (Event Analysis) Lu Chen (Sentiment Analysis) Pramod Anantharam (Social & Sensor web) Pavan Kapanipathi (Real Time Web)
  • 5. Preliminaries Tutorial description: http://www2011india.com/tutorialstr27.html and http://knoesis.org/library/resource.php?id=1030 Lots of breadth: many examples, some depth: few algorithms, mainly to convey insights Twitter > Myspace/Facebook > SMS Each has different reach/focus/importance Given the time, only parts will be covered today! Citations, further reading at bottom and at the end Images belong to their copyright holders. Copyright info. for images, where available are at the end. 5
  • 6. Aim What are research opportunities and technical challenges in gaining insights and use of social media content (esp. citizen sensing)? Provide a structure to a vast array of issues Breath, not depth 6
  • 7. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 7 Outline
  • 8. Citizen Sensing Common person (citizens of Internet) is able to use Web2.0 and social networks The human centric activity** of observing, reporting, disseminating information (facts, opinions, views) via text, audio, video and built in device sensor (and smart devices) ** direct/indirect, collective/individual Human-in-the-loop (participatory) sensing + Web 2.0 + Mobile computing = Emergence of  Citizen-Sensor networks Image: http://bit.ly/hmZe428 A. Sheth, 'Citizen Sensing, Social Signals, and Enriching Human Experience', IEEE Internet Computing, July/August 2009, pp. 80-85.
  • 9. Understanding meaningful citizen sensor observations Social Signal Processing: Aggregation, Enhancement, Analysis, Visualization, and Interpretation Citizen-Sensor network: Immense potential to disseminate social signals quickly and in real-time 9 Social Signals A. Sheth, 'Citizen Sensing, Social Signals, and Enriching Human Experience', IEEE Internet Computing, July/August 2009, pp. 80-85. Image:http://bit.ly/gWHSjD
  • 10.
  • 11. 1+B with internet connected mobile devices (2010)
  • 12. Smartphones> Notebooks + Netbooks (2010E)
  • 13. 500K+ mobile phone applications
  • 14. 74% of mobile phone users (2.4B) worldwide used SMS (2007)Mobile device might qualify as humankind's primary tool Redefines the way we engage with people, information, etc. Enablers: Mobile Devices & Ubiquitous Connectivity Mobile is Global Ubiquity, 24x7 Built in sensors environmental, biometric/biomedical,... 10
  • 15.
  • 17. YouTube (videos)
  • 18. Flicker (images)
  • 19. Blogs (text)
  • 20. Ping(Social network for music) Image: http://bit.ly/euLETT
  • 21. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media (important classes of applications) Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 12 Outline
  • 22. Citizen Sensors in Action Mumbai Terror Attack Iran Election 2009 Haiti Earthquake 2010 US Healthcare Debate 2009 13 Image: http://huff.to/hp0OhA
  • 23.
  • 24. Citizen Journalism 15 Twitter Journalism Images: http://bit.ly/9GVfPQ, http://bit.ly/hmrTYV
  • 25. News is increasingly Social Social News Social Media and Global Media are inter-twined. 16
  • 26. Business Intelligence: Trend Spotting, Forecasting, Brand Tracking, Targeted Advertising Sysomos(http://www.sysomos.com/)  - Business intelligence by engaging, measuring and understanding activities in Social Media Trendspotting(http://trendspotting.com) - Detecting, analyzingandevaluating trends for business. Simplify(http://simplify360.com/) - A collaborativeplatform to monitor, measureandengage customersusing Social Media. Shoutlet(http://www.shoutlet.com/) - Managing social media marketing communication using a single platform. Reputation.com(http://www.reputationdefender.com/) - Preserves privacyanddefendsreputationbyprotectingattacks onpersonalinformation. Image: http://bit.ly/eAebBb 17
  • 27. Social Development (Education, Health, eGov) LiveMocha (http://www.livemocha.com/) Online Language learning tool with social engagement           - bridging the gap!! Soliya(http://www.soliya.net/) Dialogue between students from diverse backgrounds across the globe using latest multimedia technologies ProjectEinstein  (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal program connecting youths in refugee camps to the world PatientsLikeMe  (http://mashable.com/2010/07/13/social-media-health-trends/) -   Facilitates sharing of health profiles, finding patients with similar ailments, and learn from discussions. TrialX(http://trialx.com/) - Finding clinical trials of new treatments and connecting with clinical trial investigators. 18 Image: http://bit.ly/ayyjlU
  • 28. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 19 Outline
  • 29. Collaboration We “simply do not have enough genes to program the brain fully in advance,” we must work together, extending and supporting our own intelligence with “social prosthetic” systems that make up for our missing cognitive and emotional capacities: “Evolution has allowed our brains to be configured during development so that we are ‘plug compatible’ with other humans, so that others can help us extend ourselves.” - Harvard "Group Brain Project" 20
  • 30. Beginnings Open Source  Linux, Apache Social Networks FaceBook, Twitter, MySpace Crowd Sourcing Wikipedia, Kiva, Ushahidi, Kiirti, SwiftRiver, Sahana Collaborative Governance Peer-to-Patent, e-Demogracia 21
  • 31. Popular Initiatives FaceBook + Twitter Iran post-election protests Tunisia and Egypt uprisings   Ushahidi Kenyan post-election violence India, Lebanon, Afghanistan, and Sudan elections Haiti Earthquake Pakistan Floods Kiirti BBMP election monitoring Bangalore AutoWatch  22
  • 32. FixOurCity - Chennai Built on top of FixMyCity open-source codebase Stage I  Report by Area/Ward and Street  Integration with Google Map Displays Ward member name/contact details Select category of issue, description and severity Confirmation through email to avoid misuse Stage II/III  Normalize incoming reports to official wards and categories Integration with Corporation website to allow auto-forwarding and updating of reports 23
  • 33. Ushahidi Information Collection: SMS (FrontlineSMS, Clickatell), Email, Web Visualization/Interactive Mapping: Timeline, Category, Geo-spatial Alerts: Geo-spatial Admin: User Management, Report Moderation / Creation, Site Statistics 24
  • 34. SwiftRiver Filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds. Offers organizations an easy way to apply semantic analysis and verification algorithms to different sources of information. Speed up the process of managing real-time data streams (email, web, sms, twitter) Add elusive context (location, historical data) and history (reputation of sources) to online research Offer a dashboard for monitoring multiple channels of information Offer advanced aggregation and analytic tools on or offline Give the user control over advance curation tools and filter 25
  • 37. Free and Open Source Disaster Management system. A web based collaboration tool that addresses the common coordination problems during a disaster between Government, the civil society (NGOs) and the victims themselves. Sahana
  • 38. Mapping - Situation Awareness & Geospatial Analysis. Messaging - Sends & Receives Alerts via Email & SMS. Document Library - A library of digital resources, such as Photos & Office documents. Missing Persons Registry: Report and Search for Missing Persons. Disaster Victim Identification Requests Management: Tracks requests for aid and matches them against donors who have pledged aid. Shelter Registry - Tracks the location, distribution, capacity and breakdown of victims in Shelters Hospital Management System - Hospitals can share information on resources & needs. Organization Registry - "Who is doing What & Where". Allows relief agencies to coordinate their activities. Ticketing - Master Message Log to process incoming reports & requests. Delphi Decision Maker - Supports the decision making of large groups of Experts Sahana
  • 39. Peer to Patent Peer To Patent opens the patent examination process to public participation for the first time. It is an online system that aims to improve the quality of issued patents by enabling the public to supply the USPTO with information relevant to assessing the claims of pending patent applications. 30
  • 41. Kiirti Allows you to set up your own instance of the Ushahidi Platform without having to install it on your own web server. Provides pre-integrated Voice and SMS reporting capabilities within India. 32
  • 42. 33 Kiirti – Home Page
  • 43. 34 Kiirti – User Interaction Flow
  • 44. Kiirti - Flywheel of Engagement 35
  • 45. Future Possibilities Online Dispute Resolution 30M+ pending cases in India's courts Public Policy Reviews Crisis Management Effective Local Governance 36
  • 46. Challenges Challenges Information overload Processing and de-duping messages Accessibility (e.x. network congestion, access points, …) Incorrect or partial data Trustworthiness of source (e.x. influence, reputation, …) Metadata extraction (e.x. geo data, name-entity, sentiment/opinion, …) Collaboration Policy discussions Structure or hierarchy
  • 47. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 38 Outline
  • 48. Dimensions of Systematic Study of Social Media Spatio - Temporal -Thematic+ People - Content - Network 39
  • 49. Social Information Processing "Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties) People: poster identities, the active effort of accomplishing interaction Content : studying the content of communication  40
  • 50. Studying Online Human Social Dynamics How does the (semantics or style of) content fit into the observations made about the network?   Often, the three-dimensional dynamic of people, content and link structure is what shapes the social dynamic.  41 Example: how does the topic of discussion, emotional charge of a conversation, the presence of an expert and connections between participants; together explain information propagation in a social network? Image: http://bit.ly/dFzjU2
  • 51. Why People-Content-Network + Spatial-Temporal-Thematic metadata?(Example of Understanding Crisis Data) 42
  • 52. Metadata/Annotations Metadata: an organized way to study Types Creation/extraction and storage Use 43 Image: http://www.biowisdom.com/tag/metadata/
  • 53. Metadata Infrastructure: Example for Tweet Annotation (mapped out tweet) 44 Image: http://rww.to/9zyoQa
  • 55. 46
  • 56. ` People Metadata: Variety of Self-expression Modes on Multiple Social Media Platforms Explicit information from user profiles  User Names, Pictures, Videos, Links, Demographic Information, Group memberships...      Implicit information from user attention metadata Page views, Facebook 'Likes', Comments; Twitter 'Follows', Retweets, Replies..  47
  • 57. People Metadata: Various Types Identification Interests Activity Network 48
  • 59. People Metadata: Continued Web Presence: - User affiliations - KLOUT Score – influence measure (www.klout.com) 50
  • 60. Content Metadata Content Independent metadata • date, location, author etc. 51 2. Content Dependent metadata Direct content-based metadata i. Explicit/Mentioned Content metadata • named entities in content ii. Implicit/Inferred Content Metadata • related named entities from knowledge sources b. Indirect content-based metadata (External metadata) • context inferred from URLs in content (images, links to articles, FourSquarecheckins etc.) V. Kashyap and A. Sheth, 'Semantic Heterogeneity in Global Information Systems: The Role of Metadata, Context and Ontologies,’ in Cooperative Information Systems: Current Trends and Directions, M. Papazoglou and G. Schlageter (Eds.), Academic Press, 1998, pp. 139-178.
  • 61.
  • 62. Content Metadata: Content Dependent (Tweet) 53 Direct Content-based Metadata Indirect content-based metadata (External metadata)
  • 63. Content Metadata: Content Dependent (SMS) Direct Content-based Metadata 54
  • 64. Network Metadata Connections/Relationships matter! (foundation for the network) 55
  • 66. Metadata Creation & Extraction Extracted Metadata Directly visible information from the user profile, tweet content & community structure Created Metadata After processing information in the user profile, content and/or network structure 57
  • 67. An Example Length: 109 charactersGeneral topic: Egypt protest  This poor {sentiment_expression: {target: “Lara Logan”, polarity: “negative”}} woman! RT @THRCBS News‘ {entity:{type=“News Agency”}} Lara Logan {entity:{type=“Person”}} Released FromHospital {entity:{type=“Hospital”}} After Egypt {entity:{type=“Country”} Assault {topic} http://bit.ly/dKWTY0 {external_URL} 58
  • 68. Why Semantic Web is a Standard for Social Metadata? Rich Snippet, open graph: RDFa - Semantic Web based social data standards Relationships/connections play central role (not just hyperlinks as in Web data)– so relationship as first class object is important Semantic Web technologies and standards provide better techniques to capture and represent metadata, relationships 59
  • 69. Semantic Web in One Slide Representing Semantic Web Data RDF: relationships as first class object <subject, predicate,object> Representing Knowledge  and Agreements nomenclature, taxonomy, folksonomy, ontology: OWL Annotation: RDFa, Xlink, model reference Web of Data: Linked Open Data  Querying: SPARQL Rules: SWRL, RIF 60
  • 70.
  • 71. Semantics = meaning
  • 72. Richer representation, support for relationships, context
  • 73. Supports use of background knowledge
  • 74. Better integration, powerful analysis
  • 75. Use of RDF data stores/LOD 
  • 76. Semantics- the implicit, the formal and the powerful
  • 77.
  • 78. Metadata Extraction from Informal Text 63 Meena Nagarajan,‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
  • 79. 64 Characteristics of Text on Social Media
  • 80. The Formality of Text 65
  • 81.
  • 82. What opinions are people conveying via the content?
  • 84. What can we infer about the author from the content he posts?
  • 85. Context (external to content) extraction
  • 86. URL extraction, analyzing external contentRecognize key entities mentioned in content Information Extraction (entity recognition, anaphora resolution, entity classification..) Discovery of Semantic Associations between entities Topic Classification, Aboutness of content  What is the content about? Intention Analysis  Why did they share this content? 66
  • 87. Research Efforts, Contributions in this space.. Examining usefulness of multiple context cues for text mining algorithms Compensating for informal, highly variable language, lack of context Using context cues: Document corpus, syntactic, structural cues, social medium, external domain knowledge… In this talk, highlighting sample metadata creation tasks: NER Key Phrase Extraction Intention Sentiment/Opinion Mining 67
  • 88. Named Entity Recognition I loved <movie> the hangover </movie>! Key Phrase Extraction 68 Part 1: NER, Key Phrase Extraction
  • 89. Multiple Context Cues Utilized for NER in Blogs and MySpace Forums 69 Meena Nagarajan,‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
  • 90. 70 Multiple Context Cues Utilized for Keyphrase Extraction from Twitter, Facebook and MySpace Meena Nagarajan,‘Understanding User-Generated Content on Social Media,’ Ph.D. Dissertation, Wright State University, 2010
  • 91. Focus, Impact We focus on techniques that exploit content and context aspects on social media platforms Our methods highlight a combination of top-down, bottom-up analysis for informal text Statistical NLP, ML algorithms over large corpora (bottom-up) Models and rich knowledge bases in a domain(top-down) 71
  • 93. Named Entity Recognition “I loved your music Yesterday!” Yesterday is an album “It was THEHANGOVER of the year..lasted forever.. The Hangover is not a movie So I went to the movies..badchoice picking “GI Jane”worse now” GI Jane is a movie 73 Task of NER : Identifying and classifying tokens
  • 94. NER in prior work vs. NER for Informal Text 74
  • 95. Cultural Named Entities • NER focus in this work: Cultural Named Entities Artifacts of Culture Name of a books, music albums, films, video games, etc. Common words in a language The Lord of the Rings, Lips, Crash, Up, Wanted, Today, Twilight, Dark Knight… 75
  • 96. What makes cultural entity extraction challenging.. Varied senses, several poorly documented Star Trek: movies, TV series, media franchise.. and cuisines !! Changing contexts with recent events The Dark Knight is a movie, it is also a reference to Obamaand the health care policy Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses .. Are Unrealistic expectationswhen building a NER system NER Relaxing the closed-world sense assumptions 76
  • 97. 77 NER in prior work vs. NER for Informal Text
  • 98.
  • 99. Starting off with a dictionary or list of entities we want to spot
  • 100. Spot, then disambiguate in context (natural language, domain knowledge cues)
  • 102. Is this mention of “the hangover” in a sentence referring to a movie?CoNLL 2003 -- http://www.cnts.ua.ac.be/conll2003/ner/
  • 103. 79 NER in prior work vs. NER for Informal Text
  • 104. Cultural NER - Two Flavors 80
  • 105. (a) Multiple Senses in the Same Domain 81
  • 106. Algorithm Preliminaries Problem Definition – Cultural Entity Identification : Music album, tracks e.g. Smile (Lilly Allen), Celebration (Madonna) • Corpus: MySpace comments – Context-poor utterances e.g. “Happy 25th Lilly, Alfieis funny” 82 • Goal: Semantic Annotation of music named entities (w.r.t MusicBrainz) MusicBrainz Schema
  • 107. Using a Knowledge Resource for NER is not straight-forward.. 83
  • 108. Approach Overview Which ‘Merry Christmas’?; ‘So Good’is also a song! Scoped Relationship graphs – Using context cues from the content, webpage title, url… e.g. new Merry Christmas tune – Reduce potential entity spot size e.g. new albums/songs • Generate candidate entities • Spot and Disambiguate 84
  • 109. Sample Real-world Constraints Which ‘Merry Christmas’?; ‘So Good’is also a song! Career Restrictions - “release your third album already..” Recent Album restrictions - “I loved your new album..” Artist age restrictions -”happy 25thrihanna, loved alfie btw..” etc. 85
  • 110. 86
  • 111. Scoping via Real-world Restrictions 87
  • 112. Scoped Entity Lists User comments are on MySpace artist pages – Contextual Restriction: Artist name – Assumption: no other artist/work mention Naive spotter has advantage of spotting all possible mentions (modulo spelling errors) – Generates several false positives “this is bad news, ill miss you MJ” 88
  • 113. But there are also non-music mentions Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem partially Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on! 89
  • 114. Using Language Features to eliminate incorrect mentions.. Syntactic features POS Tags, Typed dependencies.. Word-level features Capitalization, Quotes Domain-level features 90
  • 116. Hand-labeling - Fairly Subjective 1800+ spots in MySpace user comments from artist pages Manual annotations for a post: “Keep your <track>SMILE<track>on!” valid album/track named entity (good spot)invalid named entity (bad spot)hard-to tell (inconclusive) 4-way annotator agreements – shows that agreeing on the accuracy of a spot is hard to do even for domain experts – Madonna 90% agreement – Rihanna 84% agreement – Lily Allen 53% agreement (many named entities of ambiguous nature and usage) 92
  • 117. Combining a Dictionary Spotter + NLP Analytics 93 Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, AmitSheth,‘Context and Domain Knowledge Enhanced Entity Spotting in Informal Text,’ The 8th International Semantic Web Conference, 2009: 260-276
  • 118. Lessons Learned - NER on Social Media Text using a Knowledge Base Intelligent pruning of a knowledge base goes a long way in improving precision Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% allows the more time-intensive NLP analytics to run on less than the full set of input data 94
  • 119. 95 Music NER application : BBC SoundIndex (IBM Almaden)Pulse of the Online Music Populace Daniel Gruhl, MeenakshiNagarajan, Jan Pieper, Christine Robson, Amit Sheth: ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’ special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 Project: http://www.almaden.ibm.com/cs/projects/iis/sound/
  • 121. 97
  • 122. Several Insights 98 Trending popularity of artists Trending topics in artist pages Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source? Ignoring Spam can change ordering of popular artists
  • 123. Predictive Power of Data Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. Challenging traditional polling methods! 99
  • 125. Key Phrase Extraction - Example Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day 101
  • 126. Key Phrase Extraction from Social Media Text Different from Information Extraction Key phrase extraction does not concern itself with classification into a type Extracting vs. Assigning Key Phrases Focus: Key Phrase Extraction Prior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book.. Focus here: summarize multiple documents (UGC) around same event/topic of interest 102
  • 127. Key Phrase Extraction on Social Media Content has some differences 1. Need to preserve/isolate the social behind the social data in summarizing key phrases What is said in Egypt vs. the USA should be viewed in isolation 2. Need to Accounting for redundancy, variability, off-topic content “Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.” 103
  • 128. Where is the Social and Cultural Logic in UGC ? Thematic components similar messages convey similar ideas Space, time metadata role of community and geography in communication Poster attributes age, gender, socio-economic status reflect similar perceptions ‘Social applies to data as well as metadata’ 104
  • 129. Features used in social Key Phrase extraction (common to prior efforts) Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc. 105
  • 130. Key Phrase Extraction Overview “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” 1-grams: President, Obama, in, trying, to, regain, ... 2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... 3-grams: “President Obama in”, “Obama in trying”; “in trying to”... 106
  • 131.
  • 133. Variability: contextually importantSpatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending) ` 107
  • 134. 108 TF-IDF vs. Spatio-temporal-thematic scores rank phrases differently Foreign relations surfaces up M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009: 539-553
  • 135. Next task : Eliminating Off-topic Content Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR 109 Popular Key phrases “single”, “Jesus” are unrelated to Madonna’s music M. Nagarajan et al., Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
  • 136. Elimination off-topic content : Example • “Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonys” • “CanonHV20.Great little cameras under $1000.” Possible relevant phrases are: ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras', 'canon'] 110
  • 137. Assume one or more seed words (from domain knowledge base) C1 -['camcorder'] • Extracted Key words / phrasesC2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] • Gradually expand C1 by adding phrases from C2 that are strongly associated with C1 • Mutual Information based algorithm [WISE2009] 111 Eliminating off-topic content : Approach Overview
  • 138. Key Phrases & Aboutness - Evaluations Are the key phrases we extracted topical and good indicators of what the content is about? If it is, it should act as an effective index/search phrase and return relevant content Evaluation Application: Targeted Content Delivery 112
  • 139. Targeted Content Delivery -Evaluations We took 12K posts from MySpace and Facebook Electronics forums Extracted Baseline phrases using Yahoo Term Extractor Extracted phrases using the Key phrase extraction, elimination algorithm described earlier Generated Targeted Content from Google AdSense Asked users if the delivered content matched the posts 113
  • 140. Targeted Content for all content vs. extracted key phrases 114
  • 141. User Studies and Results 115
  • 142. Social Key Phrase Extraction : Impact, Contributions TFIDF + social contextual cues yield more useful phrases that preserve social perceptions Corpus + seeds from a domain knowledge base eliminate off-topic phrases effectively 116
  • 144. Why do people share? Outside of the psychological incentives, broadly, people share to Seek Information OR Share Information If we understand the intent behind a post, we can build systems that respond to it better Focus of our work: Understand intent to deliver targeted content Use case: Online Content-Targeted Advertisements on Social Media Platforms 118
  • 146. Today – Content-based Ads on Profiles 120
  • 147.
  • 148. But Interests on profiles do not translate to purchase intents – Interests are often outdated.. – Intents are rarely stated on a profile.. • Some profile data does seem to work – Example: New store openings, sales targeted at location information in a profile 121
  • 149. But Monetizable Intents are Elsewhere, away from their profiles.. 122
  • 150. Showing clear intents on MySpace posts but no relevant ads.. 123
  • 151. Targeted Content-based Advertizing –Non-trivial –Non-policed content •Brand image, Unfavorable sentiments –People are there to network •User attention to ads is not guaranteed –Informal, casual nature of content •People are sharing experiences and events –Main message overloaded with off topic content I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 1Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008 124
  • 152. Focus: Discuss Methodology, Preliminary Results in… • Identifying intents behind user posts on social networks – Identify Content with monetization potential • Identifying keywords for advertizing in user-generated content – Considering interpersonal communication & off-topic chatter 125 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
  • 153. Investigations User studies – Hard to compare activity based ads to s.o.t.a – Impressions to Clickthroughs – How well are we able to identify monetizable posts – How targeted are ads generated using our keywords vs. entire user generated content 126 M. Nagarajan et al., ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
  • 154. Identifying Intents on SM is different from that on the Web.. Scribe Intent not same as Web Search Intent1 People write sentences, not keywords or phrases Presence of a keyword does not imply navigational / transactional intents – ‘am thinking of getting X’ (transactional) – ‘I like my new X’ (information sharing) – ‘what do you think about X’ (information seeking) Useful here would be to identify: Transactional and Information Seeking intents 1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web queries,”Inf. Process. Manage., vol. 44, no. 3, 2008. 127
  • 155. Not Focusing on the entity but Action Patterns surrounding the entity “where can I find a chottopspcam” – User post also has an entity, which is a plus but not the main target of intent identification.. Goal is to study How questions are asked and not topic words that indicate what the question is about 128
  • 156. Conceptual Overview Bootstrapping to learn IS patterns Take a set of user posts from SNSs Not annotated for presence or absence of any intent 129
  • 157. Bootstrapping to learn IS patterns Generate a universal set of n- gram patterns; freq > f S = set of all 4-grams; freq > 3 130
  • 158. Bootstrapping to learn IS patterns Generate set of candidate patterns from seed words (why,when,where,how,what) Sc= all 4-grams in S that extract seed words 131
  • 159. Bootstrapping to learn IS patterns User picks 10 seed patterns from Sc Sis= ‘does anyone know how’, ‘where do I find’, ‘someone tell me where’… 132
  • 160. Bootstrapping to learn IS patterns Gradually expand Sis by adding Information Seeking patterns from Sc 133
  • 161. Bootstrapping to learn IS patterns For every pis in Sis generate set of filler patterns 134
  • 162.
  • 163. Expanding the Pattern Pool Functional properties / communicative functions of words From a subset of LIWC1 – cognitive mechanical (e.g., if, whether, wondering, find) • ‘I am thinking about getting X’ – adverbs(e.g., how, somehow, where) –(e.g., someone, anybody, whichever) • ‘Someone tell me where can I find X’ 1Linguistic Inquiry Word Count, LIWC, http://liwc.net 136
  • 164. Example - Acquiring New Intent Patterns.. • ‘does * know how’ – ‘does someone know how’ • Functional Compatibility -Impersonal pronouns • Empirical Support –1/3 – ‘does somebody know how’ • Functional Compatibility -Impersonal pronouns • Empirical Support –0 • Pattern Retained – ‘does john know how’ • Pattern discarded Sc= {‘does anyone know how’, ‘where do I find’, ‘someone tell me where’} • pis= `does anyone know how’ 137
  • 165. Finer Details of the Approach are in the paper.. Iterative algorithm, single-word substitutions, functional usage and empirical support conservatively expand the intent-seeking pool of patterns.. Infusing new patterns and seed words Stopping conditions 138 M. Nagarajan et al., Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009: 92-99
  • 167.
  • 168. Using LIWC ‘Money’ dictionary : 173 words and word forms indicative of transactions, e.g., trade, deal, buy, sell, worth, price etc.140
  • 169.
  • 170. 309 unique new patterns, 263 unambiguous• Testing patterns for recall using ‘To buy’ Facebook Marketplace where all posts are information seeking – extracted patterns average 81 % recall 141
  • 171. Next task: Identifying Keywords for Advertizing Identifying keywords in monetizable posts – Plethora of work in this space Off-topic noise removal is our focus I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( 142
  • 172.
  • 173. Example: Off-topic Chatter Elimination • C1 -['camcorder'] • C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] • Informative words ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras', 'canon'] 144
  • 174. Evaluations- User Study Keywords from 60 monetizable user posts – Monetizable intent, at least 3 keywords in content – 45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students – 10 sets of 6 posts each – Each set evaluated by 3 randomly selected users • Monetizable intents? – All 60 posts voted as unambiguously information seeking in intent 145
  • 175. Effectiveness of using topical keywords • Google AdSenseads for user post vs. extracted topical keywords 146
  • 177. Result -2X Relevant Impressions Users picked ads relevant to the post – At least 50% inter-evaluator agreement For the 60 posts – Total of 144 ad impressions – 17% of ads picked as relevant For the topical keywords – Total of 162 ad impressions – 40% of ads picked as relevant 148
  • 178. Evaluations: Profile Ads vs. Activity Ads • Are ads generated from activity more interesting than those generated from user profiles? Gather user’s profile information – Interests, hobbies, TV shows.. (non-demographic information) • Ask them to submit a post (simulating their social media entry) – Looking to buy and why (induce off-topic content) • Generate ads from profiles, from post (keywords) 149
  • 179. Result - 8X more interest for non-profile ads.. • Using profile ads – Total of 56 ad impressions – 7% of ads generated interest • Using authored posts – Total of 56 ad impressions – 43% of ads generated interest • Using topical keywords from authored posts – Total of 59 ad impressions – 59% of ads generated interest 150
  • 180.
  • 181.
  • 182. SENTIMENT / OPINION MINING 153
  • 183. Content Analysis: Sentiment Analysis/Opinion Mining Two main types of information we can learn from user-generated content: fact vs. opinion Much of social media text (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions.   For example," Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14{fact}... looks like they "learned" from #Egypt {opinion}"
  • 184. Sentiment Analysis: Motivation Why do people oppose health care reform? What customers complain about? Which movie should I see? 155 Image: http://bit.ly/eZtKBF
  • 185. Sentiment Analysis: Tasks Example: “How awful that many #Egypt ian artifacts are in danger of being Destroyed. What ZahiHawassmust be thinking#jan25” Classification: Overall sentiment polarity [Pang et al. 2002], [Turney 2002], etc. the overall polarity is positive, neutral or negative (on the document/sentence/word level) For the example: overall polarity is negative Target-specific sentiment polarity [Yi et al. 2003], [Hu et al. 2004], etc. The polarity toward the given target is positive, neutral or negative For the Example: polarity is "negative“ for the target "egyptian artifacts“; polarity is "neutral“for target "ZahiHawass" 156
  • 186. Sentiment Analysis: Tasks Example: “How awful that many #Egypt ian artifacts are in danger of being Destroyed. What ZahiHawassmust be thinking #jan25” Identification & Extraction: opinion[Dave et al. 2003] etc. opinion holder [Bethard et al. 2004] etc. opinion target [Hu et al. 2004] etc. For the example:  opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger” opinion="must be thinking", opinion holder="the author", target="ZahiHawass" 157
  • 187. Sentiment Analysis: Classification Supervised[Pang et al. 2002] etc. Labeled training data: e.g., product review, movie review, etc. Features: e.g., term-based, part-of-speech, syntactic relations, etc. Learning strategies: e.g.,SVMs), Naive Bayes, .. Unsupervised [Turney 2002] etc. lexicon-based approach [Hu et al. 2004], [Ding et al. 2008] etc. Using a sentiment lexicon of positive/negative sentiment words Bootstrapping [Thelen et al. 2002] etc. Iteratively trains and evaluates a classifier, starting from an unannotated corpus and a few predefined seed words, The task of extracting the opinion/holder/target is similar to the traditional IE task. Key distinction- the relations between opinion and opinion target are considered important. 158
  • 188.
  • 189. Proximity[Hu et al. 2004] etc.
  • 190. extract the nearby adjectives modifying the target topic as opinion clues
  • 191. Syntactic dependency [Popescu et al. 2005] etc.
  • 192. employed language parser to compute the syntactic dependencies to extract the opinion clues with a given target topic
  • 194. heuristics: the more frequently a candidate opinion target co-occurs with any opinion clues, the more likely it is the real opinion target
  • 196.
  • 197. Highlight the potential of text streams as a substitute and supplement for traditional polling.Connect public opinion measured from polls with sentiment measured from tweets.  Lexicon-based approachfor sentiment analysis of tweets Within topic tweets, count messages containing positive and negative words defined by the sentiment lexicon 160
  • 198. Sentiment Analysis: Predicting the Future With Social Media [Asur et al. 2010] Use tweets to forecast box-office revenues for movies. Traina language model classifier for sentiment classification of tweets. Findings: The prediction model using the rate at which tweets are created about a movie outperforms the market-based methods. The sentiments present in tweets can be used to improve the prediction. 161
  • 199. Sentiment Analysis: Target-specific Opinion Identification & Classification of Tweets-Unsupervised Approach [kno.e.sis ongoing work] Simple lexicon-based method doesn't work well. Target of “sexy” is “Helena” Target of “terrific” is “reviews” “free” is not opinionated in movie domain. Target of “loving” is “telling” “well” in “as well” is not opinionated Observations: The opinion clues may not be toward the given target (1,2,3,6) The opinion clues are domain and context dependent (5,7) Single words are not enough (4,7,8) 162
  • 200. Domain and context-aware sentiment lexicon generation (here take the movie domain as example) General subjective lexicon Commonly used subjective lexicon + polar slangs learned from dictionary Select candidate opinion clues from the domain-specific corpus based on the general lexicon word + surrounding context E.g., {“free”, “free movie”, “free movie streaming”... }, {“must”, “must see”, “a must see”, “must see movie”…} , {“well”, “as well”, “well done”… } Identify the opinion clues and their polarity Utilize information from multiple sources, including the corpus, domain knowledge (e.g., freebase, imdb), general lexicon, etc. Bootstrapping + statistical model E.g., <“must”, “must see”, positive>; <“well”, “well done”, positive>   Sentiment Analysis: Target-specific Opinion Identification & Classification of Tweets-Unsupervised Approach [kno.e.sis ongoing work] 163
  • 201.
  • 203. When generating the domain and context-aware sentiment lexicon, use a set of predefined rules to select toward-target candidate opinion clues
  • 205. When using the generated lexicon to extract target-specific opinion, for each pair of <target, opinion clues> in one tweet, determine whether the opinion clues is toward the target based on their syntactic dependency.
  • 206. E.g., Lovedthe King's Speech. Funny, moving...Colin Firth is so amazing. I know, you already knew that. (“amazing” won’t be extracted since nsubj(amazing, Firth) )
  • 207.
  • 208. from Content Analysis
  • 209. from Network Analysis
  • 210. Merge of two approaches
  • 211. People Analysis showing use of Merger approach (Content+Network) and derived metadata
  • 213. Finding User Types & Affiliation
  • 214. Measuring Social Engagement 165
  • 215. People Analysis: Extracting People Metadata 166
  • 216. People Analysis: Using Content to Derive People Metadata Personality Signals Extrovert, agreeable, open etc Blogs, Style of Writing Loose and periodic sentence, connotation etc. Psychometric analysis of content Knowledge, abilities, attitude etc. Sample study: Gendered writing styles online [Ellison et al. 2006, Nagarajan et al. 2009, ICWSM etc.] Self-expression tends towards attempting homophily in online dating profiles, given the tendency to 'imitate and impress' in courtship 167 Image: http://bit.ly/JZ6eF Read: ‘How’ people write @Kno.e.sis
  • 217.
  • 218.
  • 219. For individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity.  [Romero et al. 2010]
  • 221. Homophily causing Reciprocity on Twitter [TwitterRank, Weng et al. 2010]
  • 222. Klout Score - True Reach, Amplification [http://klout.com]By Link Analysis Algorithms Hits [Kleinberg 1999] & variants   PageRank[Brin et al. 1998] & variants etc.. Links not sufficient! Audience size doesn’t prove influence on twitter [Million Follower Fallacy,Cha et al. 2010] 169 Image: http://bit.ly/9pfTO4
  • 223.
  • 224. Web Presence: Use of Web & Knowledge bases (Wikipedia, Blogs) to build context for user types
  • 225. Entity Spotting & Extraction, followed by Semantic Association and Similarity with user-type context170 Image: kahunainstitute.com *Read Semantics driven Social Media Analysis@ Kno.e.sis
  • 226.
  • 227. How effectively the community of people talking about this event online, can grow to reach potential donors and people in need of resources (food, water, first aids etc.)?
  • 228. What are the best possible ways to communicate between resource providers and people in need of resources?
  • 229.
  • 230. NETWORK ANALYSIS - Deriving Network Metadata Interesting questions Network Analysis – Methods Models Metrics Network Analysis – Algorithms Graph Partitioning, Traversal Community Discovery, Evolution Social Network Analysis Diffusion Homophily Study of 3-D Dynamics (People-Content-Network) - Analysis & Visualization tools 173
  • 231.
  • 233. Network Analysis: Methods Network Modeling Approaches  Random graph model (Erdos-Renyi model) start with n vertices and add edges between them at random Small-world model most nodes are not neighbors of one another, but they can be reached from every other by a small number of hops or steps (Small World Phenomenon) Scale-free model  degree distribution follows a power law, i.e., frequency of degree varies as a power of its size 175 Image: http://www.kudosdynamics.com/ Important Literature:      [Wasserman et al. 1992, Watts et al. 1998, Albert et al. 2002, Newman et al. 2006, Marin et al. 2010, Easley et al. 2010]
  • 234.
  • 235. Clusters (Cliques and extensions, Communities)Image: http://www.kudosdynamics.com/ Important Literature:      [Wasserman et al. 1992, Watts et al. 1998, Albert et al. 2002, Newman et al. 2006, Marin et al. 2010, Easley et al. 2010]
  • 236. Network Analysis: Algorithms  Graph Partitioning & Traversal Goal: Best time-complexity & reachability Generally follows Greedy paths e.g., K-way multilevel Partitioning, Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS, MST Community Discovery, growth, evolution Based on relationship types (e.g., signed network), geography/location based, interest based etc. Generally follow cluster analysis e.g., Hierarchical clustering algorithms – Top-down, bottom-up Further Reading: Modularity Maximization [Newman et al. 2006] Algorithms comparison survey [Balakrishnan et al. 2006] Online Communities [Preece 2001] 177 "We dream in Graph and We analyze in Matrix” - Barry Wellman, INSNA 
  • 237. Social Network Analysis: Diffusion & Homophily Social Network Analysis (Interested in information flow) Can we predict user actions? Understanding dynamics is challenging! Why to study Diffusion Maximizing Spread (Opinion, Innovation, Recommendation) Outbreak Detection (e.g., disease) Diffusion Behavior Power Law distribution[Leskovec et al. 2007] Factors impacting Diffusion User Homophily – similar behavior tendency [McPherson et al. 2001] Sampling strategy [Choudhury et al. 2010], content nature[Nagarajan et al. 2010]etc. 178 Image: http://bit.ly/fGkIBK
  • 238.
  • 239. Any bridges to connect to the other community? (People)
  • 240. Any Similarity in actions with the other community (Can Content help?)Image: http://themelis-cuiper.com
  • 241. Study of 3-D Dynamics-People, Content, Network Metadata a powerful tool to explore this dynamics* Studies in this direction A Qualitative Examination of Topical Tweet and Retweet Practices [Nagarajan et al. 2010] How content dictates the network flow User-Community Engagement by Multi-faceted Features: A Case Study on Twitter [Purohit et al. 2011] [TO BE PRESENTED TOMORROW IN SoME'11] What factors impact user engagement in topic discussion 180 *Read People-Content-Network Analysis @ Kno.esis
  • 242. Graphs showing sparse (A) and dense (B) RT networks and their corresponding follower graphs for 'call for action' and 'information sharing' type of tweets M. Nagarajan, H. Purohit, and A. Sheth,  ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010 181
  • 243. Analysis & Visualization Tools Network WorkBench (NWB) Truthy Graph-tool Orange Pajek Tulip …. Many tools!! Resource: http://en.wikipedia.org/wiki/Social_network_analysis_software 182 Image:http://truthy.indiana.edu/
  • 244. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 183 Outline
  • 245.
  • 246. In Disaster scenarios (e.g. Haiti earthquake, Gulf oil spill)
  • 247. For Political revolution (e.g. Egypt political crisis)
  • 248.
  • 250. (How?) Detect spam and misleading content.
  • 251. Assess data quality of on-topic content
  • 252. (How?) Trust models to assess trustworthiness of content.184
  • 253.
  • 254. Email spam drops while social media spam surges2
  • 255. e.g. spam on twitter
  • 256. 2% of [100 million]1 spam tweets per day!Graph depicting % of spam tweets per day on twitter against time. Image: http://blog.twitter.com/2010/03/state-of-twitter-spam.html 1http://techcrunch.com/2010/09/14/twitter-seeing-90-million-tweets-per-day/ 1http://blog.twitter.com/2011/03/numbers.html 2http://www.allspammedup.com/2011/03/email-spam-drops-as-social-media-spam-surges/ 185
  • 257. Spam in Social Networks Spamming on twitter: Gaining Popularity Use of popular topic related keywords (e.g. hashtags of trending topics) to propagate off topic content. Launching malicious attacks Phishing attacks, virus, malware etc. Misleading the masses Propagating false information [Mustafaraj et al. 2010] Astroturf campaigns during political elections1. 1http://truthy.indiana.edu/ 186
  • 258.
  • 259. Spam in Social Networks Spam Detection Content-based features Content Size and repetition, URL usage, spam words [Benevenuto et al. 2010]. Metadata-based features Account information, behavior [Ratkiewicz et al. 2010]. Network-based features Provenance. (e.g. content from a reliable source) Community voting. (e.g. RT in case of twitter). Author status in the network (e.g. number of followers, friends, PageRank[Brin et al. 1998], influence [Weng et al. 2010]).  Removal of spam doesn’t guarantee trustworthiness 188
  • 260. Trust in Social Networks Reputation, Policy, Evidence, and Provenance used to derive trustworthiness [Artz et al. 2007]. Illustrative examples of online cues used for trust assessment Wikipedia: article size, number of references, author, edit history, age of the article, edit frequency, etc [Dondi et al. 2006, Moturu et al. 2009]. Product Reviews: number of helpful, very helpful ratings, author expertise, sentiments in comments received for a review, etc [Liu et al. 2008]. 189
  • 261. Trust in Social Networks Gleaning primitive (edge) trust Trust value between two nodes is quantified using numbers. E.g., [0,1] or [-1,1] or partial ordering [Thirunarayan et al. 2009, Jøsang et al. 2002, Ganeriwal et al. 2008]. Gleaning composite (path) trust  Propagation via chaining and aggregation (transitivity) [Golbeck et al. 2006, Sun et al. 2006] Some popular algorithms for trust computation  Eigentrust[Kamvar et al. 2003], Spreading Activation [Ziegler et al. 2004], SUNNY [Kuter et al. 2007], etc. 190
  • 262. Trust in Social Networks Trust Ontology1 Captures semantics of trust. Enables representation and reasoning with trust. Semantics of Trust specifies, for a given trustor and trustee, the following features. {type, value, scope, process} trustor trustee 1P. Anantharam, C. A. Henson, K. Thirunarayan, and A. P. Sheth, 'Trust Model for Semantic Sensor and Social Networks: A Preliminary Report', National Aerospace & Electronics Conference (NAECON), Dayton Ohio, July 14-16th, 2010. 191
  • 263.
  • 264. Referral Trust – Agent a1 trusts agent a2’s ability to recommend another agent.
  • 265. Functional Trust – Agent a1 trusts agent a2’s ability.
  • 266. Non-Functional Trust – Agent a1 distrusts agent a2’s ability.
  • 267. Trust Value - E.g., Star rating, numeric value or partial ordering.
  • 268. Trust Scope -E.g., recommendation for a Car Mechanic, Baby sitter or a movie.1K. Thirunarayan, Dharan K. Althuru, Cory A. Henson, and Amit P. Sheth, “A Local Qualitative Approach to Referral and Functional Trust,” In: Proceedings of the The 4th Indian International Conference on Artificial Intelligence (IICAI-09), pp. 574-588, December 2009 192
  • 269.
  • 270. Trust Ontology Anna’s car is in terrible shape Dick is a certified mechanic Bob has experience with cars type: referral process: reputation scope: car mechanic value: 8 Bob type: functional process: policy scope: car mechanic value: 10 ASE certified type: non-functional process: reputation scope: car mechanic value: 3 Ben Dick 194 Anna
  • 271. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 195 Outline
  • 272.
  • 273. Peer Power: Mobiles can create social movements based on peer influencehttp://www.technologyreview.com/business/26654/?a=f
  • 274. Mobile Social Computing (Cont.) Personalized Branding: advertising are rapidly becoming personalized based on individual's needs and preferences  Mobiles in social development becoming an integral part of development  Coordination in disaster situations Health care delivery, especially in developing countries Elections and other forms of political expression Mobile payments will be widespread Social Networking Accelerating Growth of Mobile 197
  • 275. Integrating Social And Sensor Networks Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative. Benefits of combining observations from humans and machine sensors Complementary evidence. Corroborative evidence. Applications of integrating heterogeneous sensor observations Situation Awareness by using  human observations to interpret machine sensor observations. Enhancing trustworthiness using corroborative evidence. 198
  • 276. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 199 Outline
  • 277. Citizen Sensing @ Real-time 200
  • 278. Real-Time Motivation People can’t wait for Information 500 years ago Single life time 20 years ago Next day or two Television, News papers Presently Minutes are not considered  fast enough Digital media, Social media  201 Image: http://bit.ly/fg8EI3
  • 279. Real-Time Social Media Is Real-Time the future of Web? Social Media for Real-Time Web Disaster Management Ushahidi (www.ushahidi.org) Real-Time Markets RealTimeMarkets (http://www.realtimemarkets.com/) Brand Tracking Twarql (http://wiki.knoesis.org/index.php/Twarql) Movie reviews Flicktweets (www.flicktweets.com) 202
  • 280. Scenario CNN - IBN Feb 2011 Journalist 203 Scenario Image: http://bit.ly/hlIutz 203
  • 281. Challenges Information Overload Number of tweets that contained the words "Egypt," "Yemen," or "Tunisia" increased more than tenfold after January 23rd: there were 122,319 tweets between January 16 and 23 containing these terms, and 1.3 million tweets between January 24 and January 30. Real Time Can we extract social signals (through analysis) as data is generated? http://blog.sysomos.com/2011/01/31/egyptian-crisis-twitte/ 204
  • 282. A Semantic Web Approach Expressive description of Information need Using SPARQL (Instead of traditional keyword search)  Flexibility on the point of view Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc.. Streaming data with Background Knowledge Enables automatic evolution and serendipity Scalable Real-Time delivery  Using sparqlPuSH (SFSW'10) 205
  • 284. Architecture Linked Open Social Signals @ Kno.e.sis P. Mendes, A. Passant, P. Kapanipathi, and A. Sheth ‘Linked Open Social Signals,’ WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010. 207
  • 286.
  • 287. User profile: User Name, Location, Time etc..
  • 288.
  • 290. Semantic Publisher Virtuoso to filter triples Queries formulated by the users are stored SPARQL protocol over the HTTP to access rdf from the store Combine data from tweet with the background knowledge in the rdf store  212
  • 291. Application Server & Distribution Hub 213
  • 292. Application Server & Distribution Hub Distribution Hub  PUSH Model - Pubsubhubbub protocol  Pushes the tweets to the Application Server Application Server  Delivers data to the Clients  RSS Enable Concept feeds 214
  • 293. Use Case 4: Competitors Brand Tracking - Competitors Background Knowledge (e.g. DBpedia) IPhone HPTabletPC category:Wi-Fi @anonymized This Ipad is super cool. Is it better than category:Touchscreen ?category skos:subject skos:subject ?competitor skos:subject moat:taggedWith dbpedia:IPad ?tweet 215
  • 294. Demonstration Social Media & News Topic - HealthCare 1242 Articles from Nytimes Around 800,000 tweets President Obama lays out plan for Health care reform in Speech to Joint Session of Congress (10th Sept Timeline.com) Obama taking an active role in Health talks in pursuing his proposed overhaul of health care system. (13th Aug Nytimes) 216 216
  • 295. Twarql on Linked Open Data 217 Image: http://bit.ly/aTOV1r 217
  • 296. Twarql (Real-Time Event Following) Motivation - Rapidly Evolving Events Example : Egypt protests, Health care debate, Iran Election Twarql (Real-Time Event Following)
  • 297. Adapt to the changes of the event 219 Approach – Continuous Semantics Continuous Semantics @ Kno.e.sis Sheth, C. Thomas, and P. Mehra, Continuous Semantics to Analyze Real-Time Data, IEEE Internet Computing, 14 (6), November-December 2010, pp. 84-89.
  • 298. Twarql (Real-Time Event Following) Twarql (Real-Time Event Following) Twarql (Real-Time Event Following) Twarql Delivers tweets in Real-Time Leverages background knowledge Constraint (Sparql – Query) Doozer Mines Wikipedia Automatically created domain model http://wiki.knoesis.org/index.php/Twarql http://knoesis.wright.edu/research/ModelCreation/
  • 299. Continuous Semantics(Cycle) Filter tweets (Twarql) Leverage automatically created domain models Extract Event descriptors (Twarql) From the filtered tweets. Create Domain models (Doozer) Use extracted event descriptors and Back to (1). Continuous Semantics(Cycle)
  • 300. Twarql(Real-Time Event Following) Architecture New addition
  • 301. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 223 Outline
  • 303. Twitris - Motivation Image: http://bit.ly/etFezl 1. Information Overload Multiple events around us WHAT to be aware of Multiple Storylines about same event!! 225
  • 304. Twitris - Motivation 2. Evolution of Citizen Observation with location and time  226
  • 305. Twitris - Motivation  3. Semantics of Social perceptions What is being said about an event (theme) Where (spatial) When (temporal ) Twitrislets you browse citizen reports using social perceptions as the fulcrum 227
  • 306. Twitris: Semantic Social Web Mash-up Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media 228 228
  • 307. Twitris: Architecture 229 MeenakshiNagarajan, KarthikGomadam, AmitSheth, AjithRanabahu, RaghavaMutharaju and AshutoshJadhav, ‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553, Oct 5-7, 2009.
  • 310. Twitris: Event Summarization 2  Sentiment Analysis using statistical and machine learning techniques  232
  • 311. Twitris: Event Summarization 3 Entity-relationship graph  using semantically annotated DBpedia entities mentioned in the tweets  233
  • 312. Twitris: Demo, Quick Show  http://twitris.knoesis.org/ Many other interesting efforts – Eg: Vivek K. Singh, MingyanGao, and Ramesh Jain. 2010. From microblogs to social images: event analytics for situation assessment. In Proceedings of the international conference on Multimedia information retrieval (MIR '10). ACM, New York, NY, USA, 433-436. 234
  • 313. Citizen Sensing Overview, Social Signals, Enablers Role of Social Media Activism, Journalism, Business Intelligence, Global Development Development-Centric Platforms Beginnings, Architectures and Possibilities Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris Conclusion & Future Work 235 Outline
  • 314. Citizen Sensing Role of Social Media Development-Centric Platforms Systematic Study of Social Media Spatio-Temporal-Thematic + People-Content-Network Analysis Trustworthiness in Social Media Mobile Social Computing Citizen Sensing @ Real-time Research Application: Twitris 236 Conclusion & Future Work
  • 315. Twitris: Coordination Great role in military and NGO rescue operations during emergencies: Haiti and Chile Earthquakes 237
  • 316. Twitris: Coordination Coordinating needs and resources in disaster situation Analyze SMS and Web reports from disaster location Use domain models for efficient and timely coordination 238 Image: http://bit.ly/hcp4PG 238
  • 317. Twitris: CommunityFormation Homophily in society Bond of common interest TRUST factor 239
  • 318. Do you have a sense of immense opportunity of analyzing citizen sensing for useful social signals? Do you appreciate the broad range of issues and challenges? Did we present examples and a few insights into how to address some unique challenges? Did spatio-temporal-thematic and people-content-network dimensions present reasonable way to organize vast number of relevant research challenges and techniques? Do you have more material to follow up on topics of interest? 240 Conclusions
  • 319. References [Mustafaraj et al. 2010] E. Mustafaraj and P. Metaxas, ‘From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search’, In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (April 2010). [Anantharam et al. 2010] P. Anantharam, C. A. Henson, K. Thirunarayan, and A. P. Sheth, 'Trust Model for Semantic Sensor and Social Networks: A Preliminary Report', National Aerospace & Electronics Conference (NAECON), Dayton Ohio, July 14-16th, 2010. [Thirunarayan et al. 2009] K. Thirunarayan, D. K. Althuru, C. A. Henson, and A. P. Sheth, 'A Local Qualitative Approach to Referral and Functional Trust,' In: Proceedings of the The 4th Indian International Conference on Artificial Intelligence (IICAI-09), pp. 574-588, December 2009. [Gruhl et al. 2010] D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, A. Sheth, ‘Multimodal Social Intelligence in a Real-Time Dashboard System,’ in a special issue of the VLDB Journal on 'Data Management and Mining for Social Networks and Social Media, 2010. 241
  • 320. References 2 [O’Connor et al. 2010] B.O’Connor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith, ‘From Tweets to polls: Linking text sentiment to public opinion time series’, In International AAAI Conference on Weblogs and Social Media, Washington,D.C., 2010. [Asur et al. 2010] S. Asur and B. A.Huberman, ‘Predicting the Future With Social Media’, 2010. http://arxiv.org/abs/1003.5699 [Sheth et al. 2009] A. Sheth, ‘Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness’, February 17, 2009. [Brin et al. 1998] S. Brin and L. Page, ‘The anatomy of a large-scale hypertextual Web search engine,’ Computer Networks and ISDN Systems, Vol 30, 1-7, 1998. 242
  • 321. References 3 [Sheth et al. 2010] A. Sheth, C. Thomas, and P. Mehra, ‘Continuous Semantics to Analyze Real-Time Data,’ IEEE Internet Computing, November-December 2010, pp. 80-85. [Nagarajan et al. 2010] M. Nagarajan, H. Purohit, and A. Sheth,  ’A Qualitative Examination of Topical Tweet and Retweet Practices,’ 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010. [Romero et al. 2010] D. Romero, W. Galuba, S. Asur, and B. Huberman, ‘Influence and Passivity in Social Media,’ Arxiv preprint, arXiv:1008.1253, 2010. [Leskovec et al. 2009] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. ‘Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters’, Internet Mathematics, 6(1):29{123, 2009. [Cha et al. 2010] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi, ‘Measuring user influence in twitter: The million follower fallacy,’ In ICWSM'04, 2010. [Pang et al. 2008] B. Pang and L. Lee. ‘Opinion mining and sentiment analysis,’ Foundations and Trends in Information Retrieval, 2(1-2):1-135, 2008. 243
  • 322. References 4 [Pang et al. 2002] B. Pang, L. Lee, and S. Vaithyanathan, ‘Thumbs up? Sentiment classification using machine learning techniques,’ In Proceedings of EMNLP, pages 79–86, 2002. [Turney 2002] P. Turney, ‘Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,’ In Proceedings of ACL, pages 417–424, 2002. [Thelen et al. 2002] M. Thelen and E. Riloff, ‘A bootstrapping method for learning semantic lexicons using extraction pattern contexts,’ In Proceedings of EMNLP, pages 214-221, 2002. [Yi et al. 2003] J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack, ‘Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques,’ In Proceedings of ICDM, 2003. [Nagarajan et al. 2009, WI] M. Nagarajan, K. Baid, A. Sheth, and S. Wang, ‘Monetizing User Activity on Social Networks - Challenges and Experiences,’ 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 : pp. 92-99, 2009 244
  • 323. References 5 [Hu et al. 2004] M. Hu and B. Liu, ‘Mining and summarizing customer reviews,’ In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 168–177, 2004. [Ding et al. 2008] X. Ding, B. Liu, and P. S. Yu, ‘A holistic lexicon-based approach to opinion mining,’ In Proceedings of WSDM, 2008. [Jijkoun et al. 2010] V. Jijkoun, M. d. Rijke, and W. Weerkamp, ‘Generating Focused Topic-specific Sentiment Lexicons,’ In Proceedings of ACL, 2010. [Bethard et al. 2004] S. Bethard, H. Yu, A. Thornton, V. Hatzivassiloglou, and D. Jurafsky, ‘Automatic extraction of opinion propositions and their holders,’ In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text, 2004. [Hu et al. 2004] M. Hu and B. Liu, ‘Mining opinion features in customer reviews,’ In Proceedings of AAAI, pages 755-760, 2004. [Dave et al. 2003] K. Dave, S. Lawrence, and D. M.Pennock, ‘Mining the peanut gallery: Opinion extraction and semantic classification of product reviews,’ In Proceedings of WWW, pages 519–528, 2003. 245
  • 324. References 6 [Popescu et al. 2005] A.-M. Popescu and O. Etzioni, ‘Extracting product features and opinions from reviews,’ In Proceedings of HLT/EMNLP, 2005. [Choi et al. 2009] Y. Choi, Y. Kim, and S.-H. Myaeng, ‘Domain-specific sentiment analysis using contextual feature generation,’ In Proceeding of the International CIKM workshop on Topic-sentiment analysis for mass opinion (TSA), 2009. [Kobayashi et al. 2004] N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, and T. Fukushima, ‘Collecting evaluative expressions for opinion extraction,’ In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), 2004 [A. Sheth 2009] A. Sheth, 'Citizen Sensing,Social Signals, andEnriching Human Experience', IEEE Internet Computing, July/August 2009, pp. 80-85. [Nagarajan et al. 2009, WISE] M. Nagarajan, K. Gomadam, A. Sheth, A. Ranabahu, R. Mutharaju, and A. Jadhav, ‘Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences,’ Tenth International Conference on Web Information Systems Engineering, 539 - 553, Oct 5-7, 2009. 246
  • 325. References 7 [Kleinberg 1999] J. Kleinberg, ‘Authoritative sources in a hyperlinked environment,’ Journal of the ACM 46 (5): 604 -632, 1999. [Albert et al. 2002] R. Albert and A.L. Barabasi, ‘Statistical Mechanics of Complex Networks,’ Rev. Modem Physics,’ vol. 74, no. 1, pp. 47-97, 2002. [Weng et al. 2010] J. Weng and E. Lim and J. Jiang, and Q. He, ‘TwitterRank: nding topic-sensitive influential twitterers,’ WSDM, 2010. [Banerjee et al. 2009] N. Banerjee, D. Chakraborty, K. Dasgupta, S. Mittal, A. Joshi, S. Nagar, A. Rai, and S. Madan, ‘User interests in social media sites: an exploration with micro-blogs,’ CIKM '09. [Ritter et al. 2010] A. Ritter, C. Cherry, and B. Dolan, ‘Unsupervised modeling of Twitter conversations,’ InHuman Language Technologies: ACL (HLT '10), 2010. [Watts et al. 1998] D.J. Watts and S.H. Strogatz, ‘Collective dynamics of 'small-world' networks,’ Nature 393 (6684): 409–10, 1998 [Newman et al. 2006] M. E. J. Newman, D. J. Watts, ‘The structure and dynamics of network,’ Princeton University Press, 2006 [Wasserman et al. 1992] Wasserman and Faust, ‘Social Network Analysis’, 1992 247
  • 326. References 8 [Easley et al. 2010] D. Easley, J. Kleinberg, ‘Networks, Crowds, and Markets: Reasoning About a Highly Connected World,’ Cambridge University Press, 2010. [Marin et al. 2010] A. Marin and B. Wellman, ‘Handbook of Social Network,’ Analysis, 2010. [Balakrishnan 2006] H. Balakrishnan, ‘Algorithms for Discovering Communities in Complex Networks. Ph.D. Dissertation. University of Central Florida,’ Orlando, FL, USA. Advisor(s) NarsinghDeo. 2006. [Choudhury et al. 2010] M. D. Choudhury, Y-R. Lin, H. Sundaram, K. S. Candan, L. Xie, and A. Kelliher, ‘How Does the Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?’, ICWSM 2010 [Leskovec et al. 2007] J. Leskovec, L. A. Adamic, and B. A. Huberman, ‘The dynamics of viral marketing,’ ACM Trans. Web 1, 1, Article 5, May 2007. [Purohit et al. 2011] H. Purohit, Y. Ruan, A. Joshi, S. Parthasarathy and A. Sheth. Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter. To appear in SoME'11 (Workshop on Social Media Engagement, in conjunction with WWW 2011) 248
  • 327. References 9 [Pablo et al. 2010] P. Mendes, A. Passant, P. Kapanipathi, and A. Sheth‘Linked Open Social Signals,’ WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010. [Pablo et al. 2010] P. Mendes, P. Kapanipathi, and A. Passant, ‘Twarql: Tapping into the Wisdom of the Crowd,’. Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 1-3 September 2010. (Winner of Triplification Challenge 2010) http://www.nascio.org/events/2009Midyear/documents/NASCIO-KeynoteNoveck.pdf http://citizensensing.posterous.com [Ellison et al. 2006] N. Ellison, R. Heino, and J. Gibbs. ‘Managing Impressions Online: Self-Presentation Processes in the Online Dating Environment’. Journal of Computer-Mediated Communication 11(2):415–441. 2006 249
  • 328.
  • 329. References 10 [Dondi et al. 2006] P. Dondio, S. Barrett, S. Weber, and J. Seigneur, ‘Extracting Trust from Domain Analysis: A Case Study on the Wikipedia Project,’ In: Autonomic and Trusted Computing In Autonomic and Trusted Computing, Vol. 4158 (2006), pp. 362-373. [Liu et al. 2008] H. Liu, E. P. Lim, H. W. Lauw, M. T. Le, A. Sun, J. Srivastava, Y. A. Kim, ‘Predicting trusts among users of online communities: an epinions case study,’ In: EC '08: Proceedings of the 9th ACM conference on Electronic com-merce (2008), pp. 310-319. [Benevenuto et al. 2010] F. Benevenuto, G. Magno, T. Rodrigues, V. Almeida, ‘Detecting Spammers on Twitter,’ In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (July 2010). [Ratkiewicz et al. 2010] J. Ratkiewicz, M. Conover, M. Meiss, B. Gonalves, S. Patil, A. Flammini, F. Menczer, ‘Detecting and Tracking the Spread of As-troturf Memes in Microblog Streams,’ CoRR, Vol. abs/1011.3768, 16th November 2010. [Ziegler et al. 2004] C.-N. Ziegler and G. Lausen, ‘Spreading Activation Models for Trust Propagation,’ In Proceedings of the IEEE International Conference on e-Technology, e-Commerce, and e-Service, 2004. 251
  • 330. References 10 [Moturu et al. 2009] S. T. Moturu and H. Liu, ‘Evaluating the Trustworthiness of Wikipedia Articles through Quality and Credibility’, The 5th International Symposium on Wikis and Open Collaboration, Florida, Oct 25-27, 2009. [Jøsang et al. 2002] A. Jøsang and R. Ismail, ‘The Beta Reputation System,’ Proceedings of the 15th Bled Conference on Electronic Commerce, Bled, Slovenia, 17-19 June 2002. [Ganeriwal et al. 2008] S. Ganeriwal, L. K. Balzano, and M. B. Srivastava, ‘Reputation-based frameworkfor high integrity sensor networks,’ ACM Trans. Sen. Netw., Volume 4, issue 3, 1-37,June, 2008. [Sun et al. 2006] Y. L. Sun, W. Yu, Z. Han, and K.J.R. Liu, ‘Information theoretic framework of trust modeling and evaluation for ad hoc networks,’ Selected Areas in Communications, IEEE Journal on , vol.24, no.2, pp. 305- 317, Feb. 2006 [Golbeck et al. 2005] J. Golbeck and J. Hendler, 'Inferring Trust Relationships in Web-Based Social Networks,' ACM Transactions on Internet Technology, Vol. 7, 2005. 252
  • 331. About us MeenaNagarajan is a research staff scientist at the IBM Almaden Research Center. Meena completed her dissertation on "Understanding User-Generated Content on Social Media" in 2010 at Wright State University Center of Excellence on Knowledge-enabled Computing (Kno.e.sis). She has collaborated with IBM Research, Microsoft Research, Marti Hearst at University of Berkeley and HP Labs. She has extensively published and served on Program Committees for key conferences. She also has the rare distinction of being invited to give a keynote (http://knoesis.org/library/resource.php?id=731), chair a panel before completing her Ph.D., and was selected for the prestigious NSF CI Fellows award. Her research has played a key role in state of the art social media and analytics systems such as BBC SoundIndex.   Amit Sheth is an educator, researcher and entrepreneur. He is the LexisNexis Ohio Eminent Scholar at the Wright State University, Dayton OH. He directs Kno.e.sis - the Ohio Center of Excellence in Knowledge-enabled Computing (http://knoesis.org, http://bit.ly/coe-k) which works on topics in Semantic, Social, Sensor, and Services computing over the Web, with the goal of advancing from the information age to meaning age. Prof. Sheth is an IEEE fellow and is one of the most highly cited authors in Computer Science (h-index = 68, http://bit.ly/CS-h) and World Wide Web (http://bit.ly/mas-www). He is Editor-in-Chief of the International Journal of Semantic Web & Information Systems, joint- Editor-in-Chief of Distributed & Parallel Databases, series co-editor of two Springer book series and serves on several editorial boards. By licensing his funded university research, he has also founded and managed two successful companies. Several commercial products and many operationally deployed applications have resulted from his R&D. More about Amit: http://knoesis.org/amit   SelvamVelmurugan is a technology entrepreneur, social entrepreneur and a leader/evangelist in use of social technology for development. He serve

Notas del editor

  1. Got carried away with coverage and content – too much material for 3 hours – so the remaining content can be used as background
  2. Got carried away with coverage and content – too much material for 3 hours – so the remaining content can be used as background
  3. - We want to understand meaningful citizen sensor observation  social signals
  4. Source for Stats
  5. Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social.
  6. tweetmeme_url = &apos;http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php&apos;; tweetmeme_source = &apos;rww&apos;; A tweet is filled with metadata - information about when it was sent, by who, using what Twitter application and so on
  7. “isn&apos;t much of life&apos;s meaning found in the play between limits and the infinite?”
  8. Link to media files, context about annotation, a special option to write reviews of movies, books, or links you&apos;re sharing. The ISBN of the book, a link to a preview of the movie and the number of stars in your rating could be included in the Tweet Annotations, Any way you can classify, describe, append or otherwise enrich a Tweet with words or numbers can be included in Annotations.
  9. Interest level:(Based on Description info, lists and fav. tweets)
  10. Semantic metadata, relationships: Inferred?
  11. Structure Level MetadataCommunity Size - Showing scale: global vs. localCommunity growth rate - Popularity estimation for a topicLargest Strongly Connected Component size - Measuring Reachability in the directed graphNo. of Weakly Connected Components &amp; Max. size - distribution of pre-existing network connections (follower-followee) - Showing Nature: loose vs. compactAverage Degree of Separation - How many hops between two authorsClustering Coefficient - Showing the likelihood of associationRelationship Level MetadataType of Relationship- topic/content (based on Retweet, Entity etc.)- follower/followee (based on structure)Relationship strength- Strong vs. Weak ties based on activity/ communication between users - % tie strengthUser Homophily [Homophily (i.e., &quot;love of the same&quot;) is the tendency of individuals to associate and bond with similar others] based on certain characteristic (e.g., Location, interest etc.)% of users showing similar behaviorReciprocity: mutual relationship- % of users following back their followersActive Community/ Ties- How active is the communication between users or how active are the relationship ties - Average of tie strength based on activity
  12. Pat Hayes
  13. Add some examples of how people store such semantic metadata…When you put social data as LOD, talk about technologies -
  14. Building on foundations of Statistical Natural Language ProcessingInformation ExtractionSemantic Web/ Knowledge RepresentationWe will talk about key issues in extracting metadata from Informal Text and how it varies from what has been done in more well-structured text like news articles etc.
  15. Social Media text is informal for various reasons.. Read red points
  16. Recently two researchers came up with a score to formalize the contextual nature of text and therefore the formality of text. More the available context, more formal the textWe used the same score on SM text and found that …---Score is too limited and probably outdated– does not consider full sentences/structure, does not consider links– similarly network related score would be good to have
  17. What the two tasks look like in terms of outputs they produce
  18. For two types of NE movie and music over two types of SM textUsing those cues
  19. Focus only on one row at a time Cultural entity defn in next slide
  20. What makes cultural entity extraction more difficult
  21. There are two flavors to the Cultural entity recognition problemWhere same entity appears in multiple senses in the same domainWhere same entity appears in multiple senses in different domains
  22. Focusing on the first flavor
  23. Same song occurs as multiple instances in Music Brainz (knowledge base)
  24. Sample real world constraints hard-coded – this work was an experiment into scoping using real-world constraints
  25. As you chop away the domain model you accuracy increases…
  26. This is an application of the NER work
  27. Conclusion in RED
  28. We have come a long way but still room for improvement
  29. Fact can be proven, opinion cannot. An opinion is normally a subjective statement that bases on people&apos;s thoughts, feelings and understandings.
  30. Social media serves as a platform for people to speak their mind more freely, which lead to a growing volume of opinionated data that can be used by:  (1) individuals for suggestion and recommendation(2) companies and organizations for marketing strategies and other decision making process(3) government for monitoring social phenomenons, being aware of potential dangerous situations, etc.
  31. For the task of classification, supervised learning or unsupervised learning techniques can be used. For the review-like data, e.g., movie review or product review, it&apos;s easy to get training data from website like imdb or Amazon. The sentiment classification is different from traditional topic classification since they have different features involved.lexicon-based approach: first creating a sentiment lexicon, and then determining the polarity of a text via some function based on the positive and negative clues within the text, as determined by the lexicon. The idea of bootstrapping is to use the output of an available initial classifier to create labeled data, to which a supervised learning algorithm may be applied. The task of extracting the opinion/holder/target is similar to the traditional information extraction task. The difference is that for this task, the relations between opinion and opinion target are considered important.E.g. proximity, the opinion expression is assumed to be closed to the opinion target in the text. Based on this assumption, if the opinion target is given, then the nearby adjectives can be extracted as opinion candidates.Other possible ways to model the relations between opinion and opinion target include: syntactic dependency, co-occurrence, or manually prepared patterns/rules 
  32.  In this paper, The authors connect measures of public opinion measured from polls with sentiment measured from tweets. They find that a relatively simple sentiment detector based on Tweets replicates consumer confidence and presidential job approval polls.The results highlight the potential of text streams as a substitute and supplement for traditional polling. Positive and negative words are defined by the subjectivity lexicon from OpinionFinder,a word list containing about 1,600 and 1,200 words marked as positive and negative, respectively (Wilson, Wiebe, and Hoffmann, 2005)A message is defined as positive if it contains any positive word, and negative if it contains any negative word. (This allows for messages to be both positive and negative.)
  33. In this paper, the authors demonstrate how social mediacontent can be used to predict real-world outcomes. In particular, they use tweets to forecast box-officerevenues for movies. The results show that the prediction model using the rate at which tweets are created about a movie outperforms the market-based methods. And the sentiments present in tweets about a movie can be used to improve the prediction. The intuition is that a movie that has far more positive than negative tweets is likely to be successful. For the task of sentiment classification of tweets, they use a supervised classifier &quot;DynamicLMClassifier&quot; from LingPipe.Each tweet in the training set is labeled as positive, negative or neutral by workers from Amazon Mechanical Turk.The classifier is trained using the n-gram model. In their work, they use n=8.  they find that the sentiments do provide improvements, although they are not as important as the rate of tweets themselves 
  34. One of the most attractive advantages of unsupervised approaches is that they do not require for training data.Many sentiment analysis applications for social media content use simple lexicon-based method. However, for the problem of target-specific sentiment analysis, it doesn&apos;t work. Based on simple lexicon-based method which use a general sentiment lexicon containing positive/negative/neutral words in the general sense, (1) for the task of &quot;find tweets containing positive opinions about a specific topic&quot;, such as a movie, the results will like the table shows. However, 2,3,5,6,7 don&apos;t contain opinions about the movie. (2) for the task of extract the opinion clues/expressions, the right answers should be like we show in the other picture. However, the simple  lexicon-based method might give all the words with orange color in the table.
  35. We create a general subjective lexicon which contains subjective words in the general sense. This lexicon is created by extending the commonly used subjective lexicon to involve slangs learned from Urban Dictionary.This general lexicon is used for select sentiment units candidates. A bootstrapping method is used to learn domain-dependent sentiment clues from domain-specific corpus. Most of the current lexicons only contain words. We employ statistical models to find words, phrases and patterns which can be used as sentiment clues. Such as &quot;must see&quot;, &quot;want my money/time back&quot;, &quot;don&apos;t miss it&quot; in the movie domain.For the task of identifying opinions towards the given target, we use a syntactic rule-based method as well as proximity model. Since the informal language structures of tweets bring difficulties to the parser, our method just requires a very shallow syntactic parse of tweets.
  36. Refs:http://en.wikipedia.org/wiki/Writing_stylehttp://en.wikipedia.org/wiki/Psychometrics
  37. Metadata from Network Analysis:- Not sufficient to answer the above questions unless we consider context, and hence merge approach (Content + Network) is better
  38. [Example scenario:- Buyer wants to buy a movie dvdMultiple influencers!!!- Key Influencers: Media experts- Peer Influencers: Hiscollegues (the people, buyer interacts face-to-face daily)- Social Influencers: His social circle ]Now how do we find out them?Link Analysis based on structure is not just sufficient ---- SOCIAL MEDIA IS HIGLY COMPLEXThat’s why we need additional context analysis in play- Popularity NOT =Influence! - We need to understand audience, their activity level and interest is of greater importanceHomophily (tendency to follow similar behavior) limits people&apos;s social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience.KLOUT: (http://klout.com/kscore)Reach :: Are your tweets interesting and informative enough to build an audience? How far has your content been spread across Twitter?Amplification:: Probability is the likelihood that your content will be acted upon
  39. Multiple types of users - HOW DO WE FIND OUT THESE TYPES?Does external web (background knowledge) presence of a user tells us more than the limited context available in the network?
  40. User engagement levels: applications in coordination activities Connecting the dots here with NGO initiatives (*presented by Selvam)
  41. User engagement levels: applications in coordination activities Connecting the dots here with NGO initiatives (*presented by Selvam)- Just not limit to Active vs. Passive in general but be specific to topic and then say ‘active’/passive w.r.t. topic (e.g., active for ‘Biology info’ vs. passive for ‘comp. sci. info.’)
  42. Connections/Relationships- Implicit content features
  43. We want to achieve by Network Analysis for social media: - Graph Traversal --- for understanding reachability between people - Community Formation, sustainability for people
  44. We want to analyze these Social networks for understanding various social science studies:- DiffusionHomophily (tendency to follow similar behavior) – based on certain characteristic (demographic, interest etc.) What makes dynamics to be diff. here (factors)
  45. Authoritativenature of the poster or the volume of follower connections did not predict the re-tweet behavior associated with the tweets!
  46. Spammers diverting their attention to social media sites.
  47. This tweet was by Kenneth Cole at the time of Egypt Revolution. Though it uses a hashtag that was used to indicate a tweet on Egypt crisis (#Cairo), the link it has is not connected to Egypt crisis.
  48. This Article was published on Guardian website in Feb 2010. In this article the Director of BBC Peter Horrocks states that the journalists should use social media as the primary source of Information. He took over the position of Director a week back.Now let us consider a scenario where a Journalist wants to follow social signals wants to analyze what news is stirring up today at a particulat location.There is a problem using this since there is Information Overload
  49. This use case requires merging streaming data with background knowledge information (e.g. from DBpedia). Examples of ?category include category:Wi-Fi devices and category:Touchscreen portable media players amongst others. As a result, without having to elicit all products of interest as keywords to lter a stream, a user is able to leverage relationships in background knowledge to more effectively narrow down the stream of tweets to a subset of interest.
  50. How news articles We collected the output of our system for healthcare topic for a time period. We also collected articles from the Nytimes for the same period and put this as the input for our Extraction pipeline. And plotted the occurrence of entities in tweets and in Nytimes articles. We found that both these co-ordinate very well. We then got the events occurred during the peaks of our time plot from timeline.com and nytimes.com and found this result.