http://forum.stanford.edu/events/2012mobi.php
Title: Location and Language in Social Media
Ed H. Chi
Staff Research Scientist, Google Research
(work done at [Xerox] PARC)
Abstract:
Despite the widespread adoption of social media internationally,
little research has investigated the differences among users of
different languages. Moreover, we know relatively little about how
people reveal their location information. In this talk, I will
outline our recent characterization studies on how users of differing
geographical locations and languages use social media.
First, on geographical location: We found that 34% of users did not
provide real location information in Twitter, frequently incorporating
fake locations or sarcastic comments that can fool traditional
geographic information tools. We performed a simple machine learning
experiment to determine whether we can identify a user’s location by
only looking at what that user tweets.
Second, on language, Examining users of the top 10 languages, we
discovered cross-language differences in adoption of features such as
URLs, hashtags, mentions, replies, and retweets.
We discuss our work’s implications for research on large-scale social
systems and design of cross-cultural communication tools.
Homepage:
edchi.net
Speaker Bio:
Ed H. Chi is a Staff Research Scientist at Google. Until recently, he
was the Area Manager and a Principal Scientist at Palo Alto Research
Center's Augmented Social Cognition Group. He led the group in
understanding how Web2.0 and Social Computing systems help groups of
people to remember, think and reason. Ed completed his three degrees
(B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota, and
has been doing research on user interface software systems since 1993.
He has been featured and quoted in the press, including the Economist,
Time Magazine, LA Times, and the Associated Press.
With 20 patents and over 90 research articles, his most well-known
past project is the study of Information Scent --- understanding how
users navigate and understand the Web and information environments. He
also led a group of researchers at PARC to understand the underlying
mechanisms in online social systems such as Wikipedia and social
tagging sites. He has also worked on information visualization,
computational molecular biology, ubicomp, and recommendation/search
engines, and has won awards for both teaching and research. In his spare time, Ed is an avid Taekwondo martial artist, photographer, and
snowboarder.
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Location and Language in Social Media (Stanford Mobi Social Invited Talk)
1. Stanford Mobi Social Workshop 2012 | Invited Talk !
Location and Language Use in Social Media!
!
Ed H. Chi!
!
Google
Research!
!
Work done while
at Palo Alto
Research Center
(Xerox PARC)!
!
!
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 1
2. What can you do with all this data?
Google Trends
Trendalyzer
Big Data Analytics!!
Google Analytics
Google Website
Optimizer
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 2
3. Model-Driven and Living Laboratory Approach
Characterization Unlock Understanding Intelligent UI
and Modeling of Collective Intelligence and Data-mining
Productization
Living Laboratory Applications / Products
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 3
28. Backstrom et al. 2010
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 28
29. Assumptions about the Location Field!
1. Strongly-typed geo information!
2. Little noise!
3. Good precision!
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 29!
30. 27.7 million
English Tweets 4.6 million 990K+ active
(collected early Twitter Users Twitter users
2010)
Removed
Automatically Randomly Extracted Their
Populated Lat/ Sampled Location Field
Lon Entries 10,000 Entries Entries
(1154)
8846 Manually Entered Twitter Location Field
Entries
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 30
31. 8846 Manually Entered Twitter Location
Field Entries
Two Coders
Powered by human knowledge, the Internet, friends + family, etc.
89%+ Agreement
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 31
32. Study 1: “Geographicness”
!
data quality of the location field
!
“850 n.benson ave!
upland ca”!
18%
Nothing Entered “JoviLand, CA”
!
“Middle Earth”
!
“San Francisco”
!
“Global Citizen”
! 16%!
Non-Valid Geographic! 66% “New Mexico”
!
Information!
“The Moon”
! Some Valid Geographic
Information
“the panhandle”
!
“Worldwide”
!
“Novi Sad,
Serbia, Europe”
!
“kcmo – call the
popo” !
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk
33. Study 1: Non-Geo Information
types of non-geographic information entered into the location field
Information Type # of Users
Popular Culture Reference 195 (12.9%)
Privacy-Oriented 18 (1.2%)
Insulting or Threatening to Reader 69 (4.6%)
Non-Earth Location 75 (5.0%)
Negative Emotion Towards Current Location 48 (3.2%)
Sexual in Nature 49 (3.2%)
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 33
34. Study 1: Non-Geo Information
types of non-geographic information entered into the location field
Information Type # of Users
Popular Culture Reference 195 (12.9%)
Privacy-Oriented 18 (1.2%)
Insulting or Threatening to Reader 69 (4.6%)
Non-Earth Location 75 (5.0%)
Negative Emotion Towards Current Location 48 (3.2%)
Sexual in Nature 49 (3.2%)
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 34
35. Study 1: Popular Culture References
Non-geographic information in the location field in user’s profiles
“BieberTown”
“My World”
“belieber wonderland”
“JaeJoongs heart”
“Next to Waldo :D”
“somewhere in Glambertville”
“Los Angeles, 2019 (GET IT?)”
“Schrute Farms”
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 35
36. Study 1: Non-Geo Information
types of non-geographic information entered into the location field
Information Type # of Users
Popular Culture Reference 195 (12.9%)
Privacy-Oriented 18 (1.2%)
Insulting or Threatening to Reader 69 (4.6%)
Non-Earth Location 75 (5.0%)
Negative Emotion Towards Current Location 48 (3.2%)
Sexual in Nature 49 (3.2%)
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 36
37. Study 1: Privacy References
Non-geographic information in the location field in user’s profiles
“Stalker City”
“Stalking me here isnt enough?”
“MindingMyOwn”
“For me to know n u to find out”
“NONE YA BISNESS”
“UM…STALKER!!”
“kgb answers”
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 37
38. Study 1: Implications
STRONGLY-TYPED
GEOGRAPHIC
INFORMATION
REQUIRED
Geocoder
Latitude and
Longitude
Coordinates
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 38
39. Study 1: Quality Implications
STRONGLY-TYPED
GEOGRAPHIC
INFORMATION
REQUIRED
16% Geocoder
Non-Valid Geographic
Information
Latitude and
Longitude
Coordinates
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 39
40. Study 1: Quality Implications
16%
Yahoo! Geocoder
Non-Valid Geographic
Information
“Stalker City”, “NONE YA
BISNESS”, “Justin Biebers
Heart”, “The Void”, “Redneck
Hell”, “In the Middle of
Nowhere”, “yer mum”,
“BSNBC”, “in God’s Graces’,
etc…
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 40
47. 2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 47
48. Position Information
Implicitly
Self-reported
Sensor-based
Revealed!
Global Positioning System
(GPS)
WiFi Access Point
Cell Phone Towers
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 48
49. Study 2: Country Experiments
Uniform Sampling
Naïve Bayes
United States?
Classifier
Canada?
United Kingdom?
Australia?
72.10% Accuracy
2.91x better than random
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 49
50. Study 2: Country Experiments
Demographically Proportional Sampling
Naïve Bayes
United States?
Classifier
Canada?
United Kingdom?
Australia?
88.86% Accuracy
1.08x better than random
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 50
51. Study 2: State Experiments
Uniform Sampling
California?
Naïve Bayes
Arkansas?
Classifier
New York?
Washington?
Texas?
…
30.28% Accuracy
5.45x better than random
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 51
52. Study 2: State Experiments
Demographically Proportional Sampling
California?
Naïve Bayes
Arkansas?
Classifier
New York?
Washington?
Texas?
…
27.31% Accuracy
1.81x better than random
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 52
53. Study 2: Predictive Words
Word Geography Predictiveness
calgary Canada 419.42
brisbane Australia 137.29
coolcanuck Canada 78.28
afl Australia 56.24
clegg UK 35.49
cbc Canada 29.40
yelp USA 19.80
Word Geography Predictiveness
elk Colorado 90.74
redsox Massachusetts 41.18
biggbi Michigan 24.26
gamecock South Carolina 16.00
crawfish Louisiana 14.87
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 53
54. 1.81x better than random
5.45x better than random
1.08x better than random
2.91x better than random
Tweets Have Implicit
Location Information
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 54
55. This needs to be
considered in the
context of
implicit location
disclosure!
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 55
56. Contributions
!
1. First characterization study of user location
field behavior !
2. Location field behavior is much more complex
than has been assumed!
3. The complexity has implication for geography-
related HCI technologies!
4. Location field behavior must be considered
along with implicit disclosure behavior.!
!
2012-04-04 Stanford Mobi Social Workshop 2012 Invited Talk 56!