Powerpoint exploring the locations used in television show Time Clash
ASC Research given at the PARC Forum on 2008-05-01
1. Ed H. Chi
Augmented Social Cognition Area
Palo Alto Research Center
Peter Pirolli, Lichan Hong, Bongwon Suh, Les Nelson, Rowan Nairn
Alumni: Raluca Budiu, Bryan Pendleton, Niki Kittur, Todd Mytkowicz
Image from: http://www.flickr.com/photos/ourcommon/480538715/
2008-05-01 Ed H. Chi ASC Overview 1
6. 12 years of work in foraging and sensemaking
Information Scent
– WUFIS / IUNIS (Basic scent modeling algorithms)
[CHI2000,2001]
– Bloodhound (Simulation of web navigation) [CHI2003]
– LumberJack (Log analysis of user needs) [CHI2002]
Foraging
– ScentTrails [TOCHI2003]
– ScentIndex [CHI2004]
– ScentHighlight [IUI2005]
– Visual foraging of highlighted text [to appear, HCII]
– Proximal Search [to be published]
Sensemaking
– Visualization of Web Ecologies [CHI98]
– Visualization Spreadsheets [Infovis97, Infovis99]
2008-05-01 Ed H. Chi ASC Overview 6
11. Groups utilize systems to
make sense and share
complex topics and
materials.
Wikipedia (social status)
Slashdot (karma points)
WikiHow.com
Lostpedia.com
2008-05-01 Ed H. Chi ASC Overview 11
12. Systems that evolve structures
that can be used to organize
information.
Del.icio.us
Flickr
YouTube
Friendster
2008-05-01 Ed H. Chi ASC Overview 12
13. Counting votes
– A way to increase signal‐to‐noise ratio
– Information faddishness
Examples:
– Digg.com
– Most bookmarked items on del.icio.us
– Estimating the weight of an ox or
temperature of a room
– The true value of a stock
– PageRank or Hub / Authority algorithms
2008-05-01 Ed H. Chi ASC Overview 13
14. Col. Information Collaborative
Voting systems
Structures Co-Creation
Digg.com eHow.com
Wikipedia
IBM dogear
PageRank
Slashdot Naver
Del.icio.us Flickr
Heavier
collaboration
2008-05-01 Ed H. Chi ASC Overview 14
15. Col. Information Collaborative
Voting systems
Structures Co-Creation
Digg.com eHow.com
Understanding of Understanding of info Understanding of
Wikipedia
micro-economics and social networks conflicts and
IBM dogear
PageRank coordination
Slashdot Naver
Del.icio.us Flickr
• of foraging [PARC] • Tag network analysis [PARC,
• Wikipedia coordination
Golder, Yahoo]
costs [PARC]
• Personal vs. group
• Structural holes (info brokerage) Heavier
[Huberman, Adamic]
• Invisible Colleges [Sandstrom]
• Wisdom of Crowd [Burt] collaboration effects [Pirolli]
• Interference
• Co-laboratories [Olson and
[Surowieki] • Network constraints and
Olson]
• Information cascades structure [various]
• Community networks / Col.
[Anderson and Holt] • Semantic of semiotic structures /
Problem solving [Carroll]
words [IR, LSA]
2008-05-01 Ed H. Chi ASC Overview 15
16. Cognition: the ability to remember, think, and reason; the faculty of
knowing.
Social Cognition: the ability of a group to remember, think, and
reason; the construction of knowledge structures by a group.
– (not quite the same as in the branch of psychology that studies the
cognitive processes involved in social interaction, though included)
Augmented Social Cognition: Supported by systems, the
enhancement of the ability of a group to remember, think, and
reason; the system‐supported construction of knowledge
structures by a group.
2008-05-01 Ed H. Chi ASC Overview 16
17. Characteriza*on Models
Evalua*ons Prototypes
2008-05-01 Ed H. Chi ASC Overview 17
18. Characteriza*on Models
Evalua*ons Prototypes
2008-05-01 Ed H. Chi ASC Overview 18
23. Wikipedia is the best thing ever. Anyone in the world can write
anything they want about any subject, so you know you’re getting the
best possible information.”
– Steve Carell, The Office
2008-05-01 Ed H. Chi ASC Overview 23
24. Understanding coordination costs is vital for long‐term
viability of collaborative information environment
Data:
– Entire dump on July 2, 2006
– 58 million revisions
– 4.7 million wiki pages
– 2.4 million article pages
– 800 gigabytes
2008-05-01 Ed H. Chi ASC Overview 24
32. Increase in proportion of edits to user talk
0.2
8%
0.18
0.16
0.14
Edit Proportion
0.12
0.1
0.08
0.06
0.04
0.02
0
2001 2002 2003 2004 2005 2006
2008-05-01 Ed H. Chi ASC Overview 32
33. Increase in proportion of edits to user talk
Increase in proportion of edits to procedure
0.2
11%
0.18
0.16
0.14
Edit proportion
0.12
0.1
0.08
0.06
0.04
0.02
0
2001 2002 2003 2004 2005 2006
2008-05-01 Ed H. Chi ASC Overview 33
35. Increase in proportion of edits that are reverts
Increase in proportion of edits reverting vandalism
% Edits (marked Vandalism)
0.03
1-2%
0.025
Edit proportion
0.02
0.015
0.01
0.005
0
2001 2002 2003 2004 2005
2008-05-01 Ed H. Chi ASC Overview 35
36. Conflict and coordination costs are growing
– Less direct work (articles)
+ More indirect work (article talk, user, procedure)
+ More maintenance work (reverts, vandalism)
100%
Maintenance
95%
90%
Percentage of total edits
Other
85%
80%
User Talk
75%
User
70%
Article Talk
65%
Article
60%
2001 2002 2003 2004 2005 2006
2008-05-01 Ed H. Chi ASC Overview 36
37. Characteriza*on Models
Evalua*ons Prototypes
2008-05-01 Ed H. Chi ASC Overview 37
38. Conflict is growing at the global level, and we have
some idea about where it is.
But what defines conflict inside Wikipedia?
Build a characterization model of article conflict
– Identify metrics relevant to conflict
– Automatically identify high‐conflict articles
2008-05-01 Ed H. Chi ASC Overview 38
39. Controversial” tag
Use # revisions tagged controversial
2008-05-01 Ed H. Chi ASC Overview 39
40. Possible metrics for identifying conflict in articles
Metric type Page Type
Revisions (#) Article, talk, article/talk
Page length Article, talk, article/talk
Unique editors Article, talk, article/talk
Unique editors / revisions Article, talk
Links from other articles Article, talk
Links to other articles Article, talk
Anonymous edits (#, %) Article, talk
Administrator edits (#, %) Article, talk
Minor edits (#, %) Article, talk
Reverts (#, by unique
Article
editors)
2008-05-01 Ed H. Chi ASC Overview 40
46. Topics
Concepts
Documents
Users
Noise
Tags
Decoding Encoding
T1…Tn
46
2008-05-01 Ed H. Chi ASC Overview
47. How do we evaluate a tagging system?
Given a tag vocabulary, how effective is it in describing a
set of URLs?
Approach:
– Crawled the del.icio.us bookmark set
– Information theory provides a nice framework for analysis
2008-05-01 Ed H. Chi ASC Overview 47
48. Measures the uncertainty about a particular event associated with a
probability distribution
Thought experiment: drawing colored balls out of a box
– Maximum when p is uniform, no single color predominates
– Minimum when p is 1, only one color
Entropy measure the amount of information associated with a
drawn ball.
2008-05-01 Ed H. Chi ASC Overview 48
49. Entropy increases when
– (a) total number of events x increases
– (b) distribution on X becomes more uniform
Conditional Entropy, H(Y|X)
– Measures how much entropy a random variable Y has
remaining if we have already learned completely the value
of a second variable X.
– Can be understood by thinking about the joint entropy
H(Y|X) = H(X,Y) – H(X)
2008-03-28 ICWSM Poster 49
53. Source: Hypertext 2008 study on del.icio.us (Chi & Mytkowicz)
2008-03-26 Ed H. Chi ASC Overview 53
54. Entropy can be used effectively as a measure for social
tagging systems.
As a map, over time, social tagging systems seems to
lose their ability to guide users efficiently.
– However, there are ways to deal with this pressure.
2008-05-01 Ed H. Chi ASC Overview 54
55. Characteriza*on Models
Evalua*ons Prototypes
2008-05-01 Ed H. Chi ASC Overview 55
60. Factual accuracy
Motives of editors
Uncertain expertise
Volatility
Spotty coverage
Unproven/non‐independent source
2008-05-01 Ed H. Chi ASC Overview 60
61. Social translucent for effective communication and collaboration
[Erickson and Kellogg 2002]
– Make socially significant information visible and salient
– Support awareness of the rules and constraints
– Accountability for actions
Wikis can be a prime candidate
– Every edit is logged and retrievable
– WikiScanner.com: analyze anonymous IP edits
– WikiRage.com: top edits
2008-05-01 Ed H. Chi ASC Overview 61
64. List of every edits that a user made
Let readers examine each individual revision for validity, which is hard to accomplish
when only provided with aggregate visual summaries.
2008-05-01 Ed H. Chi ASC Overview 64
65. Surfacing hidden social context to users
For readers
– Any incidents in the past e.g. A sudden burst of edits?
– Who are the editors?
– What is their motivation / point of views / expertise / topics of
interest
– Help them judging the quality/trustworthiness/usefulness of an
article
For writers
– Measure expertise / contribution / reputation
– Motivate them to be more active / responsible (?)
2008-05-01 Ed H. Chi ASC Overview 65
67. Interaction costs
# People willing to produce for “free”
determine number of
people who participate
Surplus of attention &
motivation at small
transaction costs
Therefore…
Important to keep
interaction costs low
Cost of participation
2008-05-01 Ed H. Chi ASC Overview 67
68. In situ tagging while reading
– No new window
– Clicking vs typing
Tagging + highlighting
2008-05-01 Ed H. Chi ASC Overview 68
69. Intuition: sub‐doc nuggets useful
– Entities, facts, concepts, paragraphs
Annotations attached to paragraphs
Portable across pages and other contents (e.g.
Word documents)
– Dynamic pages
– Duplicate content
2008-05-01 Ed H. Chi ASC Overview 69
78. Crowdsourcing [collaborative co‐creation]
– Is there a wisdom of the crowd in Wikipedia?
Collective Intelligence [folksonomy]
– Are social tags collectively gathered useful for organization of a large
document collection?
Collective Averaging [social attention]
– Does voting systems identify the best quality and most interesting
information for that community?
Participation Architecture [AJAX]
– Does lowering the interaction cost barrier increase participation
productively?
Expertise finding [social networking]
– Does getting experts through social network gets you to better quality
information sooner?
2008-05-01 Ed H. Chi ASC Overview 78
79. Research Vision: Understand how social computing
systems can enhance the ability of a group of
people to remember, think, and reason.
Living Laboratory: Create applications that harness
collective intelligence to improve knowledge
capture, transfer, and discovery.
http://asc‐parc.blogspot.com
http://www.edchi.net
echi@parc.com
Image from: http://www.flickr.com/photos/ourcommon/480538715/
2008-05-01 Ed H. Chi ASC Overview 79
Editor's Notes
<number>PARC FORUM *this week*:Thursday May 1, 4:00 – 5:00 pm, George E. Pake Auditorium at Palo Alto Research Center (www.parc.com/directions) TITLE: \"Enhancing the Social Web through Augmented Social Cognition research\"SPEAKER: Ed Chi, PARC Augmented Social Cognition groupABSTRACT: We are experiencing the new Social Web, where people share, communicate, commiserate, and conflict with each other. As evidenced by Wikipedia and del.icio.us, Web 2.0 environments are turning people into social information foragers and sharers. Users interact to resolve conflicts and jointly make sense of topic areas from “Obama vs. Clinton” to “Islam.”PARC‘s Augmented Social Cognition researchers -- who come from cognitive psychology, computer science, HCI, sociology, and other disciplines -- focus on understanding how to “enhance a group of people’s ability to remember, think, and reason”. Through Web 2.0 systems like social tagging, blogs, Wikis, and more, we can finally study, in detail, these types of enhancements on a very large scale.In this Forum, we summarize recent PARC work and early findings on: (1) how conflict and coordination have played out in Wikipedia, and how social transparency might affect reader trust; (2) how decreasing interaction costs might change participation in social tagging systems; and (3) how computation can help organize user-generated content andmetadata.ABOUT THE SPEAKER: Ed H. Chi is a senior research scientist and area manager of PARC's Augmented Social Cognition group. His previous work includes understanding Information Scent (how users navigate and make sense of information environments like the Web), as well as developing information visualizations such as the \"Spreadsheet for Visualization\" (which allows users to explore data through a spreadsheet metaphor where each cell holds an entire data set with a full-fledged visualization). He has also worked on computational molecular biology, ubiquitous computing systems, and recommendation and personalized search engines. Ed has over 19 patents and has been conducting research on user interface software systems since 1993. He has been quoted in the Economist, Time Magazine, LA Times, Slate, and the Associated Press. Ed completed his B.S., M.S., and Ph.D. degrees from the University of Minnesota between 1992 and 1999. In his spare time, he is an avid Taekwondo black belt, photographer, and snowboarder. ***************************************************This is the final talk in our \"Going Beyond Web 2.0\" speaker series. Previous talks in this series, as well as other recent Forum talks, are available online at www.parc.com/forums.**************************************************To subscribe to future PARC Forum announcements and/or our bimonthly e-newsletter, please visit: www.parc.com/subscriptions.To unsubscribe from Forum announcements, please send an e-mail to info@parc.com specifying the e-mail address you'd like to have removed.
Making sense of this area<number>
<number>
<number>
<number>
<number>
<number>
<number>
<number>
Voting systems: faddishness of information, social dashboardsCol info. Structures: explicit social networksCollaborative Co-creation<number>
Voting systems: faddishness of information, social dashboardsCol info. Structures: explicit social networksCollaborative creation<number>
<number>
<number>
<number>
This clip is from a comedy show, but it raises a serious question as well. What does happen when you have millions of people with different viewpoints all editing the same content? Well, you get a lot of conflict. I’m going to briefly go through an example of conflict that occurred on one of the most heavily edited pages in Wikipedia, which is, <pause>, you guessed it, about our own George W.<number>
<number>
This main article page is all a casual browser would see when visiting the site. However, for each article in Wikipedia there is a corresponding talk page<number>
...and it’s this talk page where much of the discussion and conflict occurs. For example, here’s a conflict started by user Duke53, who believes that the age at which George W received a DUI is an important fact that belongs in the main article. Meanwhile others argue that the information is unencyclopedic, and that Duke53’s continued re-adding of it constitutes vandalism. So even this seemingly small issue has sparked a major controversy, which continues well past what you see on this page. In fact, in Wikipedia each user has their own user page, and the argument spills over to Duke53’s user talk page <number>
here it turns into a discussion of the policies for conflict resolution and what is considered vandalism. Wikipedia has a large number of pages dedicated just to policies and procedures such as conflict resolution, <number>
which themselves are fluid and changing over time. In fact, some of the most heated debates take place on the talk pages<number>
for these policies. These policies and procedures are so important that an admin we surveyed said:
<number>
So what we see is that the proportion of edits going to article pages is decreasing to around 70% of all edits, meaning there is less direct work being done.<number>
<number>
<number>
<number>
But vandalism still only accounts for about 1% of all edits<number>
<number>
<number>
Paste controversial tag picture hereFigure depicting CRC<number>
Selected a set of page metrics which we could scale to compute across large numbers of pages.<number>
This graph is just running the model on the list of controversial topics, it is not x-validation. It’s R-square is actually 0.897.<number>
This graph is just running the model on the list of controversial topics, it is not x-validation. It’s R-square is actually 0.897.<number>
Especially interesting: unique editors DECREASE conflict. Anonymous edits are bad when on the discussion page but not the article page.Change to 1,2,3,4... and up/down arrows<number>
46
<number>
TALK ABOUT MODEL IMPLICATIONS FOR SEARCHHow do we evaluate a search engine?<number>
Add idea of black box and tell story from the side of the box getting tags<number>
4 possibilities:<number>
Vocabulary saturation!shows a marked increase in the entropy of the tag distribution H(T) up until week 75 (mid-2005) at which point the entropy measure hits a plateau. Since the total number of tags keeps increasing, tag entropy can only stay constant in the plateau by having the tag probability distribution become less uniform. What this suggests is that users are having a hard time coming up with “unique” tags. That is to say, a user is more likely to add a tag to del.icio.us that is already popular in the system, than to add a tag that is relatively obscure.<number>
What’s perhaps the most telling data of all is the entropy of documents conditional on tags, H(D|T), which is increasing rapidly (see Figure 4). What this means is that, even after knowing completely the value of tags, the entropy of the document is still increasing. Conditional Entropy asks the question: “Given that I know a set of tags, how much uncertainty regarding the document set that I was referencing with those tags remains?” This measure gives us a method for analyzing how useful a set of tags is at describing a document set. The fact that this curve is strictly increasing suggests that the specificity of any given tag is decreasing. That is to say, as a navigation aid, tags are becoming harder and harder to use. We are moving closer and closer to the proverbial “needle in a haystack” where any single tag references too many documents to be considered useful.<number>
Figure 6 shows the number of tags per bookmark over time. The trend is clearly increasing, complementing the increase in navigation difficulty.<number>
<number>
<number>
<number>
<number>
<number>
<number>
<number>
<number>
<number>
<number>
<number>
What is the valuable problem addressed by this research program? What is the target (user, company, application, market), what is our place in the value chain, and what is the business model to bring value to the target and PARC?<number>
<number>
As you can tell from my demo, what is being tagged are paragraphs. This is based on our intuition that although there are cases where it makes sense to tag the whole document, there are many other cases where the interesting nuggets of information are at the sub-document level, for example, entities, facts, concepts, and paragraphs. Our implementation focuses on paragraphs for now. The key idea is that we compute a unique fingerprint for each paragraph that we encounter. Currently, we use Secure Hash Algorithm to compute the paragraph fingerprint. We are exploring other ways in the future. This simple idea of paragraph fingerprint has also been picked up by other projects in UbiDocs.<number>
Here is an example of duplicate content. Here we have a story at Forbes.com which is about the recent tragedy happening in Minnesota and I annotated part of the story. Here on a different web site, the same story appears and my annotations show up too.<number>
As I browse the web and annotate the pages, one of the things that SparTag.us automatically created for me is a notebook which contains all the paragraphs that I have annotated. Here it shows when I annotated this paragraph. Here is an option that allows me to make my annotations on this paragraph become private. Here are the URLs that I have visited and contain this paragraph. And I can search my notebook against the tags that I specified, the text that I highlighted, the text of the paragraphs that I annotated, or the URLs. By the way, this last one was suggested by Prateek who was a subject in our last user study. And here is a tag cloud which is really a representation of what kind of keywords I have using as tags.<number>
The way that we support social sharing is through a simple user interface like this. Here I designate myself as a fan of Ed, which means that I can see his annotations. When I go to this web page, I see that Ed has been here before and decided to leave some annotations. Of course, I can highlight or tag this paragraph too. Now, if I don’t want to be Ed’s fan anymore, I can remove his name from my friend list. And his annotations disappear too. And because this is done in AJAX, there is no need to reload the page.73
A nice thing about SparTag.us is that when you come to a web page, it sort of tells you what may be interesting to pay attention to. Here it reminds me that these are two paragraphs that I have annotated. Here I see that Ed has annotated this paragraph.
<number>
There are really two facets of tagging. The first is encoding: when you encounter a document, have read or skimmed it and have to generate a few words that describe it. The second side of tagging is retrieval: you find a new document that has several tags attached to it, and you read those tags and the document. The tags may give you an idea about what the document is about.I am going to come back to this distinction later.<number>
Posing the right questions is half of the work.<number><number>