2. “Traditional” personalization
on the World Wide Web
64%
48%
48%
23%
23%
23%
20%
16%
11%
9%
7%
5%News clipping services
Personalized content through non-PC devices
Custom pricing
Targeted marketing/advertising
Express transactions
Saved links
Product recommendations
Wish lists
Personal productivity tools
Account access
Customized content
Tailored email alerts
Percent of 44 companies interviewed (multiple responses accepted) Source: Forrester Research
3. More recent deployed personalization
• Personalized search
• Web courses that tailor their teaching strategy to each individual
student
• Information and recommendations by portable devices that consider
users’ location and habits
• Personalized news (to mobile devices)
• Product descriptions whose complexity is geared towards the
presumed level of user expertise
• Tailored presentations that take into account the user’s preferences
regarding product presentation and media types (text, graphics, video)
4. Current personalization methods
(in 30 seconds)
Data sources
• Explicit user input
• User interaction logs
Methods
• Assignment to user groups
• Rule-based inferences
• Machine learning
Storage of data about users
• Persistent user profile
• Updated over time
5. Web personalization delivers benefits for
both users and web vendors
Jupiter Communications, 1998: Personalization at 25 consumer e-commerce
sites increased the number of new customers by 47% in the first year, and
revenues by 52%.
Nielsen NetRatings, 1999:
• Registered visitors to portal sites spend over 3 times longer at their home
portal than other users, and view 3 to 4 times more pages at their portal
• E-commerce sites offering personalized services convert significantly more
visitors into buyers than those that don’t.
Choicestream 2004, 2005:
• 80% interested in personalized content
• 60% willing to spend a least 2 minutes answering questions about themselves
Downside: Personalized sites collect significantly more personal data
than regular websites, and do this often in a very inconspicuous manner.
6. Many computer users are concerned
about their privacy online
Number of users who reported:
• being extremely or very concerned about divulging personal information online:
67% (Forrester 1999), 74% (AARP 2000)
• being (extremely) concerned about being tracked online:
77% (AARP 2000)
• leaving web sites that required registration information:
41% (Boston Consulting 1997)
• having entered fake registration information:
40% (GVU 1998), 27% (Boston Consulting 1997), 32% (Forrester 1999)
• having refrained from shopping online due to privacy concerns, or bought less:
32% (Forrester 1999), 32% 35% 54% : IBM 1999, 24% (AARP 2000)
• wanting internet sites ask for permission to use personal data: 81% (Pew 2000)
• being willing to give out personal data for getting something valuable in return:
31% (GUV 1998), 30% (Forrester 99), 51% (Personalization Consortium)
7. Privacy surveys do not seem to predict
people’s privacy-related actions very well
Harper and Singleton, 2001
Personalization Consortium
• In several privacy studies in E-commerce contexts, discrepancies
have already been observed between users stating high privacy
concerns but subsequently disclosing personal data carelessly.
• Several authors therefore challenge the genuineness of such
reported privacy attitudes and emphasize the need for experiments
that allow for an observation of actual online disclosure behavior.
8. Either Personalization or Privacy?
• Personal data of computer users are indispensable for
personalized interaction
• Computer users are reluctant to give out personal data
☛ Tradeoff between
privacy and
personalization?
9. The tension between privacy and
personalization is more complex than that…
• Indirect relationship between
privacy and personalization
• Situation-dependent
• Many mitigating factors
People use “privacy calculus” to decide
whether or not to disclose personal data,
e.g. for personalization purposes
10. Privacy-Enhanced Personalization
How can personalized systems
maximize their personalization
benefits, while at the same time
being compliant with the privacy
constraints that are in effect?
Can we have good
personalization and good
privacy at the same time?
11. Privacy constraints,
and how to deal with them
Privacy constraints
A. People’s privacy preferences in a given situation
(and factors that influence them)
B. Privacy norms (laws, self-regulation, principles)
Reconciliation of privacy and personalization
1. Use of privacy-enhancing technology
2. Privacy-minded user interaction design
12. Privacy constraints,
and how to deal with them
Privacy constraints
A. People’s privacy preferences in a given situation
(and factors that influence them)
B. Privacy norms (laws, self-regulation, principles)
Reconciliation of privacy and personalization
1. Use of privacy-enhancing technology
2. Privacy-minded user interaction design
13. A. Factors that influence people’s
willingness to disclose information
Information type
– Basic demographic and lifestyle information, personal
tastes, hobbies
– Internet behavior and purchases
– Extended demographic information
– Financial and contact information
– Credit card and social security numbers
Data values
– Willingness to disclose certain data decreases with
deviance from group average
– Confirmed for age, weight, salary, spousal salary, credit
rating and amount of savings
14. Valuation of personalization
“the consumers’ value for personalization is almost
two times […] more influential than the consumers’
concern for privacy in determining usage of
personalization services”
Chellappa and Sin 2005
What types of personalization do users value?
• time savings, monetary savings, pleasure
• customized content, remembering preferences
15. Trust
Trust positively affects both people’s stated willingness
to provide personal information to websites, and their
actual information disclosure to experimental websites.
Antecedents of trust
• Past experience (of oneself and of others)
Established, long-term relationships are specifically important
• Design and operation of a website
• Reputation of the website operator
• Presence of a privacy statement / seal
• Awareness of and control over the use of personal data
16. Privacy constraints,
and how to deal with them
Privacy constraints
A. People’s privacy preferences in a given situation
(and factors that influence them)
B. Privacy norms (laws, self-regulation, principles)
Reconciliation of privacy and personalization
1. Use of privacy-enhancing technology
2. Privacy-minded user interaction design
17. B. Privacy norms
• Privacy laws
More than 40 countries and 100 states and provinces worldwide
• Industry self-regulations
Companies, industry sectors (NAI)
• Privacy principles
– supra-national (OECD, APEC)
– national (Australia, Canada, New Zealand…)
– member organizations (ACM)
Several privacy laws disallow a number of frequently
used personalization methods unless the user’s consents
18. Privacy constraints,
and how to deal with them
Privacy constraints
A. People’s privacy preferences in a given situation
(and factors that influence them)
B. Privacy norms (laws, self-regulation, principles)
Reconciliation of privacy and personalization
1. Use of privacy-enhancing technology
2. Privacy-minded user interaction design
20. Privacy-Enhancing Technology
for Personalized Systems
Caveats
• PETs are not “complete solutions” to the privacy
problems posed by personalized systems
• Their presence is also unlikely to “charm away” users’
privacy concerns.
PETs should be used judiciously in a user-oriented
system design that takes both privacy attitudes and
privacy norms into account
21. Pseudonymous users and user models
• In most cases privacy laws do not apply any more when users cannot
be identified with reasonable means.
• Pseudonymous users may be more inclined to disclose personal data
(but this needs still to be shown!), leading to better personalization.
• Anonymity is currently difficult and/or tedious to preserve when pay-
ments, physical goods and non-electronic services are exchanged.
• Harbors the risk of misuse.
• Hinders vendors from cross-channel personalization.
• Anonymity of database entries, web trails, query terms, ratings and texts
can be surprisingly well defeated by an attacker who has identified data
available that can be partly matched with the “anonymous” data.
Users of personalized systems can receive full personalization
and remain anonymous (users are unidentifiable, unlinkable and
unobservable for third parties, linkable for personalized system)
+
–
22. • A website with personalization at the client side only is generally not
subject to privacy laws.
• Users may be more inclined to disclose their personal data if
personalization is performed locally upon locally stored data only.
• Popular user modeling and personalization methods that rely on an
analysis of data from the whole user population cannot be applied any
more or will have to be radically redesigned.
• Server-side personalization algorithms often incorporate confidential
rules / methods, and must be protected from unauthorized access.
Trusted computing platforms
Client-side Personalization
Users’ data are all located at the client rather than the server side.
All personalization processes that rely on this data are exclusively
carried out at the client side.
+
–
23. Privacy-enhancing techniques for
collaborative filtering
• Traditional collaborative filtering systems collect large amounts of user data in a
central repository (product ratings, purchased products, visited web pages),
• Central repositories may not be trustworthy, can be targets for attacks.
Privacy-protecting techniques
Distribution: No central repository, but interacting distributed clusters that contain information
about a few users only.
Aggregation of encrypted data
– Users privately maintain their own individual ratings
– Community of users compute an aggregate of their private data with homomorphic encryption and
peer-to-peer communication
– The aggregate allows personalized recommendations to be generated at the client side
Perturbation: Ratings become systematically altered before submission to the central server, to hide
users’ true values from the server.
Obfuscation
– A certain percentage of users’ ratings become replaced by different values before the ratings are
submitted to a central server for collaborative filtering.
– Users can then “plausibly deny” the accuracy of any of their data.
24. Privacy constraints,
and how to deal with them
Privacy constraints
A. People’s privacy preferences in a given situation
(and factors that influence them)
B. Privacy norms (laws, self-regulation, principles)
Reconciliation of privacy and personalization
1. Use of privacy-enhancing technology
2. Privacy-minded user interaction design
25. 2. Privacy-minded user interaction design
Two experiments on the privacy-minded
communication of websites’ privacy practices
26. Current “industry standard”:
Privacy statements
Privacy statements…
• are regarded as very important by 76% (DTI 2001)
• make 55% more comfortable providing personal
information (Roy Morgan 2001, Gartner 2001)
• are claimed to be viewed by 73% [“always” by 26%]
(Harris Interactive 2001)
• are effectively viewed by only 0.5% (Kohavi 2001), 1%
(Reagan 2001)
• are several readability levels too difficult (Lutz 04,
JensenPotts 04)
27. Our counterproposal: A design pattern for
personalized websites that collect user data
1. Traditional link to global privacy statement
• Still necessary for legal reasons
2. Additionally, contextualized local communication of privacy
practices and personalization benefits
• Break long privacy policies into small, understandable pieces
• Relate them specifically to the current context
• Explain privacy practices as well as personalization benefits
Design patterns constitute descriptions of best practices based on research
and application experience. They give designers guidelines for the efficient
and effective design of user interfaces.
Every personalized site that collects user data should include the following
elements on every page:
28. An example webpage based on
the proposed design pattern
Traditional link to a
privacy statement
Explanation of
privacy practices
Explanation of
personalization benefits
29. The tested “experimental new version
of an online bookstore”
Contextualized short description of relevant
privacy practices
(taken from original privacy statement)
Contextualized short description of relevant
personalization benefits
(derived from original privacy statement)
Links to original privacy state-ment
(split into privacy, security and
personalization notice)
“Selection counter”
30. Traditional version,
with link to privacy statement only
Links to original privacy state-
ment (split into privacy, security
and personalization notice)
“Selection counter”
31. Why not simply ask users
what they like better?
• Inquiry-based empirical studies
Reveal aspects of users’ rationale that cannot be inferred
from mere observation
• Observational empirical studies
Reveal actual user behavior which may differ from users’
stated behavior
This divergence seems to be quite substantial in the case of
stated privacy preferences vs. actual privacy-related behavior
32. Experimental Procedures
(partly based on deception)
1. Instructions to subjects
“Usability test with new version of a well-known online book retailer”
Answering questions to allegedly obtain better book recommendations
No obligation to answer any question, but helpful for better recommendation.
Data that subjects entered would purportedly be available to company
Possibility to buy one of the recommended books with a 70% discount.
Reminder that if they buy a book, ID card and bank/credit card would be checked
(subjects were instructed beforehand to bring these documents if they wish to buy)
2. Answering interest questions in order to “filter the selection
set” (anonymous)
• 32 questions with 86/64 answer options become presented (some free-text)
• Most questions were about users’ interests (a very few were fairly sensitive)
• All “make sense” in the context of filtering books that are interesting for readers
• Answering questions decreased the “selection counter” in a systematic manner
• After nine pages of data entry, users are encouraged to review their entries, and
then to view those books that purportedly match their interests
33. Experimental Procedures (cont’d)
3. “Recommendation” of 50 books (anonymous)
• 50 predetermined and invariant books are displayed (popular fiction, politics, travel, sex
and health advisories)
• Selected based on their low price and their presumable attractiveness for students
• Prices of all books are visibly marked down by 70%, resulting in out-of-pocket expenses
between €2 and €12 for a book purchase.
• Extensive information on every book available
4. Purchase of one book (identified)
• Subjects may purchase one book if they wish
• Those who do are asked for their names, shipping and payment data (bank account or
credit card charge).
5. Completing questionnaires
6. Verification of name, address and bank data (if book purchased)
34. Subjects and experimental design
• 58 economics/business/MIS students of Humboldt
University, Berlin, Germany
(data of 6 students were eventually discarded)
• Randomly assigned to one of two system versions:
a) 26 used traditional version (that ony had links to global
descriptions of privacy, personalization and security)
b) 26 used contextualized version (with additional local information
on privacy practices and personalization benefits for each entry
field)
Hypothesis:
– User will be more willing to share personal data
– User will regard enhanced site more favorably in terms of privacy
35. Results
Control
With contextual
explanations
% increase p
Questions
answered
84% 91% +8% 0.001
Answers given 56% 67% +20% 0.001
Book buyers 58% 77% +33% 0.07
“Privacy has
priority”
3.34 3.95 +18% 0.01
“Data is used
responsibly”
3.62 3.91 +8% 0.12
“Data allowed
store to select
better books”
2.85 3.40 +19% 0.035
ObservationsPerception
36. Privacy, Trust and Personalization
Trust
Willingness
to disclose
personal
data
Quality of
personalization
Perceived
benefits
Perceived
privacy
+
+
+
+
Control of
own data
Understanding
+
+
Purchases
+
+
+
+
+
+
37. Other factors to be studied
With
contextual
explanations
w/o contextual
explanations
Increase p
Questions
answered
91% 84% +8% 0.001
Answers given 67% 56% +20% 0.001
Book buyers 77% 58% +33% 0.07
“Privacy has
priority”
3.95 3.34 +18% 0.01
“Data allowed
store to select
better books”
3.40 2.85 +19% 0.035
“Data is used
responsibly”
3.91 3.62 +8% 0.12
ObservationsPerception
• Site reputation
• Is privacy or benefit dis-
closure more important?
• Stringency of privacy
practices
• Permanent visibility of
contextual explanations
• References to full privacy
policy
38. Privacy Finder (Cranor et al.)
P3P
• P3P is a W3C standard that allows a website to create a
machine-readable version of its privacy policy
• P3P policies an be automatically downloaded and evaluated
against an individual's privacy preferences
Privacy Finder is a search engine that
• Displays the search results
• Evaluates the P3P policies of the retrieved sites
• Compares them with the user’s individual preferences
• Displays this information graphically
39. Will online shoppers pay a premium
for privacy? (Tsai et al., WEIS 2007)
• Participants were adults from the general Pittsburgh population
(12.5% privacy-unconcerned were screened out)
• They received a flat fee of $45, and had to buy a battery pack and a sex
toy online (each about $15 with shipping). They could keep the rest.
• Three conditions, each with 16 subjects.
• Subjects had to enter a pre-determined search expression, and were
shown a list of real search results that were artificially ranked and
supplemented with additional information:
– in condition no-info: total price only (with shipping)
– in condition irrelevant-info: total price, accessibility rating
– in condition privacy-info: total price, privacy rating
42. Results
1. Between conditions no-info and accessibility-info:
No significant difference in avg. purchase price
2. Between conditions no-info and privacy info:
Users with privacy-info paid an average premium of $0.59
for batteries, and $0.62 for sex toys (p0.0001)
3. In the absence of prominent privacy information, people
will purchase where price is lowest:
Batteries Sex toys
no-info 83% 80%
irrelevant-info 75% 67%
privacy-info 22% 33%
Percentage of users who
bought at the cheapest site
44. There is no magic bullet for
reconciling personalization with privacy
Effort is comparable to
… making systems secure
… making systems fast
… making systems reliable
45. Privacy-Enhanced Personalization:
Need for a process approach
1. Gain the user’s trust
– Respect the user’s privacy attitude (and let the user know)
• Respect privacy laws / industry privacy agreements
– Provide benefits (including optimal personalization within the given privacy
constraints)
– Increase the user’s understanding (don’t do magic)
– Use trust-enhancing methods
– Give users control
– Use privacy-enhancing technology (and let the user know)
2. Then be patient, and most users will incrementally come forward with
personal data / permissions if the usage purpose for the data and the
ensuing benefits are clear and valuable enough to them.
46.
47. Existing approaches for
catering to privacy constraints
• Largest permissible dominator (e.g., Disney)
– Infeasible if a large number of jurisdictions are involved, since the largest
permissible denominator would be very small
– Individual preferences not taken into account
• Different country/region versions (e.g., IBM)
– Infeasible as soon as the number of countries/regions, and hence the
number of different versions of the personalized system, increases
– Individual preferences not taken into account
• Anonymous personalization (users are not identified)
– Nearly full personalization possible
– Harbors the risk of misuse
– Slightly difficult to implement if physical shipments are involved
– Practical extent of protection unclear
– Individual user preferences not taken into account
48. Our approach
Seeking a mechanism to dynamically select user
modeling components that comply with the currently
prevailing privacy constraints
49. User modeling component pool
Different methods differ in their data requirements,
quality of predictions, and also their privacy implications
data used
demographic
data
user-supplied
data
visited
pages
X
X
X
X X
X X
X X
X X
X X
user
modeling
component
methods used
UMC1
clustering
UMC2
rule-based reasoning
UMC3
fuzzy reasoning with uncertainty
UMC4
rule-based reasoning
UMC5
fuzzy reasoning with uncertainty
UMC6
incremental machine learning
UMC7
one-time machine learning across
sessions
UMC8
one-time machine learning +
fuzzy reasoning with uncertainty
50. Product line architecture
• “The common architecture for a set of related products or
systems developed by an organization.” [Bosch, 2000]
• a PLA includes
– Stable core: basic functionalities
– Options: optional features/qualities
– Variants: alternative features/qualities
• Dynamic runtime selection (van der Hoek 2002):
A particular architecture instance is selected from the
product-line architecture
56. Roadmap for Privacy-Enhanced
Personalization Research
• Study the impacts of privacy laws, industry
regulations and individual privacy preferences on
the admissibility of personalization methods
• Provide optimal personalization while respecting
privacy constraints
• Apply state-of-the-art industry practice for
managing the combinatorial complexity of privacy
constraints
57. Readings…
• A. Kobsa: Privacy-Enhanced Web Personalization.
In P. Brusilovsky, A. Kobsa, W. Nejdl, eds.: The
Adaptive Web: Methods and Strategies of Web
Personalization. Springer Verlag.
• A. Kobsa: Privacy-Enhanced Personalization.
Communications of the ACM, Aug. 2007