1. Identity vs Reputation
What You Will Learn
This paper covers individual identities on the WWW and how tracking users’ interactions can improve
their experience without sacrificing their privacy.
Introduction
Web users have become increasingly savvy about protecting their identity and privacy. At the same
time, web site operators have become savvy about amassing large amounts of customer data and
finding trends to customize user experiences and offerings. To be successful at this, web site operators
need to respect the privacy needs of their users while collecting the information they need to improve
their business. Violation of users’ privacy can result in the loss of the customer as well as government
intervention. This paper provides one approach to meeting these seemingly conflicting goals.
Definitions
Defining the terms around identity and privacy are of critical importance.
Online Identity: A person’s distinct individual online persona. It usually doesn’t include any Personally
Identifiable Information (PII) but consists of a shell – including a nickname and an avatar.
Authentication: The process in which a person’s identity is confirmed online using a verifiable source to
admit them into an online community or website.
Verifiable Source: A verifiable source may be as simple as providing an email address, or may be as
significant as providing a credit card number.
Authorization: The process by which a person becomes approved to enter a website or program,
usually with a user name and/or password.
Personally Identifiable Information (PII): A term used in privacy and legal fields that refers to any
information that can identify a person as a specific individual, such as name, postal or email address,
phone number, occupation, or personal interests. It does not include web pages viewed or links clicked
on, web search terms, time spent on a site, response to advertisements, or system settings such as the
browser used, speed of connection and zip code.
Sensitive Personal Information (SPI): Any information that would permit access to a person’s financial
account, including account number, credit or debit card number, in combination with any required
security code, access code or password.
2. Why Privacy is Important / Why it Matters
The rise of web based applications has made it easy for companies to determine information about their
customers – ranging from their basic demographics to their personal preferences. While this information
can be gathered explicitly through surveys, and forms, it can also be inferred through the user’s actions.
The results may benefit the user in the end, but the method may make them uncomfortable and cause
them to leave the website. The ultimate challenge is balancing the needs of both parties. At the end of
the day though, privacy is measured by the end consumer’s reaction to their experience.
What The User Reveals
In order to meet the requirements of the Children’s Online Privacy Protection Act (COPPA) of 1998,
users may be asked to enter their birthday to verify their eligibility to access certain content. Users
are comfortable revealing this and other basic demographics in order to access many of their favorite
sites. They do however make a conscious decision to limit what they reveal on a site to what they feel is
necessary for the experience. When prompted for information that a user feels is unnecessary, they will
typically provide incorrect information about such things as their birthday or gender.
At the same time, when it comes to social networking sites such as LinkedIn and Facebook, there is a
social norm which causes people to reveal much more accurate information. When there are personal
relationships involved, people feel compelled to provide their real birth date or gender information.
When pictures can be uploaded, the accuracy of the basic information increases even more since
deceptions are more likely to be uncovered.
Beyond the basic information, the accuracy of what users reveal about themselves is much more
impacted by social status and peer pressure than anything else. Stereotypes can be readily found in
individual profiles: for example, men expressing interest in action movies and sports, college students
talking about parties, and women liking romantic movies.
Another area that causes concern for individuals is what they reveal from a financial perspective. As a
result, they often provide false information. Beyond the basic PII information, users may misrepresent
their financial status to boost their self-esteem or to assert themselves as a member of a particular
group. Ironically, this is one piece of information that companies are most interested in to ensure that
they target the right product to the right user.
The most useful information is what a user does when online. Some of the obvious examples are
purchasing choices that are indicative of gender such as a purse or a wallet. More subtle ones come
from participation in groups that have an obvious bias such as a retiree’s discussion group or a visit to an
ecological travel site. These actions when combined with photos that a person may have on his/her Flikr
account or messages posted to online discussion groups can provide a more complete understanding
of an individual.
What is significant here is that the information doesn’t have to come from the user directly. The ability of
Facebook users to tag a photo with all the people in it means that this information can be made available
without the user taking any action.
| 2
3. Historical Mistakes a solution that will maximize profits through the largest audience
possible.
The stakes are huge for companies to get identity and privacy
right. Over the past few years, a number of high-profile incidents At the same time, users expect more of a personalized, intuitive
where PII or SPI was accidentally revealed to the public have experience. It is when the user has a perception of value for
been broadly publicized. what they reveal that they will really see an improvement in their
In 2006, America On-Line released the records of 20 million experience. Users are willing to let Amazon track their purchasing
search keywords from approximately 650,000 of its users done habits because they get better recommendations as a result.
over a three-month period. While the users were not personally They provide accurate rating to Netflix in order to improve the
identified, per se, their searches contained a wealth of PII. Within quality of the movies that it suggests to them. The key to all of this
only a few days, New York Times journalists had determined the is making it obvious to the user that they are the ones benefiting.
identities of many of the searchers, and with permission, revealed With this in mind it is important for web site operators to
the identity of one of the users.1 That user, a 62-year-old Georgia remember that the personal data belongs to the end user. If the
woman, had conducted over 300 searches that were traced back user perceives sufficient value for providing the information
to her, some of which were embarrassing to her. The AOL incident they will readily reveal it. By forcing users to reveal information
was devastating to the company. that they are not ready to, they will either provide inaccurate
Similarly in 2006, Netflix released over 100 million movie ratings information or choose to go elsewhere – in either case the only
made by 500,000 of the company’s subscribers. To protect its loss is to the web site polluting their trend data or losing the
customers’ privacy, the data was made anonymous by removing
any personal details. Only a few weeks later, Arvind Narayanan
Custo
and Vitaly Shmatikov announced that they had de-anonymized me
s rL
the data by comparing the data with publicly available ratings on a u lt ist
es e
movie database called the Internet Movie Database2.
R
ni
ng
or
nit
Most recently, Facebook faced an uproar of criticism over its
Mo
Beacon advertising program which pulls information from
external websites and shares that information with Facebook
users’ friends. Controversy swiftly followed Beacon’s launch
Customer Driven
Quality Improvement
over privacy concerns because the mechanism to opt-in or out
Process
Defin
of program was not clear. Fortunately for Facebook, the concern
Bes
over Beacon did not doom the program. In fact, it continues to
eG
t Pr
operate today, but with a higher level of control given to end users
oa
ac
to permit the sharing of their information. ls
tic
s e
Customer Benefits
Metrics
While data gathering primarily provides feedback to advertisers
and content providers about trends and product interest, it also
provides a significant benefit to all users. When users express customer. A better approach would be to allow the user to clearly
similar interests, content providers can respond by creating retain their privacy and stay with the site and opt for lesser quality
new products or modifying old products to meet the newly recommendations.
discovered interests.
Eventually when the user hears about or otherwise realizes the
This is most evident in the local grocery store which pays value of the sharing, they will gladly provide accurate information.
extremely close attention to the aggregate buying habits of their In return these users expect that information to be kept private.
customers in order to ensure that the right products are always It is when this trust is broken that users will react – when this
on the shelves. No company wants to create a product for a single reaction becomes an uproar, the government gets involved and
user or even track the habits of one person. They are looking for creates new laws to ensure that the privacy is protected.
| 3
4. Privacy Laws and Central American nations tend to take a
sectoral approach to privacy laws.5 6
Charter Communications notified
affected customers, who could opt out
Reactions Privacy laws and regulations continue of the program. However, public interest
Privacy laws in the United States and groups claimed the opt-out system did
to react to the marketplace, with new
across the globe are inconsistent and not prevent users’ activities from being
technologies and processes leading to
continue to evolve. In contrast to the monitored. Two members of the United
more stringent regulation. For instance,
European Union, in the United States States House of Representatives wrote a
the recent emergence of behavioral
there is no over-arching privacy law in letter to Charter expressing their concern
targeting has raised the ire of privacy
place. Instead, the United States takes a that “[a]ny service to which a subscriber
regulators. Service providers along with
more laissez-faire approach that targets does not affirmatively subscribe and that
two companies, Phorm in the UK, and
specific sectors, relying on a combination can result in the collection of information
Nebuad, in the US, have recently found
of legislation, regulation, and self- about the web-related habits and
themselves embroiled in controversy
regulation. For example, U.S. laws are in interests of a subscriber, and achieves
over plans to target customers with
place to address medical privacy, financial any of these results with the ‘prior written
advertisements based on their prior web
institution privacy and children’s privacy. or electronic consent of the subscriber,’
surfing behaviors.7 8 Both companies
raises substantial questions related to
The EU has a comprehensive law4 planned to install deep packet inspection
Section 631 [of the Communications
reflecting the EU’s philosophy that equipment on ISP networks that would
Act].” Behavioral targeting has advanced
while data processing is beneficial, an monitor subscribers’ online activities,
over the years to provide a much more
individual’s fundamental privacy rights build behavioral profiles, and sell the
complete view of users’ behaviors. While,
must be protected. Many consider the EU profiles to advertisers who could use the
the behavioral targeting industry has
to have the most restrictive privacy laws profiles to deliver targeted ads.
attempted to educate consumers on
of any jurisdiction worldwide. Importantly, Privacy regulators in the EU and the benefits of having content tailored to
the EU regulations are implemented by United States questioned whether the individuals, there are still many concerns
each individual member state, which companies obtained informed consent over transparency, the ability to easily opt-
has lead to different interpretations and from end users. BT deployed its system out, and how opt-out data is discarded.
governing regulations. without the knowledge of affected As a result, regulator and lawmakers have
While privacy has historically been given users. European Union Communications proposed legislation and regulation to
low priority in Asia, economic concerns— Commissioner, Viviane Reding, address the privacy concerns around
in particular, the desire to establish voiced her concern that the practice behavioral targeting.
consumer trust in online commerce— breached the EU Privacy and Electronic
have driven a surge in privacy there Communications Regulations 2003
(see “Asia: the new Thought Leader in (PECR)—which implement European
Privacy?”). The Asia Pacific Economic Directives on wiretapping—saying “[i]t
Cooperation group (APEC) approved a set is very clear in E.U. directives that unless
of non-binding privacy principles to assist someone specifically gives authorization
governments in passing comprehensive (to track consumer activity on the Web)
privacy legislation in 2004. In contrast, then you don’t have the right to do that.”
| 4
5. Stratification
It is possible for the industry to meet all of these regulations, the desires of a company, and the needs
of the user by taking a layered approach to the information about a user and the collective actions of a
community. To accomplish this the concept of who a user is can be broken down into three levels.
1. Identity – Used for authentication and authorization
2. Profile (or Persona) – Used to describe an individual
3. Uniquity – Unique identifier used to collect actions
Identity
At the highest level is a user’s identity. This is how a user says “they are who they say they are”. It is
often represented as a combination of a userid and a password, but also can be authenticated through
identification cards, biometrics, certificates, encryption keys, or other security mechanisms. To a user,
this is the most valuable thing that they have because if someone else gets it from them, the user
stands to lose a lot. Given their value, these authentication credentials are a common target of Phishing
attacks.
This identity is often shared among multiple web sites – particularly when the default identity a site
depends upon is an email id to which they send a verification message which requires no 3rd party
involvement. The advent of OpenID technologies ensures that a user can use a common identity to
access multiple sites. One weakness in using an email address/password combination to authenticate
a user is that any compromised site may lead to a user’s identity being compromised on multiple sites.
To a web site operator, the identity portion has very little value other than to authenticate the user.
However, protecting it requires attention to security of not just the data but the actual mechanisms of
authentication. This is necessary to give confidence to the end user that their personal information
can’t be compromised.
Profile
Given the identity, the user has access to their profile(s) where all of their PII and information about their
friends, interests, groups and preferences are stored. While this information is still valuable to the user,
if it is hacked, someone can impersonate the user with the amount of risk based on how much SPI is
taken.
It is important to note that the relationship between an Identity and a Profile is one way. Given an
Identity, it is possible to determine a profile, but starting from a profile does not yield the credentials
that the user gave to create it. This one-way relationship also works in that a user may have multiple
profiles based on the situation that they are in. For example, the user may choose to have a different
public name or picture on a team’s fan site verses when they are on a cooking related site.
The amount of information that they reveal in their profile can vary from site to site based on the user’s
perceived value from the site. Furthermore, as a user creates multiple profiles for the different sites that
they want to participate in, it is incumbent on the user to keep them in sync.
Uniquity
At the lowest level we propose the concept of uniquity that represents a collection of a user’s actions.
It is important that it does not contain any PII. What it does contain is a collection of actions that an
anonymous user has taken. Like the relationship between the Identity and the Profile, you cannot get
| 5