3. Secondary Use/Disclosure
disclosure collection
recipient
individuals
custodian
agent
t
custodian
use
disclosure
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
4. Data Flows
• Mandatory disclosures
• Uses by an agent for secondary
purposes
• Permitted discretionary disclosures for
secondary purposes
• Other disclosures for secondary
purposes
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
5. Obtaining Consent - I
• Sometimes it is not possible or
practical to obtain consent:
– Making contact to obtain consent may
reveal the individual’s condition to others
against their wishes
h h
– The size of the population may be too large
to obtain consent from everyone
– Many patients may have relocated or died
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
6. Obtaining Consent - II
– There may be a lack of existing or
continuing relationship with the patients
– There is a risk of inflicting psychological,
social or other harm by contacting
individuals or their families in delicate
circumstances
– It would be difficult to contact individuals
through advertisements and other public
notices
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
7. Impact of Obtaining Consent
• In the case where explicit consent is
used, consenters and non-consenters
non consenters
differ on:
– age, sex, race, marital status, educational
level, socioeconomic status, health status,
mortality, lifestyle factors, functioning
• The consent rate for express consent
varied from 16% to 93%
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
8. Limiting Principles
• Do not collect, use, or disclose PHI if
other information will serve the
purpose
• For example, even if it is easier to
p,
disclose a whole record, that should
not be done if lesser information will
reasonably satisfy the purpose
• De-identification would be one element
in limiting the amount of PHI that is
i li iti th tf th t i
collected/used/disclosed
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
9. Breaches
• In many large research hospitals and
hospital networks it is simply not
possible to control and manage all of
the databases and data sets that are
created, used, and disclosed for
research
• Breach frequency and severity is
growing
• D id tifi ti
De-identification provides one way to
id t
manage the risks, however
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
10. Trust
• Patients change their behavior if they
perceive a threat to privacy
• This can have a negative impact on the
q
quality of the data that is used for
y
research
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
11. Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
12. Deloitte Survey (2007)
• N=827 respondents in North America
• 43% reported more than 10 privacy breaches
within the last 12 months in their
organizations
• Over 85% reported at least one privacy
breach
• Over 63% reported multiple privacy breaches
requiring notification
• Breaches involving 1000+ records were
reported by 34% of respondents
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
13. Verizon Study
• Based on forensic engagements conducted by
Verizon
• Breaches resulting from external sources:
73%
• Caused by insiders: 18%
• Implicated business partners: 39%
• The median number of records involved in an
e ed a u be o eco ds o ed a
insider breach were 10 times more than an
external breach
• Bi
Biggest causes are errors and hackers
t dh k
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
14. Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
15. HIMSS Leadership Survey
• Survey of healthcare IT executives, n=307
• Conducted in the 2007-2008 timeframe
• 24% of respondents reported that they have
had a security breach in their organization in
the last 12 months
• 16% of respondents reported that they have
had a security breach in their organization in
the last 6 months
• Half indicated that an internal security breach
is a concern to their organizations
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
16. HIMSS Analytics Report
• IT executives and security officers at
healthcare institutions; n=263
• Half of respondents are concerned with
internal inadvertent access to patient data
• 13% indicated that their organization has had
a security breach in the last 12 months
• 80% of these were internal breaches
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
17. Medical Record Breaches 2008
• For all of 2008 (datalossdb.org)
• 83 breaches involving medical records (14%
of total)
• Approx. 7.2 million records involved in these
breaches (21.5% of all records)
(21 5%
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
18. Does this Happen Here ?
• Do you know of any cases where computer
equipment was stolen from a hospital ? Did this
equipment contain personal health information ?
• Do you know if any cases where memory sticks with
data on them were lost ?
• Does anyone email data to their hotmail or gmail
accounts so that they can access them from home
or while travelling ?
• Do people still share passwords ?
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
19. Known Data Leaks
• PHI on second hand computers
• Leaks through peer-to-peer file sharing networks
• PowerPoint files on th I t
P P i t fil the Internet
t
• Password protected files sent by email
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
20. Identity Theft
• William Ernst Black (Edmonton 1999)
• The creation of identity packages using
information about dead children who were
living in one jurisdiction but died in another
($37k for each identity package)
• Example: drug smuggler was caught with
these identity packages
• Example: American getting free medical care
in Canada
iC d
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
21. Patient Concerns
• There is evidence (from surveys) that the general
public has changed their behavior to adjust for
perceived privacy risks wrt th i PHI
idi ik t their PHI:
– 15% to 17% of US adults
– 11% to 13% of Canadian adults
• There is also evidence that vulnerable populations
exhibit similar behaviors (e.g., adolescents, people
with HIV or at high risk for HIV, those undergoing
HIV
genetic testing, mental health patients and battered
women)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
22. Behavior Change - I
• Going to another doctor
• Paying out of pocket when insured to avoid
disclosure
• Not seeking care to avoid disclosure to an employer
or to not be seen entering a clinic by other members
of the community
• Giving inaccurate or incomplete information on
medical historyy
• Asking a doctor not to record a health problem or
record a less serious or embarrassing one
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
23. Behavior Change - II
• 87% of US physicians reported that a patient
had asked them not to include certain
information in their record
• 78% of US physicians reported that they
have withheld information due to privacy
concerns
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
24. S
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
25. Asymmetry Principle - I
• Trust is hard to gain but easy to lose:
– Negative events/news carry more weight than
g y g
positive ones (negativity bias); it is more
diagnostic
– Avoiding loss – people weight negative
information more greatly in an effort to avoid loss
– Sources of negative information appear more
g pp
credible (positive information seems self-serving)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
26. Asymmetry Principle - II
– People interpret information according to their
prior beliefs: if they have negative prior beliefs
then
th negative events will re-enforce that and
ti t ill f th t d
positive events will have little impact
– Undecided individuals tend to be affected more
by negative information
– People with positive prior beliefs may feel
betrayed b negative i f
bt d by ti information/events
ti / t
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
27. Canadian Public - 2007
100
90
80
70
60
46 44
50
40
39 37 37 35
34
40
30
20
10
0
Total BC Alberta Prairies Ont Que Atlantic Territories
In your opinion, how safe and secure is the health
y p ,
information which EXISTS about you?
(5-7 on a 7 pt scale)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
28. Canadian Public - 2003
Agree (5-7)
(5 7)
Neither (4)
Disagree (1-3)
DK/NR
0 10 20 30 40 50 60 70 80 90 100
I really worry that my personal health information
might be used for other purposes in the future
i ht b df th i th f t
which have little to do with my health
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
29. How not to De identify
De-identify
• Just removing the name and address
information is not enough
• It is quite easy to re-identify
individuals from the other data that is
left
• There are a number of public real life
p
examples of re-identification actually
happening
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
30. Example Data With PHI
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
31. Types of Variables
• Identifying variables: variables that
can directly identify a patient
• Quasi-identifiers: variables that can
indirectly identify a patient
y yp
• Sensitive variables: sensitive clinical
information that the patient would not
p
want to be known beyond the circle of
care
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
32. De identified
De-identified Data ?
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
33. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
34. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
35. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
36. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
37. User #4417749
• “tea for good health”
• “numb fingers”, “hand tremors”
numb fingers , hand tremors
• “dry mouth”
• “60 single men
60 men”
• “dog that urinates on everything”
• “landscapers in Lilburn Ga”
landscapers Lilburn, Ga
• “homes sold in shadow lake subdivision
gwinnett county georgia”
georgia
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
38. Thelma Arnold
• 62 year old widow
living in Lilburn Ga
re-identified by the
New York Times
• She has three dogs
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
39. What Happened Next ?
• Maureen Govern, CTO of AOL “resigns”
• Abdur Chowdhury, AOL researcher who
released the data was fired
• Abdur’s boss in the research
department was fired
• Big embarrassment for AOL
g
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
40. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
41. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
42. Examples of Re-identification
Re identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
43. Uniqueness in the US Population
• Studies show that between 63% to
87% of the US population is unique on
their date of birth + ZIP code + gender
• Uniqueness makes it q
q quite easy to re-
y
identify individuals using a variety of
techniques
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
44. Uniqueness in Canadian Population
100%
80%
ques
60%
Percent Uniq
40%
20%
0%
PC
PC + Gender
PC + DoB
1 2 3 4 5 6
PC + DoB + Gender
Number of Characters in Postal Code
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
45. Example
• This example shows the risk of re-
identification using just demographics
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
46. Types of Disclosure
• Identity Disclosure: being able to
determine the identity associated with
a record
• Attribute Disclosure: discovering g
something new about an individual
known to be in the database
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
47. Disclosure and Invasion-of-Privacy
Invasion of Privacy
• An important first criterion is deciding
on the sensitivity of the data and the
potential for harm to the patients from
a secondary use/disclosure
• If the invasion-of-privacy is deemed
low then there may not be a need to
de-identify the data
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
48. Invasion of Privacy
Invasion-of-Privacy - I
• The personal information in the Data is
highly detailed
• The information in the Data is of a
highly sensitive and personal nature
gy p
• The information in the Data comes
from a highly sensitive context
gy
• Many people would be affected if there
was a Data breach or the Data was
processed inappropriately by the
recipient/agent
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
49. Invasion of Privacy
Invasion-of-Privacy - II
• If there was a Data breach or the Data
was processed inappropriately by the
recipient/agent that may cause direct
and quantifiable damages and
measurable injury to the patients
• If the recipient/agent is located in a
different jurisdiction, there is a
possibility, for practical purposes, that
the data sharing agreement will be
difficult to enforce
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
50. Invasion of Privacy
Invasion-of-Privacy – Consent - I
• There is a provision in the relevant
legislation permitting the
disclosure/use of the Data without the
consent of the patients
• The Data was unsolicited or given
freely or voluntarily by the patients
with little expectation of it being
maintained in total confidence
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
51. Invasion of Privacy
Invasion-of-Privacy – Consent - II
• The patients have provided express
consent that their Data can be
disclosed for this secondary Purpose
when it was originally collected or at
some point since then
• The custodian has consulted well-
defined groups or communities
regarding the disclosure of the Data
and had a positive response
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
52. Invasion of Privacy
Invasion-of-Privacy – Consent - III
• A strategy for informing/notifying the
public about potential disclosures for
the recipient’s secondary Purpose was
in place when the data was collected or
since then
• Obtaining consent from the individuals
at this point is inappropriate or
impractical
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
53. Identity Disclosure
• Three common types:
– Prosecutor risk
– Journalist risk
– Rareness
• All three are concerned with the risk of
re-identifying a single individual
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
54. Prosecutor vs. Journalist
• If all of the following is true then
p
prosecutor risk is relevant:
– The data represents the whole population
such that everyone is known to be in it or
the sampling fraction is very high
– If not the whole population, it is possible
for an intruder to know that a particular
p
person has a record in the data
• Patient may self-reveal
• Data collection method is revealing
• Otherwise journalist risk is relevant
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
55. Prosecutor Risk - I
• The intruder has background
information about a specific individual
p
known to be in the database
• The amount of background information
will depend on the intruder
• The intruder is attempting to find the
record belonging to that individual in
the database
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
56. Prosecutor Risk - II
• Examples of intruders:
– Neighbor
g
– Ex-spouse
– Employer
– Relative
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
57. Example
Date of Birth Gender Postal Code Diagnosis
12/03/1957 M K0J 1P0 …
01/7/1978 M K0J 1P0 …
09/12/1968 F K0J 1P0 …
17/08/1987 F K0J 1P0 …
25/02/1974 F K0J 1T0 …
23/05/1985 M K0J 1T0 …
14/03/1965 F K0J 2A0 …
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
58. Selecting Variables – Prosecutor - I
• In the best case assumption, a
neighbor would know:
g
– Address and telephone information about
the VIP
– Household and dwelling information
(number of children, value of property,
type of property)
–KKey dates (births, deaths, weddings)
d t (bi th d th ddi )
– Visible characteristics: gender, race,
ethnicity, language spoken at home,
weight, height, physical disabilities
– Profession
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
59. Selecting Variables – Prosecutor - II
• What would an ex-spouse know:
– The same things that a neighbor would
g g
know
– Basic medical history (allergies, chronic
diseases)
– Income, years of schooling
• All of these variables would be
considered quasi-identifiers if they
appear in the database
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
60. Journalist Risk
• The journalist is not looking for a
specific p
p person – re-identifying any
yg y
person will do
• The journalist has access to a database
that s/he can use for matching
• This is called an identification database
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
61. Journalist Matching Example
Medical Database Identification DB
DoB
DB Name
Clinical Initials
and lab Address
data Gender
Telephone No.
Postal
Code
Quasi-Identifiers
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
62. Assessing Journalist Risk
• In general, we want to know how rare
the quasi-identifier values would be in
q
the population (e.g.,
homeowners/professionals/civil
servants i th geographic area of
t in the hi f
interest)
• If the combination is not rare then
th bi ti i t th
there is small journalist risk
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
63. Selecting Variables – Journalist - I
• Depends on what information can be
obtained in an identification database
• For an external intruder, likely
variables are those available in public
registries:
egist ies
– Key dates (birth, death, marriage)
– Profession
– Home address and telephone number
– Type of dwelling
– Gender, ethnicity, race
– Income if a highly paid public servant
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
64. Selecting Variables – Journalist - II
• Assume that an internal intruder would
be able to get all relevant
g
administrative data:
– Key dates (birth, death, admission,
discharge,
discharge visit)
– Gender, address, telephone number
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
65. Inference of Variables - I
• Even though a particular quasi-
identifier may not be known to the
y
intruder (prosecutor risk), available in
an identification database (journalist),
or available in the disclosed database
(all three risks), it may be possible to
infer it from other variables
• Variables that can be inferred should
be treated as quasi-identifiers
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
66. Inference of Variables - II
• Inferred variables should be added to
the disclosed database if they are not
y
there because they may be used in a
re-identification attack, and you want
to take them into account during risk
assessment
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
67. Inference Examples
• Gender, ethnicity, religious origin from
name
• Age from graduation date
• Profession from payer of insurance
claim (e.g., civil servants have a single
health insurer)
• Age and gender from a diagnostic or
lab code (e.g., mamogram or PSA test)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
68. Rareness
• If individuals are rare on the quasi-
identifiers, then they are at higher
, y g
prosecutor and journalist re-
identification risk
• If an individual has a rare and visible
characteristic/feature, then that also
makes th
k them easier to re-identify (
it id tif (eg,
put an ad in the radio)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
69. Attribute Disclosure
• If there is very little variation on
sensitive variables
• The data set can represent a whole
population or some subset
• Learn something new about a person
without actually finding which record
belongs to them
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
70. A Pragmatic Approach
• It is important to ensure that the
q
quasi-identifiers are plausible for the
p
data and the recipients of the data
• If you select many quasi-identifiers
then that will b definition inc ease the
ill by increase
re-identification risk
• Ideally each selected quasi-identifier
Ideally, quasi identifier
should be associated with a realistic re-
identification scenario
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
71. Constructing an Identification DB
• This may be a single physical database
or a join of multiple sources together
to construct a virtual database
• It will have the quasi-identifiers as well
q
as identity information, but will not
have the sensitive information (e.g.,
clinical or financial details)
• The sources may be public and free,
public and for a fee, or fully
bli df f f ll
commercial
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
72. Examples of Identification DBs - I
• These are databases or sources
(Canada):
– Obituaries: available from newspapers and
funeral homes; there are obituary
aggregator sites that make this simple
h kh l
– PPSR: Private Property Security
Registration; contains information on loans
secured by property (e.g., cars)
– Land Registry: information on house
ownership
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
73. Examples of Identification DBs - II
– Membership Lists: provide comprehensive
listings of professionals (e.g., doctors,
lawyers, civil servants)
– Salary Disclosure Reports: provided by
governments for those earning higher than
a certain threshold
– White Pages: public telephone directory
– Job Sites: CVs posted in public and closed
job web sites
–DDonations: Di l
ti Disclosures of donations to
fd ti t
political parties (include address)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
74. Voter Lists - I
• Cannot legally be used for purposes
outside of an election (in Canada)
( )
• But, a charity allegedly supporting a
terrorist group (Tamil Tigers) was
found by
fo nd b the RCMP to ha e Canadian
have
voter lists
• Volunteers do not necessarily destroy
or dispose of the lists after an election
(and in many cases do not sign
anything b f
thi before th
they get them)
t th )
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
75. Voter Lists - II
• It is not expensive (or difficult) to
become a candidate in an election and
get the voter list:
– Alberta: $500
– BC: $100
– NB: $100 (+nominated by 25 electors)
– Ontario: $100
$
– Quebec: 0$ (+nominated by 100 electors)
• Canadian voter lists do not contain the
DoB ( t)
D B (yet)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
76. Economics of Identification DBs
• Some data sources have a fee for each
individual record/search
• This makes the cost of creating an
identification database quite high
• This may impose a large economic
burden on an intruder and act as a
deterrent from creating identification
databases
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
77. Internal Identification Databases
• An internal intruder may have access
to administrative databases that can
act as Identification DB
• For example, in a hospital an internal
intruder may ha e
int de ma have access to all
admissions; this is not sensitive data
so is less protected but has enough
p g
demographics that it can be good as an
identification database
• Thi puts i t
This t internal i t d
l intruders at a huge
th
advantage
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
78. Internal Access
• An internal intruder can get access to
such an administrative database:
– had access in a previous position but that access
was not revoked
– people in the organization share access credentials,
so the intruder can use someone else’s credentials
to get the administrative database
– has access as part of his/her job and there are no
audit trails
– internal systems are not well protected because
internal people are trusted and intruder knows how
to break-in the system to get the data
break in
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
79. Public Registries
• In the following slides I will explain
how to create identification databases
from public registries in Canada
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
80. Professional Groups - I
We can construct identification databases for specific
professional groups
Membership PPSR
Lists
White Pages
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
81. Professional Groups - II
• College of Physicians and Surgeons of Ontario
• Law Society of Upper Canada
• Professional Engineers O t i
Pf i lE i Ontario
• College of Occupational Therapists
• College of Physical Therapists
• Public servants (eg, GEDS)
• …….
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
82. What is the success rate ?
CPSO LSUC
• Ability to get home postal codes (source: PPSR and 60% 45%
telephone directory)
• Ability to get practice/firm postal codes (source: 100% 100%
CPSO/LSUC)
• Ability to get date of birth (source: PPSR) 40% 45%
• Ability to get gender (source: CPSO/genderizing 100% 100%
LSUC)
• Ability to get initials (source: CPSO/LSUC) 100% 100%
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
83. What is the success rate by gender?
CPSO LSUC
MALE
• Ability to get home postal codes (source: PPSR and 63% 48%
telephone directory)
• Ability to get date of birth (source: PPSR) 45% 48%
FEMALE
• Ability to get home postal codes (source: PPSR and 49% 40%
telephone directory)
• Ability to get date of birth (source: PPSR) 29% 40%
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
84. Homeowners
We can construct identification databases for specific
postal codes
Canada Land PPSR
Post Registry
White Pages
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
85. What is the success rate ?
Ott To
• Ability to get initials 93% 100%
• Ability to get DoB 33% 40%
• Ability to get telephone number 80% 50%
• Ability to get gender 87% 95%
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
86. Re id
Re-id Risk for Homeowners
• The number of households per postal
code is quite small (
q (Ott: 15; To: 20)
; )
• The individuals (homeowners) were
unique on common combinations of
quasi-identifiers (eg, gender and DoB)
• For these individuals re-identification
risk is very high
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
87. Civil Servants - I
• GEDS is on the Internet: Government
Electronic Directory Services
• There are 386,630 individuals in the
federal government (159,652 in
Ontario and 28 046 in Alberta)
28,046
• GEDS has approx. 170,000 entries
• Incomplete because: organizations can
opt-out, some individuals need to opt-
in, and some employees and orgs are
exempted (
d (eg, CSIS DND)
CSIS,
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
88. Civil Servants - II
• We selected a sample of 40 individuals
in health care related federal
departments in Ontario
• Able to get home address for 50%,
home telephone number for 40%,
gender for 100%, DoB for 22.5%
• Provincial governments have similar
sources
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
89. Re identification
Re-identification Threshold
• There is a spectrum of re-identification
risk
• When does the probability of re-
identification become so high that the
information is deemed identifiable ?
• Canadian privacy law tends not to be
precise about this
• Gordon case: serious possibility test
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
90. Canadian Definitions - I
Privacy Law Definition
Ontario PHIPA “Identifying information” means information that identifies an
individual or for which it is reasonably foreseeable in the
circumstances that it could be utilized, either alone or with other
information, to identify an individual.
Nfld PPHI “Identifying information means information that identifies an
Identifying information”
individual or for which it is reasonably foreseeable in the
circumstances that it could be utilized either alone or together
with other information to identify an individual.
Sask THIPA “De-identified personal health information” means personal
health information from which any information that may
reasonably be expected to identify an individual has been
removed.
removed
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
91. Canadian Definitions - II
Privacy Law Definition
Alberta HIA
be a “Individually identifying” means that the identity o the individual
d dua y de y g ea s a e de y of e d dua
who is the subject of the information can be readily ascertained
from the information; “nonidentifying” means that the identity of
the individual who is the subject of the information cannot be
readily ascertained from the information
information.
NB PPIA “Identifiable individual” means an individual can be identified by
the contents of the information because the information includes
the individual s name, makes the individual s identity obvious, or
individual’s name individual’s obvious
is likely in the circumstances to be combined with other
information that includes the individual’s name or makes the
individual’s identity obvious.
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
92. Re identification
Re-identification Risk Spectrum
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
93. Re identification
Re-identification Threshold
• Privacy legislation treats the threshold
in two ways:
y
– Discretionary/permitted disclosures and
uses = threshold can be anywhere along
the spectrum
– Only de-identified information without
consent = information id identifiable or
not; there is no spectrum
• Any systematic approach to dealing
with thresholds must cover both
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
94. Threshold Precedents - I
• We will use healthcare precedents as
an indication of the risk that society
y
has agreed to take:
– The largest probability of re-identification
that i
th t is used in any policy or guideline
di li id li
document in Canada or the US is 0.33
– If the probability is > 0.33 then the
information would certainly be considered
identifiable
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
95. Threshold Precedents - II
– The most common probability of re-
identification used in disclosure control of
health d t i 0 2 ( ll i
h lth data is 0.2 (cell size of 5)
f
– It makes sense that a value of 0.2 would
be used as a “default” risk
default
• Below 0.33 there are many degrees of
de-identification
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
96. Example
• The choice of threshold has a
significant impact on risk assessment
g p
results
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
97. De identification
De-identification Techniques
D1 quasi
identifying
yg identifying
yg
variables variables
D3
D2
Analytics
Heuristics
Randomization Coding
Suppression
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
98. Examples of Analytics
• Table aggregation – disclose only
summary tables
y
• Generalization
• Record or variable suppression
pp
• Geographic aggregation
• Sub-sampling
Sub sampling
• Adding noise
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
99. Common De-identification Heuristic
De identification
• If geographic area has a small
pp
population, then:
,
– Suppress all data from that area
– Aggregate the geographic area
• Applied for a variety of data sets,
including public health data sets
• For many applications this heuristic
results in significant loss of data or
imperils analysis
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
100. Examples
• HIPAA: 20k rule
• Census Bureau: 100k rule
• Statistics Canada: 70k rule
• British Census: 120k rule
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
101. The Problem
• Such generic rules ignore the specific
variables that are included in a data
set
• A smaller cutoff should be used if few
variables are in a data set
• A larger cutoff should be used if many
variables are in a data set
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
102. Automation - I
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
103. Automation - II
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
104. 20,000 70,000 100,000
Our GAPS Models
Province Cutoff Cutoff Cutoff
FSA Pop FSA Pop FSA Pop FSA Pop
Alberta
Alb t 55% 84% 38% 71% 1.4%
1 4% 5% 0 0
British Columbia 68% 87% 46% 70% 1.1% 4% 0 0
Manitoba 59% 88% 39% 68% 0 0 0 0
New Brunswick 20% 51% 4.5% 19% 0 0 0 0
Newfoundland 55% 83% 30% 62% 0 0 0 0
Nova Scotia 47% 82% 16% 43% 0 0 0 0
Ontario 69% 91% 49% 76% 1.4% 5% 0.2% 1%
PEI 57% 90% 43% 79% 0 0 0 0
Quebec 59% 84% 36% 63% 1% 5% 0.25% 0
Saskatchewan 60% 93% 49% 84% 2% 7% 0 2%
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
105. Risk Methodology
• De-identification by itself is not
sufficient:
– Using low thresholds results in rapid data
quality deterioration
– Using high thresholds is perceived as too
risky
– We want to create incentives for the data
recipients to improve their security and
privacy practices
• M th d l
Methodology allows you to select and
ll t lt d
justify a threshold
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
106. Managing Re identification Risk
Re-identification
V A
Amount of
De-identification
-
Risk
Exposure
p
- +
+
Mitigating Invasion-of- Motives &
Controls Privacy
V A
Capacity
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
107. The Tradeoffs
Ability to Re-identify the Data
Low High
g
gating Controls
s
balanced dangerous
Low
C
higher cost
burden on
data recipient
High
Mitig
conservative balanced
lower data quality
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
108. Steps in Risk Methodology
• The methodology has two steps to
evaluate the overall risks
• First we determine the probability of a
re-identification attempt
• Then we determine the re-identification
risk to use
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
109. Determining Pr Re-identification Attempts
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
110. Determining Risk Threshold to Use
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
111. Implementation of Methodology
• An important component of this
methodology is the ability to audit the
gy y
data recipient/agent receiving the data
• Update audits are performed regularly
• Data sharing agreements are put in
place for external recipients and
external agents (internal ones usually
covered by employment agreements)
• The elements in the security maturity
y y
profile are part of the data sharing
agreement
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
112. Compliance Audits
• The audits use a publicly available
checklist
• Audit results would be generally
accepted so that recipients do not need
to get audited repeatedly for different
a dited epeatedl fo diffe ent
disclosures
• Intended to be rapid (one or two day
on-site) and cheap ($1k to $2k)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
113. Example - Pharmacy Data
• Request to CHEO for prescription data
from a commercial data broker
• Concern that this data could potentially
identify patients
• We performed a study to evaluate re-
identification risk and come up with an
anonymous version of the data
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
114. Prescription Records Example
• Patient age in days
• Gender
• Patient gender
• Length of stay in days
• Forward Sortation Area
• Admission date • Quarter and year of admission
• Discharge date • Patient’s region (first character of the
• Diagnosis postal code)
• Dispensed drug • Patient’s age in weeks
• Diagnosis
• Dispensed drug
• Regular third party privacy/security audits
• Breach notification protocols must be in place
B h ifi i l bi l
• Restrictions on further distribution of raw data
• Data destruction provisions
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
115. An Example Deployment
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
116. An Example Deployment
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
117. An Example Deployment
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca