SlideShare a Scribd company logo
1 of 21
NISO Lightning Overview:
Identification & “Anonymization”
Micah Altman
Director of Research
MIT Libraries
Prepared for
NISO Workshop on Patron Privacy
Online
May 2015
DISCLAIMER
These opinions are my own, they are not the
opinions of MIT, Brookings, any of the project
funders, nor (with the exception of co-authored
previously published work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about
the future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston
Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert
Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan
Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel,
Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.
Lightning Overview: Identification &
“Anonymization”
2
Collaborators & Co-Conspirators
 Privacy Tools for Sharing Research Data Team
(Salil Vadhan, P.I.)
http://privacytools.seas.harvard.edu/people
 Research Support
Supported in part by NSF grant CNS-1237235
Lightning Overview: Identification &
“Anonymization”
3
Related Work
Main Project:
 Privacy Tools for Sharing Research Data
http://privacytools.seas.harvard.edu/
Related publications:
 Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D., Laevart,
C., et al. (2011). Communicating Science and Engineering Data in the Information
Age. Computer Science and Telecommunications. National Academies Press
 Vadhan, S., et al. 2011. “Re: Advance Notice of Proposed Rulemaking: Human
Subjects Research Protections.”
 Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study: Request for
Information.”
 O'Brien, et al. 2015. “When Is Information Purely Public?” (Mar. 27, 2015) Berkman
Center Research Publication No. 2015-7.
 Wood, et al. 2014. “Long-Term Longitudinal Studies” (July 22, 2014). Berkman Center
Research Publication No. 2014-12.
Slides and reprints available from:
informatics.mit.edu
Lightning Overview: Identification &
“Anonymization”
4
Identifiable private information is common
 Birth date + zipcode +
gender uniquely identify
~87% of people in the U.S.
 Can predict social security
number using
birthdate/place
 Tables, graphs and maps
can reveal identifiable
information
 People have been identified
through movie rankings,
search strings, writing
style…
Brownstein, et al., 2006 , NEJM 355(16),
5 Lightning Overview: Identification &
“Anonymization”
Privacy is not Confidentiality…
(defining basic terms)
 Privacy
Control over extent and circumstances of sharing
 Confidentiality
Control of disclosure information
 Sensitive information
Information that would cause harm if improperly
disclosed
(to individual, institution, social group, or society)
 Private personally identifiable information
 Not already purely public
 Directly or indirectly linkable to an identifiable individual
 Possibly using externally available information
6 Lightning Overview: Identification &
“Anonymization”
Legal Constraints are Complicated
Contract Intellectual
Property
Access
Rights Confidentiality
Copyrigh
t
Fair Use
DMCA
Database Rights
Moral Rights
Intellectua
l
Attribution
Trade
Secret
Patent
Trademark
Common
Rule
45 CFR 26HIPA
AFERP
A
EU Privacy
Directive
Privacy
Torts
(Invasion,
Defamation)
Rights of
Publicity
Sensitive
but
Unclassified
Potentially
Harmful
(Archeologica
l Sites,
Endangered
Species,
Animal
Testing, …)
Classifie
d
FOIA
CIPSE
A
State
Privacy
Laws
EA
R
State
FOI
Laws
Journal
Replication
Requirements
Funder
Open
Access
Contract
License
Click-Wrap
TOU
ITA
Export
Restriction
s
Lightning Overview: Identification &
“Anonymization”
7
Laws define “anonymized” differently
FERPA HIPAA Common
Rule
MA 201 CMR 17
Identificatio
n Criteria
- Direct
- Indirect
- Linked
- Bad intent
- direct/indirect:
18 identifier
- OR statistician
verifies
minimal risk
AND no actual
knowledge of
identified indiviual
- Direct
- Indirect /
Linked -- if
“readily
identifiable”
-First Initial + Last
Name
Sensitivity
Criteria
Any non-
directory
information
Any medical
information
Private
information –
based on harm
Financial, State,
Federal Identifiers
8 Lightning Overview: Identification &
“Anonymization”
Different definitions of identifiability
Lightning Overview: Identification &
“Anonymization”
9
Record-linkage
• “where’s waldo”
• Match a real person to
precise record in a database
• Examples: direct identifiers.
• Caveats: Satisfies
compliance for specific laws,
but not generally; substantial
potential for harm remains
Indistinguishability
+ Heterogeneity
• “hiding in the crowd”
• People can be matched only
to cluster of records
• Based on quasi-ids
• Sensitive attributes must
also vary
• Examples: K-anonymity, l-
diversity, attribute disclosure
• Caveats: Potential for
substantial harms may
remain
Learning
• “privacy, guaranteed”
• Formally bound the total
learning about any individual
that can occur from a query
• Examples: differential
privacy, zero-knowledge
proofs
• Caveats: Challenging to
implement, requires
interactive system
How many things are wrong with this picture?
Name SSN Birthdate Zipcode Gender Favorite
Ice Cream
# of crimes
committed
A. Jones 12341 01011961 02145 M Raspberr
y
0
B. Jones 12342 02021961 02138 M Pistachio 0
C. Jones 12343 11111972 94043 M Chocolat
e
0
D. Jones 12344 12121972 94043 M Hazelnut 0
E. Jones 12345 03251972 94041 F Lemon 0
F. Jones 12346 03251972 02127 F Lemon 1
G. Jones 12347 08081989 02138 F Peach 1
H. Smith 12348 01011973 63200 F Lime 2
I. Smith 12349 02021973 63300 M Mango 4
J. Smith 12350 02021973 63400 M Coconut 16
K. Smith 12351 03031974 64500 M Frog 32
L. Smith 12352 04041974 64600 M Vanilla 64
M. Smith 12353 04041974 64700 F Pumpkin 128
N.
Smi
th-
12354 04041974 64800 F Allergic 256
10 Lightning Overview: Identification &
“Anonymization”
Name SSN Birthdate Zipcode Gender Favorite
Ice Cream
# of crimes
committed
A. Jones 12341 01011961 02145 M Raspberr
y
0
B. Jones 12342 02021961 02138 M Pistachio 0
C. Jones 12343 11111972 94043 M Chocolat
e
0
D. Jones 12344 12121972 94043 M Hazelnut 0
E. Jones 12345 03251972 94041 F Lemon 0
F. Jones 12346 03251972 02127 F Lemon 1
G. Jones 12347 08081989 02138 F Peach 1
H. Smith 12348 01011973 63200 F Lime 2
I. Smith 12349 02021973 63300 M Mango 4
J. Smith 12350 02021973 63400 M Coconut 16
K. Smith 12351 03031974 64500 M Frog 32
L. Smith 12352 04041974 64600 M Vanilla 64
M. Smith 12353 04041974 64700 F Pumpkin 128
N. Smith 12354 04041974 64800 F Allergic 256
What’s wrong with this picture?
Identifier Sensitive
Private
Identifier
Private
Identifier
Identifier Sensitive
Unexpected Response?
Mass resident
FERPA too?
Californian
Twins, separated at birth?
11 Lightning Overview: Identification &
“Anonymization”
Common Approach: Suppress Information for
Data Release
Published Outputs
* Jones * * 1961 021*
* Jones * * 1961 021*
* Jones * * 1972 9404*
* Jones * * 1972 9404*
* Jones * * 1972 9404*
Modal Practice
“The correlation between
X and Y was large and
statistically
significant”
Summary statistics
Contingency table
Public use sample microdata
Information Visualization
Lightning Overview: Identification &
“Anonymization”
12
Help, help, I’m being suppressed…
Name SSN Birthdate Zipcode Gender Favorite
Ice Cream
# of crimes
committed
[Name 1] 1234
1
*1961 021* M Raspberry .1
[Name 2] 1234
2
*1961 021* M Pistachio -.1
[Name 3] 1234
3
*1972 940* M Chocolate 0
[Name 4] 1234
4
*1972 940* M Hazelnut 0
[Name 5] 1234
5
*1972 940* F Lemon .6
[Name 6] 1234
6
*1972 021* F Lemon .6
[Name 7] 1234
7
*1989 021* * Peach 64.6
[Name 8] 1234
8
*1973 632* F Lime 3
[Name 9] 1234
9
*1973 633* M Mango 3
Row
VarSynthetic Global Recode Local Suppression Aggregation
+
Perturbation
Traditional Static Suppression
 Data reduction
 Observation
 Measure
 Cell
 Perturbation
 Microaggregation
 Rule-based data
swapping
 Adding noise
13 Lightning Overview: Identification &
“Anonymization”
Suppression reduces utility
Lightning Overview: Identification &
“Anonymization”
14
 Common approach of anonymizing/suppressing data
reduces usefulness
 Minimizing disclosure in the presence of large
external data sources reduces usefulness a lot
 Anonymized data is not simply less informative -- it
typically yields biased analyses
New Data – New Challenges
 How to deidentify without completely
destroying the data?
 The “Netflix Problem”: large, sparse datasets that
overlap can be probabilistically linked [Narayan
and Shmatikov 2008]
 The “GIS”: fine geo-spatial-temporal data
impossible mask, when correlated with external
data [Zimmerman 2008; ]
 The “Facebook Problem”: Possible to identify
masked network data, if only a few nodes
controlled. [Backstrom, et. al 2007]
 The “Blog problem” : Pseudononymous
communication can be linked through textual
analysis [Novak wet. al 2004]
[For more examples see Vadhan, et al 2010]
Source: [Calberese 2008; Real
Time Rome Project 2007]
15 Lightning Overview: Identification &
“Anonymization”
Little Data – Big World
 The “Favorite Ice Cream” problem
-- public information that is not risky can help us
learn information that is risky
 The “Doesn’t Stay in Vegas” problem
-- information shared locally can be found anywhere
 The “Data Exhaust problem”
-- wherever you go, there you are, and your data too!
Lightning Overview: Identification &
“Anonymization”
16
Algorithmic Discrimination
Lightning Overview: Identification &
“Anonymization”
• Emergent behavior of algorithms, big data, and behavior
 discrimination on private personal characteristics
17
Information Science Approach:
Manage Privacy & Confidentiality Lifecycle
Lightning Overview: Identification &

 Collection:
 Consent/licensing terms
 Methods
 Measures
 Storage
 Systems information
security
 Data structures and
partitioning
 Dissemination
 Vetting
 Disclosure limitation
 Data use agreements
Creation/C
ollection
Storag
e/Inge
st
Processing
Internal
Sharing
Analysi
s
External
dissemination/pu
blication
Re-use
Long-
term
access
Researc
h
methods
Data
Management
Systems
Legal / Policy
Frameworks
∂∂
Statistical /
Computational
Frameworks
18
Hybrid Approaches
 Collection limitations
 Limitations on collection
 Inform and consent
 Data enclaves – physically restrict access to data
 Examples: ICPSR, Census Research Data Center
 May include availability of synthetic data as an aid to preparing model specifications
 Advantages: extensive human auditing, vetting; information security threats much reduced
 Disadvantages: expensive, slow, inconvenient to access
 Controlled remote access
 Varies from remote access to all data and output to human vetting of output
 Restrictions on use, easier to enforce
 Advantages: auditable, potential to impose human review, potential to limit analysis
 Disadvantages: complex to implement, slow
 Model servers
 Mediated remote access – analysis limited to designated models
 Advantages: faster, no human in loop
 Disadvantage: statistical methods for ensuring model safety are immature – residuals,
categorical variables, dummy variables are all risky; very limited set of models currently
supported; complex to implement
 Experimental approaches
 Personal Data Stores
 Data Auditing and Accountability
19 Lightning Overview: Identification &
“Anonymization”
Questions?
Web:
informatics.mit.edu
20 Lightning Overview: Identification &
“Anonymization”
Creative Commons License
This work. Managing Confidential
information in research, by Micah Altman
(http://redistricting.info) is licensed under
the Creative Commons Attribution-Share
Alike 3.0 United States License. To view a
copy of this license, visit
http://creativecommons.org/licenses/by-
sa/3.0/us/ or send a letter to Creative
Commons, 171 Second Street, Suite 300,
San Francisco, California, 94105, USA.
21 Lightning Overview: Identification &
“Anonymization”

More Related Content

Viewers also liked

Tugas presentasi tik kelas ix
Tugas presentasi tik kelas ixTugas presentasi tik kelas ix
Tugas presentasi tik kelas ixdhauzk mts
 
Bodas de Sangre Federico García Lorca
Bodas de Sangre Federico García LorcaBodas de Sangre Federico García Lorca
Bodas de Sangre Federico García LorcaAndrea Diaz Caballero
 
Powerfull Design Presentation by: Yuda Mahendra Asmara
Powerfull Design Presentation by: Yuda Mahendra AsmaraPowerfull Design Presentation by: Yuda Mahendra Asmara
Powerfull Design Presentation by: Yuda Mahendra AsmaraYuda Mahendra Asmara
 
Power Point Karakteristik Gandum
Power Point Karakteristik GandumPower Point Karakteristik Gandum
Power Point Karakteristik Gandumida farida
 
FINAL_capstone SOC101716 Group 1020
FINAL_capstone SOC101716 Group 1020FINAL_capstone SOC101716 Group 1020
FINAL_capstone SOC101716 Group 1020Charlie Cota
 
市場・制度を創る
市場・制度を創る市場・制度を創る
市場・制度を創るYosuke YASUDA
 
Biokimia Pangan (Sayur sayuran)
Biokimia Pangan (Sayur sayuran)Biokimia Pangan (Sayur sayuran)
Biokimia Pangan (Sayur sayuran)Fathmasari
 

Viewers also liked (13)

Ppc project scope
Ppc project scopePpc project scope
Ppc project scope
 
Tugas presentasi tik kelas ix
Tugas presentasi tik kelas ixTugas presentasi tik kelas ix
Tugas presentasi tik kelas ix
 
Bodas de Sangre Federico García Lorca
Bodas de Sangre Federico García LorcaBodas de Sangre Federico García Lorca
Bodas de Sangre Federico García Lorca
 
Powerfull Design Presentation by: Yuda Mahendra Asmara
Powerfull Design Presentation by: Yuda Mahendra AsmaraPowerfull Design Presentation by: Yuda Mahendra Asmara
Powerfull Design Presentation by: Yuda Mahendra Asmara
 
Power Point Karakteristik Gandum
Power Point Karakteristik GandumPower Point Karakteristik Gandum
Power Point Karakteristik Gandum
 
FINAL_capstone SOC101716 Group 1020
FINAL_capstone SOC101716 Group 1020FINAL_capstone SOC101716 Group 1020
FINAL_capstone SOC101716 Group 1020
 
市場・制度を創る
市場・制度を創る市場・制度を創る
市場・制度を創る
 
Henderson Balancing Rights and Reuse for Authors, Readers and Publishers
Henderson Balancing Rights and Reuse for Authors, Readers and PublishersHenderson Balancing Rights and Reuse for Authors, Readers and Publishers
Henderson Balancing Rights and Reuse for Authors, Readers and Publishers
 
Biokimia Pangan (Sayur sayuran)
Biokimia Pangan (Sayur sayuran)Biokimia Pangan (Sayur sayuran)
Biokimia Pangan (Sayur sayuran)
 
InAWARE
InAWAREInAWARE
InAWARE
 
SSPSW 2
SSPSW 2SSPSW 2
SSPSW 2
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Mixed economy
Mixed economyMixed economy
Mixed economy
 

Similar to Micah Altman NISO privacy in library systems

UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingMicah Altman
 
June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacyMicah Altman
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big databis_foresight
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPMicah Altman
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like ThisDavid Millard
 
Confessions (and Lessons) of a "Recovering" Data Broker
Confessions (and Lessons) of a "Recovering" Data BrokerConfessions (and Lessons) of a "Recovering" Data Broker
Confessions (and Lessons) of a "Recovering" Data Brokermetanautix
 
Understanding Users' Privacy Motivations and Behaviors in Online Spaces
Understanding Users' Privacy Motivations and Behaviors in Online SpacesUnderstanding Users' Privacy Motivations and Behaviors in Online Spaces
Understanding Users' Privacy Motivations and Behaviors in Online SpacesJessica Vitak
 
Towards a socio demographic fingerprint ch-iassist 2013
Towards a socio demographic fingerprint ch-iassist 2013Towards a socio demographic fingerprint ch-iassist 2013
Towards a socio demographic fingerprint ch-iassist 2013Katelijne Gysen
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
Privacy flip book assignment film 260 queensu kc
Privacy flip book assignment  film 260  queensu kcPrivacy flip book assignment  film 260  queensu kc
Privacy flip book assignment film 260 queensu kcCatherine Cowperthwaite
 
Privacy as identity territoriality re-conceptualising behaviour in cyberspace
Privacy as identity territoriality  re-conceptualising behaviour in cyberspacePrivacy as identity territoriality  re-conceptualising behaviour in cyberspace
Privacy as identity territoriality re-conceptualising behaviour in cyberspaceFabrice Epelboin
 
Introduction to Privacy and Social Networking
Introduction to Privacy and Social NetworkingIntroduction to Privacy and Social Networking
Introduction to Privacy and Social NetworkingJason Hong
 
Group 4 discussion leading
Group 4 discussion leadingGroup 4 discussion leading
Group 4 discussion leadingHsuan-Ting Chen
 
Posthuman literacies: reframing relationships between information, technology...
Posthuman literacies: reframing relationships between information, technology...Posthuman literacies: reframing relationships between information, technology...
Posthuman literacies: reframing relationships between information, technology...IL Group (CILIP Information Literacy Group)
 
Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...
Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...
Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...joinson
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Big Data for a Better World
Big Data for a Better WorldBig Data for a Better World
Big Data for a Better Worldleadinghands
 
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA
 

Similar to Micah Altman NISO privacy in library systems (20)

Altman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless DataAltman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless Data
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
 
June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacy
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
 
People Like You Like Presentations Like This
People Like You Like Presentations Like ThisPeople Like You Like Presentations Like This
People Like You Like Presentations Like This
 
Confessions (and Lessons) of a "Recovering" Data Broker
Confessions (and Lessons) of a "Recovering" Data BrokerConfessions (and Lessons) of a "Recovering" Data Broker
Confessions (and Lessons) of a "Recovering" Data Broker
 
Understanding Users' Privacy Motivations and Behaviors in Online Spaces
Understanding Users' Privacy Motivations and Behaviors in Online SpacesUnderstanding Users' Privacy Motivations and Behaviors in Online Spaces
Understanding Users' Privacy Motivations and Behaviors in Online Spaces
 
Towards a socio demographic fingerprint ch-iassist 2013
Towards a socio demographic fingerprint ch-iassist 2013Towards a socio demographic fingerprint ch-iassist 2013
Towards a socio demographic fingerprint ch-iassist 2013
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
Privacy flip book assignment film 260 queensu kc
Privacy flip book assignment  film 260  queensu kcPrivacy flip book assignment  film 260  queensu kc
Privacy flip book assignment film 260 queensu kc
 
Privacy as identity territoriality re-conceptualising behaviour in cyberspace
Privacy as identity territoriality  re-conceptualising behaviour in cyberspacePrivacy as identity territoriality  re-conceptualising behaviour in cyberspace
Privacy as identity territoriality re-conceptualising behaviour in cyberspace
 
Introduction to Privacy and Social Networking
Introduction to Privacy and Social NetworkingIntroduction to Privacy and Social Networking
Introduction to Privacy and Social Networking
 
Group 4 discussion leading
Group 4 discussion leadingGroup 4 discussion leading
Group 4 discussion leading
 
Conference Report Final 11.18
Conference Report Final 11.18Conference Report Final 11.18
Conference Report Final 11.18
 
Posthuman literacies: reframing relationships between information, technology...
Posthuman literacies: reframing relationships between information, technology...Posthuman literacies: reframing relationships between information, technology...
Posthuman literacies: reframing relationships between information, technology...
 
Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...
Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...
Truth, Lies and Cyberspace: Understand, Predicting and Hacking Behaviour on t...
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Big Data for a Better World
Big Data for a Better WorldBig Data for a Better World
Big Data for a Better World
 
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
Data Con LA 2019 - Applied Privacy Engineering Study on SEER database by Ken ...
 

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 

Micah Altman NISO privacy in library systems

  • 1. NISO Lightning Overview: Identification & “Anonymization” Micah Altman Director of Research MIT Libraries Prepared for NISO Workshop on Patron Privacy Online May 2015
  • 2. DISCLAIMER These opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators Secondary disclaimer: “It’s tough to make predictions, especially about the future!” -- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc. Lightning Overview: Identification & “Anonymization” 2
  • 3. Collaborators & Co-Conspirators  Privacy Tools for Sharing Research Data Team (Salil Vadhan, P.I.) http://privacytools.seas.harvard.edu/people  Research Support Supported in part by NSF grant CNS-1237235 Lightning Overview: Identification & “Anonymization” 3
  • 4. Related Work Main Project:  Privacy Tools for Sharing Research Data http://privacytools.seas.harvard.edu/ Related publications:  Novak, K., Altman, M., Broch, E., Carroll, J. M., Clemins, P. J., Fournier, D., Laevart, C., et al. (2011). Communicating Science and Engineering Data in the Information Age. Computer Science and Telecommunications. National Academies Press  Vadhan, S., et al. 2011. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections.”  Altman, M., D. O’Brien, S. Vadhan, A. Wood. 2014. “Big Data Study: Request for Information.”  O'Brien, et al. 2015. “When Is Information Purely Public?” (Mar. 27, 2015) Berkman Center Research Publication No. 2015-7.  Wood, et al. 2014. “Long-Term Longitudinal Studies” (July 22, 2014). Berkman Center Research Publication No. 2014-12. Slides and reprints available from: informatics.mit.edu Lightning Overview: Identification & “Anonymization” 4
  • 5. Identifiable private information is common  Birth date + zipcode + gender uniquely identify ~87% of people in the U.S.  Can predict social security number using birthdate/place  Tables, graphs and maps can reveal identifiable information  People have been identified through movie rankings, search strings, writing style… Brownstein, et al., 2006 , NEJM 355(16), 5 Lightning Overview: Identification & “Anonymization”
  • 6. Privacy is not Confidentiality… (defining basic terms)  Privacy Control over extent and circumstances of sharing  Confidentiality Control of disclosure information  Sensitive information Information that would cause harm if improperly disclosed (to individual, institution, social group, or society)  Private personally identifiable information  Not already purely public  Directly or indirectly linkable to an identifiable individual  Possibly using externally available information 6 Lightning Overview: Identification & “Anonymization”
  • 7. Legal Constraints are Complicated Contract Intellectual Property Access Rights Confidentiality Copyrigh t Fair Use DMCA Database Rights Moral Rights Intellectua l Attribution Trade Secret Patent Trademark Common Rule 45 CFR 26HIPA AFERP A EU Privacy Directive Privacy Torts (Invasion, Defamation) Rights of Publicity Sensitive but Unclassified Potentially Harmful (Archeologica l Sites, Endangered Species, Animal Testing, …) Classifie d FOIA CIPSE A State Privacy Laws EA R State FOI Laws Journal Replication Requirements Funder Open Access Contract License Click-Wrap TOU ITA Export Restriction s Lightning Overview: Identification & “Anonymization” 7
  • 8. Laws define “anonymized” differently FERPA HIPAA Common Rule MA 201 CMR 17 Identificatio n Criteria - Direct - Indirect - Linked - Bad intent - direct/indirect: 18 identifier - OR statistician verifies minimal risk AND no actual knowledge of identified indiviual - Direct - Indirect / Linked -- if “readily identifiable” -First Initial + Last Name Sensitivity Criteria Any non- directory information Any medical information Private information – based on harm Financial, State, Federal Identifiers 8 Lightning Overview: Identification & “Anonymization”
  • 9. Different definitions of identifiability Lightning Overview: Identification & “Anonymization” 9 Record-linkage • “where’s waldo” • Match a real person to precise record in a database • Examples: direct identifiers. • Caveats: Satisfies compliance for specific laws, but not generally; substantial potential for harm remains Indistinguishability + Heterogeneity • “hiding in the crowd” • People can be matched only to cluster of records • Based on quasi-ids • Sensitive attributes must also vary • Examples: K-anonymity, l- diversity, attribute disclosure • Caveats: Potential for substantial harms may remain Learning • “privacy, guaranteed” • Formally bound the total learning about any individual that can occur from a query • Examples: differential privacy, zero-knowledge proofs • Caveats: Challenging to implement, requires interactive system
  • 10. How many things are wrong with this picture? Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed A. Jones 12341 01011961 02145 M Raspberr y 0 B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolat e 0 D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 04041974 64700 F Pumpkin 128 N. Smi th- 12354 04041974 64800 F Allergic 256 10 Lightning Overview: Identification & “Anonymization”
  • 11. Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed A. Jones 12341 01011961 02145 M Raspberr y 0 B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolat e 0 D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 04041974 64700 F Pumpkin 128 N. Smith 12354 04041974 64800 F Allergic 256 What’s wrong with this picture? Identifier Sensitive Private Identifier Private Identifier Identifier Sensitive Unexpected Response? Mass resident FERPA too? Californian Twins, separated at birth? 11 Lightning Overview: Identification & “Anonymization”
  • 12. Common Approach: Suppress Information for Data Release Published Outputs * Jones * * 1961 021* * Jones * * 1961 021* * Jones * * 1972 9404* * Jones * * 1972 9404* * Jones * * 1972 9404* Modal Practice “The correlation between X and Y was large and statistically significant” Summary statistics Contingency table Public use sample microdata Information Visualization Lightning Overview: Identification & “Anonymization” 12
  • 13. Help, help, I’m being suppressed… Name SSN Birthdate Zipcode Gender Favorite Ice Cream # of crimes committed [Name 1] 1234 1 *1961 021* M Raspberry .1 [Name 2] 1234 2 *1961 021* M Pistachio -.1 [Name 3] 1234 3 *1972 940* M Chocolate 0 [Name 4] 1234 4 *1972 940* M Hazelnut 0 [Name 5] 1234 5 *1972 940* F Lemon .6 [Name 6] 1234 6 *1972 021* F Lemon .6 [Name 7] 1234 7 *1989 021* * Peach 64.6 [Name 8] 1234 8 *1973 632* F Lime 3 [Name 9] 1234 9 *1973 633* M Mango 3 Row VarSynthetic Global Recode Local Suppression Aggregation + Perturbation Traditional Static Suppression  Data reduction  Observation  Measure  Cell  Perturbation  Microaggregation  Rule-based data swapping  Adding noise 13 Lightning Overview: Identification & “Anonymization”
  • 14. Suppression reduces utility Lightning Overview: Identification & “Anonymization” 14  Common approach of anonymizing/suppressing data reduces usefulness  Minimizing disclosure in the presence of large external data sources reduces usefulness a lot  Anonymized data is not simply less informative -- it typically yields biased analyses
  • 15. New Data – New Challenges  How to deidentify without completely destroying the data?  The “Netflix Problem”: large, sparse datasets that overlap can be probabilistically linked [Narayan and Shmatikov 2008]  The “GIS”: fine geo-spatial-temporal data impossible mask, when correlated with external data [Zimmerman 2008; ]  The “Facebook Problem”: Possible to identify masked network data, if only a few nodes controlled. [Backstrom, et. al 2007]  The “Blog problem” : Pseudononymous communication can be linked through textual analysis [Novak wet. al 2004] [For more examples see Vadhan, et al 2010] Source: [Calberese 2008; Real Time Rome Project 2007] 15 Lightning Overview: Identification & “Anonymization”
  • 16. Little Data – Big World  The “Favorite Ice Cream” problem -- public information that is not risky can help us learn information that is risky  The “Doesn’t Stay in Vegas” problem -- information shared locally can be found anywhere  The “Data Exhaust problem” -- wherever you go, there you are, and your data too! Lightning Overview: Identification & “Anonymization” 16
  • 17. Algorithmic Discrimination Lightning Overview: Identification & “Anonymization” • Emergent behavior of algorithms, big data, and behavior  discrimination on private personal characteristics 17
  • 18. Information Science Approach: Manage Privacy & Confidentiality Lifecycle Lightning Overview: Identification &   Collection:  Consent/licensing terms  Methods  Measures  Storage  Systems information security  Data structures and partitioning  Dissemination  Vetting  Disclosure limitation  Data use agreements Creation/C ollection Storag e/Inge st Processing Internal Sharing Analysi s External dissemination/pu blication Re-use Long- term access Researc h methods Data Management Systems Legal / Policy Frameworks ∂∂ Statistical / Computational Frameworks 18
  • 19. Hybrid Approaches  Collection limitations  Limitations on collection  Inform and consent  Data enclaves – physically restrict access to data  Examples: ICPSR, Census Research Data Center  May include availability of synthetic data as an aid to preparing model specifications  Advantages: extensive human auditing, vetting; information security threats much reduced  Disadvantages: expensive, slow, inconvenient to access  Controlled remote access  Varies from remote access to all data and output to human vetting of output  Restrictions on use, easier to enforce  Advantages: auditable, potential to impose human review, potential to limit analysis  Disadvantages: complex to implement, slow  Model servers  Mediated remote access – analysis limited to designated models  Advantages: faster, no human in loop  Disadvantage: statistical methods for ensuring model safety are immature – residuals, categorical variables, dummy variables are all risky; very limited set of models currently supported; complex to implement  Experimental approaches  Personal Data Stores  Data Auditing and Accountability 19 Lightning Overview: Identification & “Anonymization”
  • 20. Questions? Web: informatics.mit.edu 20 Lightning Overview: Identification & “Anonymization”
  • 21. Creative Commons License This work. Managing Confidential information in research, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by- sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. 21 Lightning Overview: Identification & “Anonymization”

Editor's Notes

  1. This work. Managing Confidential information in research, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This talk discusses findings from this survey, common gaps, and trends in this area. (I also have a little fun highlighting the hidden assumptions underlying Amazon Glacier's reliability claims. For more on that see this earlier post: http://drmaltman.wordpress.com/2012/11/15/amazons-creeping-glacier-and-digital-preservation )