This document summarizes a presentation on big data given by Sir Mark Walport, the UK's Chief Scientific Adviser. It discusses the opportunities and risks of big data, including how it can improve health and infrastructure but also enable privacy violations. While data can be anonymized, it is difficult to fully protect privacy due to the ability to match anonymous data with other public datasets. Both utopian and dystopian futures are possible depending on how data is governed and balanced with individual privacy. Moving forward will require advances in technology, open communication, and governance measures to control data access.
1. Big Data: Big Opportunity, Big Brother or Big
Trouble?
Oxford Martin School, December 3rd 2013
Sir Mark Walport, Chief Scientific Adviser to HM Government
2. The future will not be a repetition
of the past.
James Martin, 1933-2013
Writing in 1978
Credit: Oxford Martin School
Those who cannot remember the
past are condemned to repeat it.
George Santayana, 1863-1952
Writing in 1905
2 Big data and privacy
PD
3. •
Florence Nightingale, Crimean War
nurse and pioneer of statistics. In
the 1890s she tried to get a
Professorship of Statistics
established at Oxford University,
specifically for applying statistical
analysis to social problems.
•
At the time the scheme came to
nothing, but her vision is now
realised all over the world.
•
Oxford began teaching Applied
Statistics in 1947, and appointed its
first Professor of Mathematical
Statistics in 1962.
3 Big data and privacy
PD
4. Overview
1. Identity and identification
2. The promise of big data – opportunities
and risks
3. What about privacy?
4.Where is all of this headed, and what do
we need to do?
5. Identity – the sameness of a person or thing at
all times or in all circumstances; the condition
or fact that a person or thing is itself and not
something else; individuality, personality.
Identity – this is what makes me me
Credit: Wellcome Collection
6. Identification – The determination of identity;
the action or process of determining what a
thing is; the recognition of a thing as being
what it is
Identification – I will find out who you are
Credit: Wellcome Collection
7. Society doesn’t work in the absence of
identifiers. So who needs to know about us?
Credit: Getty
Credit:imagezone
Family and friends
Credit: Getty
Civic sector
Credit: Getty
Business
7 Big data and privacy
Government
8. We manage our relationships by selective
disclosure of data - multiple identities
Age
Financial
status
Place
attachments
Profession
Nationality
Hobbies
Ethnicity
Family
role
Religion
Community
& friendship
8 Big data and privacy
9. The outside world uses different approaches
to identify us
Direct disclosures
• Passport
• Driving license
• Work pass
• NHS number
• National Insurance Number
Credit: Mark Yuill
Credentials and tokens
• PIN number
• Password
• RFID embedded device
Credit: Shutterstock
9 Big data and privacy
10. What is personal information?
Direct
Hard to define, but ultimately information that enables particular
attributes to be linked to a unique individual.
Face
Fingerprint
DNA
Indirect
Name
Address
Postcode
Workplace
Club
11. Some attributes are more or less
sensitive in different contexts
• Age
• Sex
• Nationality
• Religion
• Health
• Education
• Financial
• Football Team
Richard Nixon ‘s application to the FBI, 1937.
Released under FOI. Contains lots of (redacted)
sensitive health information.
11 Big data and privacy
12. Information Technology and the web have
created new opportunities to create identities
Anonymous
12 Big data and privacy
Pseudonymous
Real
13. The next generation of products will generate yet
more data – the internet of things
Credit: tedeytan/CC-BY-SA-2.0
Credit: MIT Media Lab
Credit: MarkDoliner/CC-BY-2.0
Credit: LG
13 Big data and privacy
14. The data is used by each of us for our personal
utility
Finding things out
Telling other people
things
Listening and
watching things
Navigating the real
world
Navigating fictional
worlds
Buying and selling
stuff
Playing games
Storing stuff
Recording our lives and
those of friends/families
Socialising with others
Stealing things
Plotting and causing
harm
14 Big data and privacy
15. Information technology has created new ways of
locating or finding us
Image: iPhone tracking data
The consequence of all of this is that we are giving a lot
of information out that others can then use….
15 Big data and privacy
16. Smart meters produce detailed data on energy
consumption
16 Big data and privacy
17. The price of the utility is that we are generating
data on a massive scale
17 Big data and privacy
18. Lots of other people are interested in our data. Who
knows the most about us?
Government
Corporations
ONS
Google
HMRC
Experian
NHS
Loyalty Cards
18 Big data and privacy
19. How do they use it? Retail suppliers.
• Our data is used to provide
individual services.
• But is also aggregated for
wholesale purposes - and
they give or sell the
wholesale data to other
organisations.
Credit: Lotus Head/CC-BY-SA-2.5
…and do we know how they use it?
Credit: Tesco
19 Big data and privacy
20. The myth of consent - do we really read and
understand the full terms and conditions?
Credit: Google
In 2008, researchers calculated it would take 76 working days to read all
the privacy policies you encounter in a year. If everyone in the US did so,
it would cost the country more than the GDP of Florida.
20 Big data and privacy
21. How do they use it? Government
Voting
Credit: ClassicStock
Taxes
Credit: Phillip Ingham/CC-BY-ND-2.0
21 Big data and privacy
Planning
Credit: iStockphoto
Law enforcement
Credit: South Yorkshire Police
23. Who else uses it?
• Future employers
• Hostile and
competing foreign
states
• Criminals and
terrorists
• Journalists
23 Big data and privacy
Credit: Getty
24. How do the wholesale collectors of data add
value to it?
24 Big data and privacy
25. What more can we do?
Societal Level
Improving Health
(and research in
general)
Understanding and
optimising business
processes
Improving and
optimising cities and
countries
Optimising Machine
and Device
Performance
Understanding,
targeting, and
serving customers
Improving Security
and Law
Enforcement
25 Big data and privacy
Individual Level
Personal
quantification and
performance
optimisation
Improving sports
performance
26. Improving health: diabetes in Scotland
• Total Scottish Population 5.2m
• People with diabetes : 251,132
(4.9%)
• People with Type 1 DM : ~27,000
(0.5%)
• All patients nationally are
registered onto a single register;
the SCI-DC register
• SCI-DC used in all 38 hospitals
• Nightly capture of data from all
1043 primary care practices across
Scotland
Courtesy of Andrew Morris
26 Big data and privacy
27. Getting about: Citymapper
• An app for New York and London, which links all transport
systems together so you can easily discover the best way to
get from where you are to where you want to be.
27 Big data and privacy
28. Improving infrastructure: Streetbump
Credit: Streetbump
• A project in Boston, a city plagued by potholes and other street
maintenance issues.
• People can report problems in various easy ways, including an app
that automatically detects bumps driven over.
• Highly successful, the critical element being an efficient system for
getting maintenance crews to the sites of reported issues.
28 Big data and privacy
29. What about the potential harms?
• UK research with 58,000 US
volunteers found that algorithms
based on Facebook “likes”, which
are often public, can predict
personality traits.
• 95% accurate in distinguishing
African-American from
Caucasian-American and 85% for
differentiating Republican from
Democrat.
• Some odd links as well. Curly
fries correlated with high
intelligence…
Credit: BBC
29 Big data and privacy
30. Dangers of releasing data into the wild
• Released anonymised search data
for research purposes.
• Journalists were able to pick up
clues to name and location, then
triangulate with embarrassing search
queries.
• Programme was halted, its initiators
sacked.
30 Big data and privacy
• Released anonymised film rental data
and set a $1m prize, hoping to improve
recommendation algorithms.
• People’s viewing taste beyond usual
blockbusters is highly individual.
• Triangulating with IMDB data, bloggers
identified individual users and were able
to reveal their full list of rentals, not just
those they had “rated”.
32. Privacy controls are not binary but fall on
spectra
Openly identifiable
Free on the
internet
Obfuscation
Access / Environment
(Everyone)
Little legislation
32 Big data and privacy
Anonymised to the
point of losing
valuable content
Locked in a steellined room
(Accredited researcher)
Governance and
accountability
Highly legislated
33. A taxonomy of obfuscation
Anonymisation: Remove all identifiers such
that it is impossible to identify an individual
Encryption: Prevent it from being read without
unlocking - in theory encrypted databases can
be analysed without breaking the encryption but basically they cannot be used for anything
but trivial uses
Credit: University of
Regensburg
Tokenisation or pseudonymisation: remove
as much of the 'personal' information as
possible - and link to personal via independent
securely held database
Credit: Robbie Cooper
33 Big data and privacy
34. Obfuscation - differential privacy
• Differential privacy: the database itself remains pure, but a small amount
of noise is added to the final answer of each query, to prevent identification
of a single record.
• Good for many situations, but not for small populations or finding needles
in haystacks, such as the common factors behind a rare disease.
34 Big data and privacy
35. Access and environment: safe havens
• A safe haven for data is more
like a traditional library, where
controlled access is granted to
people who have the right
credentials.
• You lose some of the benefit of
making data freely available over
the internet, but the risk of
malicious use is greatly reduced.
Credit: QTS
35 Big data and privacy
• The Administrative Data
Research Network is a scheme
to make HMG data available in
safe havens.
36. Governance: data protection legislation
• Harm can be done by sharing and not
sharing data
• The Data Protection Act is rarely the
real barrier to sharing data for the
protection of individuals
• DPA law provides exemptions for
research, which would be tightened
significantly by the proposed EU Data
Protection Regulation, making some
current medical research illegal. A major
concern.
36 Big data and privacy
Credit: EU dpi
37. Laws have borders – data does not
Map showing undersea internet cables
37 Big data and privacy
38. Even if a dataset is effectively anonymised on its own,
and this is very difficult, if freely available it can be
“decrypted” by finding overlaps with other datasets.
These could be a mixture of public and private datasets.
The bottom line: it is very hard to guarantee privacy
38 Big data and privacy
39. Where is all of this headed, and
what do we need to do?
Credit: Arne Hückelheim/CC-BY-SA-3.0
40. There are some tough challenges
• The digital infrastructure creates new threats
and vulnerabilities
• Security considerations were not planned into
the internet and web
• The keys to cryptography are only as secure as
those that hold them – importance of human
science
• Who watches the watchers?
• Should big data be on the National Risk
Register?
PD
Juvenal: Roman poet to which Quis
custodiet ipsos custodes? is
attributed.
41. Balancing risks
• Don't underplay risk
of releasing data: the
challenge is to
balance utility and
privacy
• Recognise that
people that will reidentify are extremely
able and may have
powerful hardware at
their disposal.
Source: stewardshipcommunity.com
41 Big data and privacy
42. What will be the effect on people?
Autonomy
Privacy
Disclosure
Credit: Shutterstock
42 Big data and privacy
Credit: Shutterstock
43. What will be the effect on people?
• It is impossible to completely
erase a digital past.
• Future generations may
require the right to be forgiven
rather than the right to be
forgotten.
•Young people are already
becoming more protective of
their data and abandoning
Facebook for Snapchat,
WhatsApp and other platforms.
43 Big data and privacy
44. There are utopian and dystopian futures
•
Utopia: Knowledge to all,
educating the world,
accountability and
sustainability.
PD
JMW Turner, The Rise of the Carthaginian Empire, 1815
•
Dystopia: end of individuality,
disrupted fabric of society,
childhood play disrupted,
monopoly of the state in law
enforcement disrupted, loss of
trust in service providers.
Credit: Friman/CC-BY-SA-3.0
Presidio Modelo prison, Cuba (abandoned)
44 Big data and privacy
45. How do we move forward?
Technology
Continue to strongly
support science and skills
agenda.
Communication
Don't underplay risk of
releasing data: challenge
to balance utility and
privacy
Governance
Reduce risk by choice of
environment - safe
havens with penalties:
control environment
proportional to risk of
harm
45 Big data and privacy
46. Final messages
• There is no going back – the world shaped
by the digital revolution
• There are new tools for understanding
ourselves and the world
• Huge economic opportunities
• There are unforeseen benefits and harms
47. Final messages
• The internet has no borders
• There will be ever more scope for crime and terrorism in
cyberspace
• UK has great strength in cyber security
• We must stay at the leading edge, develop
proportionate regulation, legislation and accountability.
• Need a sophisticated level of debate.
48. @uksciencechief
www.bis.gov.uk/go-science
Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material.
We apologise for any errors or omissions in the included attributions and would be grateful if notified of any corrections
that should be incorporated in future versions of this slide set. We can be contacted through enquiries@bis.gsi.gov.uk .