SlideShare una empresa de Scribd logo
1 de 48
Big Data: Big Opportunity, Big Brother or Big
Trouble?
Oxford Martin School, December 3rd 2013

Sir Mark Walport, Chief Scientific Adviser to HM Government
The future will not be a repetition
of the past.
James Martin, 1933-2013
Writing in 1978

Credit: Oxford Martin School

Those who cannot remember the
past are condemned to repeat it.
George Santayana, 1863-1952
Writing in 1905

2 Big data and privacy

PD
•

Florence Nightingale, Crimean War
nurse and pioneer of statistics. In
the 1890s she tried to get a
Professorship of Statistics
established at Oxford University,
specifically for applying statistical
analysis to social problems.

•

At the time the scheme came to
nothing, but her vision is now
realised all over the world.

•

Oxford began teaching Applied
Statistics in 1947, and appointed its
first Professor of Mathematical
Statistics in 1962.
3 Big data and privacy

PD
Overview

1. Identity and identification
2. The promise of big data – opportunities
and risks
3. What about privacy?
4.Where is all of this headed, and what do
we need to do?
Identity – the sameness of a person or thing at
all times or in all circumstances; the condition
or fact that a person or thing is itself and not
something else; individuality, personality.

Identity – this is what makes me me
Credit: Wellcome Collection
Identification – The determination of identity;
the action or process of determining what a
thing is; the recognition of a thing as being
what it is

Identification – I will find out who you are
Credit: Wellcome Collection
Society doesn’t work in the absence of
identifiers. So who needs to know about us?

Credit: Getty

Credit:imagezone

Family and friends

Credit: Getty

Civic sector

Credit: Getty

Business
7 Big data and privacy

Government
We manage our relationships by selective
disclosure of data - multiple identities
Age
Financial
status

Place
attachments

Profession
Nationality

Hobbies

Ethnicity

Family
role

Religion
Community
& friendship

8 Big data and privacy
The outside world uses different approaches
to identify us
Direct disclosures
• Passport
• Driving license
• Work pass
• NHS number
• National Insurance Number

Credit: Mark Yuill

Credentials and tokens
• PIN number
• Password
• RFID embedded device
Credit: Shutterstock

9 Big data and privacy
What is personal information?

Direct

Hard to define, but ultimately information that enables particular
attributes to be linked to a unique individual.

Face

Fingerprint

DNA

Indirect

Name

Address

Postcode

Workplace

Club
Some attributes are more or less
sensitive in different contexts

• Age
• Sex
• Nationality
• Religion
• Health
• Education
• Financial
• Football Team
Richard Nixon ‘s application to the FBI, 1937.
Released under FOI. Contains lots of (redacted)
sensitive health information.

11 Big data and privacy
Information Technology and the web have
created new opportunities to create identities

Anonymous
12 Big data and privacy

Pseudonymous

Real
The next generation of products will generate yet
more data – the internet of things

Credit: tedeytan/CC-BY-SA-2.0

Credit: MIT Media Lab

Credit: MarkDoliner/CC-BY-2.0

Credit: LG

13 Big data and privacy
The data is used by each of us for our personal
utility

Finding things out

Telling other people
things

Listening and
watching things

Navigating the real
world

Navigating fictional
worlds

Buying and selling
stuff

Playing games

Storing stuff

Recording our lives and
those of friends/families

Socialising with others

Stealing things

Plotting and causing
harm

14 Big data and privacy
Information technology has created new ways of
locating or finding us
Image: iPhone tracking data

The consequence of all of this is that we are giving a lot
of information out that others can then use….
15 Big data and privacy
Smart meters produce detailed data on energy
consumption

16 Big data and privacy
The price of the utility is that we are generating
data on a massive scale

17 Big data and privacy
Lots of other people are interested in our data. Who
knows the most about us?
Government

Corporations

ONS

Google

HMRC

Experian

NHS

Loyalty Cards

18 Big data and privacy
How do they use it? Retail suppliers.

• Our data is used to provide
individual services.
• But is also aggregated for
wholesale purposes - and
they give or sell the
wholesale data to other
organisations.

Credit: Lotus Head/CC-BY-SA-2.5

…and do we know how they use it?
Credit: Tesco

19 Big data and privacy
The myth of consent - do we really read and
understand the full terms and conditions?

Credit: Google

In 2008, researchers calculated it would take 76 working days to read all
the privacy policies you encounter in a year. If everyone in the US did so,
it would cost the country more than the GDP of Florida.

20 Big data and privacy
How do they use it? Government
Voting

Credit: ClassicStock

Taxes

Credit: Phillip Ingham/CC-BY-ND-2.0

21 Big data and privacy

Planning

Credit: iStockphoto

Law enforcement

Credit: South Yorkshire Police
National security

Credit: The Telegraph

22 Big data and privacy

Credit: The Guardian
Who else uses it?

• Future employers
• Hostile and
competing foreign
states
• Criminals and
terrorists
• Journalists

23 Big data and privacy

Credit: Getty
How do the wholesale collectors of data add
value to it?

24 Big data and privacy
What more can we do?

Societal Level
Improving Health
(and research in
general)

Understanding and
optimising business
processes

Improving and
optimising cities and
countries

Optimising Machine
and Device
Performance

Understanding,
targeting, and
serving customers

Improving Security
and Law
Enforcement

25 Big data and privacy

Individual Level
Personal
quantification and
performance
optimisation

Improving sports
performance
Improving health: diabetes in Scotland

• Total Scottish Population 5.2m
• People with diabetes : 251,132
(4.9%)
• People with Type 1 DM : ~27,000
(0.5%)
• All patients nationally are
registered onto a single register;
the SCI-DC register
• SCI-DC used in all 38 hospitals
• Nightly capture of data from all
1043 primary care practices across
Scotland
Courtesy of Andrew Morris

26 Big data and privacy
Getting about: Citymapper

• An app for New York and London, which links all transport
systems together so you can easily discover the best way to
get from where you are to where you want to be.

27 Big data and privacy
Improving infrastructure: Streetbump

Credit: Streetbump

• A project in Boston, a city plagued by potholes and other street
maintenance issues.
• People can report problems in various easy ways, including an app
that automatically detects bumps driven over.
• Highly successful, the critical element being an efficient system for
getting maintenance crews to the sites of reported issues.
28 Big data and privacy
What about the potential harms?

• UK research with 58,000 US
volunteers found that algorithms
based on Facebook “likes”, which
are often public, can predict
personality traits.
• 95% accurate in distinguishing
African-American from
Caucasian-American and 85% for
differentiating Republican from
Democrat.
• Some odd links as well. Curly
fries correlated with high
intelligence…

Credit: BBC

29 Big data and privacy
Dangers of releasing data into the wild

• Released anonymised search data
for research purposes.
• Journalists were able to pick up
clues to name and location, then
triangulate with embarrassing search
queries.
• Programme was halted, its initiators
sacked.

30 Big data and privacy

• Released anonymised film rental data
and set a $1m prize, hoping to improve
recommendation algorithms.
• People’s viewing taste beyond usual
blockbusters is highly individual.
• Triangulating with IMDB data, bloggers
identified individual users and were able
to reveal their full list of rentals, not just
those they had “rated”.
What about privacy?
Credit: mkabakov
Privacy controls are not binary but fall on
spectra

Openly identifiable

Free on the
internet

Obfuscation

Access / Environment

(Everyone)

Little legislation

32 Big data and privacy

Anonymised to the
point of losing
valuable content

Locked in a steellined room
(Accredited researcher)

Governance and
accountability

Highly legislated
A taxonomy of obfuscation

Anonymisation: Remove all identifiers such
that it is impossible to identify an individual
Encryption: Prevent it from being read without
unlocking - in theory encrypted databases can
be analysed without breaking the encryption but basically they cannot be used for anything
but trivial uses

Credit: University of
Regensburg

Tokenisation or pseudonymisation: remove
as much of the 'personal' information as
possible - and link to personal via independent
securely held database

Credit: Robbie Cooper

33 Big data and privacy
Obfuscation - differential privacy

• Differential privacy: the database itself remains pure, but a small amount
of noise is added to the final answer of each query, to prevent identification
of a single record.
• Good for many situations, but not for small populations or finding needles
in haystacks, such as the common factors behind a rare disease.
34 Big data and privacy
Access and environment: safe havens

• A safe haven for data is more
like a traditional library, where
controlled access is granted to
people who have the right
credentials.
• You lose some of the benefit of
making data freely available over
the internet, but the risk of
malicious use is greatly reduced.
Credit: QTS

35 Big data and privacy

• The Administrative Data
Research Network is a scheme
to make HMG data available in
safe havens.
Governance: data protection legislation

• Harm can be done by sharing and not
sharing data
• The Data Protection Act is rarely the
real barrier to sharing data for the
protection of individuals
• DPA law provides exemptions for
research, which would be tightened
significantly by the proposed EU Data
Protection Regulation, making some
current medical research illegal. A major
concern.

36 Big data and privacy

Credit: EU dpi
Laws have borders – data does not

Map showing undersea internet cables

37 Big data and privacy
Even if a dataset is effectively anonymised on its own,
and this is very difficult, if freely available it can be
“decrypted” by finding overlaps with other datasets.
These could be a mixture of public and private datasets.

The bottom line: it is very hard to guarantee privacy

38 Big data and privacy
Where is all of this headed, and
what do we need to do?
Credit: Arne Hückelheim/CC-BY-SA-3.0
There are some tough challenges
• The digital infrastructure creates new threats
and vulnerabilities
• Security considerations were not planned into
the internet and web
• The keys to cryptography are only as secure as
those that hold them – importance of human
science
• Who watches the watchers?
• Should big data be on the National Risk
Register?

PD
Juvenal: Roman poet to which Quis
custodiet ipsos custodes? is
attributed.
Balancing risks

• Don't underplay risk
of releasing data: the
challenge is to
balance utility and
privacy
• Recognise that
people that will reidentify are extremely
able and may have
powerful hardware at
their disposal.
Source: stewardshipcommunity.com

41 Big data and privacy
What will be the effect on people?

Autonomy

Privacy

Disclosure

Credit: Shutterstock

42 Big data and privacy

Credit: Shutterstock
What will be the effect on people?

• It is impossible to completely
erase a digital past.
• Future generations may
require the right to be forgiven
rather than the right to be
forgotten.
•Young people are already
becoming more protective of
their data and abandoning
Facebook for Snapchat,
WhatsApp and other platforms.

43 Big data and privacy
There are utopian and dystopian futures

•

Utopia: Knowledge to all,
educating the world,
accountability and
sustainability.

PD

JMW Turner, The Rise of the Carthaginian Empire, 1815

•

Dystopia: end of individuality,
disrupted fabric of society,
childhood play disrupted,
monopoly of the state in law
enforcement disrupted, loss of
trust in service providers.
Credit: Friman/CC-BY-SA-3.0

Presidio Modelo prison, Cuba (abandoned)

44 Big data and privacy
How do we move forward?

Technology

Continue to strongly
support science and skills
agenda.

Communication

Don't underplay risk of
releasing data: challenge
to balance utility and
privacy

Governance

Reduce risk by choice of
environment - safe
havens with penalties:
control environment
proportional to risk of
harm

45 Big data and privacy
Final messages
• There is no going back – the world shaped
by the digital revolution
• There are new tools for understanding
ourselves and the world
• Huge economic opportunities
• There are unforeseen benefits and harms
Final messages
• The internet has no borders
• There will be ever more scope for crime and terrorism in
cyberspace
• UK has great strength in cyber security
• We must stay at the leading edge, develop
proportionate regulation, legislation and accountability.
• Need a sophisticated level of debate.
@uksciencechief
www.bis.gov.uk/go-science

Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material.
We apologise for any errors or omissions in the included attributions and would be grateful if notified of any corrections
that should be incorporated in future versions of this slide set. We can be contacted through enquiries@bis.gsi.gov.uk .

Más contenido relacionado

La actualidad más candente

20190528 - Guidelines for Trustworthy AI
20190528 - Guidelines for Trustworthy AI20190528 - Guidelines for Trustworthy AI
20190528 - Guidelines for Trustworthy AI
Brussels Legal Hackers
 

La actualidad más candente (20)

Big Data and High Performance Computing
Big Data and High Performance ComputingBig Data and High Performance Computing
Big Data and High Performance Computing
 
AI Governance and Ethics - Industry Standards
AI Governance and Ethics - Industry StandardsAI Governance and Ethics - Industry Standards
AI Governance and Ethics - Industry Standards
 
Toward Trustworthy AI
Toward Trustworthy AIToward Trustworthy AI
Toward Trustworthy AI
 
IoT Community - MassTLC - Harvard Business School joint open forum
IoT Community -  MassTLC - Harvard Business School joint open forumIoT Community -  MassTLC - Harvard Business School joint open forum
IoT Community - MassTLC - Harvard Business School joint open forum
 
From Privacy Impact Assessment to Social Impact Assessment: Preserving TRrus...
From Privacy Impact Assessment to Social Impact Assessment: Preserving TRrus...From Privacy Impact Assessment to Social Impact Assessment: Preserving TRrus...
From Privacy Impact Assessment to Social Impact Assessment: Preserving TRrus...
 
20190528 - Guidelines for Trustworthy AI
20190528 - Guidelines for Trustworthy AI20190528 - Guidelines for Trustworthy AI
20190528 - Guidelines for Trustworthy AI
 
Data 4 AI: For European Economic Competitiveness and Societal Progress
Data 4 AI: For European Economic Competitiveness and Societal ProgressData 4 AI: For European Economic Competitiveness and Societal Progress
Data 4 AI: For European Economic Competitiveness and Societal Progress
 
"Technology, Ethics, and Social Work"
"Technology, Ethics, and Social Work""Technology, Ethics, and Social Work"
"Technology, Ethics, and Social Work"
 
Sfit1
Sfit1Sfit1
Sfit1
 
e-SIDES and Ethical AI
e-SIDES and Ethical AIe-SIDES and Ethical AI
e-SIDES and Ethical AI
 
How to Build Out a Tech Eco-System | Dan Cregg | Lunch & Learn
How to Build Out a Tech Eco-System | Dan Cregg | Lunch & Learn How to Build Out a Tech Eco-System | Dan Cregg | Lunch & Learn
How to Build Out a Tech Eco-System | Dan Cregg | Lunch & Learn
 
Manufacturing Innovation Model | Has Patel | Lunch & Learn
Manufacturing Innovation Model | Has Patel | Lunch & Learn Manufacturing Innovation Model | Has Patel | Lunch & Learn
Manufacturing Innovation Model | Has Patel | Lunch & Learn
 
Towards Privacy by Design. Key issues to unlock science.
Towards Privacy by Design. Key issues to unlock science.Towards Privacy by Design. Key issues to unlock science.
Towards Privacy by Design. Key issues to unlock science.
 
Industry Standards as vehicle to address socio-technical AI challenges
Industry Standards as vehicle to address socio-technical AI challengesIndustry Standards as vehicle to address socio-technical AI challenges
Industry Standards as vehicle to address socio-technical AI challenges
 
e-SIDES workshop at BDV Meet-Up, Sofia 14/05/2018
e-SIDES workshop at BDV Meet-Up, Sofia 14/05/2018e-SIDES workshop at BDV Meet-Up, Sofia 14/05/2018
e-SIDES workshop at BDV Meet-Up, Sofia 14/05/2018
 
Open source software in government challenges and opportunities
Open source software in government challenges and opportunitiesOpen source software in government challenges and opportunities
Open source software in government challenges and opportunities
 
SM_General KTN_Smart
SM_General KTN_SmartSM_General KTN_Smart
SM_General KTN_Smart
 
Spring Splash 3.4.2019: When AI Meets Ethics by Meeri Haataja
Spring Splash 3.4.2019: When AI Meets Ethics by Meeri Haataja Spring Splash 3.4.2019: When AI Meets Ethics by Meeri Haataja
Spring Splash 3.4.2019: When AI Meets Ethics by Meeri Haataja
 
e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017
e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017
e-SIDES workshop at ICE-IEEE Conference, Madeira 28/06/2017
 
Internet of Things (IotT) Legal Issues Privacy and Cybersecurity
Internet of Things (IotT) Legal Issues Privacy and Cybersecurity Internet of Things (IotT) Legal Issues Privacy and Cybersecurity
Internet of Things (IotT) Legal Issues Privacy and Cybersecurity
 

Similar a Making sense of big data

The death of data protection sans obama
The death of data protection sans obamaThe death of data protection sans obama
The death of data protection sans obama
Lilian Edwards
 
Thierer Internet Privacy Regulation
Thierer Internet Privacy RegulationThierer Internet Privacy Regulation
Thierer Internet Privacy Regulation
Mercatus Center
 

Similar a Making sense of big data (20)

The 3 Secrets of Online Privacy
The 3 Secrets of Online Privacy The 3 Secrets of Online Privacy
The 3 Secrets of Online Privacy
 
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdfSFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Data ethics and machine learning: discrimination, algorithmic bias, and how t...
Data ethics and machine learning: discrimination, algorithmic bias, and how t...Data ethics and machine learning: discrimination, algorithmic bias, and how t...
Data ethics and machine learning: discrimination, algorithmic bias, and how t...
 
Conference Report Final 11.18
Conference Report Final 11.18Conference Report Final 11.18
Conference Report Final 11.18
 
Privacy reconsidered
Privacy reconsideredPrivacy reconsidered
Privacy reconsidered
 
Big data privacy security regulation
 Big data privacy security regulation Big data privacy security regulation
Big data privacy security regulation
 
The death of data protection sans obama
The death of data protection sans obamaThe death of data protection sans obama
The death of data protection sans obama
 
The death of data protection
The death of data protection The death of data protection
The death of data protection
 
Big Data for a Better World
Big Data for a Better WorldBig Data for a Better World
Big Data for a Better World
 
New Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max WellingNew Developments in Machine Learning - Prof. Dr. Max Welling
New Developments in Machine Learning - Prof. Dr. Max Welling
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism
 
Thierer Internet Privacy Regulation
Thierer Internet Privacy RegulationThierer Internet Privacy Regulation
Thierer Internet Privacy Regulation
 
Big Data Paper
Big Data PaperBig Data Paper
Big Data Paper
 
Towards a More Open World
Towards a More Open WorldTowards a More Open World
Towards a More Open World
 
The age of analytics
The age of analyticsThe age of analytics
The age of analytics
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
The future of data risk sept 2020
The future of data risk   sept 2020The future of data risk   sept 2020
The future of data risk sept 2020
 
Ed Snowden: hero or villain? And the implications for media and democracy
Ed Snowden: hero or villain? And the implications for media and democracyEd Snowden: hero or villain? And the implications for media and democracy
Ed Snowden: hero or villain? And the implications for media and democracy
 
Safe use of cloud - alternative cloud
Safe use of cloud - alternative cloudSafe use of cloud - alternative cloud
Safe use of cloud - alternative cloud
 

Más de bis_foresight

Más de bis_foresight (20)

How will we power the UK in the future?
How will we power the UK in the future? How will we power the UK in the future?
How will we power the UK in the future?
 
Future of cities: overview of evidence
Future of cities: overview of evidenceFuture of cities: overview of evidence
Future of cities: overview of evidence
 
Future of cities: science of cities
Future of cities: science of citiesFuture of cities: science of cities
Future of cities: science of cities
 
Future of cities: foresight for cities
Future of cities: foresight for citiesFuture of cities: foresight for cities
Future of cities: foresight for cities
 
Future of cities: graduate mobility
Future of cities: graduate mobilityFuture of cities: graduate mobility
Future of cities: graduate mobility
 
Distributed ledger technology: beyond block chain
Distributed ledger technology: beyond block chainDistributed ledger technology: beyond block chain
Distributed ledger technology: beyond block chain
 
Forensic science and beyond: authenticity, provenance and assurance - report
Forensic science and beyond: authenticity, provenance and assurance - reportForensic science and beyond: authenticity, provenance and assurance - report
Forensic science and beyond: authenticity, provenance and assurance - report
 
Forensic science and beyond: authenticity, provenance and assurance - evidenc...
Forensic science and beyond: authenticity, provenance and assurance - evidenc...Forensic science and beyond: authenticity, provenance and assurance - evidenc...
Forensic science and beyond: authenticity, provenance and assurance - evidenc...
 
International role of science
International role of scienceInternational role of science
International role of science
 
Resilience to severe weather
Resilience to severe weatherResilience to severe weather
Resilience to severe weather
 
Science and the future of the National Risk Assessment (NRA)
Science and the future of the National Risk Assessment (NRA)Science and the future of the National Risk Assessment (NRA)
Science and the future of the National Risk Assessment (NRA)
 
Resistance 15: The Grand Challenge
Resistance 15: The Grand ChallengeResistance 15: The Grand Challenge
Resistance 15: The Grand Challenge
 
City visions
City visionsCity visions
City visions
 
Water in Future Cities
Water in Future CitiesWater in Future Cities
Water in Future Cities
 
Communicating risk and hazard to policy-makers
Communicating risk and hazard to policy-makers Communicating risk and hazard to policy-makers
Communicating risk and hazard to policy-makers
 
Science and the Library in the 21st Century
Science and the Library in the 21st Century Science and the Library in the 21st Century
Science and the Library in the 21st Century
 
Future of Cities: Thinking for the long-term
Future of Cities: Thinking for the long-termFuture of Cities: Thinking for the long-term
Future of Cities: Thinking for the long-term
 
Going Global: The appliance of science in a complex global context
Going Global: The appliance of science in a complex global contextGoing Global: The appliance of science in a complex global context
Going Global: The appliance of science in a complex global context
 
National Oceanography Centre - 5th annual meeting
National Oceanography Centre - 5th annual meetingNational Oceanography Centre - 5th annual meeting
National Oceanography Centre - 5th annual meeting
 
Crop Protection Association - Managing risk, not avoiding it
Crop Protection Association - Managing risk, not avoiding itCrop Protection Association - Managing risk, not avoiding it
Crop Protection Association - Managing risk, not avoiding it
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Making sense of big data

  • 1. Big Data: Big Opportunity, Big Brother or Big Trouble? Oxford Martin School, December 3rd 2013 Sir Mark Walport, Chief Scientific Adviser to HM Government
  • 2. The future will not be a repetition of the past. James Martin, 1933-2013 Writing in 1978 Credit: Oxford Martin School Those who cannot remember the past are condemned to repeat it. George Santayana, 1863-1952 Writing in 1905 2 Big data and privacy PD
  • 3. • Florence Nightingale, Crimean War nurse and pioneer of statistics. In the 1890s she tried to get a Professorship of Statistics established at Oxford University, specifically for applying statistical analysis to social problems. • At the time the scheme came to nothing, but her vision is now realised all over the world. • Oxford began teaching Applied Statistics in 1947, and appointed its first Professor of Mathematical Statistics in 1962. 3 Big data and privacy PD
  • 4. Overview 1. Identity and identification 2. The promise of big data – opportunities and risks 3. What about privacy? 4.Where is all of this headed, and what do we need to do?
  • 5. Identity – the sameness of a person or thing at all times or in all circumstances; the condition or fact that a person or thing is itself and not something else; individuality, personality. Identity – this is what makes me me Credit: Wellcome Collection
  • 6. Identification – The determination of identity; the action or process of determining what a thing is; the recognition of a thing as being what it is Identification – I will find out who you are Credit: Wellcome Collection
  • 7. Society doesn’t work in the absence of identifiers. So who needs to know about us? Credit: Getty Credit:imagezone Family and friends Credit: Getty Civic sector Credit: Getty Business 7 Big data and privacy Government
  • 8. We manage our relationships by selective disclosure of data - multiple identities Age Financial status Place attachments Profession Nationality Hobbies Ethnicity Family role Religion Community & friendship 8 Big data and privacy
  • 9. The outside world uses different approaches to identify us Direct disclosures • Passport • Driving license • Work pass • NHS number • National Insurance Number Credit: Mark Yuill Credentials and tokens • PIN number • Password • RFID embedded device Credit: Shutterstock 9 Big data and privacy
  • 10. What is personal information? Direct Hard to define, but ultimately information that enables particular attributes to be linked to a unique individual. Face Fingerprint DNA Indirect Name Address Postcode Workplace Club
  • 11. Some attributes are more or less sensitive in different contexts • Age • Sex • Nationality • Religion • Health • Education • Financial • Football Team Richard Nixon ‘s application to the FBI, 1937. Released under FOI. Contains lots of (redacted) sensitive health information. 11 Big data and privacy
  • 12. Information Technology and the web have created new opportunities to create identities Anonymous 12 Big data and privacy Pseudonymous Real
  • 13. The next generation of products will generate yet more data – the internet of things Credit: tedeytan/CC-BY-SA-2.0 Credit: MIT Media Lab Credit: MarkDoliner/CC-BY-2.0 Credit: LG 13 Big data and privacy
  • 14. The data is used by each of us for our personal utility Finding things out Telling other people things Listening and watching things Navigating the real world Navigating fictional worlds Buying and selling stuff Playing games Storing stuff Recording our lives and those of friends/families Socialising with others Stealing things Plotting and causing harm 14 Big data and privacy
  • 15. Information technology has created new ways of locating or finding us Image: iPhone tracking data The consequence of all of this is that we are giving a lot of information out that others can then use…. 15 Big data and privacy
  • 16. Smart meters produce detailed data on energy consumption 16 Big data and privacy
  • 17. The price of the utility is that we are generating data on a massive scale 17 Big data and privacy
  • 18. Lots of other people are interested in our data. Who knows the most about us? Government Corporations ONS Google HMRC Experian NHS Loyalty Cards 18 Big data and privacy
  • 19. How do they use it? Retail suppliers. • Our data is used to provide individual services. • But is also aggregated for wholesale purposes - and they give or sell the wholesale data to other organisations. Credit: Lotus Head/CC-BY-SA-2.5 …and do we know how they use it? Credit: Tesco 19 Big data and privacy
  • 20. The myth of consent - do we really read and understand the full terms and conditions? Credit: Google In 2008, researchers calculated it would take 76 working days to read all the privacy policies you encounter in a year. If everyone in the US did so, it would cost the country more than the GDP of Florida. 20 Big data and privacy
  • 21. How do they use it? Government Voting Credit: ClassicStock Taxes Credit: Phillip Ingham/CC-BY-ND-2.0 21 Big data and privacy Planning Credit: iStockphoto Law enforcement Credit: South Yorkshire Police
  • 22. National security Credit: The Telegraph 22 Big data and privacy Credit: The Guardian
  • 23. Who else uses it? • Future employers • Hostile and competing foreign states • Criminals and terrorists • Journalists 23 Big data and privacy Credit: Getty
  • 24. How do the wholesale collectors of data add value to it? 24 Big data and privacy
  • 25. What more can we do? Societal Level Improving Health (and research in general) Understanding and optimising business processes Improving and optimising cities and countries Optimising Machine and Device Performance Understanding, targeting, and serving customers Improving Security and Law Enforcement 25 Big data and privacy Individual Level Personal quantification and performance optimisation Improving sports performance
  • 26. Improving health: diabetes in Scotland • Total Scottish Population 5.2m • People with diabetes : 251,132 (4.9%) • People with Type 1 DM : ~27,000 (0.5%) • All patients nationally are registered onto a single register; the SCI-DC register • SCI-DC used in all 38 hospitals • Nightly capture of data from all 1043 primary care practices across Scotland Courtesy of Andrew Morris 26 Big data and privacy
  • 27. Getting about: Citymapper • An app for New York and London, which links all transport systems together so you can easily discover the best way to get from where you are to where you want to be. 27 Big data and privacy
  • 28. Improving infrastructure: Streetbump Credit: Streetbump • A project in Boston, a city plagued by potholes and other street maintenance issues. • People can report problems in various easy ways, including an app that automatically detects bumps driven over. • Highly successful, the critical element being an efficient system for getting maintenance crews to the sites of reported issues. 28 Big data and privacy
  • 29. What about the potential harms? • UK research with 58,000 US volunteers found that algorithms based on Facebook “likes”, which are often public, can predict personality traits. • 95% accurate in distinguishing African-American from Caucasian-American and 85% for differentiating Republican from Democrat. • Some odd links as well. Curly fries correlated with high intelligence… Credit: BBC 29 Big data and privacy
  • 30. Dangers of releasing data into the wild • Released anonymised search data for research purposes. • Journalists were able to pick up clues to name and location, then triangulate with embarrassing search queries. • Programme was halted, its initiators sacked. 30 Big data and privacy • Released anonymised film rental data and set a $1m prize, hoping to improve recommendation algorithms. • People’s viewing taste beyond usual blockbusters is highly individual. • Triangulating with IMDB data, bloggers identified individual users and were able to reveal their full list of rentals, not just those they had “rated”.
  • 32. Privacy controls are not binary but fall on spectra Openly identifiable Free on the internet Obfuscation Access / Environment (Everyone) Little legislation 32 Big data and privacy Anonymised to the point of losing valuable content Locked in a steellined room (Accredited researcher) Governance and accountability Highly legislated
  • 33. A taxonomy of obfuscation Anonymisation: Remove all identifiers such that it is impossible to identify an individual Encryption: Prevent it from being read without unlocking - in theory encrypted databases can be analysed without breaking the encryption but basically they cannot be used for anything but trivial uses Credit: University of Regensburg Tokenisation or pseudonymisation: remove as much of the 'personal' information as possible - and link to personal via independent securely held database Credit: Robbie Cooper 33 Big data and privacy
  • 34. Obfuscation - differential privacy • Differential privacy: the database itself remains pure, but a small amount of noise is added to the final answer of each query, to prevent identification of a single record. • Good for many situations, but not for small populations or finding needles in haystacks, such as the common factors behind a rare disease. 34 Big data and privacy
  • 35. Access and environment: safe havens • A safe haven for data is more like a traditional library, where controlled access is granted to people who have the right credentials. • You lose some of the benefit of making data freely available over the internet, but the risk of malicious use is greatly reduced. Credit: QTS 35 Big data and privacy • The Administrative Data Research Network is a scheme to make HMG data available in safe havens.
  • 36. Governance: data protection legislation • Harm can be done by sharing and not sharing data • The Data Protection Act is rarely the real barrier to sharing data for the protection of individuals • DPA law provides exemptions for research, which would be tightened significantly by the proposed EU Data Protection Regulation, making some current medical research illegal. A major concern. 36 Big data and privacy Credit: EU dpi
  • 37. Laws have borders – data does not Map showing undersea internet cables 37 Big data and privacy
  • 38. Even if a dataset is effectively anonymised on its own, and this is very difficult, if freely available it can be “decrypted” by finding overlaps with other datasets. These could be a mixture of public and private datasets. The bottom line: it is very hard to guarantee privacy 38 Big data and privacy
  • 39. Where is all of this headed, and what do we need to do? Credit: Arne Hückelheim/CC-BY-SA-3.0
  • 40. There are some tough challenges • The digital infrastructure creates new threats and vulnerabilities • Security considerations were not planned into the internet and web • The keys to cryptography are only as secure as those that hold them – importance of human science • Who watches the watchers? • Should big data be on the National Risk Register? PD Juvenal: Roman poet to which Quis custodiet ipsos custodes? is attributed.
  • 41. Balancing risks • Don't underplay risk of releasing data: the challenge is to balance utility and privacy • Recognise that people that will reidentify are extremely able and may have powerful hardware at their disposal. Source: stewardshipcommunity.com 41 Big data and privacy
  • 42. What will be the effect on people? Autonomy Privacy Disclosure Credit: Shutterstock 42 Big data and privacy Credit: Shutterstock
  • 43. What will be the effect on people? • It is impossible to completely erase a digital past. • Future generations may require the right to be forgiven rather than the right to be forgotten. •Young people are already becoming more protective of their data and abandoning Facebook for Snapchat, WhatsApp and other platforms. 43 Big data and privacy
  • 44. There are utopian and dystopian futures • Utopia: Knowledge to all, educating the world, accountability and sustainability. PD JMW Turner, The Rise of the Carthaginian Empire, 1815 • Dystopia: end of individuality, disrupted fabric of society, childhood play disrupted, monopoly of the state in law enforcement disrupted, loss of trust in service providers. Credit: Friman/CC-BY-SA-3.0 Presidio Modelo prison, Cuba (abandoned) 44 Big data and privacy
  • 45. How do we move forward? Technology Continue to strongly support science and skills agenda. Communication Don't underplay risk of releasing data: challenge to balance utility and privacy Governance Reduce risk by choice of environment - safe havens with penalties: control environment proportional to risk of harm 45 Big data and privacy
  • 46. Final messages • There is no going back – the world shaped by the digital revolution • There are new tools for understanding ourselves and the world • Huge economic opportunities • There are unforeseen benefits and harms
  • 47. Final messages • The internet has no borders • There will be ever more scope for crime and terrorism in cyberspace • UK has great strength in cyber security • We must stay at the leading edge, develop proportionate regulation, legislation and accountability. • Need a sophisticated level of debate.
  • 48. @uksciencechief www.bis.gov.uk/go-science Every effort has been made to trace copyright holders and to obtain their permission for the use of copyright material. We apologise for any errors or omissions in the included attributions and would be grateful if notified of any corrections that should be incorporated in future versions of this slide set. We can be contacted through enquiries@bis.gsi.gov.uk .