SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
​GDPR and Hadoop
​The elephant in the room
​Janosch Woschitz
​2017-09-27
2
• GDPR Overview
• Rights of the data subject
• Challenges within Hadoop ecosystem
• Technical considerations
Agenda
3
• Complex and detailed topic
• This is NOT legal advice
• A lot of opinions and interpretations about
GDPR
• Talk is not covering all aspects of GDPR
• Process matters, documentation is your
friend
Disclaimer
Take it with a grain of salt
4
“Regulation (EU) 2016/679 of the European Parliament [...] on the protection of natural persons with
regard to the processing of personal data and on the free movement of such data, and repealing
Directive 95/46/EC (General Data Protection Regulation)”
• Establishes data protection as a fundamental right
• Creates unified data protection law for all EU member states
• Enables EU citizens to be in control of their personal data
General Data Protection Regulation
GDP what?
- Official title of the GDPR, http://eur-lex.europa.eu/eli/reg/2016/679/oj
5
• Applies if the data controller or processor (organization) or the data
subject (person) is based in the EU
• Applies to organizations based outside the European Union if they
process or monitor personal data of EU citizens
• Employees might be EU citizens as well
General Data Protection Regulation
Who is affected?
6
• Officially published on May 4th 2016
• Applicable from May 25th 2018 across the EU (including UK)
• “Regulation” instead of “Directive” → no need for national
implementing legislation, directly applicable to all EU countries
• Evaluated and reviewed on May 25th 2020
General Data Protection Regulation
When does it happen?
7
• Better data protection and portability for consumers
• Fines for non-compliance will be
– up to €10M or 2% revenue for minor violations
– up to €20M or 4% revenue for major violations
• Any individual has the right to raise a complaint against any
organisation (Art. 77)
General Data Protection Regulation
Why should I care?
8
Privacy by design
Better data protection, you said?
• Privacy by design and by default, essential data protection
• Breach notification within 72 hours
• Data minimization and access limitation
• Data Protection Officer (DPO) and Data Privacy Impact Assessments
(DPIAs)
• Active, specific and unambiguous consent
“the controller shall [...] implement appropriate technical and organisational measures [...] in an
effective manner [...] in order to meet the requirements of this Regulation and protect the rights of
data subjects.” - Article 25, GDPR
9
Personal data?
https://pixabay.com/en/family-drawing-children-cat-paper-879432/
10
Personal data (examples)
It all depends on context
• Location or web surfing data
• Video surveillance and images
• Personal interests or behavioural patterns
• A child's drawing depicting its family
• Publication of x-ray plates together with the patient's first name
• Damage caused by graffiti in public transportation
• X1234 drinks a glass of wine more than 3 times a week, drives a
Bentley and has a Windows 10 phone
11
Source: Facebook
• Right of access and data portability
– free of charge
– structured, commonly used and machine readable
• Right to erasure
– “without undue delay”
• Right to object, to restrict, to rectify, ...
Data citizen rights
Rights of the data subject
GDPR and Hadoop
13
Hadoop ecosystem & beyond
The known Hadoopverse (excerpt)
and much more ...
14
Data processing on Hadoop
Bird’s eye view
• Various data sources and ingestion tools
• Diverse input formats, structured & unstructured
• Diverse processing tools
• Liberal data access, local data science
• Write-append and immutable data structures
• Redundant data
Ingest Process Access
15
Challenges by
example
• Customer data from
RDBMS to HDFS
• Streaming device
location data to
Kafka
16
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
Challenges by example
Ingest table from RDBMS
daily import (e.g. via sqoop)
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
today
-1 day
-2 days
Big DataSmaller Data
17
Problems & Solution approaches
• Right to be forgotten
• Access limitation
• Bound to consent
• ...
• Anonymization
• Hashing
• Encryption
• ...
18
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
Challenges by example
Encrypt, a.k.a. Lost Key Pattern
daily import (e.g. via sqoop)
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “54DCF13E4...”
“dateOfBirth”: “D3DFBCE...”
today
-1 day
-2 days
123
19
deviceId: 123pushes data to Kafka topic
123
B
“deviceId”: 123
“lat”: 52.510781
“lon”: 13.371735
Challenges by example
Deletion in log based systems
Edge device
456
A
123
D
123
∅
Kafka topic Consumer
B, C, D, ∅
offset
2
123
C
3 4 5 6
20
deviceId: 123pushes data to Kafka topic
123
D4
“deviceId”: 123
“lat”: 52.510781
“lon”: 13.371735
Challenges by example
Encrypt on write
Edge device
123
Z3
456
T3
123
6H
Kafka topic Consumer
A, B, C, D
offset
1
123
N7
2 3 4 5
123
?
21
Vendor recommendations
Distributions to the rescue!
• Hortonworks - "GDPR: The Good, Bad and Ugly", Jun 20 2017
• Cloudera - "Simplify your response to GDPR", Aug 24 2017
• GDPR compliance via partner solutions
• Only partial answers
Source: Cloudera Inc.
22
GDPR recommendations simplified
Kudu
Sentry
Navigator
Data Science
Workbench
HDFS / ...
Ranger
Atlas
Zeppelin
+ lots of partner solutions
23
Data privacy and open source
Pragmatic considerations
• Secured cluster
• Raw data in encryption zones with very limited access
• Anonymize for further processing wherever possible
• Proper retention policies, batch delete requests and perform regular
clean-ups
• Integrate with Atlas and Ranger → tagging, filtering and masking
• Custom solutions for glue and missing pieces
24
Summary
• No comprehensive open-source solution available
• Proprietary services target specific problem domains, integration still
necessary
• Some time until legal dust settled
• Idea: Avro (logical types) + Vault (or similar) + Ranger + Atlas?
The road ahead
2525 © 2017 Teradata
26
Hadoop Security Primer
In just one slide
• Authentication - Kerberos
• Authorization - Ranger, Sentry, ACLs
• Auditing / Monitoring - Ranger, Navigator, ...
• Encryption of data in motion - KMS, Navigator, ...
• Encryption of data at rest - Encryption zones, SEDs, ...
• Hadoop Security (Ben Spivey, Joey Echeverria)
• Hadoop and Kerberos: The Madness beyond the Gate
27
Personal data
According to GDPR
“any information relating to an identified or identifiable natural person (‘data
subject’);
An identifiable natural person is one who can be identified, directly or indirectly,
in particular by reference to an identifier such as a name, an identification
number, location data, an online identifier or to one or more factors specific to
the physical, physiological, genetic, mental, economic, cultural or social identity
of that natural person.”
- Article 4, GDPR

Más contenido relacionado

La actualidad más candente

The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetCarl W. Handlin
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 

La actualidad más candente (20)

The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 

Similar a GDPR and Hadoop

04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, RubrikVMUG IT
 
Isaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big dataIsaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big dataUlf Mattsson
 
Isaca new delhi india privacy and big data
Isaca new delhi india   privacy and big dataIsaca new delhi india   privacy and big data
Isaca new delhi india privacy and big dataUlf Mattsson
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018 e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018 e-SIDES.eu
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Cross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataCross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataUlf Mattsson
 
Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Big Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRBig Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRMatt Stubbs
 
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 Mind Your Business: Why Privacy Matters to the Successful Enterprise Mind Your Business: Why Privacy Matters to the Successful Enterprise
Mind Your Business: Why Privacy Matters to the Successful EnterpriseEric Kavanagh
 
How MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceHow MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceMongoDB
 
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...TokenEx
 
Webinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPRWebinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPRpanagenda
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 

Similar a GDPR and Hadoop (20)

Sible 09
Sible 09Sible 09
Sible 09
 
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
 
Isaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big dataIsaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big data
 
Isaca new delhi india privacy and big data
Isaca new delhi india   privacy and big dataIsaca new delhi india   privacy and big data
Isaca new delhi india privacy and big data
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018 e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
 
Gdpr brief and controls ver2.0
Gdpr brief and controls ver2.0Gdpr brief and controls ver2.0
Gdpr brief and controls ver2.0
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Cross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataCross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive data
 
Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Big Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRBig Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPR
 
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 Mind Your Business: Why Privacy Matters to the Successful Enterprise Mind Your Business: Why Privacy Matters to the Successful Enterprise
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 
How MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceHow MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR compliance
 
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
 
Webinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPRWebinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPR
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 

Último

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 

Último (20)

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

GDPR and Hadoop

  • 1. ​GDPR and Hadoop ​The elephant in the room ​Janosch Woschitz ​2017-09-27
  • 2. 2 • GDPR Overview • Rights of the data subject • Challenges within Hadoop ecosystem • Technical considerations Agenda
  • 3. 3 • Complex and detailed topic • This is NOT legal advice • A lot of opinions and interpretations about GDPR • Talk is not covering all aspects of GDPR • Process matters, documentation is your friend Disclaimer Take it with a grain of salt
  • 4. 4 “Regulation (EU) 2016/679 of the European Parliament [...] on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)” • Establishes data protection as a fundamental right • Creates unified data protection law for all EU member states • Enables EU citizens to be in control of their personal data General Data Protection Regulation GDP what? - Official title of the GDPR, http://eur-lex.europa.eu/eli/reg/2016/679/oj
  • 5. 5 • Applies if the data controller or processor (organization) or the data subject (person) is based in the EU • Applies to organizations based outside the European Union if they process or monitor personal data of EU citizens • Employees might be EU citizens as well General Data Protection Regulation Who is affected?
  • 6. 6 • Officially published on May 4th 2016 • Applicable from May 25th 2018 across the EU (including UK) • “Regulation” instead of “Directive” → no need for national implementing legislation, directly applicable to all EU countries • Evaluated and reviewed on May 25th 2020 General Data Protection Regulation When does it happen?
  • 7. 7 • Better data protection and portability for consumers • Fines for non-compliance will be – up to €10M or 2% revenue for minor violations – up to €20M or 4% revenue for major violations • Any individual has the right to raise a complaint against any organisation (Art. 77) General Data Protection Regulation Why should I care?
  • 8. 8 Privacy by design Better data protection, you said? • Privacy by design and by default, essential data protection • Breach notification within 72 hours • Data minimization and access limitation • Data Protection Officer (DPO) and Data Privacy Impact Assessments (DPIAs) • Active, specific and unambiguous consent “the controller shall [...] implement appropriate technical and organisational measures [...] in an effective manner [...] in order to meet the requirements of this Regulation and protect the rights of data subjects.” - Article 25, GDPR
  • 10. 10 Personal data (examples) It all depends on context • Location or web surfing data • Video surveillance and images • Personal interests or behavioural patterns • A child's drawing depicting its family • Publication of x-ray plates together with the patient's first name • Damage caused by graffiti in public transportation • X1234 drinks a glass of wine more than 3 times a week, drives a Bentley and has a Windows 10 phone
  • 11. 11 Source: Facebook • Right of access and data portability – free of charge – structured, commonly used and machine readable • Right to erasure – “without undue delay” • Right to object, to restrict, to rectify, ... Data citizen rights Rights of the data subject
  • 13. 13 Hadoop ecosystem & beyond The known Hadoopverse (excerpt) and much more ...
  • 14. 14 Data processing on Hadoop Bird’s eye view • Various data sources and ingestion tools • Diverse input formats, structured & unstructured • Diverse processing tools • Liberal data access, local data science • Write-append and immutable data structures • Redundant data Ingest Process Access
  • 15. 15 Challenges by example • Customer data from RDBMS to HDFS • Streaming device location data to Kafka
  • 16. 16 “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” Challenges by example Ingest table from RDBMS daily import (e.g. via sqoop) “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” today -1 day -2 days Big DataSmaller Data
  • 17. 17 Problems & Solution approaches • Right to be forgotten • Access limitation • Bound to consent • ... • Anonymization • Hashing • Encryption • ...
  • 18. 18 “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” Challenges by example Encrypt, a.k.a. Lost Key Pattern daily import (e.g. via sqoop) “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “54DCF13E4...” “dateOfBirth”: “D3DFBCE...” today -1 day -2 days 123
  • 19. 19 deviceId: 123pushes data to Kafka topic 123 B “deviceId”: 123 “lat”: 52.510781 “lon”: 13.371735 Challenges by example Deletion in log based systems Edge device 456 A 123 D 123 ∅ Kafka topic Consumer B, C, D, ∅ offset 2 123 C 3 4 5 6
  • 20. 20 deviceId: 123pushes data to Kafka topic 123 D4 “deviceId”: 123 “lat”: 52.510781 “lon”: 13.371735 Challenges by example Encrypt on write Edge device 123 Z3 456 T3 123 6H Kafka topic Consumer A, B, C, D offset 1 123 N7 2 3 4 5 123 ?
  • 21. 21 Vendor recommendations Distributions to the rescue! • Hortonworks - "GDPR: The Good, Bad and Ugly", Jun 20 2017 • Cloudera - "Simplify your response to GDPR", Aug 24 2017 • GDPR compliance via partner solutions • Only partial answers Source: Cloudera Inc.
  • 22. 22 GDPR recommendations simplified Kudu Sentry Navigator Data Science Workbench HDFS / ... Ranger Atlas Zeppelin + lots of partner solutions
  • 23. 23 Data privacy and open source Pragmatic considerations • Secured cluster • Raw data in encryption zones with very limited access • Anonymize for further processing wherever possible • Proper retention policies, batch delete requests and perform regular clean-ups • Integrate with Atlas and Ranger → tagging, filtering and masking • Custom solutions for glue and missing pieces
  • 24. 24 Summary • No comprehensive open-source solution available • Proprietary services target specific problem domains, integration still necessary • Some time until legal dust settled • Idea: Avro (logical types) + Vault (or similar) + Ranger + Atlas? The road ahead
  • 25. 2525 © 2017 Teradata
  • 26. 26 Hadoop Security Primer In just one slide • Authentication - Kerberos • Authorization - Ranger, Sentry, ACLs • Auditing / Monitoring - Ranger, Navigator, ... • Encryption of data in motion - KMS, Navigator, ... • Encryption of data at rest - Encryption zones, SEDs, ... • Hadoop Security (Ben Spivey, Joey Echeverria) • Hadoop and Kerberos: The Madness beyond the Gate
  • 27. 27 Personal data According to GDPR “any information relating to an identified or identifiable natural person (‘data subject’); An identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” - Article 4, GDPR