SlideShare una empresa de Scribd logo
1 de 53
Descargar para leer sin conexión
Automating Data Pipeline Security
1
2
4
3
5 6
Carta’s Data Team is Hiring 🎉
Automating Data Pipeline Security
Automating Data Pipeline Security
Privacy
3 Big Ideas
1. Privacy has a strange history.
2. Privacy-first systems are designed by people with a professional ethic.
3. Privacy can be automated away.
Automating security in your data pipeline
privacy
1. Strange History of Privacy
16

“The actio iniuriarum was, in Roman law,
a delict which served to protect the
non-patrimonial aspects of a person's
existence – who a person is rather than
what a person has.”
©1979 "The Invention of the Right to Privacy" by Dorothy J. Glancy
2. Privacy-first Ethic
Software is eating
the world.
“Audit defensibility is too low a
bar when it comes to our
customer’s privacy.”
GDPR
EU General Data Protection Regulation
● Right of access
● Pseudonymisation
● Right of erasure
● Records of processing activities
● Privacy by design
CCPA
California Consumer Privacy Act
● Know what personal information is being
collected
● Right to erasure
● Know whether their personal information is
being shared, and if so, with whom
● Opt-out of the sale of their personal
information
Privacy Regulation
3. Automate Privacy
“The security posture of your
weakest vendor is the security
posture of your entire
organization.”
Blank Slide
● Airflow DAGs to move data into S3
and Redshift
● DAG: Directed Acyclic Graph
● Operator/Task: A node in the graph
● Airflow runs dbt
Workflow manager from Airbnb
Apache Airflow
Apache Airflow
● Open source boilerplate for running Airflow
in Docker
● Used at Carta
Dockerized Airflow
How do we keep up with the sensitive
columns being added in source data?
Automating the blacklist updates
Stale Blacklist
● dbt tests fail when the result set is
not empty.
● The records returned by dbt test
are the offending records.
Automated data tests
dbt test
● dbt tests fail when the result set is
not empty.
● The records returned by dbt test
are the offending records.
Automated data tests
dbt test
We have a custom access management
system called Gatekeeper.
Tools for requesting and granting access
Automating Access
This example uses our IAM Service
Account custom Terraform module to
create a new Revenue Service account
user with access to a single S3 data lake
bucket.
Automate Data Lake access
Terraform Modules
Data Warehouse Migrations
● sql-migrate: Excellent cli and
migrations library written in Go.
● Extended to support Jinja
templating.
We can rebuild the Warehouse from code.
Pseudonymity
Disguised identity or “false name”
©2019 Alex Ewerlöf "GDPR pseudonymization techniques"
Pseudonymity: Obfuscation
👍 Easy to do in any language.
👍 No impact to downstream systems.
👎 Can be unscrambled.
Scrambling or mixing up data
Pseudonymity: Masking
👍 Simple.
👍 Owner can verify the last 4 digits.
👎 Some pieces of the real data are stored.
Obscure part of the data
Pseudonymity: Tokenization
👍 Popular libraries like Faker.
👍 All original data is replaced.
👎 No way to recover the original data.
Replace real data with fake data
Pseudonymity: Blurring
👍 95% of this image is left unblurred.
👎 Possible to reverse blurring.
Blur a subset of the data
Pseudonymity: Encryption
👍 The original data can be recovered.
👍 Manage fewer permissions downstream.
👎 Asymmetric vs Symmetric trade-offs.
Two-way transformation of the data
AWS Key Management Service
● Generate a new data key for encrypting and
decrypting data protected by a master key.
● Or manually rotate the master key and
re-encrypt the data.
Automate key creation and rotation
Encrypted Columns
● pgcrypto allows us to encrypt sensitive
columns before the data lands in our S3
data lake.
● This example is encrypting the birth_date
column in Postgres.
Postgres pgcrypto
“Last Mile” Decryption
● Access to encrypted columns is limited to
analysts with the encryption key.
● This example is decrypting the birth_date
column in Redshift.
Decrypt sensitive data at query time
Encrypted Column Problems
Some things to consider...
1. Symmetric or Asymmetric encryption scheme?
2. Should we manually rotate our master key?
3. How many keys should we use and how should they be organized?
4. Should our analysts and data scientists need to think about keys?
5. When and how do we re-encrypt data? When an employee with
access to keys leaves the company?
3 Big Ideas
1. Privacy has a strange history.
2. Privacy-first systems are designed by people with a professional ethic.
3. Privacy can be automated away.
Automating security in your data pipeline
privacy
carta.com/jobs
@troyharvey
troy.harvey@carta.com
OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey

Más contenido relacionado

La actualidad más candente

CryptocurrencyProject
CryptocurrencyProjectCryptocurrencyProject
CryptocurrencyProject
Tim Tosi
 

La actualidad más candente (15)

C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam HeywoodC* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
C* Summit 2013: Lock it Up: Securing Sensitive Data by Sam Heywood
 
CryptocurrencyProject
CryptocurrencyProjectCryptocurrencyProject
CryptocurrencyProject
 
BigchainDB - Big Data meets Blockchain
BigchainDB - Big Data meets BlockchainBigchainDB - Big Data meets Blockchain
BigchainDB - Big Data meets Blockchain
 
Blockchain
BlockchainBlockchain
Blockchain
 
Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas Te damos la bienvenida a una nueva forma de realizar búsquedas
Te damos la bienvenida a una nueva forma de realizar búsquedas
 
Blockchain big data groningen meetup 2017-03-23
Blockchain   big data groningen meetup 2017-03-23Blockchain   big data groningen meetup 2017-03-23
Blockchain big data groningen meetup 2017-03-23
 
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017 Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
Blockchain Beyond Finance - Cronos Groep - Jan 17, 2017
 
Ethereum explorer
Ethereum explorerEthereum explorer
Ethereum explorer
 
Blockchain – The future of Internet by Moinur Rahman
Blockchain – The future of Internet by Moinur RahmanBlockchain – The future of Internet by Moinur Rahman
Blockchain – The future of Internet by Moinur Rahman
 
Records keeper product deck
Records keeper   product deckRecords keeper   product deck
Records keeper product deck
 
Crowdsourcing Speech Data Science and AI
Crowdsourcing Speech Data Science and AI Crowdsourcing Speech Data Science and AI
Crowdsourcing Speech Data Science and AI
 
Demystifying messaging communication patterns
Demystifying messaging communication patterns Demystifying messaging communication patterns
Demystifying messaging communication patterns
 
Trent McConaghy- BigchainDB
Trent McConaghy- BigchainDBTrent McConaghy- BigchainDB
Trent McConaghy- BigchainDB
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- Frontera
 

Similar a OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey

Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
David Timothy Strauss
 
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Denodo
 

Similar a OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey (20)

CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge SystemsCrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
CrypTag: Building Encrypted, Taggable, Searchable Zero-knowledge Systems
 
key-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storagekey-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storage
 
XP Days 2019: First secret delivery for modern cloud-native applications
XP Days 2019: First secret delivery for modern cloud-native applicationsXP Days 2019: First secret delivery for modern cloud-native applications
XP Days 2019: First secret delivery for modern cloud-native applications
 
Cloud Security and some preferred practices
Cloud Security and some preferred practicesCloud Security and some preferred practices
Cloud Security and some preferred practices
 
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
Don't Build "Death Star" Security - O'Reilly Software Architecture Conference...
 
P2 Project
P2 ProjectP2 Project
P2 Project
 
Security pre
Security preSecurity pre
Security pre
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Webinar: How to Design Primary Storage for GDPR
Webinar: How to Design Primary Storage for GDPRWebinar: How to Design Primary Storage for GDPR
Webinar: How to Design Primary Storage for GDPR
 
cryptography
cryptographycryptography
cryptography
 
Protecting Sensitive Data using Encryption and Key Management
Protecting Sensitive Data using Encryption and Key ManagementProtecting Sensitive Data using Encryption and Key Management
Protecting Sensitive Data using Encryption and Key Management
 
Rugged DevOps at Scale with Rich Mogull
Rugged DevOps at Scale with Rich MogullRugged DevOps at Scale with Rich Mogull
Rugged DevOps at Scale with Rich Mogull
 
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
Cryptographie avancée et Logical Data Fabric : Accélérez le partage et la mig...
 
Securing data at rest with encryption
Securing data at rest with encryptionSecuring data at rest with encryption
Securing data at rest with encryption
 
key-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storagekey-aggregate cryptosystem for scalable data sharing in cloud storage
key-aggregate cryptosystem for scalable data sharing in cloud storage
 
Building A Cloud Security Strategy for Scale
Building A Cloud Security Strategy for ScaleBuilding A Cloud Security Strategy for Scale
Building A Cloud Security Strategy for Scale
 
Solve Big Data Security Issues
Solve Big Data Security IssuesSolve Big Data Security Issues
Solve Big Data Security Issues
 
Automation Patterns for Scalable Secret Management
Automation Patterns for Scalable Secret ManagementAutomation Patterns for Scalable Secret Management
Automation Patterns for Scalable Secret Management
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control
 
Privacy and Neutrality v0.1.0
Privacy and Neutrality v0.1.0Privacy and Neutrality v0.1.0
Privacy and Neutrality v0.1.0
 

Último

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Último (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 

OSDC 2019 | Automating Security in Your Data Pipline by Troy Harvey

  • 3.
  • 4.
  • 5. Carta’s Data Team is Hiring 🎉
  • 7. Automating Data Pipeline Security Privacy
  • 8. 3 Big Ideas 1. Privacy has a strange history. 2. Privacy-first systems are designed by people with a professional ethic. 3. Privacy can be automated away. Automating security in your data pipeline privacy
  • 9. 1. Strange History of Privacy
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. 16  “The actio iniuriarum was, in Roman law, a delict which served to protect the non-patrimonial aspects of a person's existence – who a person is rather than what a person has.”
  • 17.
  • 18.
  • 19.
  • 20. ©1979 "The Invention of the Right to Privacy" by Dorothy J. Glancy
  • 21.
  • 24. “Audit defensibility is too low a bar when it comes to our customer’s privacy.”
  • 25. GDPR EU General Data Protection Regulation ● Right of access ● Pseudonymisation ● Right of erasure ● Records of processing activities ● Privacy by design CCPA California Consumer Privacy Act ● Know what personal information is being collected ● Right to erasure ● Know whether their personal information is being shared, and if so, with whom ● Opt-out of the sale of their personal information Privacy Regulation
  • 27.
  • 28. “The security posture of your weakest vendor is the security posture of your entire organization.”
  • 29.
  • 31.
  • 32. ● Airflow DAGs to move data into S3 and Redshift ● DAG: Directed Acyclic Graph ● Operator/Task: A node in the graph ● Airflow runs dbt Workflow manager from Airbnb Apache Airflow
  • 33. Apache Airflow ● Open source boilerplate for running Airflow in Docker ● Used at Carta Dockerized Airflow
  • 34.
  • 35. How do we keep up with the sensitive columns being added in source data? Automating the blacklist updates Stale Blacklist
  • 36. ● dbt tests fail when the result set is not empty. ● The records returned by dbt test are the offending records. Automated data tests dbt test
  • 37. ● dbt tests fail when the result set is not empty. ● The records returned by dbt test are the offending records. Automated data tests dbt test
  • 38. We have a custom access management system called Gatekeeper. Tools for requesting and granting access Automating Access
  • 39. This example uses our IAM Service Account custom Terraform module to create a new Revenue Service account user with access to a single S3 data lake bucket. Automate Data Lake access Terraform Modules
  • 40. Data Warehouse Migrations ● sql-migrate: Excellent cli and migrations library written in Go. ● Extended to support Jinja templating. We can rebuild the Warehouse from code.
  • 41. Pseudonymity Disguised identity or “false name” ©2019 Alex Ewerlöf "GDPR pseudonymization techniques"
  • 42. Pseudonymity: Obfuscation 👍 Easy to do in any language. 👍 No impact to downstream systems. 👎 Can be unscrambled. Scrambling or mixing up data
  • 43. Pseudonymity: Masking 👍 Simple. 👍 Owner can verify the last 4 digits. 👎 Some pieces of the real data are stored. Obscure part of the data
  • 44. Pseudonymity: Tokenization 👍 Popular libraries like Faker. 👍 All original data is replaced. 👎 No way to recover the original data. Replace real data with fake data
  • 45. Pseudonymity: Blurring 👍 95% of this image is left unblurred. 👎 Possible to reverse blurring. Blur a subset of the data
  • 46. Pseudonymity: Encryption 👍 The original data can be recovered. 👍 Manage fewer permissions downstream. 👎 Asymmetric vs Symmetric trade-offs. Two-way transformation of the data
  • 47. AWS Key Management Service ● Generate a new data key for encrypting and decrypting data protected by a master key. ● Or manually rotate the master key and re-encrypt the data. Automate key creation and rotation
  • 48. Encrypted Columns ● pgcrypto allows us to encrypt sensitive columns before the data lands in our S3 data lake. ● This example is encrypting the birth_date column in Postgres. Postgres pgcrypto
  • 49. “Last Mile” Decryption ● Access to encrypted columns is limited to analysts with the encryption key. ● This example is decrypting the birth_date column in Redshift. Decrypt sensitive data at query time
  • 50. Encrypted Column Problems Some things to consider... 1. Symmetric or Asymmetric encryption scheme? 2. Should we manually rotate our master key? 3. How many keys should we use and how should they be organized? 4. Should our analysts and data scientists need to think about keys? 5. When and how do we re-encrypt data? When an employee with access to keys leaves the company?
  • 51. 3 Big Ideas 1. Privacy has a strange history. 2. Privacy-first systems are designed by people with a professional ethic. 3. Privacy can be automated away. Automating security in your data pipeline privacy