Más contenido relacionado La actualidad más candente (20) Similar a Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019 (20) Más de Amazon Web Services (20) Protect customer privacy with AWS - GRC351 - AWS re:Inforce 2019 1. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Protect customer privacy with AWS
Rohit Pujari
Solutions Architecture
Amazon Web Services
G R C 3 5 1
Anhad Preet Singh
Enterprise Architect
Dataguise
2. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s look at some metrics
3. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
PII
Rogue
agents
External
Hacking
Second-party
misuse
Breach
Spyware
Unsecured
devices
Espionage
Botnets
Consumer
consent
violation
Dangers of
holding PII
4. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Compliance and regulations
• GDPR (EU): General Data Protection Regulation
• CCPA (California): California Consumer Privacy Act
• PIPEDA (Canada): Personal Information Protection and Electronic Documents Act
• PCI-DSS (payment cards)
• FSA, ICO, DPA, Payment Schemes, EU Member State laws, and US and other foreign
regulators (e.g., SEC)
Compliance rules and regulations are constantly evolving. As such, we are moving toward true
data privacy law/regulation.
5. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Then and now
Directives Laws
Best practices/
good ethics
Regulatory
requirements
No
consequences
Heavy fines
Overhead
In design and
necessity
6. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
What problems are customers are trying to solve?
• What type of data am I collecting?
• Where do I collect it?
• Where do I store it?
• Do I have the appropriate legal collection
statements?
• How and when do I delete data?
• How do I secure the data?
• What responsibility do I have?
• Why do I collect the data?
• What is my legal basis for processing and using
the data?
• Where is a list of all my data?
• Do I communicate with the subject I am
collecting from?
• Who do I share it with?
• Who has access to my data? How do I control it?
• What are the use cases for the data? Are they
permitted? Who provided permission?
• How do I find my data?
8. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lakes architecture
• Relational and non-relational data
• Schema defined during analysis
• Unmatched durability and availability
• Security, compliance, and audit capabilities
• Run any analytics on the same data without
movement
• Scale storage and compute independently
• Pay for what is used
AWS
Snowball
AWS
Snowmobile
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon S3
Amazon
Redshift
Amazon
EMR
Amazon
Athena
Amazon
Kinesis Amazon Elasticsearch
Service
Amazon Kinesis
Video Streams
AI Services
9. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pay only for the resources you use as you scale
• Pay as you go for the resources you consume
• As low as $0.05/GB scanned with Amazon Athena
• Amazon EMR and Amazon Athena can automatically
scale down resources after job completes, saving you
costs
• Commit to a set term and save up to 75% with Reserved
Instance
• Run on spare compute capacity with Amazon EMR and
save up to 90% with Amazon EC2 Spot
Traditional approach leads to wasted capacity
Traditional: Rigid
AWS: Elastic
Capacity
Demand
Demand
Servers
Unmet demand
Upset players
Missed revenue
Excess capacity
Wasted $$$
AWS approach: Pay for the capacity you use
10. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits of AWS for secure data storage
Security and compliance
Three different forms of
encryption; encrypts data in
transit when replicating across
Regions; log and monitor with
AWS CloudTrail, use ML to
discover and protect sensitive
data with Amazon Macie
Flexible management
Classify, report, and visualize
data usage trends; objects can be
tagged to see storage
consumption, cost, and security;
build lifecycle policies to
automate tiering, and retention
Durability, availability, &
scalability
Built for eleven nines of
durability; data distributed
across 3 physical facilities in
an AWS Region;
automatically replicated to
any other AWS Region
Query in place
Run analytics & ML on
data lake without data
movement; Amazon S3
Select can retrieve a
subset of data, improving
analytics performance by
400%
11. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storing is not enough; data needs to be discoverable
Dark data are the information assets
organizations collect, process, and store
during regular business activities,
but generally fail to use for other purposes
(for example, analytics, business
relationships and direct monetizing).
CRM ERP Data warehouse Mainframe
data
Web Social Log
files
Machine
data
Semi-
structured
Unstructured
“
”Gartner IT Glossary, 2018
https://www.gartner.com/it-glossary/dark-data
12. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue Data Catalog
15. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Grant permissions to securely share data
16. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Lake Formation security workflow
User
IAM users, roles
Active Directory Amazon S3
17. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
CSA
Cloud Security
Alliance Controls
ISO 9001
Global Quality
Standard
ISO 27001
Security Management
Controls
ISO 27017
Cloud Specific
Controls
ISO 27018
Personal Data
Protection
PCI DSS Level 1
Payment Card
Standards
SOC 1
Audit Controls
Report
SOC 2
Security, Availability, &
Confidentiality Report
SOC 3
General Controls
Report
Global United States
CJIS
Criminal Justice
Information Services
DoD SRG
DoD Data
Processing
FedRAMP
Government Data
Standards
FERPA
Educational
Privacy Act
FIPS
Government Security
Standards
FISMA
Federal Information
Security Management
GxP
Quality Guidelines
and Regulations
ISO FFIEC
Financial Institutions
Regulation
HIPAA
Protected Health
Information
ITAR
International Arms
Regulations
MPAA
Protected Media
Content
NIST
National Institute of
Standards and Technology
SEC Rule 17a-4(f)
Financial Data
Standards
VPAT/Section 508
Accountability
Standards
Asia Pacific
FISC [Japan]
Financial Industry
Information Systems
IRAP [Australia]
Australian Security
Standards
K-ISMS [Korea]
Korean Information
Security
MTCS Tier 3 [Singapore]
Multi-Tier Cloud
Security Standard
My Number Act [Japan]
Personal Information
Protection
Europe
C5 [Germany]
Operational Security
Attestation
Cyber Essentials
Plus [UK]
Cyber Threat
Protection
G-Cloud [UK]
UK Government
Standards
IT-Grundschutz
[Germany]
Baseline Protection
Methodology
X P
G
Compliance: Virtually every regulatory agency
18. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
How data normally flows…
Extraction process
Load process
Transformation process
Amazon S3
data lake
Amazon
Redshift
staging
table
Reporting process
Amazon
Redshift
destination
table
Reports and
extracts
Source data
(database or
API)
19. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transforming sensitive data
The key to building a de-identified
system is adding a sensitive data
transformation step to the data
extraction process
Extraction and
transformation
process
Load process
Post-load
transformation
Amazon S3
data lake
Amazon
Redshift
staging
table
Reporting process
Amazon
Redshift
destination
table
Source data
(database or
API)
Reporting process
Reports and
extracts
21. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dataguisecombines privacy and security
Find sensitive data in structured,
unstructured, and
semi-structured content
Remediate your sensitive data exposure for
risk and compliance obligations
Track how and where sensitive data is being
accessed
Detect Protect Monitor
De-identify personal data
Encrypt at the element level
Track cross-border transfers
Track third-party disclosures
Discover and classify sensitive
data
Inventory identities and
requirements
Process data subject access
requests
Notify on retention limits
Alert on compliance violations
Alert on inappropriate user access
22. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
The problem: Regulatory resistance
Hadoop Database
File sharesData warehouse
On-premises
AWS
23. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
The solution
On-premises AWS
Start End
Scan on-premises
data repositories
for personal and
sensitive data
Sensitive
data?
Sensitive
data?
Scan the migrated
data in AWS for
personal and
sensitive data
Yes
Remediate: notify,
mask, encrypt,
tokenize, access
control, DLP
No Migrate
data
Yes
Remediate: notify,
mask, encrypt,
tokenize, access
control, DLP
No
25. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solution architecture
Hadoop Cluster with
Hadoop IDP on the edge node
DgSecure controller
LDAP/AD
On-premises Data Center 1 On-premises Data Center 2
Target databases
File shares with files IDP
DBMS IDP
AWS Cloud
Target databases/Redshift/RDS
EC2 instance with
DBMS IDP and Cloud IDP
Amazon S3 buckets
Amazon EMR cluster
with S3 compute IDP
26. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Email Customer ID Transcript
vcorleon@gf.com 19664 Just talked to Vito Corleone
fred@gf.com 23423 Fredo’s SSN is 716905534
sonny@gf.com 99644 Sonny is moving to Nevada
NA 02945 It is expected to rain tomorrow
Validating the knowns & finding the unknowns: Structured
and semi-structured data
ID Name, SSN, StateEmail
27. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Email Customer ID Transcript
vcorleon@gf.com 19664 Just talked to Vito Corleone
fred@gf.com 23423 Fredo’s SSN is 716905534
sonny@gf.com 99644 Sonny is moving to Nevada
NA 02945 It is expected to rain tomorrow
Validating the knowns & finding the unknowns: Structured
and semi-structured data
ID Name, SSN, StateEmail
Email Customer ID Transcript
4t23gttt 7462391 Just talked to Lebron James
44e5325 1239474 Melo’s SSN is 983441298
0we&yrw 9983487 Manu is moving to Texas
NA 3344325 It is expected to rain tomorrow
28. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Finding the needles in the haystack
unstructured data
Customers Call center
Your call will be recorded
for quality assurance
……………..this is Jonathan Franklin and my social is
six one two one four five three zero nine is there any more
informationyou need for my app...........
Social Security Number
Full name
1 2 3
4
5
29. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Protecting the needles in the haystack
unstructured data
……………..this is Aaron Rodgers and my social is
two three one six four zero nine one two is there any more
informationyou need for my app...........
30. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale and accuracy
• Scanning strategies
• Sampling
• Location: Top, bottom, random, etc.
• Amount: By percentage, by size, etc.
• Machine learning
• Low to no false positives
• Intelligent detection
• Parallel execution
31. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
What Dataguise scans and protects on-premises
SQL Server
Supported databases Other supported repositoriesSupported Hadoop distributions
32. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
What Dataguise scan and protect in AWS
Amazon S3 Amazon Aurora
Amazon RDS Amazon EMR Amazon Redshift
34. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sleeping disorder device manufacturer
• Continuous ingestion of data from sleeping devices used by their patients, in CSV and Parquet files
• Customer created a data lake on Amazon S3
• PHI data needs to be masked before landing in data lake
• Customer uses a concept of microbatches, where each microbatch = 10 min., and in this time it
ingests almost 1.5 GB of data
• Dataguise needs to identify and mask data in < 5 min. (+1 min. tolerance)
35. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
The solution
Landing zone
Safe zone
Devices
Amazon S3
AWS Cloud
36. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
Large pharmaceutical company
• Continuous ingestion of data from various healthcare companies in the form of JSON and CSV files in
Amazon S3
• Customer created a data lake on Amazon S3
• AWS Lambda functions detect when a new file lands in the staging area and kicks off all the APIs
• PHI data needs to be detected and masked before landing in data lake
• Customer uses a staging area where Dataguise and other tools are used to identify sensitive data,
identify profile data, run anti-viruses, etc. before the data is moved into the data lake
37. © 2019,Amazon Web Services, Inc. or its affiliates. All rights reserved.
The solution
Staging area
Safe zone
Source
Amazon S3
AWS Cloud
Data profile Antivirus
Source
Source