Data is growing in data lakes, so are security and compliance risks. These risks stem from storing and processing sensitive data. In this webinar, we will go through a 4 step process to proactively discover and manage sensitive data within big data environments.
https://hortonworks.com/webinar/4-essential-steps-managing-sensitive-data-data-lake/
19. 4 STEPS FOR MANAGING SENSITIVE DATA
DATA
DISCOVERY
ACCESS
CONTROL ANONYMIZATION MONITORING
20. REPRESENTATIVE SCENARIO – FINANCIAL SERVICES
DATA LAKE
Multiple systems
Multiple formats
INGESTION STORAGE AND PROCESSING DOWNSTREAM SYSTEMS
Sensitive data
cannot be shared
with users
21. SOLUTION - PRIVACERA AUTOMATED DATA DISCOVERY
Discover and
classify data
during ingest or
at rest
Standard rules
combined with
machine
learning
Classification/
tags pushed to
Atlas
STEP 1 > STEP 2 > STEP 3 > STEP 4
22. REPRESENTATIVE SCENARIO – HEDGE FUND
DATA LAKE
Stock Info
Proprietary
Confidential data
INGESTION STORAGE AND PROCESSING DOWNSTREAM SYSTEMS
Access to sensitive
data is restricted
Data Scientist
23. SOLUTION - TAG BASED ACCESS CONTROL
Simplify policies
by managing at
tag level
Tag attributes
such as
expiration date
Metadata
updated by
Privacera
STEP 1 > STEP 2 > STEP 3 > STEP 4
24. REPRESENTATIVE SCENARIO - HEALTHCARE
INGESTION STORAGE AND PROCESSING DOWNSTREAM SYSTEMS
HDFS
HIVE
ETL
Tokenized
sensitive
data
Select users
with raw
data access
Most users
see only
tokenized
data
25. SOLUTION - PRIVACERA ANONYMIZATION
Format
preserving
encryption and
masking
Integrated with
Ranger
infrastructure
Policy driven
access
STEP 1 > STEP 2 > STEP 3 > STEP 4
26. REPRESENTATIVE SCENARIO – FINANCIAL SERVICES
INGESTION STORAGE AND PROCESSING DOWNSTREAM SYSTEMS
DATALAKE
HIVE
ETL
Compliance team
manually analyzing
audit logs
FTPSERVER
Where is sensitive
data and where is it
moving ?
27. SOLUTION - PRIVACERA MONITORING
Automated
monitoring of
user actions
Alerts if sensitive
is moved or on
unusual access
Alerts if sensitive
data is
discovered in
restricted zones
STEP 1 > STEP 2 > STEP 3 > STEP 4
29. SUMMARY
▸ Understand your data before expanding your data lake
▸ Invest in automated classification and centralized
metadata
▸ Manage access to user by data classification
▸ Anonymize data to reduce exposure
▸ Monitor the use of data, “trust but verify”.
▸ Data plane provides next generation for tools for hybrid
data infrastructure