SlideShare una empresa de Scribd logo
1 de 55
1© Cloudera, Inc. All rights reserved.
Charles Lamb clamb@cloudera.com
Andrew Wang andrew.wang@cloudera.com
Transparent Encryption in HDFS
2© Cloudera, Inc. All rights reserved.
Why do we need
encryption?
3© Cloudera, Inc. All rights reserved.
4© Cloudera, Inc. All rights reserved.
Importance of Encryption
• Information leaks affect 10s to 100s of millions of people
• Personally identifiable information (PII)
• Credit cards, SSNs, account logins
• Encryption would have prevented some of these leaks
• Encryption is a regulatory requirement for many business sectors
• Finance (PCI DSS)
• Government (Data Protection Directive)
• Healthcare (DPD, HIPAA)
5© Cloudera, Inc. All rights reserved.
Access to Sensitive Information
• Only as secure as the weakest link
• Need to carefully control access to:
• Encryption keys
• Encrypted ciphertext
• Keys and ciphertext together (and thus plaintext!)
• Many potential attack vectors
7© Cloudera, Inc. All rights reserved.
9© Cloudera, Inc. All rights reserved.
Attack Vectors
• Physical access
• Data at-rest on hard drives
• Network sniffing
• Data in-transit on the network
• Rogue user or admin
• Insider access to encrypted data, encryption keys
• Particularly difficult for admins
10© Cloudera, Inc. All rights reserved.
Why Encrypt in HDFS?
Local Disk
HDFS
Database
Application
Simplicity
Flexibility
11© Cloudera, Inc. All rights reserved.
Why Encrypt in HDFS?
Local Disk
HDFS
Database
Application HDFS is a sweet spot
• Excellent performance
• Safe from OS attacks
• App transparency
• Easy to deploy
12© Cloudera, Inc. All rights reserved.
Architecture
Client
KMS
NN
KS
DN
NameNode
Backing
Key Server
DataNode
Key Management
Server
13© Cloudera, Inc. All rights reserved.
Architecture
Client
KMS
NN
KT
DN
NameNode
Cloudera
Navigator
KeyTrustee
DataNode
Key Management
Server
14© Cloudera, Inc. All rights reserved.
Architecture
Client
KMS
NN
KT
DN
Read/write
encrypted data
Per-file key
operations
15© Cloudera, Inc. All rights reserved.
Architecture
Client
KMS
NN
KT
DN
Separate
administrative
domains
Keys
Data
16© Cloudera, Inc. All rights reserved.
Design
• KMS and key server manage encryption keys
• HDFS never touches encryption keys or plaintext
• End-to-end client-side encryption
• Data is protected both at-rest and in-transit
• Granular per-file encryption keys and per-key ACLs
• Rogue user can only leak their own data
• Separate administrative domains for key management and HDFS
• Rogue admins do not have access to both keys and data
• API transparency for existing applications
• High-performance
17© Cloudera, Inc. All rights reserved.
Architectural Concepts
• Encryption zone (EZ)
• Key types: EZ keys, (E)DEKs
• Key Management Server (KMS)
• Key handling during writes and reads
18© Cloudera, Inc. All rights reserved.
Encryption Zones (EZs)
• Directory whose contents are encrypted on write and decrypted on read
• Encryption is transparent to applications (simple!)
• No code changes required
• Invariant: files in an EZ are encrypted, files not in an EZ are not encrypted
• Existing files are not encrypted in place
• An EZ begins life as an empty directory
• Renames in to and out of an EZ are prohibited
• Encryption zones cannot be nested
19© Cloudera, Inc. All rights reserved.
Recommended EZ Setup
• Single root encryption zone
• Pros: conceptually simple, easy to configure
• Cons: less secure, hard to migrate to multiple EZs
• Separate per-user or per-org encryption zones
• Pros: compartmentalizes security
• Cons: more keys to manage
20© Cloudera, Inc. All rights reserved.
Encryption Zone Key (EZ key)
• Unique key per encryption zone
• EZ key metadata is stored in an xattr on the EZ root directory
• EZ key material is stored in the Key Management Server
• NameNode never touches the key
21© Cloudera, Inc. All rights reserved.
(Encrypted) Data Encryption Keys
• Data Encryption Key (DEK)
• Unique DEK per file
• Used by client to encrypt and decrypt data in HDFS
• Only handled by client and KMS, never HDFS
• DataNodes never handle plaintext, only ciphertext
• Encrypted Data Encryption Key (EDEK)
• DEK encrypted with corresponding EZ key
• Generated by KMS for NameNode
• Stored in HDFS metadata as a file xattr
22© Cloudera, Inc. All rights reserved.
EZ Keys, DEKs, and EDEKs
23© Cloudera, Inc. All rights reserved.
EZ Keys, DEKs, and EDEKs
24© Cloudera, Inc. All rights reserved.
Key Management Server
• Caching proxy for an enterprise key server
• Presents a unified key API to cluster services
• Scalable, fault-tolerant, high-performance
• Typically backed by Cloudera Navigator KeyTrustee
• Handles key operations with fine-grained ACLs
• Create and retrieve EZ keys
• Generate and decrypt EDEKs
• Separate secure process
• Separate key administrator
25© Cloudera, Inc. All rights reserved.
Writing an Encrypted File
Client
KMS
NN
DN
create
/ez/myFile
Fetch EDEK for /ez/
myFile
26© Cloudera, Inc. All rights reserved.
Writing an Encrypted File
Client
KMS
NN
DN
create
/ez/myFile
Fetch EDEK for /ez/
myFile
Decrypt EDEK
Write encrypted
data
File data
27© Cloudera, Inc. All rights reserved.
Reading an Encrypted File
Client
KMS
NN
DN
open
/ez/myFile
myFile
Decrypt EDEK
Read encrypted
data
File data
Decrypt with DEK
28© Cloudera, Inc. All rights reserved.
Key Handling
• Writing a file
• Client asks NameNode to create a file in an EZ
• NameNode gets a new EDEK from KMS
• NameNode stores EDEK in file’s xattr and returns EDEK to client
• Client asks KMS to decrypt EDEK
• KMS decrypts EDEK and returns DEK to client
• Client uses DEK to encrypt file data
29© Cloudera, Inc. All rights reserved.
Key Handling
• Reading a file
• Client asks NameNode for file’s EDEK
• Client asks KMS to decrypt EDEK
• KMS decrypts EDEK and returns DEK to client
• Client uses DEK to decrypt data
30© Cloudera, Inc. All rights reserved.
Granular Access Controls
• Granular HDFS-level access controls
• Controls access to encrypted ciphertext
• HDFS ACLs and Unix-like permissions
• Granular KMS-level access controls
• Per-EZ key ACLs
• Per-operation ACLs (generate and decrypt EDEKs)
• Whitelist and blacklist
• See our docs
31© Cloudera, Inc. All rights reserved.
Attack Vectors
• Compromised EZ key
• Also requires EDEK and ciphertext
• Only leaks a single zone
• Compromised EDEK
• Also requires EZ key and ciphertext
• Only leaks a single file
• Compromised ciphertext
• Also requires corresponding EZ key and EDEK
• Only leaks a single file
32© Cloudera, Inc. All rights reserved.
Quickstart: Create a key and an EZ
$ # as the key administrator user
$ hadoop key create mykey
$ # as the HDFS administrator user
$ hadoop fs –mkdir /users/cwl/my-ez
$ hadoop fs –chown cwl /users/cwl/my-ez
$ hadoop fs –chmod 700 /users/cwl/my-ez
$ hadoop crypto –createZone –keyName mykey 
–path /users/cwl/my-ez
33© Cloudera, Inc. All rights reserved.
Quickstart: put and get a file into the EZ
$ # as the encryption user
$ echo “hello world” > hello-world.txt
$ hadoop fs –put hello-world.txt /users/cwl/my-ez/
$ hadoop fs –cat /users/cwl/my-ez/hello-world.txt
hello world
$
34© Cloudera, Inc. All rights reserved.
Quickstart: verify the file is encrypted
$ # as the HDFS admin
$ hadoop fs –cat 
/.reserved/raw/users/cwl/my-ez/hello-world.txt
��嶗]‫ܙ‬N^���+���
$
35© Cloudera, Inc. All rights reserved.
Encryption Ciphers
• Currently, only AES-CTR is implemented
• 128 or 256 bit keys
• Cipher is designed to be pluggable
• Other ciphers are future work
• AES-GCM would provide authenticated encryption
• Great place for contributions!
36© Cloudera, Inc. All rights reserved.
AES-NI Hardware Acceleration
• Joint work with Intel engineers
• Two implementations of AES-CTR
• JCE software implementation
• OpenSSL hardware-accelerated AES-NI implementation
• AES-NI is available in Westmere (2010) and newer Intel CPUs
• AES-NI further optimized in Haswell (2013)
37© Cloudera, Inc. All rights reserved.
Microbenchmark: Encrypt/Decrypt 1GB Byte Array
• How much faster is AES-NI accelerated AES-CTR?
• Encrypt and decrypt a 1GB byte array
• Single-threaded
• Run locally on a single Haswell machine
• Excludes HDFS overheads (checksumming, network, copies)
38© Cloudera, Inc. All rights reserved.
0
500
1000
1500
2000
2500
Encryption Decryption
MB/sec
With NI Without NI
Microbenchmark: Encrypt/Decrypt 1GB Byte Array
39© Cloudera, Inc. All rights reserved.
Macrobenchmark: TestDFSIO
• What is the overhead of using encryption?
• TestDFSIO
• IO-heavy benchmark
• Reading and writing 300GB files
• Encrypted vs. unencrypted
• Cluster setup
• 3 datanodes, so all reads are local
• 64GB memory, so datasize larger than memory
• OpenSSL enabled, with Haswell AES-NI optimizations
40© Cloudera, Inc. All rights reserved.
Macrobenchmark: TestDFSIO
0
20
40
60
80
100
120
140
160
Read Write
MB/sec
Non-Encrypted Encrypted
41© Cloudera, Inc. All rights reserved.
Performance Takeaways
• OpenSSL and AES-NI acceleration is highly recommended
• 16x microbenchmark speedup
• Minor performance impact on HDFS reads and writes
• 7.5% for reads, ~0% for writes
• Most realistic workloads will see little to no impact
• TestDFSIO is extremely IO bound
• Terasort comparisons had no difference in runtime
42© Cloudera, Inc. All rights reserved.
Conclusion
• End-to-end, client-side encryption
• Data protected at-rest and in-transit
• Key Management Server and Cloudera Navigator KeyTrustee
• Secure key management, storage, access
• Separate administrative domains for key management and HDFS
• API transparent for existing applications
• Performance transparent
• Marginal overhead with AES-NI hardware acceleration
43© Cloudera, Inc. All rights reserved.
Acknowledgements
• Cloudera
• Alejandro Abdelnur
• Aaron T. Myers
• Arun Suresh
• Colin McCabe
• Intel
• Yi Liu
• Haifeng Chen
44© Cloudera, Inc. All rights reserved.
Questions
45© Cloudera, Inc. All rights reserved.
Key Rotation
• Encryption zone keys can be rolled
• Newly generated EDEKs use the new key version
• Currently no support for reencrypting existing EDEKs
• Still encrypted with the old key version
• Doable, but requires a lot of KMS RPCs from NN
• Rolling DEKs is also unsupported
• Requires reencrypting the data, expensive
• One method is to distcp and then delete the old copy
46© Cloudera, Inc. All rights reserved.
DistCp
• Encryption Zone to Encryption Zone
• use –update –skipcrccheck
• Admins use special /.reserved/raw path prefix
• /.reserved/raw is only available to root and provides the encrypted
contents
47© Cloudera, Inc. All rights reserved.
KMS Per-User ACL Configuration
• White lists (check for inclusion) and black lists (check for exclusion)
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.CREATE
• hadoop.kms.blacklist.CREATE
• … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA,
GENERATE_EEK, DECRYPT_EEK
48© Cloudera, Inc. All rights reserved.
KMS Per-Key ACL Configuration
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.<keyname>.<operation>
• MANAGEMENT – createKey, deleteKey, rolloverNewVersion
• GENERATE_EEK – generateEncryptedKey,
warmUpEncryptedKeys
• DECRYPT_EEK – decryptEncryptedKey
• READ – getKeyVersion, getKeyVersions, getMetadata,
getKeysMetadata, getCurrentKey
• ALL – all of the above
49© Cloudera, Inc. All rights reserved.
Exceptions
• Hive: may not be able to do a query that combines data from more than one
encryption zone
50© Cloudera, Inc. All rights reserved.
Sample Icons
51© Cloudera, Inc. All rights reserved.
Key Management Server (KMS)
• KMS sits between client and key server
• E.g. Cloudera Navigator Key Trustee
• Provides a unified API and scalability
• REST API
• Does not actually store keys (backend does that), but does cache them
• ACLs on per-key basis
52© Cloudera, Inc. All rights reserved.
High-profile leaks
• Many personal information leaks over the years
• Feb 2015, PII of 80 million Anthem customers
• Nov 2014, Sony Pictures Entertainment hack
• Mar 2014, PII of 40 million Target customers
• Apr 2011, 77 million PlayStation Network accounts
• Jan 2009, 100 million HPS credit cards
• Jan 2007, 45 million TJX SSNs and other personal information
53© Cloudera, Inc. All rights reserved.
54© Cloudera, Inc. All rights reserved.
55© Cloudera, Inc. All rights reserved.

Más contenido relacionado

La actualidad más candente

Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 

La actualidad más candente (20)

ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLON
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
ODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live DiskODA Backup Restore Utility & ODA Rescue Live Disk
ODA Backup Restore Utility & ODA Rescue Live Disk
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Oracle GoldenGate アーキテクチャと基本機能
Oracle GoldenGate アーキテクチャと基本機能Oracle GoldenGate アーキテクチャと基本機能
Oracle GoldenGate アーキテクチャと基本機能
 
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
GoldenGateテクニカルセミナー3「Oracle GoldenGate Technical Deep Dive」(2016/5/11)
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Oracle Cloud Infrastructure:2022年8月度サービス・アップデート
Oracle Cloud Infrastructure:2022年8月度サービス・アップデートOracle Cloud Infrastructure:2022年8月度サービス・アップデート
Oracle Cloud Infrastructure:2022年8月度サービス・アップデート
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
GoldenGateテクニカルセミナー4「テクニカルコンサルタントが語るOracle GoldenGate現場で使える極意」(2016/5/11)
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 

Destacado

Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
DataWorks Summit
 

Destacado (15)

Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
 
How PCI And PA DSS will change enterprise applications
How PCI And PA DSS will change enterprise applicationsHow PCI And PA DSS will change enterprise applications
How PCI And PA DSS will change enterprise applications
 
AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8
 
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on HadoopApache Falcon - Simplifying Managing Data Jobs on Hadoop
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Cómo sacar rendimiento al PCI DSS. SafeNet.
Cómo sacar rendimiento al PCI DSS. SafeNet.Cómo sacar rendimiento al PCI DSS. SafeNet.
Cómo sacar rendimiento al PCI DSS. SafeNet.
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 

Similar a Transparent Encryption in HDFS

Extracting Credentials From Windows
Extracting Credentials From WindowsExtracting Credentials From Windows
Extracting Credentials From Windows
NetSPI
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 

Similar a Transparent Encryption in HDFS (20)

Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Transparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQLTransparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQL
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
Data Security Essentials for Cloud Computing - JavaOne 2013
Data Security Essentials for Cloud Computing - JavaOne 2013Data Security Essentials for Cloud Computing - JavaOne 2013
Data Security Essentials for Cloud Computing - JavaOne 2013
 
Az 104 session 4: azure storage
Az 104 session 4: azure storageAz 104 session 4: azure storage
Az 104 session 4: azure storage
 
Secure360 - Extracting Password from Windows
Secure360 - Extracting Password from WindowsSecure360 - Extracting Password from Windows
Secure360 - Extracting Password from Windows
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop Encryption
 
Extracting Credentials From Windows
Extracting Credentials From WindowsExtracting Credentials From Windows
Extracting Credentials From Windows
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Managing your secrets in a cloud environment
Managing your secrets in a cloud environmentManaging your secrets in a cloud environment
Managing your secrets in a cloud environment
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Ozone: Evolution of HDFS
Ozone: Evolution of HDFSOzone: Evolution of HDFS
Ozone: Evolution of HDFS
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
 
"Mobile security: iOS", Yaroslav Vorontsov, DataArt
"Mobile security: iOS", Yaroslav Vorontsov, DataArt"Mobile security: iOS", Yaroslav Vorontsov, DataArt
"Mobile security: iOS", Yaroslav Vorontsov, DataArt
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Protecting Your Data with Encryption
Protecting Your Data with EncryptionProtecting Your Data with Encryption
Protecting Your Data with Encryption
 
(SEC301) Strategies for Protecting Data Using Encryption in AWS
(SEC301) Strategies for Protecting Data Using Encryption in AWS(SEC301) Strategies for Protecting Data Using Encryption in AWS
(SEC301) Strategies for Protecting Data Using Encryption in AWS
 
Protecting Your Data in AWS
Protecting Your Data in AWS Protecting Your Data in AWS
Protecting Your Data in AWS
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Transparent Encryption in HDFS

  • 1. 1© Cloudera, Inc. All rights reserved. Charles Lamb clamb@cloudera.com Andrew Wang andrew.wang@cloudera.com Transparent Encryption in HDFS
  • 2. 2© Cloudera, Inc. All rights reserved. Why do we need encryption?
  • 3. 3© Cloudera, Inc. All rights reserved.
  • 4. 4© Cloudera, Inc. All rights reserved. Importance of Encryption • Information leaks affect 10s to 100s of millions of people • Personally identifiable information (PII) • Credit cards, SSNs, account logins • Encryption would have prevented some of these leaks • Encryption is a regulatory requirement for many business sectors • Finance (PCI DSS) • Government (Data Protection Directive) • Healthcare (DPD, HIPAA)
  • 5. 5© Cloudera, Inc. All rights reserved. Access to Sensitive Information • Only as secure as the weakest link • Need to carefully control access to: • Encryption keys • Encrypted ciphertext • Keys and ciphertext together (and thus plaintext!) • Many potential attack vectors
  • 6.
  • 7. 7© Cloudera, Inc. All rights reserved.
  • 8.
  • 9. 9© Cloudera, Inc. All rights reserved. Attack Vectors • Physical access • Data at-rest on hard drives • Network sniffing • Data in-transit on the network • Rogue user or admin • Insider access to encrypted data, encryption keys • Particularly difficult for admins
  • 10. 10© Cloudera, Inc. All rights reserved. Why Encrypt in HDFS? Local Disk HDFS Database Application Simplicity Flexibility
  • 11. 11© Cloudera, Inc. All rights reserved. Why Encrypt in HDFS? Local Disk HDFS Database Application HDFS is a sweet spot • Excellent performance • Safe from OS attacks • App transparency • Easy to deploy
  • 12. 12© Cloudera, Inc. All rights reserved. Architecture Client KMS NN KS DN NameNode Backing Key Server DataNode Key Management Server
  • 13. 13© Cloudera, Inc. All rights reserved. Architecture Client KMS NN KT DN NameNode Cloudera Navigator KeyTrustee DataNode Key Management Server
  • 14. 14© Cloudera, Inc. All rights reserved. Architecture Client KMS NN KT DN Read/write encrypted data Per-file key operations
  • 15. 15© Cloudera, Inc. All rights reserved. Architecture Client KMS NN KT DN Separate administrative domains Keys Data
  • 16. 16© Cloudera, Inc. All rights reserved. Design • KMS and key server manage encryption keys • HDFS never touches encryption keys or plaintext • End-to-end client-side encryption • Data is protected both at-rest and in-transit • Granular per-file encryption keys and per-key ACLs • Rogue user can only leak their own data • Separate administrative domains for key management and HDFS • Rogue admins do not have access to both keys and data • API transparency for existing applications • High-performance
  • 17. 17© Cloudera, Inc. All rights reserved. Architectural Concepts • Encryption zone (EZ) • Key types: EZ keys, (E)DEKs • Key Management Server (KMS) • Key handling during writes and reads
  • 18. 18© Cloudera, Inc. All rights reserved. Encryption Zones (EZs) • Directory whose contents are encrypted on write and decrypted on read • Encryption is transparent to applications (simple!) • No code changes required • Invariant: files in an EZ are encrypted, files not in an EZ are not encrypted • Existing files are not encrypted in place • An EZ begins life as an empty directory • Renames in to and out of an EZ are prohibited • Encryption zones cannot be nested
  • 19. 19© Cloudera, Inc. All rights reserved. Recommended EZ Setup • Single root encryption zone • Pros: conceptually simple, easy to configure • Cons: less secure, hard to migrate to multiple EZs • Separate per-user or per-org encryption zones • Pros: compartmentalizes security • Cons: more keys to manage
  • 20. 20© Cloudera, Inc. All rights reserved. Encryption Zone Key (EZ key) • Unique key per encryption zone • EZ key metadata is stored in an xattr on the EZ root directory • EZ key material is stored in the Key Management Server • NameNode never touches the key
  • 21. 21© Cloudera, Inc. All rights reserved. (Encrypted) Data Encryption Keys • Data Encryption Key (DEK) • Unique DEK per file • Used by client to encrypt and decrypt data in HDFS • Only handled by client and KMS, never HDFS • DataNodes never handle plaintext, only ciphertext • Encrypted Data Encryption Key (EDEK) • DEK encrypted with corresponding EZ key • Generated by KMS for NameNode • Stored in HDFS metadata as a file xattr
  • 22. 22© Cloudera, Inc. All rights reserved. EZ Keys, DEKs, and EDEKs
  • 23. 23© Cloudera, Inc. All rights reserved. EZ Keys, DEKs, and EDEKs
  • 24. 24© Cloudera, Inc. All rights reserved. Key Management Server • Caching proxy for an enterprise key server • Presents a unified key API to cluster services • Scalable, fault-tolerant, high-performance • Typically backed by Cloudera Navigator KeyTrustee • Handles key operations with fine-grained ACLs • Create and retrieve EZ keys • Generate and decrypt EDEKs • Separate secure process • Separate key administrator
  • 25. 25© Cloudera, Inc. All rights reserved. Writing an Encrypted File Client KMS NN DN create /ez/myFile Fetch EDEK for /ez/ myFile
  • 26. 26© Cloudera, Inc. All rights reserved. Writing an Encrypted File Client KMS NN DN create /ez/myFile Fetch EDEK for /ez/ myFile Decrypt EDEK Write encrypted data File data
  • 27. 27© Cloudera, Inc. All rights reserved. Reading an Encrypted File Client KMS NN DN open /ez/myFile myFile Decrypt EDEK Read encrypted data File data Decrypt with DEK
  • 28. 28© Cloudera, Inc. All rights reserved. Key Handling • Writing a file • Client asks NameNode to create a file in an EZ • NameNode gets a new EDEK from KMS • NameNode stores EDEK in file’s xattr and returns EDEK to client • Client asks KMS to decrypt EDEK • KMS decrypts EDEK and returns DEK to client • Client uses DEK to encrypt file data
  • 29. 29© Cloudera, Inc. All rights reserved. Key Handling • Reading a file • Client asks NameNode for file’s EDEK • Client asks KMS to decrypt EDEK • KMS decrypts EDEK and returns DEK to client • Client uses DEK to decrypt data
  • 30. 30© Cloudera, Inc. All rights reserved. Granular Access Controls • Granular HDFS-level access controls • Controls access to encrypted ciphertext • HDFS ACLs and Unix-like permissions • Granular KMS-level access controls • Per-EZ key ACLs • Per-operation ACLs (generate and decrypt EDEKs) • Whitelist and blacklist • See our docs
  • 31. 31© Cloudera, Inc. All rights reserved. Attack Vectors • Compromised EZ key • Also requires EDEK and ciphertext • Only leaks a single zone • Compromised EDEK • Also requires EZ key and ciphertext • Only leaks a single file • Compromised ciphertext • Also requires corresponding EZ key and EDEK • Only leaks a single file
  • 32. 32© Cloudera, Inc. All rights reserved. Quickstart: Create a key and an EZ $ # as the key administrator user $ hadoop key create mykey $ # as the HDFS administrator user $ hadoop fs –mkdir /users/cwl/my-ez $ hadoop fs –chown cwl /users/cwl/my-ez $ hadoop fs –chmod 700 /users/cwl/my-ez $ hadoop crypto –createZone –keyName mykey –path /users/cwl/my-ez
  • 33. 33© Cloudera, Inc. All rights reserved. Quickstart: put and get a file into the EZ $ # as the encryption user $ echo “hello world” > hello-world.txt $ hadoop fs –put hello-world.txt /users/cwl/my-ez/ $ hadoop fs –cat /users/cwl/my-ez/hello-world.txt hello world $
  • 34. 34© Cloudera, Inc. All rights reserved. Quickstart: verify the file is encrypted $ # as the HDFS admin $ hadoop fs –cat /.reserved/raw/users/cwl/my-ez/hello-world.txt ��嶗]‫ܙ‬N^���+��� $
  • 35. 35© Cloudera, Inc. All rights reserved. Encryption Ciphers • Currently, only AES-CTR is implemented • 128 or 256 bit keys • Cipher is designed to be pluggable • Other ciphers are future work • AES-GCM would provide authenticated encryption • Great place for contributions!
  • 36. 36© Cloudera, Inc. All rights reserved. AES-NI Hardware Acceleration • Joint work with Intel engineers • Two implementations of AES-CTR • JCE software implementation • OpenSSL hardware-accelerated AES-NI implementation • AES-NI is available in Westmere (2010) and newer Intel CPUs • AES-NI further optimized in Haswell (2013)
  • 37. 37© Cloudera, Inc. All rights reserved. Microbenchmark: Encrypt/Decrypt 1GB Byte Array • How much faster is AES-NI accelerated AES-CTR? • Encrypt and decrypt a 1GB byte array • Single-threaded • Run locally on a single Haswell machine • Excludes HDFS overheads (checksumming, network, copies)
  • 38. 38© Cloudera, Inc. All rights reserved. 0 500 1000 1500 2000 2500 Encryption Decryption MB/sec With NI Without NI Microbenchmark: Encrypt/Decrypt 1GB Byte Array
  • 39. 39© Cloudera, Inc. All rights reserved. Macrobenchmark: TestDFSIO • What is the overhead of using encryption? • TestDFSIO • IO-heavy benchmark • Reading and writing 300GB files • Encrypted vs. unencrypted • Cluster setup • 3 datanodes, so all reads are local • 64GB memory, so datasize larger than memory • OpenSSL enabled, with Haswell AES-NI optimizations
  • 40. 40© Cloudera, Inc. All rights reserved. Macrobenchmark: TestDFSIO 0 20 40 60 80 100 120 140 160 Read Write MB/sec Non-Encrypted Encrypted
  • 41. 41© Cloudera, Inc. All rights reserved. Performance Takeaways • OpenSSL and AES-NI acceleration is highly recommended • 16x microbenchmark speedup • Minor performance impact on HDFS reads and writes • 7.5% for reads, ~0% for writes • Most realistic workloads will see little to no impact • TestDFSIO is extremely IO bound • Terasort comparisons had no difference in runtime
  • 42. 42© Cloudera, Inc. All rights reserved. Conclusion • End-to-end, client-side encryption • Data protected at-rest and in-transit • Key Management Server and Cloudera Navigator KeyTrustee • Secure key management, storage, access • Separate administrative domains for key management and HDFS • API transparent for existing applications • Performance transparent • Marginal overhead with AES-NI hardware acceleration
  • 43. 43© Cloudera, Inc. All rights reserved. Acknowledgements • Cloudera • Alejandro Abdelnur • Aaron T. Myers • Arun Suresh • Colin McCabe • Intel • Yi Liu • Haifeng Chen
  • 44. 44© Cloudera, Inc. All rights reserved. Questions
  • 45. 45© Cloudera, Inc. All rights reserved. Key Rotation • Encryption zone keys can be rolled • Newly generated EDEKs use the new key version • Currently no support for reencrypting existing EDEKs • Still encrypted with the old key version • Doable, but requires a lot of KMS RPCs from NN • Rolling DEKs is also unsupported • Requires reencrypting the data, expensive • One method is to distcp and then delete the old copy
  • 46. 46© Cloudera, Inc. All rights reserved. DistCp • Encryption Zone to Encryption Zone • use –update –skipcrccheck • Admins use special /.reserved/raw path prefix • /.reserved/raw is only available to root and provides the encrypted contents
  • 47. 47© Cloudera, Inc. All rights reserved. KMS Per-User ACL Configuration • White lists (check for inclusion) and black lists (check for exclusion) • etc/hadoop/kms-acls.xml • hadoop.kms.acl.CREATE • hadoop.kms.blacklist.CREATE • … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA, GENERATE_EEK, DECRYPT_EEK
  • 48. 48© Cloudera, Inc. All rights reserved. KMS Per-Key ACL Configuration • etc/hadoop/kms-acls.xml • hadoop.kms.acl.<keyname>.<operation> • MANAGEMENT – createKey, deleteKey, rolloverNewVersion • GENERATE_EEK – generateEncryptedKey, warmUpEncryptedKeys • DECRYPT_EEK – decryptEncryptedKey • READ – getKeyVersion, getKeyVersions, getMetadata, getKeysMetadata, getCurrentKey • ALL – all of the above
  • 49. 49© Cloudera, Inc. All rights reserved. Exceptions • Hive: may not be able to do a query that combines data from more than one encryption zone
  • 50. 50© Cloudera, Inc. All rights reserved. Sample Icons
  • 51. 51© Cloudera, Inc. All rights reserved. Key Management Server (KMS) • KMS sits between client and key server • E.g. Cloudera Navigator Key Trustee • Provides a unified API and scalability • REST API • Does not actually store keys (backend does that), but does cache them • ACLs on per-key basis
  • 52. 52© Cloudera, Inc. All rights reserved. High-profile leaks • Many personal information leaks over the years • Feb 2015, PII of 80 million Anthem customers • Nov 2014, Sony Pictures Entertainment hack • Mar 2014, PII of 40 million Target customers • Apr 2011, 77 million PlayStation Network accounts • Jan 2009, 100 million HPS credit cards • Jan 2007, 45 million TJX SSNs and other personal information
  • 53. 53© Cloudera, Inc. All rights reserved.
  • 54. 54© Cloudera, Inc. All rights reserved.
  • 55. 55© Cloudera, Inc. All rights reserved.

Notas del editor

  1. First, a basic question: why do we need encryption? Well, there have been a number of high-profile leaks of confidential information in recent years. For example, last year, Sony suffered “The Interview” hack, where hackers leaked unreleased movies and scripts, and the year before that, Target was hacked, exposing the credit card information of 77 million customers. Since the postmortems for these hacks are rarely released, it’s difficult to pinpoint if they would have been prevented by encryption, but during my Google search, I did find one example.
  2. Anthem Blue Cross Blue Shield, a health insurance provider, was hacked just a few months ago. Attackers gained access to the personal information of approximately 80 million people, including credit card and social security numbers. As you can see, the Wall Street Journal kindly put the core issue right in the headline: “Anthem didn’t encrypt data in theft.” Anthem is now offering free credit monitoring to its customers, but the potential economic impact of the resulting credit card fraud is enormous.
  3. Andrew owner
  4. Physical access is the most straightforward threat. This affects data stored at-rest on a hard drive. An attacker might walk into a datacenter and then walk out with datanode hard drives containing sensitive information. If this seems farfetched, another method is to buy used hard drives on Ebay and try to extract deleted data.
  5. Network sniffing is another common attack vector. If a hacker penetrates through the firewall, they can use tcpdump or snort to record network traffic. If your sensitive data (meaning encryption keys or plaintext) is not encrypted while in-transit over the wire, it’s vulnerable to network sniffing.
  6. The last attack vector is a rogue user or admin. This is one of the hardest to deal with. If a hacker compromises a user account, they also gain access to all the information that user would normally have access to. If this user account is also a system administrator, then this threat could be even more serious because of the broad access typically granted to sysadmins. The goal here is to limit the access of any single user, including sysadmins, such that no single user has access to both all the encryption keys stored in the key server, as well as all the encrypted ciphertext stored in HDFS.
  7. In case you were distracted, this is the most important slide in the presentation. Go very slowly and repeat yourself
  8. Charlie starts here
  9. Just define each acronym / key type
  10. Just define each acronym / key type
  11. Just define each acronym / key type
  12. Just define each acronym / key type
  13. Just define each acronym / key type
  14. The KMS ACLs are very sophisticated. I recommend going to the Cloudera documentation to see our recommended configuration for KMS ACLs….
  15. So let’s walk through what happens if an attacker gains access to sensitive keys or data. If the KMS ACLs are not set up correctly, an attacker might gain access to an EZ key. However, to gain access to plaintext, the attacker also needs EDEKs and ciphertext of files stored in that EZ key’s encryption zone. So, as long as the HDFS permissions are set up correctly, the attacker cannot gain access. Even if they do gain access HDFS, the worst case is only that single zone is compromised… Lather rinse
  16. Be sure to mention transparency
  17. Microbenchmark CPU: Intel(R) Xeon(R) CPU E3-1285 v3 @ 3.60GHz (Haswell) Algorithm: AES CTR mode Java Version: jdk 1.6.0_37 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K 1.       Methodology 1 - 1 GB data as single byte array performance This method  created a 1GB byte array for input data.  And a separate 1GB byte array for output data. For each iteration, it will encrypt or decrypt the 1GB byte array. For each read and write call, we use 1M buffer.   In this method, 1GB input data is not possible to be cached in any level of the cached. But even it is 1GB byte data, it still shows good CPU cache behavior. Because when reading through the input data, the data following in the array can be pre-fetched to  CPU cache and the next reading operation will be faster. Usually, this is also achievable for real application.
  18. tested using TestDFSIO benchmark   (300GB for each file, total 3 files) ------------------------------------------- Cluster env: 4nodes, 1 for NN, 3 for DN CPU: 2 * Xeon CPU E5-2690v2 3.00GHz Mem: 64GB   2HDD for each DN Network: need to confirm.
  19. tested using TestDFSIO benchmark   (300GB for each file, total 3 files) ------------------------------------------- Cluster env: 4nodes, 1 for NN, 3 for DN CPU: 2 * Xeon CPU E5-2690v2 3.00GHz Mem: 64GB   2HDD for each DN Network: need to confirm.
  20. Redo this once we are done with the rest of presentation Should be basically the outline
  21. Add our emails here
  22. Andrew owner
  23. Andrew owner
  24. In 2013, Target suffered one of the biggest security breaches in history. Hackers tapped into the Target’s credit card processing system to capture credit card information of holiday shoppers. This exposed at least 40 million people to possible credit card fraud, and Target ended up offering free credit monitoring to all of its customers.
  25. In 2014, Sony was in the news for what was colloquially called “The Interview hack”, where hackers leaked footage and scripts of unreleased movies, as well as internal documents containing passwords, salaries, and personal information of actors and employees. Journalists had a field day poring through the released documents, and Sony was forced to subsequently cancel the theatrical release of The Interview.