SlideShare una empresa de Scribd logo
1 de 33
1
Open Source Security
Tools For Big Data
Rommel Garcia
@rommelgarcia
Hortonworks
2
# whoami
 Global Security SME Lead @hortonworks
 Senior Solutions Engineer @hortonworks
 Book Author – Virtualizing Hadoop
 Co-organizer of Atlanta Hadoop User Group
 Regular Speaker at Big Data Conferences
Big Data Landscape
4
DATA – More Volume and More Types
I N C R E A S I N G D ATA V A R I E T Y A N D C O M P L E X I T Y
USER GENERATED CONTENT
MOBILE WEB
SMS/MMS
SENTIMENT
EXTERNAL
DEMOGRAPHICS
HD VIDEO
SPEECH TO TEXT
PRODUCT/
SERVICE LOGS
SOCIAL NETWORK
BUSINESS
DATA FEEDS
USER CLICK STREAM
WEB LOGS
OFFER HISTORY DYNAMIC PRICING
A/B TESTING
AFFILIATE NETWORKS
SEARCH MARKETING
BEHAVIORAL TARGETING
DYNAMIC FUNNELSPAYMENT
RECORD
SUPPORT
CONTACTS
CUSTOMER
TOUCHESPURCHASE DETAIL
PURCHASE
RECORD
SEGMENTATIONOFFER DETAILS
P E TA BY T E S
T E R A BY T E S
G I G A BY T E S
E X A BY T E S
E R P
BIG DATA
WEB
CR M
5
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modeling
Fraud detection
Compliance (AML, KYC)
Bank 3.0
Information security
Single view of customer
Trading applications
Market data management
ANALYSIS & VISUALIZATION
Security
Operations
Governance
&Integration
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS
(Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column
Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS
WORD
DOCUMENTS
LOCATION DATA SENSOR DATA
CUSTOMER
SENTIMENT
RESEARCH
REPORTS
6
• HIPAA - Health Insurance Portability and Accountability Act of 1996
• HITECH - The Health Information Technology for Economic and Clinical Health Act
• PCI DSS - Payment Card Industry Data Security Standard
• SOX - The Sarbanes-Oxley Act of 2003
• ISO - International Organization Standardization
• COBIT - Control Objectives for Information and Related Technology
• Corporate Security Policies
Compliance Adherences
Big Data Security
8
• Authentication
• Authorization
• Audit
• Data at rest/in-motion Encryption
• Centralized Administration
5 Pillars of Security
9
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modeling
Fraud detection
Compliance (AML, KYC)
Bank 3.0
Information security
Single view of customer
Trading applications
Market data management
ANALYSIS & VISUALIZATION
Security
Operations
Governance
&Integration
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS
(Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column
Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS
WORD
DOCUMENTS
LOCATION DATA SENSOR DATA
CUSTOMER
SENTIMENT
RESEARCH
REPORTS
1
1 Knox
2 Kerberos
3 Ranger
4 HDFS Enc.
5 LDAP
2
3
4
5
10
• Authentication ->
• Authorization ->
• Audit ->
• Data Protection ->
• Centralized Administration ->
5 Pillars of Security
11
Knox
12
Why Knox?
Simplified Access
• Kerberos encapsulation
• Extends API reach
• Single access point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration
• Active Directory integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• Partial SSL for non-SSL services
• WebApp vulnerability filter
13
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….
Master Nodes
Rack 1
Switch
NN
SNN
….
Slave Nodes
Rack 2
….
Slave Nodes
Rack N
SwitchSwitch
DN DN
Web Tier
LB
Knox
Hadoop CLIs
14
REST API
Hadoop
Services
What does Perimeter Security really mean?
Gateway
Firewall
User
Firewall
required at
perimeter
(today)
Knox Gateway
controls all
Hadoop REST API
access through
firewall
Hadoop
cluster
mostly
unaffected
Firewall only allows
connections
through specific
ports from Knox
host
Hive Host
HBase Host
WebHDFS
HBase Host
HBase Host
15
Kerberos
16
Why Kerberos?
Provides Strong Authentication
Establishes identity for users, services and hosts
Prevents impersonation on unauthorized account
Supports token delegation model
Works with existing directory services
Basis for Authorization
Page 16
17
Don’t be afraid of Kerberos…..
18
Security Implications
$ whoami
baduser
$ hadoop fs -ls /tmp
Found 2 items
drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive
drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure
$ hadoop fs -ls /tmp/secure
ls: Permission denied: user=baduser, access=READ_EXECUTE,
inode="/tmp/secure":hdfs:hdfs:drwx------
Good right?
19
Security Implications
$ whoami
baduser
$ hadoop fs -ls /tmp
Found 2 items
drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive
drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure
$ hadoop fs -ls /tmp/secure
ls: Permission denied: user=baduser, access=READ_EXECUTE,
inode="/tmp/secure":hdfs:hdfs:drwx------
Good right? – Look Again!!!
$ HADOOP_USER_NAME=hdfs hadoop fs -ls /tmp/secure
Found 1 items
drwxr-xr-x - hdfs hdfs 0 2015-07-14 20:35 /tmp/secure/blah
20
Kerberos Primer
Page 20
Client
KDC
NN
DN
1. kinit - Login and get Ticket Granting Ticket (TGT)
3. Get NameNode Service Ticket (NN-ST)
2. Client Stores TGT in Ticket Cache
4. Client Stores NN-ST in Ticket Cache
5. Read/write file given NN-ST and
file name; returns block locations,
block IDs and Block Access Tokens
if access permitted
6. Read/write block given
Block Access Token and block ID
Client’s
Kerberos Ticket
Cache
21
Ranger
22
Plugin PluginPlugin PluginPlugin Plugin
Apache Ranger authZ Architecture
Hive YARN Knox Storm Solr Kafka
Plugin
HDFS
Plugin
Audit Server Policy Server
Administration Portal
REST APIs
DB
SOLR
HDFS
KMS
LDAP/AD
user/group
syncLog4j
HBase
23
Sample Simplified Workflow - HDFS
Policy
Manager
Plugin
Admin sets policies for HDFS
files/folder
Data scientist runs a
map reduce job
User
Application
Users access HDFS data through
application Name Node
IT users access
HDFS through CLI
Namenode uses
Plugin for
Authorization
Audit
Database Audit logs pushed to DB
Namenode provides
resource access to
user/client
1
2
2
2
3
4
5
24
Ranger Stacks
• Apache Ranger v0.5 supports stack-model to enable easier onboarding of new
components, without requiring code changes in Apache Ranger.
Ranger Side Changes
Define Service-type
Secured Components Side Changes
Develop Ranger Authorization Plugin
• Create a JSON file with following
details :
- Resources
- Access types
- Config to connect
• Load the JSON into Ranger.
• Include plugin library in the secure component.
• During initialization of the service: Init RangerBasePlugIn &
RangerDefaultAuditHandler class.
• To authorize access to a resource: Use
RangerAccessRequest.isAccessAllowed()
• To support resource lookup: Implement
RangerBaseService.lookupResource() &
RangerBaseService.validateConfig()
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
25
HDFS Encryption
26
Data Protection
Hadoop allows you to apply data protection policy at
two different layers across the Hadoop stack
Layer What? How ?
Storage Encrypt data in disk
Volume level: LUKS (Linux), BitLocker (Windows)
Native in Hadoop: HDFS Encryption
Partners: Voltage, Protegrity, DataGuise, Vormetric
OS level encrypt
Transmission Encrypt data as it moves
Native in Hadoop: SSL & SASL
AES 256 for SSL & DTP with SASL
27
Data at rest Encryption Protection
Volume Level Encryption (Open Source - LUKS, DMCrypt)
OS File Level Encryption (Open Source - eCryptfs)
Hadoop Level Encryption (HDFS TDE*, Hive CLE**, HBase** )
28
1
°
°
°
°
° °
° °
° °
° °
° N°
HDFS Encryption – How it works
DATA ACCESS
DATA MANAGEMENT
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
SECURITY
YARN
HDFS Client
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°HDFS
(Hadoop Distributed File System)
Encryption Zone
(attributes - EZKey ID, version)
HDFS-6134
Encrypted File
(attributes - EDEK, IV)
Name Node
KeyProvider
API
KeyProvider
API
Key Management
System (KMS)
Hadoop-10433
KeyProvider API –
Hadoop-10141
EDEK
DEK
Crypto Stream
(r/w with DEK)
DEKs EZKs
Acronym Description
EZ Encryption Zone (an HDFS directory)
EZK Encryption Zone Key; master key associated with all
files in an EZ
DEK Data Encryption Key, unique key associated with each
file. EZ Key used to generate DEK
EDEK Encrypted DEK, Name Node only has access to
encrypted DEK.
IV Initialization Vector
EDEK
EDEK
29
As HDFS
Admin
HDFS Encryption – Common Commands
• Run KMS Server
– ./kms.sh run
• Create Encryption Key
– hadoop key create key1 -size 128
– # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file.
• List all Encryption Keys
– hadoop key list –metadata
• As an Admin(hdfs user) create an encryption Zone
– hdfs crypto -createZone -keyName key1 -path /secure1
– Point to an existing & empty directory
• List all Encryption Zones
– hdfs crypto –listZones
• Read/Write to HDFS unchanged
– hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1
– hdfs dfs -cat /securehive/sal.txt
Run this as user not in HDFS admin role
As HDFS
End-user
30
Encrypting Data In-Motion
Page 30
Protocol Communication Point Encryption Mechanism
• REST • WebHDFS (Client to Cluster)
• Client to Knox
• REST over SSL
• Knox Gateway SSL
• SPNEGO - provides a mechanism for extending Kerberos to
Web applications through the standard HTTP protocol
• HTTP • NameNode/JobTracker UI
• MapReduce Shuffle
• HTTPS
• Encrypted MapReduce Shuffle (MAPREDUCE-4117)
• RPC • Hadoop Client (Client to
Cluster, Intra-Cluster)
• SASL – The Hadoop RPC system implements SASL which
provides different QoP including encryption
• JDBC/ODBC • HiveServer2 • SSL
• TCP/IP • Data Transfer (Client to
Cluster, Intra-Cluster)
• Encrypted DataTransfer Protocol available in Hadoop
• Adding SASL support to the DataTransferProtocol
Real-world Implementation
32
Data Sources
Data
Sources
33
Thank You !

Más contenido relacionado

La actualidad más candente

Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
LDAPCon
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
Tobias Lindaaker
 

La actualidad más candente (20)

Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Introduction to redis
Introduction to redisIntroduction to redis
Introduction to redis
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Azure redis cache
Azure redis cacheAzure redis cache
Azure redis cache
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 

Destacado

Destacado (20)

Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
 
PCI Security Standards on Big Data Platform (1)
PCI Security Standards on Big Data Platform (1)PCI Security Standards on Big Data Platform (1)
PCI Security Standards on Big Data Platform (1)
 
Big Data and Cyber Security
Big Data and Cyber SecurityBig Data and Cyber Security
Big Data and Cyber Security
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
 
Geek Sync | Understanding Oracle Database Security
Geek Sync | Understanding Oracle Database SecurityGeek Sync | Understanding Oracle Database Security
Geek Sync | Understanding Oracle Database Security
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better Platform
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in Kafka
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 

Similar a Open Source Security Tools for Big Data

How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Similar a Open Source Security Tools for Big Data (20)

Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
 
Big problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces securityBig problems with big data – Hadoop interfaces security
Big problems with big data – Hadoop interfaces security
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a ServiceAWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by IntelAWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
AWS Summit Sydney 2014 | Secure Hadoop as a Service - Session Sponsored by Intel
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
batbern43 Self Service on a Big Data Platform
batbern43 Self Service on a Big Data Platformbatbern43 Self Service on a Big Data Platform
batbern43 Self Service on a Big Data Platform
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...
 

Más de Rommel Garcia

Más de Rommel Garcia (11)

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Último

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Último (20)

WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 

Open Source Security Tools for Big Data

  • 1. 1 Open Source Security Tools For Big Data Rommel Garcia @rommelgarcia Hortonworks
  • 2. 2 # whoami  Global Security SME Lead @hortonworks  Senior Solutions Engineer @hortonworks  Book Author – Virtualizing Hadoop  Co-organizer of Atlanta Hadoop User Group  Regular Speaker at Big Data Conferences
  • 4. 4 DATA – More Volume and More Types I N C R E A S I N G D ATA V A R I E T Y A N D C O M P L E X I T Y USER GENERATED CONTENT MOBILE WEB SMS/MMS SENTIMENT EXTERNAL DEMOGRAPHICS HD VIDEO SPEECH TO TEXT PRODUCT/ SERVICE LOGS SOCIAL NETWORK BUSINESS DATA FEEDS USER CLICK STREAM WEB LOGS OFFER HISTORY DYNAMIC PRICING A/B TESTING AFFILIATE NETWORKS SEARCH MARKETING BEHAVIORAL TARGETING DYNAMIC FUNNELSPAYMENT RECORD SUPPORT CONTACTS CUSTOMER TOUCHESPURCHASE DETAIL PURCHASE RECORD SEGMENTATIONOFFER DETAILS P E TA BY T E S T E R A BY T E S G I G A BY T E S E X A BY T E S E R P BIG DATA WEB CR M
  • 5. 5 Big Data Ecosystem Big Data Platform DATA REPOSITORIES Risk modeling Fraud detection Compliance (AML, KYC) Bank 3.0 Information security Single view of customer Trading applications Market data management ANALYSIS & VISUALIZATION Security Operations Governance &Integration °1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N YARN : Data Operating System Script SQL NoSQL Stream Search Others HDFS (Hadoop Distributed File System) In-Mem TRADITIONAL SOURCES EDW OLAP Datamarts Column Databases CRM RDBMS LENDING MARKETS TRADES COMPLIANCE DATA CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA EMERGING & NON-TRADITIONAL SOURCES SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT RESEARCH REPORTS
  • 6. 6 • HIPAA - Health Insurance Portability and Accountability Act of 1996 • HITECH - The Health Information Technology for Economic and Clinical Health Act • PCI DSS - Payment Card Industry Data Security Standard • SOX - The Sarbanes-Oxley Act of 2003 • ISO - International Organization Standardization • COBIT - Control Objectives for Information and Related Technology • Corporate Security Policies Compliance Adherences
  • 8. 8 • Authentication • Authorization • Audit • Data at rest/in-motion Encryption • Centralized Administration 5 Pillars of Security
  • 9. 9 Big Data Ecosystem Big Data Platform DATA REPOSITORIES Risk modeling Fraud detection Compliance (AML, KYC) Bank 3.0 Information security Single view of customer Trading applications Market data management ANALYSIS & VISUALIZATION Security Operations Governance &Integration °1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N YARN : Data Operating System Script SQL NoSQL Stream Search Others HDFS (Hadoop Distributed File System) In-Mem TRADITIONAL SOURCES EDW OLAP Datamarts Column Databases CRM RDBMS LENDING MARKETS TRADES COMPLIANCE DATA CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA EMERGING & NON-TRADITIONAL SOURCES SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT RESEARCH REPORTS 1 1 Knox 2 Kerberos 3 Ranger 4 HDFS Enc. 5 LDAP 2 3 4 5
  • 10. 10 • Authentication -> • Authorization -> • Audit -> • Data Protection -> • Centralized Administration -> 5 Pillars of Security
  • 12. 12 Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • Partial SSL for non-SSL services • WebApp vulnerability filter
  • 13. 13 Knox Deployment with Hadoop Cluster Application Tier DMZ Switch Switch …. Master Nodes Rack 1 Switch NN SNN …. Slave Nodes Rack 2 …. Slave Nodes Rack N SwitchSwitch DN DN Web Tier LB Knox Hadoop CLIs
  • 14. 14 REST API Hadoop Services What does Perimeter Security really mean? Gateway Firewall User Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host Hive Host HBase Host WebHDFS HBase Host HBase Host
  • 16. 16 Why Kerberos? Provides Strong Authentication Establishes identity for users, services and hosts Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for Authorization Page 16
  • 17. 17 Don’t be afraid of Kerberos…..
  • 18. 18 Security Implications $ whoami baduser $ hadoop fs -ls /tmp Found 2 items drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure $ hadoop fs -ls /tmp/secure ls: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------ Good right?
  • 19. 19 Security Implications $ whoami baduser $ hadoop fs -ls /tmp Found 2 items drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure $ hadoop fs -ls /tmp/secure ls: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------ Good right? – Look Again!!! $ HADOOP_USER_NAME=hdfs hadoop fs -ls /tmp/secure Found 1 items drwxr-xr-x - hdfs hdfs 0 2015-07-14 20:35 /tmp/secure/blah
  • 20. 20 Kerberos Primer Page 20 Client KDC NN DN 1. kinit - Login and get Ticket Granting Ticket (TGT) 3. Get NameNode Service Ticket (NN-ST) 2. Client Stores TGT in Ticket Cache 4. Client Stores NN-ST in Ticket Cache 5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens if access permitted 6. Read/write block given Block Access Token and block ID Client’s Kerberos Ticket Cache
  • 22. 22 Plugin PluginPlugin PluginPlugin Plugin Apache Ranger authZ Architecture Hive YARN Knox Storm Solr Kafka Plugin HDFS Plugin Audit Server Policy Server Administration Portal REST APIs DB SOLR HDFS KMS LDAP/AD user/group syncLog4j HBase
  • 23. 23 Sample Simplified Workflow - HDFS Policy Manager Plugin Admin sets policies for HDFS files/folder Data scientist runs a map reduce job User Application Users access HDFS data through application Name Node IT users access HDFS through CLI Namenode uses Plugin for Authorization Audit Database Audit logs pushed to DB Namenode provides resource access to user/client 1 2 2 2 3 4 5
  • 24. 24 Ranger Stacks • Apache Ranger v0.5 supports stack-model to enable easier onboarding of new components, without requiring code changes in Apache Ranger. Ranger Side Changes Define Service-type Secured Components Side Changes Develop Ranger Authorization Plugin • Create a JSON file with following details : - Resources - Access types - Config to connect • Load the JSON into Ranger. • Include plugin library in the secure component. • During initialization of the service: Init RangerBasePlugIn & RangerDefaultAuditHandler class. • To authorize access to a resource: Use RangerAccessRequest.isAccessAllowed() • To support resource lookup: Implement RangerBaseService.lookupResource() & RangerBaseService.validateConfig() https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
  • 26. 26 Data Protection Hadoop allows you to apply data protection policy at two different layers across the Hadoop stack Layer What? How ? Storage Encrypt data in disk Volume level: LUKS (Linux), BitLocker (Windows) Native in Hadoop: HDFS Encryption Partners: Voltage, Protegrity, DataGuise, Vormetric OS level encrypt Transmission Encrypt data as it moves Native in Hadoop: SSL & SASL AES 256 for SSL & DTP with SASL
  • 27. 27 Data at rest Encryption Protection Volume Level Encryption (Open Source - LUKS, DMCrypt) OS File Level Encryption (Open Source - eCryptfs) Hadoop Level Encryption (HDFS TDE*, Hive CLE**, HBase** )
  • 28. 28 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N° HDFS Encryption – How it works DATA ACCESS DATA MANAGEMENT 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° SECURITY YARN HDFS Client ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °HDFS (Hadoop Distributed File System) Encryption Zone (attributes - EZKey ID, version) HDFS-6134 Encrypted File (attributes - EDEK, IV) Name Node KeyProvider API KeyProvider API Key Management System (KMS) Hadoop-10433 KeyProvider API – Hadoop-10141 EDEK DEK Crypto Stream (r/w with DEK) DEKs EZKs Acronym Description EZ Encryption Zone (an HDFS directory) EZK Encryption Zone Key; master key associated with all files in an EZ DEK Data Encryption Key, unique key associated with each file. EZ Key used to generate DEK EDEK Encrypted DEK, Name Node only has access to encrypted DEK. IV Initialization Vector EDEK EDEK
  • 29. 29 As HDFS Admin HDFS Encryption – Common Commands • Run KMS Server – ./kms.sh run • Create Encryption Key – hadoop key create key1 -size 128 – # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file. • List all Encryption Keys – hadoop key list –metadata • As an Admin(hdfs user) create an encryption Zone – hdfs crypto -createZone -keyName key1 -path /secure1 – Point to an existing & empty directory • List all Encryption Zones – hdfs crypto –listZones • Read/Write to HDFS unchanged – hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1 – hdfs dfs -cat /securehive/sal.txt Run this as user not in HDFS admin role As HDFS End-user
  • 30. 30 Encrypting Data In-Motion Page 30 Protocol Communication Point Encryption Mechanism • REST • WebHDFS (Client to Cluster) • Client to Knox • REST over SSL • Knox Gateway SSL • SPNEGO - provides a mechanism for extending Kerberos to Web applications through the standard HTTP protocol • HTTP • NameNode/JobTracker UI • MapReduce Shuffle • HTTPS • Encrypted MapReduce Shuffle (MAPREDUCE-4117) • RPC • Hadoop Client (Client to Cluster, Intra-Cluster) • SASL – The Hadoop RPC system implements SASL which provides different QoP including encryption • JDBC/ODBC • HiveServer2 • SSL • TCP/IP • Data Transfer (Client to Cluster, Intra-Cluster) • Encrypted DataTransfer Protocol available in Hadoop • Adding SASL support to the DataTransferProtocol