SlideShare una empresa de Scribd logo
1 de 19
1
Hadoop Operations: How to Secure
and Control Cluster Access
Eric Sammer
Engineering Manager, Cloudera – Author, Hadoop Operations
2
We’re here to talk about…
•How common security constructs map onto services
•How these constructs work in Hadoop
•Security model and options for a few critical
components
•A few DOs and DON’Ts
3
Warning
•Security in distributed systems is complicated
•This is just a whirlwind tour – Do your homework
•Assumptions
• You’re familiar with Hadoop’s architecture and functionality
• You have a basic understanding of Kerberos
4
The Three Questions
•Identity: Who are you?
•Authentication: Can you prove it?
•Authorization: Are you allowed to do that?
5
Hadoop’s “Simple” Mode
•Identity: Usually the OS user of the client application
•Authentication: Trust
•Easy to impersonate other users
•Stop good users from doing silly things
•The default
6
Hadoop’s “Simple” Mode
•Use simple mode when:
• No regulatory or compliance concerns
• All users are trusted
• Single purpose cluster (single-tenancy)
7
Hadoop’s “Secure” Mode
•Identity: Local part of the Kerberos principal
•Authentication: Kerberos
•User impersonation not possible except in specific
(admin-configured) situations
8
Hadoop’s “Secure” Mode
•Use secure mode when:
• Real regulatory concerns
• Untrusted users
• Running on untrusted infrastructure or in an untrusted
environment
• Multi-purpose cluster (multi-tenancy)
9
Identity Management
•Always
• Use a central user database/directory service for OS users
• Wire up the Kerberos KDC to use the central directory
•Never
• Use service users (e.g. hdfs, mapred) for anything other than
running services
• Share accounts, even for admin purposes
10
Authentication
•Simple mode: Trust what the client provides
•Secure mode: Kerberos
• Keytabs for services
• Many options: Passphrase, M/TFA, X.509 for users
• Depends on Kerberos implementation
11
Authorization
•Inherently service specific
•Granularity of control varies by platform component
•Examples
• Filesystem object-level, POSIX-style
• Role-based access control (RBAC)
• Access control lists (ACLs)
• Deferral to underlying components
12
HDFS Security Model
•POSIX-style users and groups
•Traditional Unix-style octal permissions
• Files: no execute, sticky, setuid, setgid
• Directories: no setuid, always behave as if setgid is set
•Authorization checks performed by NameNode
13
HDFS User Levels
User Level Privileges Description and Notes
Cluster super user All User who started the daemons. Default: hdfs
Administrators All
Configuration property dfs.permissions.supergroup
specifies the name of the group of admins. Default:
supergroup
Normal user Object-level
All other users are beholden to the file and directory
permissions, as specified.
14
MapReduce Security Model
•Configurable job queues
•Queues have associated ACLs
•ACLs control job submission and administrative ops
•Authorization checks performed by JobTracker
15
MapReduce User Levels
User Level Privileges Queue Description and Notes
Cluster super
user
All All
User who started the daemons. Default:
mapred
Cluster admins All All
Configuration property
mapred.cluster.administrators specifies the
admin ACL.
Queue admins All Single
Configuration property
mapred.queue.queue-name.acl-administer-
jobs specifies the admin ACL.
Job owner
Submit,
Admin on
own jobs
Queue
containing
job
Configuration property
mapred.queue.queue-name.acl-submit-job
specifies the submission ACL.
16
Systems on top of MapReduce
•Hive/Impala are the most featureful today
• Without Sentry: Defers to HDFS object permissions
• With Sentry, fine-grained RBAC on logical constructs (New!)
• Scope: Server, database, table, view
• Privileges: ALL, SELECT, INSERT, TRANSFORM
• Removes direct access to files
• Supports traditional techniques for controlling column-level access
(i.e. views without sensitive columns)
•Everything else: HDFS object permissions
17
A note on auditing...
•Winds up being service-specific
•Cloudera Navigator handles this (and more)
18
What we didn’t talk about
•Configuration and deployment
• Lots of options, lots of moving parts
• Integration with existing infrastructure
• Cloudera Manager turns days or weeks of work into minutes
or hours; built to handle exactly these challenges
•The other 80%: YARN applications, ZooKeeper, Flume,
Sqoop, Oozie, Hue, Cloudera Search (Solr), multi-tenant
gateway services, all of the administrative web
interfaces, encryption of data at rest and on the wire,
network footprint and exposure, ...
19
Further reading and references
•Hadoop Operations
Chapter 6: Identity, Authentication, and
Authorization (E. Sammer, O’Reilly)
•Kerberos: The Definitive Guide
(J. Garman, O’Reilly)
•CDH4 Security Guide
•CDH4 Sentry Guide
•Cloudera Manager
•Cloudera Navigator
Submit questions in the Q&A panel
Watch on-demand video of this webinar and
many more at http://cloudera.com
Follow Eric @esammer
Follow Cloudera @ClouderaU
Learn more at Strata + Hadoop World:
http://tinyurl.com/hadoopworld
Thank you for attending!

Más contenido relacionado

La actualidad más candente

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operationsMarc Cluet
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Cloudera, Inc.
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 

La actualidad más candente (20)

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 

Destacado

Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1Sudheesh Narayanan
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase TrainingCloudera, Inc.
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name NodeAaron Cordova
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentrymozillazg
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Cloudera, Inc.
 
DCAT-AP exchanging metadata
DCAT-AP exchanging metadataDCAT-AP exchanging metadata
DCAT-AP exchanging metadataBart Hanssens
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesChengjen Lee
 

Destacado (20)

Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Final version sql over hadoop ver1
Final version sql over hadoop ver1Final version sql over hadoop ver1
Final version sql over hadoop ver1
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase Training
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
DCAT-AP exchanging metadata
DCAT-AP exchanging metadataDCAT-AP exchanging metadata
DCAT-AP exchanging metadata
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
DCAT: a tale of exchanging metadata
DCAT: a tale of exchanging metadataDCAT: a tale of exchanging metadata
DCAT: a tale of exchanging metadata
 
ckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sourcesckan 2.0: Harvesting from other sources
ckan 2.0: Harvesting from other sources
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 

Similar a Hadoop Operations: How to Secure and Control Cluster Access

Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopCloudera, Inc.
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...Vincent Giersch
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at UberDataWorks Summit
 
Creating a fortress in your active directory environment
Creating a fortress in your active directory environmentCreating a fortress in your active directory environment
Creating a fortress in your active directory environmentDavid Rowe
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedCloudera, Inc.
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
 
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopBSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopAjay Choudhary
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Secret Management with Hashicorp Vault and Consul on Kubernetes
Secret Management with Hashicorp Vault and Consul on KubernetesSecret Management with Hashicorp Vault and Consul on Kubernetes
Secret Management with Hashicorp Vault and Consul on KubernetesAn Nguyen
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Cloudera, Inc.
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahidBigDataExpo
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1Lahav Savir
 

Similar a Hadoop Operations: How to Secure and Control Cluster Access (20)

Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
Creating a fortress in your active directory environment
Creating a fortress in your active directory environmentCreating a fortress in your active directory environment
Creating a fortress in your active directory environment
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopBSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming Workshop
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Red Hart Linux
Red Hart LinuxRed Hart Linux
Red Hart Linux
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Secret Management with Hashicorp Vault and Consul on Kubernetes
Secret Management with Hashicorp Vault and Consul on KubernetesSecret Management with Hashicorp Vault and Consul on Kubernetes
Secret Management with Hashicorp Vault and Consul on Kubernetes
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Big data security
Big data securityBig data security
Big data security
 
Intel boubker el mouttahid
Intel boubker el mouttahidIntel boubker el mouttahid
Intel boubker el mouttahid
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Hadoop Operations: How to Secure and Control Cluster Access

  • 1. 1 Hadoop Operations: How to Secure and Control Cluster Access Eric Sammer Engineering Manager, Cloudera – Author, Hadoop Operations
  • 2. 2 We’re here to talk about… •How common security constructs map onto services •How these constructs work in Hadoop •Security model and options for a few critical components •A few DOs and DON’Ts
  • 3. 3 Warning •Security in distributed systems is complicated •This is just a whirlwind tour – Do your homework •Assumptions • You’re familiar with Hadoop’s architecture and functionality • You have a basic understanding of Kerberos
  • 4. 4 The Three Questions •Identity: Who are you? •Authentication: Can you prove it? •Authorization: Are you allowed to do that?
  • 5. 5 Hadoop’s “Simple” Mode •Identity: Usually the OS user of the client application •Authentication: Trust •Easy to impersonate other users •Stop good users from doing silly things •The default
  • 6. 6 Hadoop’s “Simple” Mode •Use simple mode when: • No regulatory or compliance concerns • All users are trusted • Single purpose cluster (single-tenancy)
  • 7. 7 Hadoop’s “Secure” Mode •Identity: Local part of the Kerberos principal •Authentication: Kerberos •User impersonation not possible except in specific (admin-configured) situations
  • 8. 8 Hadoop’s “Secure” Mode •Use secure mode when: • Real regulatory concerns • Untrusted users • Running on untrusted infrastructure or in an untrusted environment • Multi-purpose cluster (multi-tenancy)
  • 9. 9 Identity Management •Always • Use a central user database/directory service for OS users • Wire up the Kerberos KDC to use the central directory •Never • Use service users (e.g. hdfs, mapred) for anything other than running services • Share accounts, even for admin purposes
  • 10. 10 Authentication •Simple mode: Trust what the client provides •Secure mode: Kerberos • Keytabs for services • Many options: Passphrase, M/TFA, X.509 for users • Depends on Kerberos implementation
  • 11. 11 Authorization •Inherently service specific •Granularity of control varies by platform component •Examples • Filesystem object-level, POSIX-style • Role-based access control (RBAC) • Access control lists (ACLs) • Deferral to underlying components
  • 12. 12 HDFS Security Model •POSIX-style users and groups •Traditional Unix-style octal permissions • Files: no execute, sticky, setuid, setgid • Directories: no setuid, always behave as if setgid is set •Authorization checks performed by NameNode
  • 13. 13 HDFS User Levels User Level Privileges Description and Notes Cluster super user All User who started the daemons. Default: hdfs Administrators All Configuration property dfs.permissions.supergroup specifies the name of the group of admins. Default: supergroup Normal user Object-level All other users are beholden to the file and directory permissions, as specified.
  • 14. 14 MapReduce Security Model •Configurable job queues •Queues have associated ACLs •ACLs control job submission and administrative ops •Authorization checks performed by JobTracker
  • 15. 15 MapReduce User Levels User Level Privileges Queue Description and Notes Cluster super user All All User who started the daemons. Default: mapred Cluster admins All All Configuration property mapred.cluster.administrators specifies the admin ACL. Queue admins All Single Configuration property mapred.queue.queue-name.acl-administer- jobs specifies the admin ACL. Job owner Submit, Admin on own jobs Queue containing job Configuration property mapred.queue.queue-name.acl-submit-job specifies the submission ACL.
  • 16. 16 Systems on top of MapReduce •Hive/Impala are the most featureful today • Without Sentry: Defers to HDFS object permissions • With Sentry, fine-grained RBAC on logical constructs (New!) • Scope: Server, database, table, view • Privileges: ALL, SELECT, INSERT, TRANSFORM • Removes direct access to files • Supports traditional techniques for controlling column-level access (i.e. views without sensitive columns) •Everything else: HDFS object permissions
  • 17. 17 A note on auditing... •Winds up being service-specific •Cloudera Navigator handles this (and more)
  • 18. 18 What we didn’t talk about •Configuration and deployment • Lots of options, lots of moving parts • Integration with existing infrastructure • Cloudera Manager turns days or weeks of work into minutes or hours; built to handle exactly these challenges •The other 80%: YARN applications, ZooKeeper, Flume, Sqoop, Oozie, Hue, Cloudera Search (Solr), multi-tenant gateway services, all of the administrative web interfaces, encryption of data at rest and on the wire, network footprint and exposure, ...
  • 19. 19 Further reading and references •Hadoop Operations Chapter 6: Identity, Authentication, and Authorization (E. Sammer, O’Reilly) •Kerberos: The Definitive Guide (J. Garman, O’Reilly) •CDH4 Security Guide •CDH4 Sentry Guide •Cloudera Manager •Cloudera Navigator Submit questions in the Q&A panel Watch on-demand video of this webinar and many more at http://cloudera.com Follow Eric @esammer Follow Cloudera @ClouderaU Learn more at Strata + Hadoop World: http://tinyurl.com/hadoopworld Thank you for attending!