SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Sentry: Open Source Authorization for
Hive & Impala
Alexander Alten-Lorenz | Senior Field Engineer, Cloudera	

Wednesday, 7th November 2013
Defining  Security  Func/ons

Perimeter	
  
!
!
!

!2

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the	
  
cluster	
  itself	
  

Protec3ng	
  data	
  in	
  the	
  
cluster	
  from	
  unauthorized	
  
visibility	
  

Defining	
  what	
  users	
  and	
  
applica3ons	
  can	
  do	
  with	
  
data	
  

Repor3ng	
  on	
  where	
  data	
  
came	
  from	
  and	
  how	
  it’s	
  
being	
  used	
  

Technical	
  Concepts:	
  
Authen3ca3on	
  
Network	
  isola3on

!
!

Technical	
  Concepts:	
  
Encryp3on	
  
Data	
  masking

!
!

Technical	
  Concepts:	
  
Permissions	
  
Authoriza3on

!
!

Technical	
  Concepts:	
  
Audi3ng	
  
Lineage
Enabling  Enterprise  Security

Perimeter	
  
!
!
!

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the	
  
cluster	
  itself	
  

Protec3ng	
  data	
  in	
  the	
  
cluster	
  from	
  unauthorized	
  
visibility	
  

Defining	
  what	
  users	
  and	
  
applica3ons	
  can	
  do	
  with	
  
data	
  

Repor3ng	
  on	
  where	
  data	
  
came	
  from	
  and	
  how	
  it’s	
  
being	
  used	
  

Technical	
  Concepts:	
  
Authen3ca3on	
  
Network	
  isola3on

	
  Kerberos	
  |	
  Oozie	
  |	
  Knox

!
!

Technical	
  Concepts:	
  
Encryp3on	
  
Data	
  masking

Cer3fied	
  Partners

!
!

Technical	
  Concepts:	
  
Permissions	
  
Authoriza3on

Sentry

Available	
  7/23

!3

!
!

Technical	
  Concepts:	
  
Audi3ng	
  
Lineage

Cloudera	
  Navigator
Hive  Overview
SQL	
  Access	
  to	
  Hadoop	
  
§
§

MapReduce:	
  great	
  massively	
  scalable	
  batch	
  processing	
  framework;	
  
required	
  development	
  for	
  each	
  new	
  job	
  
Hive	
  opened	
  up	
  Hadoop	
  for	
  more	
  users	
  with	
  standard	
  SQL	
  
!

Key	
  Challenges	
  
§
§

Batch	
  MapReduce	
  too	
  slow	
  for	
  interac3ve	
  BI/analy3cs	
  
No	
  concurrency,	
  no	
  security	
  
!

OpEons	
  Today	
  
§
§

!4

Impala	
  designed	
  for	
  low-­‐latency	
  queries	
  
HiveServer2	
  delivers	
  concurrency,	
  authen3ca3on	
  
Our  OpenSource  ac/vity
CDH	
  4.1	
  (HiveServer2)	
  
§
§

Concurrency	
  and	
  Kerberos	
  authen3ca3on	
  for	
  Hive	
  
JDBC	
  and	
  Beeline	
  clients	
  

CDH	
  4.2	
  
§
§
§

HDFS	
  impersona3on	
  authoriza3on	
  as	
  stop-­‐gap	
  
Pluggable	
  authen3ca3on	
  API	
  
JDBC	
  LDAP	
  username/password	
  

ODBC	
  
§
§

!5

Supports	
  Kerberos	
  authen3ca3on	
  and	
  LDAP	
  
Extended	
  partner	
  cer3fica3on
Current  State  of  Authoriza/on
Two	
  Sub-­‐OpEmal	
  Choices	
  for	
  SQL	
  on	
  Hadoop
Insecure	
  Advisory	
  Authoriza3on	
  
Users	
  can	
  grant	
  themselves	
  permissions	
  
Intended	
  to	
  prevent	
  accidental	
  dele3on	
  of	
  data	
  
Problem:	
  Doesn’t	
  guard	
  against	
  malicious	
  users	
  

HDFS	
  Impersona3on	
  
Data	
  is	
  protected	
  at	
  the	
  file	
  level	
  by	
  HDFS	
  permissions	
  
Problem:	
  File-­‐level	
  not	
  granular	
  enough	
  
Problem:	
  Not	
  role-­‐based

!6
Authoriza/on  Requirements
Secure	
  Authoriza3on	
  
Ability	
  to	
  control	
  access	
  to	
  data	
  and/or	
  privileges	
  on	
  data	
  for	
  
authen3cated	
  users	
  

Fine-­‐Grained	
  Authoriza3on	
  
Ability	
  to	
  give	
  users	
  access	
  to	
  a	
  subset	
  of	
  data	
  (e.g.	
  column)	
  in	
  a	
  
database	
  

Role-­‐Based	
  Authoriza3on	
  
Ability	
  to	
  create/apply	
  templa3zed	
  privileges	
  based	
  on	
  
func3onal	
  roles	
  

Mul3-­‐Tenant	
  Administra3on	
  
Ability	
  for	
  central	
  admin	
  group	
  to	
  empower	
  lower-­‐level	
  admins	
  
to	
  manage	
  security	
  for	
  each	
  database/schema

!7
The  Next  Step:  Introducing  Sentry
AuthorizaEon	
  module	
  for	
  Hive	
  &	
  Impala
Unlocks	
  Key	
  RBAC	
  Requirements	
  
Secure,	
  fine-­‐grained,	
  role-­‐based	
  authoriza3on	
  
Mul3-­‐tenant	
  administra3on	
  

Open	
  Source	
  
Intent	
  to	
  donate	
  to	
  ASF	
  

Available	
  and	
  Fully	
  Supported	
  
Hiveserver2	
  &	
  Impala	
  1.1	
  ini3ally

!8
Key  Benefits  of  Sentry
Store	
  Sensi3ve	
  Data	
  in	
  Hadoop	
  
Extend	
  Hadoop	
  to	
  More	
  Users	
  
Enable	
  New	
  Use	
  Cases	
  
Enable	
  Mul3-­‐User	
  Applica3ons	
  
Comply	
  with	
  Regula3ons

!9
Key  Capabili/es  of  Sentry
Fine-­‐Grained	
  Authoriza3on	
  
Specify	
  security	
  for	
  SERVERS,	
  DATABASES,	
  TABLES	
  &	
  VIEWS	
  

Role-­‐Based	
  Authoriza3on	
  
SELECT	
  privilege	
  on	
  views	
  &	
  tables	
  	
  
INSERT	
  privilege	
  on	
  tables	
  
TRANSFORM	
  privilege	
  on	
  servers	
  
ALL	
  privilege	
  on	
  the	
  server,	
  databases,	
  tables	
  &	
  views	
  
ALL	
  privilege	
  is	
  needed	
  to	
  create/modify	
  schema	
  

Mul3-­‐Tenant	
  Administra3on	
  
Separate	
  policies	
  for	
  each	
  database/schema	
  
Can	
  be	
  maintained	
  by	
  separate	
  admins

!10
Apache  Ecosystem  and  Sentry
Shared	
  Hive	
  Metastore	
  (with	
  
HCatalog)	
  
Extensibility	
  plug-­‐in	
  for	
  
HiveServer2	
  
Inline	
  support	
  in	
  Impala	
  1.1	
  
Poten3al	
  extension	
  to	
  Pig,	
  
MapReduce,	
  REST

Hive  Metastore

HCatalog  

M
!11

Sentry
Possible	
  future	
  
development

RE
Sentry  Architecture
Impala

Binding	
  
Layer

HiveServer2

Impala

Hive

Authoriza<on	
  
Provider

Future

Policy	
  Engine
Policy	
  Provider
File

Local	
  FS/HDFS

!12

Database

Interface
Evalua3on,	
  Valida3on
Parsing
Interface
Query  Execu/on  Flow
SQL

Parse

Validate	
  SQL	
  grammar

Build

Construct	
  statement	
  tree

Check

Validate	
  statement	
  objects	
  
• First	
  check:	
  Authoriza3on
Forward	
  to	
  execu3on	
  planner

Plan
MR
!13

Sentry

Query
Example  Security  Policy
[databases]
junior_analyst_role = server=server1->db=jranalyst1, 
# Defines the location of the per DB policy file for
server=server1->uri=hdfs://ha-nn-uri/
the
landing/jranalyst1
# ‘customers’ DB (schema)
customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the
global policy
# file even though ‘customers’ has its only policy
[groups]
file.
# Assigns Hadoop groups to their respective set of
# Note that the privileges from both the global
roles
policy file and
manager = analyst_role, junior_analyst_role
# the per-db policy file are merged. There is no
analyst = analyst_role
overriding.
jranalyst = junior_analyst_role
customers_admin_role = server=server1->db=customers
customers_admin = customers_admin_role
admin = admin_role
# Role controls everything on server1.
admin_role = server=server1
[roles]
# Roles that can import or export data to the the URIs
defined,
# i.e. a landing zone. Since the server runs as the
user "hive,"
# files in this directory must either have the “hive”
group set
# with read/write or be set world read/write.
analyst_role = server=server1->db=analyst1, 
server=server1->db=jranalyst1->table=*>action=select 
server=server1->uri=hdfs://ha-nn-uri/landing/
analyst1
(Continued on next column)

!

!

!

!

!

# Role controls everything for the ‘customers’ DB on
server1.

!14

!
Live  Demo  &  Give  Aways
Closes	
  gap	
  between	
  HDFS	
  and	
  Metastore	
  
Easy	
  to	
  implement	
  
RFC	
  2307	
  compilant	
  (Kerberos)	
  
Enable	
  Mul3-­‐User	
  Applica3ons	
  in	
  one	
  Hive	
  WH	
  
Enables	
  Mul3	
  Tendency	
  per	
  Row	
  and	
  Column	
  

!15
About
dev@sentry.incubator.apache.org	

alexander@cloudera.com	

@mapredit	

mapredit.blogspot.com	

!

Web: http://wiki.apache.org/incubator/SentryProposal

16
Sentry - An Introduction

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Apache spark
Apache sparkApache spark
Apache spark
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Why Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik RamasamyWhy Splunk Chose Pulsar_Karthik Ramasamy
Why Splunk Chose Pulsar_Karthik Ramasamy
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 SeoulElastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 

Similar a Sentry - An Introduction

Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
Brock Noland
 

Similar a Sentry - An Introduction (20)

Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
OWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASROWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASR
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Ppt linux
Ppt linuxPpt linux
Ppt linux
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
Securing Open Source Databases
Securing Open Source DatabasesSecuring Open Source Databases
Securing Open Source Databases
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Gradution Project
Gradution ProjectGradution Project
Gradution Project
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storage
 
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesEncryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
2016 share the three headed beast v4
2016 share the three headed beast v42016 share the three headed beast v4
2016 share the three headed beast v4
 
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureLow Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
 

Más de Alexander Alten

BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
Alexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
Alexander Alten
 

Más de Alexander Alten (13)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Sentry - An Introduction

  • 1. Sentry: Open Source Authorization for Hive & Impala Alexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7th November 2013
  • 2. Defining  Security  Func/ons Perimeter   ! ! ! !2 Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on ! ! Technical  Concepts:   Encryp3on   Data  masking ! ! Technical  Concepts:   Permissions   Authoriza3on ! ! Technical  Concepts:   Audi3ng   Lineage
  • 3. Enabling  Enterprise  Security Perimeter   ! ! ! Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on  Kerberos  |  Oozie  |  Knox ! ! Technical  Concepts:   Encryp3on   Data  masking Cer3fied  Partners ! ! Technical  Concepts:   Permissions   Authoriza3on Sentry Available  7/23 !3 ! ! Technical  Concepts:   Audi3ng   Lineage Cloudera  Navigator
  • 4. Hive  Overview SQL  Access  to  Hadoop   § § MapReduce:  great  massively  scalable  batch  processing  framework;   required  development  for  each  new  job   Hive  opened  up  Hadoop  for  more  users  with  standard  SQL   ! Key  Challenges   § § Batch  MapReduce  too  slow  for  interac3ve  BI/analy3cs   No  concurrency,  no  security   ! OpEons  Today   § § !4 Impala  designed  for  low-­‐latency  queries   HiveServer2  delivers  concurrency,  authen3ca3on  
  • 5. Our  OpenSource  ac/vity CDH  4.1  (HiveServer2)   § § Concurrency  and  Kerberos  authen3ca3on  for  Hive   JDBC  and  Beeline  clients   CDH  4.2   § § § HDFS  impersona3on  authoriza3on  as  stop-­‐gap   Pluggable  authen3ca3on  API   JDBC  LDAP  username/password   ODBC   § § !5 Supports  Kerberos  authen3ca3on  and  LDAP   Extended  partner  cer3fica3on
  • 6. Current  State  of  Authoriza/on Two  Sub-­‐OpEmal  Choices  for  SQL  on  Hadoop Insecure  Advisory  Authoriza3on   Users  can  grant  themselves  permissions   Intended  to  prevent  accidental  dele3on  of  data   Problem:  Doesn’t  guard  against  malicious  users   HDFS  Impersona3on   Data  is  protected  at  the  file  level  by  HDFS  permissions   Problem:  File-­‐level  not  granular  enough   Problem:  Not  role-­‐based !6
  • 7. Authoriza/on  Requirements Secure  Authoriza3on   Ability  to  control  access  to  data  and/or  privileges  on  data  for   authen3cated  users   Fine-­‐Grained  Authoriza3on   Ability  to  give  users  access  to  a  subset  of  data  (e.g.  column)  in  a   database   Role-­‐Based  Authoriza3on   Ability  to  create/apply  templa3zed  privileges  based  on   func3onal  roles   Mul3-­‐Tenant  Administra3on   Ability  for  central  admin  group  to  empower  lower-­‐level  admins   to  manage  security  for  each  database/schema !7
  • 8. The  Next  Step:  Introducing  Sentry AuthorizaEon  module  for  Hive  &  Impala Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authoriza3on   Mul3-­‐tenant  administra3on   Open  Source   Intent  to  donate  to  ASF   Available  and  Fully  Supported   Hiveserver2  &  Impala  1.1  ini3ally !8
  • 9. Key  Benefits  of  Sentry Store  Sensi3ve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Enable  New  Use  Cases   Enable  Mul3-­‐User  Applica3ons   Comply  with  Regula3ons !9
  • 10. Key  Capabili/es  of  Sentry Fine-­‐Grained  Authoriza3on   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  Authoriza3on   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   TRANSFORM  privilege  on  servers   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   Mul3-­‐Tenant  Administra3on   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins !10
  • 11. Apache  Ecosystem  and  Sentry Shared  Hive  Metastore  (with   HCatalog)   Extensibility  plug-­‐in  for   HiveServer2   Inline  support  in  Impala  1.1   Poten3al  extension  to  Pig,   MapReduce,  REST Hive  Metastore HCatalog   M !11 Sentry Possible  future   development RE
  • 12. Sentry  Architecture Impala Binding   Layer HiveServer2 Impala Hive Authoriza<on   Provider Future Policy  Engine Policy  Provider File Local  FS/HDFS !12 Database Interface Evalua3on,  Valida3on Parsing Interface
  • 13. Query  Execu/on  Flow SQL Parse Validate  SQL  grammar Build Construct  statement  tree Check Validate  statement  objects   • First  check:  Authoriza3on Forward  to  execu3on  planner Plan MR !13 Sentry Query
  • 14. Example  Security  Policy [databases] junior_analyst_role = server=server1->db=jranalyst1, # Defines the location of the per DB policy file for server=server1->uri=hdfs://ha-nn-uri/ the landing/jranalyst1 # ‘customers’ DB (schema) customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the global policy # file even though ‘customers’ has its only policy [groups] file. # Assigns Hadoop groups to their respective set of # Note that the privileges from both the global roles policy file and manager = analyst_role, junior_analyst_role # the per-db policy file are merged. There is no analyst = analyst_role overriding. jranalyst = junior_analyst_role customers_admin_role = server=server1->db=customers customers_admin = customers_admin_role admin = admin_role # Role controls everything on server1. admin_role = server=server1 [roles] # Roles that can import or export data to the the URIs defined, # i.e. a landing zone. Since the server runs as the user "hive," # files in this directory must either have the “hive” group set # with read/write or be set world read/write. analyst_role = server=server1->db=analyst1, server=server1->db=jranalyst1->table=*>action=select server=server1->uri=hdfs://ha-nn-uri/landing/ analyst1 (Continued on next column) ! ! ! ! ! # Role controls everything for the ‘customers’ DB on server1. !14 !
  • 15. Live  Demo  &  Give  Aways Closes  gap  between  HDFS  and  Metastore   Easy  to  implement   RFC  2307  compilant  (Kerberos)   Enable  Mul3-­‐User  Applica3ons  in  one  Hive  WH   Enables  Mul3  Tendency  per  Row  and  Column   !15