SlideShare una empresa de Scribd logo
1 de 30
Hadoop Security
Architecture
Owen O’Malley
oom@yahoo-inc.com

Hadoop
Outline
•
•
•
•
•
•

Problem Statement
Security Threats
Solutions to Threats
HDFS
MapReduce
Oozie

• Interfaces
• Performance
• Reliability and Availability
• Operations and Monitoring
Hadoop

2
Problem Statement
• The fundamental goal of adding Hadoop security is that Yahoo's data
stored in HDFS must be secure from unauthorized access.
Furthermore, it must do so without adding significant effort to operating
or using the Grid. Based on that goal, there are a few implications.
– All HDFS clients must be authenticated to ensure that the user is
who they claim to be. That implies that all map/reduce users,
including services such as Oozie, must be also authenticated and
that tasks must run with the privileges and identity of the submitting
user.
– Since Data Nodes and TaskTrackers are entrusted with user data
and credentials, they must authenticate themselves to ensure they
are running as part of the Grid and are not trojan horses.
– Kerberos will be the underlying authentication service so that users
can be authenticated using their system credentials.
Hadoop
Communication and Threats

User
Process

Job
Tracker

Name
Node

Task
Tracker

Oozie

Data
Node

Task

NFS

MObStor

ZooKeeper

Hadoop

4
Security Threats in Hadoop
•

User to Service Authentication
– No User Authentication on NameNode or JobTracker
•

Client code supplies user and group names

– No User Authorization on DataNode – Fixed in 0.21
•

Users can read/write any block

– No User Authorization on JobTracker
•
•

•

Users can modify or kill other user’s jobs
Users can modify the persistent state of JobTracker

Service to Service Authentication
– No Authentication of DataNodes and TaskTrackers
•

•

Users can start fake DataNodes and TaskTrackers

No Encryption on Wire or Disk

Hadoop
Solutions to Threats
• Add Kerberos-based authentication to NameNode
and JobTracker.
• Add delegation tokens to HDFS and support for
them in MapReduce.
• Determine user’s group membership on the
NameNode and JobTracker.
• Protect MapReduce system directory from users.
• Add authorization model to MapReduce so that
only submitting user can modify or kill job.
• Add Backyard authentication to Web UI’s.

Hadoop

6
Out of Scope for 0.20.100
• Protecting against root on slave nodes:
– Encryption of RPC messages
– Encryption of block transfer protocol
– Encryption of MapReduce transient files
– Encryption of HDFS block files
• Passing Kerberos tickets to MapReduce
tasks for third party Kerborized services.
Hadoop

7
HDFS Security
• Users authenticate via Kerberos
• MapReduce jobs can obtain delegation
tokens for later use.
• When clients are reading or writing an HDFS
file, the NameNode generates a block
access token that will be verified by the
DataNode.
Hadoop

8
HDFS Authentication
ke r

)
b(joe

Application

Name
Node

delg(jo
e)

MapReduce
Task

kerb(hdfs)
bloc

k to

ken

Data
Node

o
ck t
blo

k en

• Clients authenticate to NameNode via:
– Kerberos
– Delegation tokens
• Client demonstrates authorization to DataNode via block
access token
• DataNode authenticates to NameNode via Kerberos
Hadoop
What does this *really* look like?
• Need a Kerberos ticket to work
– kinit –l 7d oom@DS.CORP.YAHOO.COM
– hadoop fs –ls
– hadoop jar my.jar in-dir out-dir
• Works using ticket cache!

– Can display ticket cache with klist.
Hadoop

10
Kerberos Dataflows
Key
Distribution
Center

Get TGT

Request Service Ticket

Name Node
hdfs/host@YGRID

User
joe@DS.CORP

Connect with Service Ticket

Hadoop

11
Delegation Token
• Advantages over using Kerberos directly:
– Don’t trust JobTracker with credentials
– Avoid MapReduce task authorization flood
– Renewable by third party (ie. JobTracker)
– Revocable when job finishes
• tokenId = {owner prin, renewer prin, issueDate, maxDate}
• tokenAuthenticator = HMAC(masterKey, tokenId)
• Token = {tokenId, tokenAuthenticator}
Hadoop

12
Block Access Token
• Only NameNode knows the set of users allowed to
access a specific block, so the NameNodes gives
an authorized clients a block access token.
• Capabilities include read, write, copy, or replace.
• The NameNode and DataNodes share a
dynamically rolled secret key to secure the tokens.
• tokenId={expiration, keygen, owner, block, access}
• tokenAuthenticator = HMAC(blockKey, tokenId)
• token = {tokenId, tokenAuthenticator}
Hadoop

13
MapReduce Security
• Require Kerberos authentication from client.
• Secure the information about pending and running jobs
– Store the job configuration and input splits in HDFS
under ~user/.staging/$jobid
– Store the job’s location and secrets in private directory
• JobTracker creates a random job token. It it used for:
– Connecting to TaskTracker’s RPC
– Authorizing http get for shuffle
• HMAC(job token, URL) sent from reduce tasks to TaskTracker
Hadoop

14
MapReduce Authentication
Application

kerb(joe)

Job
Tracker

kerb(mapreduce)
Task
Tracker
job token

Task

HDFS
HDFS
HDFS
)
(joe
elg
d
other
credential

trus
t

Other
Service

NFS

•

Client authenticates to JobTracker via Kerberos

•

TaskTracker authenticates to JobTracker via Kerberos

•

Task authenticates to the TaskTrackers using the job token

•

Task authenticates to HDFS using a delegation token

•

NFS is not Kerberized.

Hadoop

15
MapReduce Task Security
• Users have separate task directories with
permissions set to 700.
• Distributed cache is now divided based on
the source’s visibility
– Global – shared with other users
– Private - protected from other users

Hadoop

16
Web UI
• MapReduce makes heavy use of Web UI for
displaying state of cluster and running jobs.
• HDFS also has a web browsing interface.
• Use Backyard to authenticate Web UI users
• Only allow submitting user of job to view
stdout and stderr of job’s tasks.
• HDFS web browser checks user’s
authorization.
Hadoop

17
Oozie

• Client authenticates to Oozie
– Custom auth for Yahoo!
• Oozie authenticates to HDFS and MapReduce as
“oozie” principal
• “oozie” is configured as a super-user for HDFS and
MapReduce and may act as other users.
Hadoop

18
Proxy Services Trust Model
• Requires trust that service (eg. Oozie) principal is
secure.
• Explored and rejected
– Having user headless principals stored on
Oozie machine “x/oozie” for user “x”
– Passing user headless principal keytab to Oozie
– Generalizing delegation token to have token
granting tokens.
Hadoop

19
Protocols
• RPC
– Change RPC to use SASL and either:
• Kerberos authentication (GSSAPI)
• Tokens (DIGEST-MD5)

– User’s Kerberos tickets obtained at login used
automatically.
– Changes RPC format
– Can easily add encryption later
• Block transfer protocol
– Block access tokens in data stream
Hadoop

20
Protocols
• HTTP
– User/Browser facing
• Yahoo – Custom Authentication
• External – SPNEGO or Kerberos login module

– Web Services
• HFTP – Hadoop File Transfer Protocol
• Others later
• SPNEGO or Delegation Token via RPC

– Shuffle
• Use HMAC of URL hashed with Job Token
Hadoop

21
Summary
• RPC
– Kerberos
• Application to NameNode, JobTracker
• DataNode to NameNode
• TaskTracker to JobTracker

– Digest-MD5
• MapReduce task to NameNode, TaskTracker

• Block Access Token
• Backyard
– User to Web UI
Hadoop

22
Task
Accessing as user
Accessing as user

Backyard

Browser, 2ndNN, fsck
Browser, 2ndNN, fsck

HFTP

HTTP

NN

DIGEST-MD5

RPC

Kerberos

User (initial access),
User (initial access),
2ndNN, Balancer
2ndNN, Balancer

HTTP-DIGEST w/ delegation token
HTTP-DIGEST w/ delegation token

Task
distcp accessing as user
distcp accessing as user

HFTP

User, DN, Balancer, Task
User, DN, Balancer, Task

DN

HTTP

access token

Task

Socket

Backyard

Kerberos

Browser

Forward delegation token
Forward delegation token

Task
distcp accessing as user
distcp accessing as user

Hadoop
Backyard

JT

HTTP

RPC

User
User

Kerberos

TT

Browser
Browser

TT

Browser
Browser

HTTP

RPC

Task

DIGEST-MD5

Backyard

Kerberos

-HMAC

Local Task
Local Task

Hadoop

Task
Reduce Task getting
Reduce Task getting
Map output
Map output
Authentication Paths
Job
Tracker

Oozie

Browser

Name
Node

Data
Node

Task
Tracker

NFS
User
Process

Task

MObStor

ZooKeeper

Hadoop

HTTP backyard
HTTP HMAC
RPC Kerberos
RPC DIGEST
Block Access
Third Party

25
Interfaces (and their scope and stability)
•

Imported Interfaces
–
–

SASL – Standard for supporting token and Kerberos authentication

–

GSSAPI – Kerberos part of SASL authentication

–

HMAC-SHA1 – Shared secret authentication

–

•

JAAS – Java API for supporting authentication

SPNEGO – Use Kerberos tickets over HTTP

Exported Interfaces – Both Limited Private
–
–

•

HDFS adds a method to get delegation tokens
RPC adds a doAs method

Major Internal (Inter-system Interfaces)
–

MapReduce Shuffle uses HMAC-SHA1

–

RPC uses Kerberos and DIGEST-MD5

Hadoop

26
Pluggability
• Pluggability in Hadoop supports different environments
• HTTP browser user authentication
– Yahoo – Backyard
– External – SPNEGO or Kerberos login module
• RPC transport
– SASL supports DIGEST-MD5, Kerberos, and others
• Acquiring credentials
– JAAS supports Kerberos, and others

Hadoop

27
Performance
• The authentication should not introduce
substantial performance penalties.
• Delegation token design to avoid
authentication flood by MapReduce tasks
• Required to be less than 3% on GridMix.

Hadoop

28
Reliability and Availability
• The Kerberos KDC can not be a single point of failure.
– Kerberos clients automatically fail over to secondary KDC’s
– Secondary KDC’s can be sync’ed automatically from the primary
since the data rarely changes.
• The cluster must remain stable when Kerberos fails.
– The slaves (TaskTrackers and DataNodes) will lose their ability to
reconnect to the master, when their RPC socket closes, their service
ticket has expired, and both the primary and secondary KDC’s have
failed.
– Decided not to use special tokens to handle this case.
• Once the MapReduce job is submitted, the KDC is not required for the
job to continue running.
Hadoop

29
Operations and Monitoring
• The number of Kerberos authorizations will be
logged on the NameNode and JobTracker.
• Authorization failures will be logged.
• Authentication failures will be logged.
• The authorization logs will be a separate log4j
logger, so they can be directed to a separate file.

Hadoop

30

Más contenido relacionado

La actualidad más candente

Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration FlowHortonworks
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkDuyhai Doan
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 

La actualidad más candente (20)

Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration Flow
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Apache hive
Apache hiveApache hive
Apache hive
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

Destacado

Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and HadoopKai Zheng
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXAbhishek Mallick
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data miningharithavijay94
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authenticationleahculver
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 

Destacado (20)

Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Hadoop
HadoopHadoop
Hadoop
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 

Similar a Hadoop Security Architecture

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesMapR Technologies
 
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Yahoo Developer Network
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayHadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayCloudera, Inc.
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Yahoo Developer Network
 
Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)MapR Technologies
 
Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys BotzumSecuring Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys BotzumMapR Technologies
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumSecuring Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumMapR Technologies
 

Similar a Hadoop Security Architecture (20)

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Securing Hadoop - MapR Technologies
Securing Hadoop - MapR TechnologiesSecuring Hadoop - MapR Technologies
Securing Hadoop - MapR Technologies
 
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayHadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
 
Big data security
Big data securityBig data security
Big data security
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj...
 
Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)
 
Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys BotzumSecuring Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumSecuring Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys Botzum
 

Más de Owen O'Malley

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemOwen O'Malley
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACIDOwen O'Malley
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryptionOwen O'Malley
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 IcebergOwen O'Malley
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column EncryptionOwen O'Malley
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopOwen O'Malley
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersOwen O'Malley
 
Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to HiveOwen O'Malley
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013Owen O'Malley
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File IntroductionOwen O'Malley
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduceOwen O'Malley
 

Más de Owen O'Malley (20)

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 Iceberg
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Data protection2015
Data protection2015Data protection2015
Data protection2015
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to Hive
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
 
ORC Files
ORC FilesORC Files
ORC Files
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 

Último

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Último (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Hadoop Security Architecture

  • 2. Outline • • • • • • Problem Statement Security Threats Solutions to Threats HDFS MapReduce Oozie • Interfaces • Performance • Reliability and Availability • Operations and Monitoring Hadoop 2
  • 3. Problem Statement • The fundamental goal of adding Hadoop security is that Yahoo's data stored in HDFS must be secure from unauthorized access. Furthermore, it must do so without adding significant effort to operating or using the Grid. Based on that goal, there are a few implications. – All HDFS clients must be authenticated to ensure that the user is who they claim to be. That implies that all map/reduce users, including services such as Oozie, must be also authenticated and that tasks must run with the privileges and identity of the submitting user. – Since Data Nodes and TaskTrackers are entrusted with user data and credentials, they must authenticate themselves to ensure they are running as part of the Grid and are not trojan horses. – Kerberos will be the underlying authentication service so that users can be authenticated using their system credentials. Hadoop
  • 5. Security Threats in Hadoop • User to Service Authentication – No User Authentication on NameNode or JobTracker • Client code supplies user and group names – No User Authorization on DataNode – Fixed in 0.21 • Users can read/write any block – No User Authorization on JobTracker • • • Users can modify or kill other user’s jobs Users can modify the persistent state of JobTracker Service to Service Authentication – No Authentication of DataNodes and TaskTrackers • • Users can start fake DataNodes and TaskTrackers No Encryption on Wire or Disk Hadoop
  • 6. Solutions to Threats • Add Kerberos-based authentication to NameNode and JobTracker. • Add delegation tokens to HDFS and support for them in MapReduce. • Determine user’s group membership on the NameNode and JobTracker. • Protect MapReduce system directory from users. • Add authorization model to MapReduce so that only submitting user can modify or kill job. • Add Backyard authentication to Web UI’s. Hadoop 6
  • 7. Out of Scope for 0.20.100 • Protecting against root on slave nodes: – Encryption of RPC messages – Encryption of block transfer protocol – Encryption of MapReduce transient files – Encryption of HDFS block files • Passing Kerberos tickets to MapReduce tasks for third party Kerborized services. Hadoop 7
  • 8. HDFS Security • Users authenticate via Kerberos • MapReduce jobs can obtain delegation tokens for later use. • When clients are reading or writing an HDFS file, the NameNode generates a block access token that will be verified by the DataNode. Hadoop 8
  • 9. HDFS Authentication ke r ) b(joe Application Name Node delg(jo e) MapReduce Task kerb(hdfs) bloc k to ken Data Node o ck t blo k en • Clients authenticate to NameNode via: – Kerberos – Delegation tokens • Client demonstrates authorization to DataNode via block access token • DataNode authenticates to NameNode via Kerberos Hadoop
  • 10. What does this *really* look like? • Need a Kerberos ticket to work – kinit –l 7d oom@DS.CORP.YAHOO.COM – hadoop fs –ls – hadoop jar my.jar in-dir out-dir • Works using ticket cache! – Can display ticket cache with klist. Hadoop 10
  • 11. Kerberos Dataflows Key Distribution Center Get TGT Request Service Ticket Name Node hdfs/host@YGRID User joe@DS.CORP Connect with Service Ticket Hadoop 11
  • 12. Delegation Token • Advantages over using Kerberos directly: – Don’t trust JobTracker with credentials – Avoid MapReduce task authorization flood – Renewable by third party (ie. JobTracker) – Revocable when job finishes • tokenId = {owner prin, renewer prin, issueDate, maxDate} • tokenAuthenticator = HMAC(masterKey, tokenId) • Token = {tokenId, tokenAuthenticator} Hadoop 12
  • 13. Block Access Token • Only NameNode knows the set of users allowed to access a specific block, so the NameNodes gives an authorized clients a block access token. • Capabilities include read, write, copy, or replace. • The NameNode and DataNodes share a dynamically rolled secret key to secure the tokens. • tokenId={expiration, keygen, owner, block, access} • tokenAuthenticator = HMAC(blockKey, tokenId) • token = {tokenId, tokenAuthenticator} Hadoop 13
  • 14. MapReduce Security • Require Kerberos authentication from client. • Secure the information about pending and running jobs – Store the job configuration and input splits in HDFS under ~user/.staging/$jobid – Store the job’s location and secrets in private directory • JobTracker creates a random job token. It it used for: – Connecting to TaskTracker’s RPC – Authorizing http get for shuffle • HMAC(job token, URL) sent from reduce tasks to TaskTracker Hadoop 14
  • 15. MapReduce Authentication Application kerb(joe) Job Tracker kerb(mapreduce) Task Tracker job token Task HDFS HDFS HDFS ) (joe elg d other credential trus t Other Service NFS • Client authenticates to JobTracker via Kerberos • TaskTracker authenticates to JobTracker via Kerberos • Task authenticates to the TaskTrackers using the job token • Task authenticates to HDFS using a delegation token • NFS is not Kerberized. Hadoop 15
  • 16. MapReduce Task Security • Users have separate task directories with permissions set to 700. • Distributed cache is now divided based on the source’s visibility – Global – shared with other users – Private - protected from other users Hadoop 16
  • 17. Web UI • MapReduce makes heavy use of Web UI for displaying state of cluster and running jobs. • HDFS also has a web browsing interface. • Use Backyard to authenticate Web UI users • Only allow submitting user of job to view stdout and stderr of job’s tasks. • HDFS web browser checks user’s authorization. Hadoop 17
  • 18. Oozie • Client authenticates to Oozie – Custom auth for Yahoo! • Oozie authenticates to HDFS and MapReduce as “oozie” principal • “oozie” is configured as a super-user for HDFS and MapReduce and may act as other users. Hadoop 18
  • 19. Proxy Services Trust Model • Requires trust that service (eg. Oozie) principal is secure. • Explored and rejected – Having user headless principals stored on Oozie machine “x/oozie” for user “x” – Passing user headless principal keytab to Oozie – Generalizing delegation token to have token granting tokens. Hadoop 19
  • 20. Protocols • RPC – Change RPC to use SASL and either: • Kerberos authentication (GSSAPI) • Tokens (DIGEST-MD5) – User’s Kerberos tickets obtained at login used automatically. – Changes RPC format – Can easily add encryption later • Block transfer protocol – Block access tokens in data stream Hadoop 20
  • 21. Protocols • HTTP – User/Browser facing • Yahoo – Custom Authentication • External – SPNEGO or Kerberos login module – Web Services • HFTP – Hadoop File Transfer Protocol • Others later • SPNEGO or Delegation Token via RPC – Shuffle • Use HMAC of URL hashed with Job Token Hadoop 21
  • 22. Summary • RPC – Kerberos • Application to NameNode, JobTracker • DataNode to NameNode • TaskTracker to JobTracker – Digest-MD5 • MapReduce task to NameNode, TaskTracker • Block Access Token • Backyard – User to Web UI Hadoop 22
  • 23. Task Accessing as user Accessing as user Backyard Browser, 2ndNN, fsck Browser, 2ndNN, fsck HFTP HTTP NN DIGEST-MD5 RPC Kerberos User (initial access), User (initial access), 2ndNN, Balancer 2ndNN, Balancer HTTP-DIGEST w/ delegation token HTTP-DIGEST w/ delegation token Task distcp accessing as user distcp accessing as user HFTP User, DN, Balancer, Task User, DN, Balancer, Task DN HTTP access token Task Socket Backyard Kerberos Browser Forward delegation token Forward delegation token Task distcp accessing as user distcp accessing as user Hadoop
  • 26. Interfaces (and their scope and stability) • Imported Interfaces – – SASL – Standard for supporting token and Kerberos authentication – GSSAPI – Kerberos part of SASL authentication – HMAC-SHA1 – Shared secret authentication – • JAAS – Java API for supporting authentication SPNEGO – Use Kerberos tickets over HTTP Exported Interfaces – Both Limited Private – – • HDFS adds a method to get delegation tokens RPC adds a doAs method Major Internal (Inter-system Interfaces) – MapReduce Shuffle uses HMAC-SHA1 – RPC uses Kerberos and DIGEST-MD5 Hadoop 26
  • 27. Pluggability • Pluggability in Hadoop supports different environments • HTTP browser user authentication – Yahoo – Backyard – External – SPNEGO or Kerberos login module • RPC transport – SASL supports DIGEST-MD5, Kerberos, and others • Acquiring credentials – JAAS supports Kerberos, and others Hadoop 27
  • 28. Performance • The authentication should not introduce substantial performance penalties. • Delegation token design to avoid authentication flood by MapReduce tasks • Required to be less than 3% on GridMix. Hadoop 28
  • 29. Reliability and Availability • The Kerberos KDC can not be a single point of failure. – Kerberos clients automatically fail over to secondary KDC’s – Secondary KDC’s can be sync’ed automatically from the primary since the data rarely changes. • The cluster must remain stable when Kerberos fails. – The slaves (TaskTrackers and DataNodes) will lose their ability to reconnect to the master, when their RPC socket closes, their service ticket has expired, and both the primary and secondary KDC’s have failed. – Decided not to use special tokens to handle this case. • Once the MapReduce job is submitted, the KDC is not required for the job to continue running. Hadoop 29
  • 30. Operations and Monitoring • The number of Kerberos authorizations will be logged on the NameNode and JobTracker. • Authorization failures will be logged. • Authentication failures will be logged. • The authorization logs will be a separate log4j logger, so they can be directed to a separate file. Hadoop 30