SlideShare una empresa de Scribd logo
1 de 10
SECURITY IN A HADOOP
CLUSTER
Overview of Approaches
Security: Why You Should Care
Hadoop is a powerful technology that lets us do
amazing things…but “with great power comes
great responsibility” - Voltaire / Uncle Ben


 • Putting all your data in one place makes it a target for
   bad guys
 • Plenty of examples in the news of major data breaches
   – no one wants to be responsible for that
 • Best to consider Security up front in design time
 • It’s the right thing to do


                     © 2012 Cloudera. All Rights Reserved.
Types of Security
Type                      Example

Access                    Physical (lock and key), Virtual (Firewalls, VLANS)

Authentication            Logins – verify users are who they say they are

Authorization             Permissions – verify what a user can do

Encryption at Rest        Data protection for files on disk

Encryption in transport   Data protection on the wire

Auditing                  Keep track of who accessed what

Policy / Procedure        Protect against Human Error & Social Engineering



                           © 2012 Cloudera. All Rights Reserved.
Hadoop Ecosystem Security:
    What is supported today?
Approach                           Benefit

Network Based Isolation
                                   Restrict Network access to only authorized users
of Cluster
HDFS File Ownership &
                                   Configure access permissions (ACLs) to files in HDFS
Permissions for Users/Groups
Kerberos Authentication &          Strong authentication of both clients and servers so that
Authorization                      tasks can be run under a job submitters own acct.
Combination of Kerberos & HDFS     Configure User & Group lockdowns for Read / Write /
ACLs enable lockdowns              Execute of files and jobs. Prevents user impersonation
                                   Offers Table & Column / Row & Cell security, respectively,
HBase & Accumulo Security
                                   for Users and Groups
                                   •At Rest via 3rd party, at OS layer
                                   •In Transport:
Some Encryption                          • Internal for HDFS and MapReduce (new in CDH 4.1)
                                         •External for HttpFS via SSL, SQOOP via Native DB
                                         Driver Encryption (not yet for FLUME)
TLS between Cloudera Manager        Provides encryption and authentication in the
and Agents                          communications to prevent snooping
                                 © 2012 Cloudera. All Rights Reserved.
Kerberos Overview
•   Kerberos: A computer network authentication protocol that works on basis of tickets
    to allow nodes to prove identity to each other in a secure manner using encryption
    extensively



•   Messages are exchanged between:
     • Client
     • Server
     • Kerberos Key Distribution Center (KDC).
          • Note this is not part of Hadoop, but most Linux Distros come with MIT
             Kerberos KDC.
•   Passwords are not sent across network, Instead passwords are used to compute
    encryption keys
•   Authentication status is cached (don’t need to send credentials with each request)
•   Timestamps are essential to Kerberos (make sure system clocks are synchronized !)
HBase and Accumulo Overview
Both systems:
•   Open Source, Distributed NoSQL, Key/Value stores that run on Hadoop,
    based on Google's BigTable design
•   Provide real-time read/write access to HDFS
•   Scale to 1000s of nodes and Petabytes of data
•   Provide real-time and bulk APIs for loading of data
•   Support application-level extensions to the core (in HBase they're called co-
    processors, in Accumulo they're called iterators and aggregators)
•   Run at scale in Production Environments
•   Can run on CDH !!!

Primary Differences (in terms of Security)
•   HBase supports Kerberos authentication and ACLs on tables and column
    families
•   Accumulo support username/password authentication (Kerberos based is
    under development), table level permissions, and ACLs on individual Cells



                             © 2012 Cloudera. All Rights Reserved.
Configuring Security in Hadoop
• Hadoop Security configuration is a specialized
  topic
• Many specifics depend on:
   • Version of Hadoop
   • Type of Kerberos being used (AD or MIT)
   • Operating System and Distribution
• Little room for misconfiguration
   • Must follow instructions exactly
   • Mistakes often result in vague “access denied” errors
   • May need to work around Version specific bugs
• The                                       can help
  configure a secure system
Hadoop Security Landscape:
Future Requirements and Features
•   Forward-Deployed Systems
    •   Not just in the comfy confines of the Datacenter anymore
    •   Encryption at Rest becoming a major requirement

•   Mixed level Security
    • Analytics in environments with multiple levels of trusted users and
      multiple levels of sensitive data
    • Joining low-sensitive data with med-sensitive data can equal high-
      sensitive data.
    • What users can see, use, join/merge, analyze, and derive insight from
      what data ?

•   Wireless
    •   PDA access
    •   Wireless Clusters



                            © 2012 Cloudera. All Rights Reserved.
CONCLUSION
• “…with great power comes great responsibility”
                         - Voltaire / Uncle Ben

•   Security = Policy + Implementation
    • Implementation is both Technical and Human
    • Weaknesses and breakdowns are inevitable (ask any Hacker)
    • Must do everything to limit severity of breakdown by implementing
      multiple levels of security

•   I’m not the expert, your local Hacker is
    • Pay attention to their community
    • Keep current on Hacking techniques, news of breaches, etc.
    • Maintain good OpSec

•   The                                             can configure a secure system

                          © 2012 Cloudera. All Rights Reserved.
Questions?

Más contenido relacionado

Más de Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceCloudera, Inc.
 

Más de Cloudera, Inc. (20)

Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 

Último

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Security in a hadoop cluster: Overview of Approaches

  • 1. SECURITY IN A HADOOP CLUSTER Overview of Approaches
  • 2. Security: Why You Should Care Hadoop is a powerful technology that lets us do amazing things…but “with great power comes great responsibility” - Voltaire / Uncle Ben • Putting all your data in one place makes it a target for bad guys • Plenty of examples in the news of major data breaches – no one wants to be responsible for that • Best to consider Security up front in design time • It’s the right thing to do © 2012 Cloudera. All Rights Reserved.
  • 3. Types of Security Type Example Access Physical (lock and key), Virtual (Firewalls, VLANS) Authentication Logins – verify users are who they say they are Authorization Permissions – verify what a user can do Encryption at Rest Data protection for files on disk Encryption in transport Data protection on the wire Auditing Keep track of who accessed what Policy / Procedure Protect against Human Error & Social Engineering © 2012 Cloudera. All Rights Reserved.
  • 4. Hadoop Ecosystem Security: What is supported today? Approach Benefit Network Based Isolation Restrict Network access to only authorized users of Cluster HDFS File Ownership & Configure access permissions (ACLs) to files in HDFS Permissions for Users/Groups Kerberos Authentication & Strong authentication of both clients and servers so that Authorization tasks can be run under a job submitters own acct. Combination of Kerberos & HDFS Configure User & Group lockdowns for Read / Write / ACLs enable lockdowns Execute of files and jobs. Prevents user impersonation Offers Table & Column / Row & Cell security, respectively, HBase & Accumulo Security for Users and Groups •At Rest via 3rd party, at OS layer •In Transport: Some Encryption • Internal for HDFS and MapReduce (new in CDH 4.1) •External for HttpFS via SSL, SQOOP via Native DB Driver Encryption (not yet for FLUME) TLS between Cloudera Manager Provides encryption and authentication in the and Agents communications to prevent snooping © 2012 Cloudera. All Rights Reserved.
  • 5. Kerberos Overview • Kerberos: A computer network authentication protocol that works on basis of tickets to allow nodes to prove identity to each other in a secure manner using encryption extensively • Messages are exchanged between: • Client • Server • Kerberos Key Distribution Center (KDC). • Note this is not part of Hadoop, but most Linux Distros come with MIT Kerberos KDC. • Passwords are not sent across network, Instead passwords are used to compute encryption keys • Authentication status is cached (don’t need to send credentials with each request) • Timestamps are essential to Kerberos (make sure system clocks are synchronized !)
  • 6. HBase and Accumulo Overview Both systems: • Open Source, Distributed NoSQL, Key/Value stores that run on Hadoop, based on Google's BigTable design • Provide real-time read/write access to HDFS • Scale to 1000s of nodes and Petabytes of data • Provide real-time and bulk APIs for loading of data • Support application-level extensions to the core (in HBase they're called co- processors, in Accumulo they're called iterators and aggregators) • Run at scale in Production Environments • Can run on CDH !!! Primary Differences (in terms of Security) • HBase supports Kerberos authentication and ACLs on tables and column families • Accumulo support username/password authentication (Kerberos based is under development), table level permissions, and ACLs on individual Cells © 2012 Cloudera. All Rights Reserved.
  • 7. Configuring Security in Hadoop • Hadoop Security configuration is a specialized topic • Many specifics depend on: • Version of Hadoop • Type of Kerberos being used (AD or MIT) • Operating System and Distribution • Little room for misconfiguration • Must follow instructions exactly • Mistakes often result in vague “access denied” errors • May need to work around Version specific bugs • The can help configure a secure system
  • 8. Hadoop Security Landscape: Future Requirements and Features • Forward-Deployed Systems • Not just in the comfy confines of the Datacenter anymore • Encryption at Rest becoming a major requirement • Mixed level Security • Analytics in environments with multiple levels of trusted users and multiple levels of sensitive data • Joining low-sensitive data with med-sensitive data can equal high- sensitive data. • What users can see, use, join/merge, analyze, and derive insight from what data ? • Wireless • PDA access • Wireless Clusters © 2012 Cloudera. All Rights Reserved.
  • 9. CONCLUSION • “…with great power comes great responsibility” - Voltaire / Uncle Ben • Security = Policy + Implementation • Implementation is both Technical and Human • Weaknesses and breakdowns are inevitable (ask any Hacker) • Must do everything to limit severity of breakdown by implementing multiple levels of security • I’m not the expert, your local Hacker is • Pay attention to their community • Keep current on Hacking techniques, news of breaches, etc. • Maintain good OpSec • The can configure a secure system © 2012 Cloudera. All Rights Reserved.