SlideShare a Scribd company logo
1 of 24
HADOOP SECURITY FEATURES
That make your risk officer happy
By Anurag Shrivastava, ING Commercial Bank, Amsterdam
@shri2201
Security for Hadoop
Source: http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/
Hadoop Security Features 2
Hadoop in Enterprise
Data Lake – an important information assets for enterprise
Data from System of
Records and Logs are stored
in Hadoop
Significant cost
savings for
Enterprise
Diverse types of
users
Picture Source: http://arunkottolli.blogspot.nl/2014/03/understanding-data-in-big-data.html
Hadoop Security Features 3
Operational Security in Enterprise
• User Access Management
• Security Event Monitoring
• Application State Monitoring
• Security Testing
• Patch Management
• Data Protection
• Backup and restore
Hadoop Security Features 4
User Access Management
Requirements
Privileged, group and generic accounts
Separation of technical and business users
Separation of environments (DTAP)
Separation of admins and other users
Separation of users in different business roles
Application of four eyes principle when entering or
changing the data
Hadoop Security Features 5
Security Event Monitoring
• Definition of application specific events
• All login attempts failed or successful
• Unauthorized attempt to access a table or file
• Operational performance of application
• Name node performance
• CPU, Disk
• Integration with Master Control Room
• Alerting the asset manager
Hadoop Security Features 6
Data Protection (1/2)
• Confidentiality
• Protect information from unauthorized
disclosure
• Integrity
• Ensure the accuracy, completeness and
timeliness of information and prevent data
tempering
• Availability
• Ensure that information and service is
available when required
Picture Source:
http://www.attix5.co.uk/thought-
leadership/why-data-protection-software-
essential-good-nights-sleep
Hadoop Security Features 7
Data Protection (2/2)
• Confidentiality
• Logon
• Access Control
• Malicious code protection
• Security Event Monitoring
• Encryption
• Integrity
• Message authentication code
• Data Lineage
Picture Source:
http://www.attix5.co.uk/thought-
leadership/why-data-protection-software-
essential-good-nights-sleep
Hadoop Security Features 8
Security under spotlight in Data Lake
• All kinds of enterprise data – structured,
semi-structured and unstructured
• Many groups of users – Data Scientists,
Analysts, Engineers, Marketers,
Managers
• Long term retention of data
• Different types of workloads
• Value of data grows as the data from
different sources are combined in Data
Lake
Picture source: http://beyondplm.com/2014/05/05/plm-downstream-usage-and-future-information-rivers/
Hadoop Security Features 9
Data Lake Risks
• Data Lake is an attractive target of inside and outside attackers
• Security compromise in Data Lake can have major or catastrophic
business impact
IT Risk assessment gives Hadoop implementation
the highest risk rating for Data Lake use case.
Hadoop Security Features 10
Lab Like Security is not Enough
Play Area Big Data Predictive Analytics
Lab
Production
System
Hadoop Security Features 11
Predictive Analytics Lab
Stepping Stone
(Citrix)
18 x Hadoop
Nodes
GIT, Libraries,
Build Tools
Monitoring
Services
Data Files in
Batches
Dedicated VLAN Shared ServicesShared Services
SMTP Relay
Internet via
Corporate
Infrastructure
Firewall Rules
Guard the
Perimeter
Security
Of Hadoop
Cluster
18 x Hadoop
Nodes
Lab like security works for a small group of people
Hadoop Security Features 12
Limitations of Hadoop
• No “Data at Rest” Encryption
• A Kerberos-Centric Approach
• Limited Authorization Capabilities
• Complexity of the Security Model and Configuration
Unfortunately this is not sufficient for Data Lake that ingests all the
data and caters to thousands of users.
Hadoop Security Features 13
Hadoop Security
Hadoop Security Solutions from Major Vendors
Hortonworks acquires XASecure to
bring ACLs in Hadoop
Apache Ranger
Apache Knox
Apache Falcon
Cloudera is working on Project Rhino Project Rhino
Apache Sentry
Hadoop Security Features 14
HDP-Apache Ranger
Hadoop Security Features 15
Apache Ranger
Apache Ranger currently supports authorization, auditing and security administration of limited
number of HDP components
Hive
HBase
Storm
Knox
HDFS
Hadoop Security Features 16
Apache Ranger Goals
1. Centralized security administration to manage all security related tasks in
a central UI or using REST APIs.
2. Fine grained authorization to do a specific action and/or operation with
Hadoop component/tool and managed through a central administration tool
3. Standardize authorization method across all Hadoop components.
4. Enhanced support for different authorization methods - Role based access
control, attribute based access control etc.
5. Centralize auditing of user access and administrative actions (security
related) within all the components of Hadoop.
Hadoop Security Features 17
Apache Knox and Hadoop Services
Hadoop Services
Covered
• WebHDFS (HDFS)
• Templeton
(HCatalog)
• Stargate (HBase)
• Oozie
• Hive/JDBC
Hadoop Security Features 18
Apache Falcon
• Visualize Data Pipeline Lineage
• Track Data Pipeline audit logs
• End to End Monitoring of Data
Pipeline
• Policies for Data Replication and
Retention
Hadoop Security Features 19
Apache Sentry and Project Rhino
Hadoop Security Features 20
Goals of Project Rhino
• Provide encryption with hardware-enhanced performance
• Support enterprise-grade authentication and single sign-on for
Hadoop services
• Provide role-based access control in Hadoop with cell-level
granularity in HBase
• Ensure consistent auditing across essential Apache Hadoop
components
Hadoop Security Features 21
Apache Sentry and Project Rhino
Hadoop Security Features 22
Making Risk Officer Happy
• Hadoop security has
more to offer
• Role based access
• Audit logging
• Data encryption
• User Access Management
• Security Event Monitoring
• Application State Monitoring
• Security Testing
• Patch Management
• Data Protection
• Backup and restore
Overlapping efforts of vendors, Lack of complete coverage for all products,
Varying commitment to open source would slow down the adoption of Hadoop.
Hadoop Security Features 23
THANK YOU
Anurag Shrivastava
@shri2201

More Related Content

What's hot

Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Cloudera, Inc.
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 

What's hot (20)

Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 

Viewers also liked

Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.
Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.
Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.Owler
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopDataWorks Summit
 
Connecting Teradata and MongoDB with QueryGrid
Connecting Teradata and MongoDB with QueryGridConnecting Teradata and MongoDB with QueryGrid
Connecting Teradata and MongoDB with QueryGridMongoDB
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Disaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesDisaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesSpiceworks
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 

Viewers also liked (8)

Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.
Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.
Owler Special Report | May 26, 2015 | Granify, Remix, Karro, Rubrik and more.
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
 
Connecting Teradata and MongoDB with QueryGrid
Connecting Teradata and MongoDB with QueryGridConnecting Teradata and MongoDB with QueryGrid
Connecting Teradata and MongoDB with QueryGrid
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Disaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesDisaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup Strategies
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 

Similar to Hadoop Security Features That make your risk officer happy

Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Wadoop vivek shrivastava
Wadoop vivek shrivastavaWadoop vivek shrivastava
Wadoop vivek shrivastavaData Con LA
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Big data and Hadoop introduction
Big data and Hadoop introductionBig data and Hadoop introduction
Big data and Hadoop introductionDzung Nguyen
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 

Similar to Hadoop Security Features That make your risk officer happy (20)

Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Wadoop vivek shrivastava
Wadoop vivek shrivastavaWadoop vivek shrivastava
Wadoop vivek shrivastava
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Big data and Hadoop introduction
Big data and Hadoop introductionBig data and Hadoop introduction
Big data and Hadoop introduction
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Hadoop Security Features That make your risk officer happy

  • 1. HADOOP SECURITY FEATURES That make your risk officer happy By Anurag Shrivastava, ING Commercial Bank, Amsterdam @shri2201
  • 2. Security for Hadoop Source: http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/ Hadoop Security Features 2
  • 3. Hadoop in Enterprise Data Lake – an important information assets for enterprise Data from System of Records and Logs are stored in Hadoop Significant cost savings for Enterprise Diverse types of users Picture Source: http://arunkottolli.blogspot.nl/2014/03/understanding-data-in-big-data.html Hadoop Security Features 3
  • 4. Operational Security in Enterprise • User Access Management • Security Event Monitoring • Application State Monitoring • Security Testing • Patch Management • Data Protection • Backup and restore Hadoop Security Features 4
  • 5. User Access Management Requirements Privileged, group and generic accounts Separation of technical and business users Separation of environments (DTAP) Separation of admins and other users Separation of users in different business roles Application of four eyes principle when entering or changing the data Hadoop Security Features 5
  • 6. Security Event Monitoring • Definition of application specific events • All login attempts failed or successful • Unauthorized attempt to access a table or file • Operational performance of application • Name node performance • CPU, Disk • Integration with Master Control Room • Alerting the asset manager Hadoop Security Features 6
  • 7. Data Protection (1/2) • Confidentiality • Protect information from unauthorized disclosure • Integrity • Ensure the accuracy, completeness and timeliness of information and prevent data tempering • Availability • Ensure that information and service is available when required Picture Source: http://www.attix5.co.uk/thought- leadership/why-data-protection-software- essential-good-nights-sleep Hadoop Security Features 7
  • 8. Data Protection (2/2) • Confidentiality • Logon • Access Control • Malicious code protection • Security Event Monitoring • Encryption • Integrity • Message authentication code • Data Lineage Picture Source: http://www.attix5.co.uk/thought- leadership/why-data-protection-software- essential-good-nights-sleep Hadoop Security Features 8
  • 9. Security under spotlight in Data Lake • All kinds of enterprise data – structured, semi-structured and unstructured • Many groups of users – Data Scientists, Analysts, Engineers, Marketers, Managers • Long term retention of data • Different types of workloads • Value of data grows as the data from different sources are combined in Data Lake Picture source: http://beyondplm.com/2014/05/05/plm-downstream-usage-and-future-information-rivers/ Hadoop Security Features 9
  • 10. Data Lake Risks • Data Lake is an attractive target of inside and outside attackers • Security compromise in Data Lake can have major or catastrophic business impact IT Risk assessment gives Hadoop implementation the highest risk rating for Data Lake use case. Hadoop Security Features 10
  • 11. Lab Like Security is not Enough Play Area Big Data Predictive Analytics Lab Production System Hadoop Security Features 11
  • 12. Predictive Analytics Lab Stepping Stone (Citrix) 18 x Hadoop Nodes GIT, Libraries, Build Tools Monitoring Services Data Files in Batches Dedicated VLAN Shared ServicesShared Services SMTP Relay Internet via Corporate Infrastructure Firewall Rules Guard the Perimeter Security Of Hadoop Cluster 18 x Hadoop Nodes Lab like security works for a small group of people Hadoop Security Features 12
  • 13. Limitations of Hadoop • No “Data at Rest” Encryption • A Kerberos-Centric Approach • Limited Authorization Capabilities • Complexity of the Security Model and Configuration Unfortunately this is not sufficient for Data Lake that ingests all the data and caters to thousands of users. Hadoop Security Features 13
  • 14. Hadoop Security Hadoop Security Solutions from Major Vendors Hortonworks acquires XASecure to bring ACLs in Hadoop Apache Ranger Apache Knox Apache Falcon Cloudera is working on Project Rhino Project Rhino Apache Sentry Hadoop Security Features 14
  • 16. Apache Ranger Apache Ranger currently supports authorization, auditing and security administration of limited number of HDP components Hive HBase Storm Knox HDFS Hadoop Security Features 16
  • 17. Apache Ranger Goals 1. Centralized security administration to manage all security related tasks in a central UI or using REST APIs. 2. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool 3. Standardize authorization method across all Hadoop components. 4. Enhanced support for different authorization methods - Role based access control, attribute based access control etc. 5. Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop. Hadoop Security Features 17
  • 18. Apache Knox and Hadoop Services Hadoop Services Covered • WebHDFS (HDFS) • Templeton (HCatalog) • Stargate (HBase) • Oozie • Hive/JDBC Hadoop Security Features 18
  • 19. Apache Falcon • Visualize Data Pipeline Lineage • Track Data Pipeline audit logs • End to End Monitoring of Data Pipeline • Policies for Data Replication and Retention Hadoop Security Features 19
  • 20. Apache Sentry and Project Rhino Hadoop Security Features 20
  • 21. Goals of Project Rhino • Provide encryption with hardware-enhanced performance • Support enterprise-grade authentication and single sign-on for Hadoop services • Provide role-based access control in Hadoop with cell-level granularity in HBase • Ensure consistent auditing across essential Apache Hadoop components Hadoop Security Features 21
  • 22. Apache Sentry and Project Rhino Hadoop Security Features 22
  • 23. Making Risk Officer Happy • Hadoop security has more to offer • Role based access • Audit logging • Data encryption • User Access Management • Security Event Monitoring • Application State Monitoring • Security Testing • Patch Management • Data Protection • Backup and restore Overlapping efforts of vendors, Lack of complete coverage for all products, Varying commitment to open source would slow down the adoption of Hadoop. Hadoop Security Features 23

Editor's Notes

  1. Ask a question about the biggest data security breaches. Target 40 million debit/credit card number stolen Sony Online 102 million records Home Depot 56 million payment cards Hadoop Security was completely ineffective APT is real..
  2. We are bunch of people very excited about the technology when we hear about Hadoop. However when it comes to security the it seems that nobody is bothered about it except risk officer. This creates some tension between IT, business and risk. Technology has not kept up with marketing.
  3. All sweet marketing and enterprise sales guys sell Hadoop as the right system for enterprise. Hadoop becomes the important information system assets in the enterprise Enterprises find Hadoop attractive because of lower cost Hadoop analytics is not limited to web logs alone but also data stores in system of records Hadoop caters to diverse group of business and technical users I see a paradox here. A system for enterprise where CIO do not bother about the security.
  4. State monitoring is about monitoring the application settings. Security testing involves static and dynamic code scans Patch management requires patch history is maintained, systems are tested after patching, deciding which patch is appropriate for the system Backup frequency, logging of restore activity, incomplete backups are detected and safe storage of backup as per CIA rating
  5. Typical requirements of user access management are explained. Role based access.
  6. You can use several techniques to convince your risk officer about data protection. However as you bring all the data in data lake, you have to take all the measures.
  7. A very important Hadoop use case (Data Lake) puts the Hadoop security story under hard test.. Multitenancy A beautiful house without door locks..
  8. Multi tenancy, workload segregation User separation Sanitized hadoop cluster does not work
  9. Peripheral security with stepping stone has its limitations. We had to implement two factor authentication. Put Hadoop team in sanitized area. Hadoop provides all or nothing model for security. Relied heavily upon file system security
  10. 1. No “Data at Rest” Encryption. Currently, data is not encrypted at rest on HDFS. For organizations with strict security requirements related to the encryption of their data in Hadoop clusters, they are forced to use third-party tools for implementing HDFS disk-level encryption, or security-enhanced Hadoop distributions (like Intel’s distribution from earlier this year). 2. A Kerberos-Centric Approach – Hadoop security relies on Kerberos for authentication. For organizations utilizing other approaches not involving Kerberos, this means setting up a separate authentication system in the enterprise. 3. Limited Authorization Capabilities – Although Hadoop can be configured to perform authorization based on user and group permissions and Access Control Lists (ACLs), this may not be enough for every organization. Many organizations use flexible and dynamic access control policies based on XACML and Attribute-Based Access Control. Although it is certainly possible to perform these level of authorization filters using Accumulo, Hadoop’s authorization credentials are limited 4. Complexity of the Security Model and Configuration. There are a number of data flows involved in Hadoop authentication – Kerberos RPC authentication for applications and Hadoop Services, HTTP SPNEGO authentication for web consoles, and the use of delegation tokens, block tokens, and job tokens. For network encryption, there are also three encryption mechanisms that must be configured – Quality of Protection for SASL mechanisms, and SSL for web consoles, HDFS Data Transfer Encryption. All of these settings need to be separately configured – and it is easy to make mistakes. As the Wall Street Journal reported, Bank of New York Mellon Corp.’s Hadoop system bogged down after too many employees accessed it. Ms. Crisp is hedging her bets by maintaining the bank’s commercial database and data warehouse software.
  11. How Hadoop leaders have responded to these challenges. In addition to several proprietary initiatives which are not covered here.
  12. HDP 2.2 brings a major change in Hadoop security. Acquisition of XA secure has been significant in terms of user access management. Role based access for several components Logging Single console Not a single point of failure
  13. Apache ranger is very promising from the user access management perspective and security event monitoring perspective. But not all the hadoop components are covered Most security is geared toward the consumers of data.
  14. No 4 & 5 is a very promising feature..
  15. The following Hadoop services have integrations with the Knox Gateway: WebHDFS (HDFS) Templeton (HCatalog) Stargate (HBase) Oozie Hive/JDBC
  16. Sentry: Unified authorization and RBAC. Overlap with Ranger Secure authorization Limited coverage: Hive and Impala Pluggable interfaces, binding with PIG Cloudera CDH 4.3
  17. Open source commitment of Cloudera is a big question mark? DG Secure alternative for HDP. Key distribution and management is included. Snapshots, log etc. can be encrypted. Crypto codecs. Integration with PKI infra in a large enterprise is a challenge..
  18. As compared to previous year, Hadoop security has lot more to offer but it is still far from being a complete system suited for Data Lake use cases. You have to mix and match the components which is hard. Ranger is strong in user access management and security monitoring. Rhino is strong is data protection. Hadoop is ready for the enterprise but still we are working on readiness..
  19. You can’t make risk officer very happy.. All kind of reason for not building the security: Performance, Architecture, You did not need it before. Time to improve it..