SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
© Cloudera, Inc. All rights reserved.
Cloudera training: secure your Cloudera cluster
© Cloudera, Inc. All rights reserved.
The demand for skills is high and Hadoop is the future. Customers
cannot afford to move slowly in staffing their Big Data projects.
Customers are building plans to ensure projects are staffed with
skilled employees, and supported by a qualified services provider.
Job Trends from Indeed.com
What are you most concerned about
when it comes to your readiness for big
data and hadoop?
Cloudera MDP webinar poll results, July 2016
© Cloudera, Inc. All rights reserved.
Why Cloudera training?
Aligned to best practices and the pace of change
1 Broadest range of courses
Learning paths for Developer, Admin, Analyst
2 Most experienced instructors
More than 40,000 trained since 2009
6 Widest geographic coverage
Most classes offered: 50 cities worldwide plus online
7 Most relevant platform & community
CDH deployed more than all other distributions combined
3 Leader in certification
Over 12,000 accredited Cloudera professionals
Trusted source for training
100,000+ people have attended online courses4
8 Depth of training material
Hands-on labs and VMs support live instruction
9 Ongoing learning
Video tutorials and e-learning complement training
State of the art curriculum
Courses updated as Hadoop evolves5 10Commitment to big data education
University partnerships to teach Hadoop in colleges
© Cloudera, Inc. All rights reserved.
Creating leaders in the field
Training enables Big Data solutions and innovation
94%
66%
Would recommend or highly recommend Cloudera
training to friends or colleagues
Draw on lessons from Cloudera training on at least a
monthly basis
40% Develop new apps or perform business-critical
analyses as a result of training alone
Sources: Cloudera Past Public Training Participant Study, December 2012.
Cloudera Customer Satisfaction Study, January 2013.
88% Indicate Cloudera training provided the Hadoop
expertise their roles require
© Cloudera, Inc. All rights reserved.
What is available from Cloudera University?
• Private training: Course delivered at location of customer choice to internal audience
• Public training: Courses regularly scheduled around the globe. Schedule available on web
• Virtual training: Live training accessed via the internet; available for public and private courses
• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options
• Certification: Rigorously developed and meaningful bodies of knowledge
OnDemand Virtual live classroom Private onsitePublic live classroom
© Cloudera, Inc. All rights reserved.
Suggested Cloudera University curricula
Developers
• Python/Scala Training
• Developer for Spark and Hadoop
• CCA: Spark and Hadoop
Developer
• Spark ML & Kafka modules
• Topic specific training (Search,
HBase)
• Hands on practice
• CCP: Data Engineer
Administrators
• Cloudera Administration training
• CCA: Administrator
• Cloudera Security OnDemand
Data Analysts/Data Scientists
• Data Analyst: Using Hive, Pig & Impala
• CCA: Data Analyst
• Cloudera Data Science
7© Cloudera, Inc. All rights reserved.
Security for Hadoop
Carlo Lazzaris | Technical Instructor
8© Cloudera, Inc. All rights reserved.
Security Webinar Agenda
1. The need for Hadoop Security
Hacker news and legal regulations
2. Cloudera Security Implementation
Five levels of security
3. How to secure your Cloudera cluster
Cloudera Documentation
Cloudera professional services
Cloudera OnDemand security course
9© Cloudera, Inc. All rights reserved.
The need for Hadoop security
10© Cloudera, Inc. All rights reserved.
Unguarded data stores are the victims
11© Cloudera, Inc. All rights reserved.
Regulatory Compliance
Organizations can be fined up to 4% of
annual global turnover for breaching GDPR
or €20 Million
12© Cloudera, Inc. All rights reserved.
Cloudera security implementation
13© Cloudera, Inc. All rights reserved.
Cloudera Enterprise CDH
13
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE
SERVICES
CORE SERVICES
DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA CATALOG
INGEST &
REPLICATION
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
DATA
SCIENCE
S3 ADLS HDFS KUDU
STORAGE
SERVICES
14© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent
controls, even for transient and recurring workloads
• Consistent governance – enables secure self-service access
to all relevant data and increases compliance
• Easy workload management – increases user productivity and
boosts job predictability
• Flexible ingest and replication – aggregates a single copy of
all data, provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and
business context of data for new applications and partner
solutions
Open platform services
Built for multi-function analytics | Optimized for cloud
15© Cloudera, Inc. All rights reserved.
Cloudera Enterprise-Grade Security and Governance
Access
Defining what
users and
applications can
do with data
Technical Concepts:
Permissions
Authorization
Data
Protection
Shielding data in
the cluster from
unauthorized
visibility
Technical Concepts:
Encryption at rest & in
motion
Visibility
Reporting on
where data came
from and how it’s
being used
Technical Concepts:
Auditing
Lineage
Cloudera Manager Apache Sentry Cloudera Navigator
Navigator Encrypt &
Key Trustee
Identity
Validate users by
membership in
enterprise
directory
Technical
Concepts:
Authentication
User/group mapping
16© Cloudera, Inc. All rights reserved.
Cloudera Certified Technology Partners
Data Sources Data Ingest
Process, Refine
& Prep
Data Discovery Advanced Analytics
Connected
Machines/Data sources
Other Data Sources
17© Cloudera, Inc. All rights reserved.
A certified product ensures it integrates securely
• Authenticate via Kerberos or LDAP
Authentication
• Handle Apache Sentry with Hive, Impala, Search, HDFS
Authorization
• Support HDFS transport encryption, at-rest encryption; support SSL/TLS
connection encryption
Encryption
18© Cloudera, Inc. All rights reserved.
Vulnerability Response and Process
Vulnerability
reports
Upstream
Internal
External
Fix Publish
19© Cloudera, Inc. All rights reserved.
Cluster Security Levels
20© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
20
The modern platform for machine learning and analytics optimized for the cloud
21© Cloudera, Inc. All rights reserved.
Enterprise Encryption Performance
23© Cloudera, Inc. All rights reserved.
Disclaimer
This talk serves as a general guideline for
security implementation on Hadoop.
The actual implementation procedures and
scope of implementation vary on a case-by-
case basis, and should be assessed by
Cloudera’s Professional Services team or
certified Cloudera SI Partners.
24© Cloudera, Inc. All rights reserved.
Non-secure #0
Data Free for All
25© Cloudera, Inc. All rights reserved.
Firewall
ActiveDirectory/KDC
Hadoop cluster
Cloudera
Manager
Gateway
node
Cloudera Worker
nodesDatacenter
Applications
26© Cloudera, Inc. All rights reserved.
4 modes of Identity Management
1. Simple Authentication
2. Kerberos
3. LDAP
4. SAML
File group ownership
• AD integration
• SSSD or Centrify
Consideration in large enterprises.
via SSSD
via
27© Cloudera, Inc. All rights reserved.
Simple Authentication detect the user
Firewall
ActiveDirectory
Master
Worker Worker Worker
Cloudera
Manager
Master
(SSSD/Centrify)
28© Cloudera, Inc. All rights reserved.
Simple authentication =
no authentication
29© Cloudera, Inc. All rights reserved.
Minimal Security #1
Reduce Risk Exposure
30© Cloudera, Inc. All rights reserved.
How it works: Authentication
• LDAP and SAML authentication
options
Web UIs
• LDAP/AD and Kerberos
authentication options
SQL Access
•Kerberos authentication
•Automation provided by Cloudera
Manager to leverage Active
Directory (AD)
Command Lines
User authenticates to
AD or KDC
Authenticated user
gets Kerberos Ticket
Ticket grants access to
Services e.g. Impala
User [ssmith]
Password [***** ]
31© Cloudera, Inc. All rights reserved.
Kerberos
EXAMPLE.COM
KDC
user@EXAMPLE.COM
Hadoop
user@EXAMPLE.COM 
user
Strong Authentication
KDC Key Distribution Center
• MIT
• ActiveDirectory (more common)
realmprimary
32© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos
principals
33© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos
principals
34© Cloudera, Inc. All rights reserved.
Kerberos Authentication
* LDAP over SSL
35© Cloudera, Inc. All rights reserved.
Authorization/Access Control
HDFS File ACL YARN job submission
Hbase ACLsOozie ACL
Access Control List (ACLs)
Hive
Sentry Managed
(RBAC)
Impala
36© Cloudera, Inc. All rights reserved.
Auditing
37© Cloudera, Inc. All rights reserved.
Backup/Disaster Recovery
Cloudera Backup/Disaster Recovery (BDR)
• A high performance data replicator
• Copies incremental data on the source cluster at specified schedules
Supports
 Kerberos
 Data encryption
 HDFS replication to cloud
38© Cloudera, Inc. All rights reserved.
Kerberized BDR Best Practice
Production DR
Cloudera BDR
PROD.EXAMPLE.COM
Cross-realm trust
KDC KDC
DR.EXAMPLE.COM
39© Cloudera, Inc. All rights reserved.
More Security #2
Managed, Secure, Protected
40© Cloudera, Inc. All rights reserved.
Data In-Motion Encryption
RPC encryption
Data transport encryption
• Supports AES CTR, up to 256-bit
key length
HTTP TLS/SSL encryption
• No self-signed certificates in
production
Master
Worker Worker Worker
Master
Application
RPC encryption
Transport
encryption
TLS/SSL
41© Cloudera, Inc. All rights reserved.
Data At-Rest Encryption
Transparent encryption
Supports any Hadoop applications
Encryption Zone
$ hadoop key create mykey
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName mykey -path /zone
/
/tmp /zone
foo bar
Encryption zone
42© Cloudera, Inc. All rights reserved.
Key Management Server Deployment (non-prod)
HDFS
NameNode
Client
Java
Keystore
KMS
Keystore file
Separation of duties
• Encryption Zone Key (EZK) is stored in
KMS server
• HDFS super user can not decrypt files
43© Cloudera, Inc. All rights reserved.
Key Management Server/Key Trustee Server Deployment
HDFS
NameNode
Client
Key Trustee
KMS
Key Trustee
KMS
Firewall
Key Trustee
Server
(Active)
Key Trustee
Server
(Passive)
synchronization
(or more)
44© Cloudera, Inc. All rights reserved.
KMS+KTS+HSM Deployment
HDFS
NameNode
Client HSM KMS
HSM KMS
Firewall
Key Trustee
Server
(Active)
Key Trustee
Server
(Passive)
synchronization
Key HSM
(or more)
Key HSM
HSM
HSM
45© Cloudera, Inc. All rights reserved.
Troubleshooting: Encryption Performance Anomaly
• Configuration
• AES-NI Hardware acceleration
• OpenSSL library
• Entropy
46© Cloudera, Inc. All rights reserved.
Fine Grained Access Control with Apache Sentry
47© Cloudera, Inc. All rights reserved.
Most Security #3
Secure Data Vault
48© Cloudera, Inc. All rights reserved.
Level 3 Secure Data Vault
• All data, both data-at-rest and data-in-transit is encrypted
• Key management system is fault-tolerant
• Auditing mechanisms comply with industry, government, and regulatory
standards (PCI, HIPAA, NIST, for example)
• Auditing extends from EDH to the other systems that integrate with it.
• Cluster administrators are well-trained
• Security procedures have been certified by an expert
• Cluster can pass technical review
49© Cloudera, Inc. All rights reserved.
Data Redaction
Personal Identifiable Information
• PCI-DSS, HIPAA
Best practices followed
Password
• stores in credential files, not in configuration
Log, queries
• Cloudera Manager
50© Cloudera, Inc. All rights reserved.
Full Encryption
Encrypt Data Spills
• MapReduce
• Impala
• Hive
• Flume
OS-level encryption
• Navigator Encrypt
51© Cloudera, Inc. All rights reserved.
How to secure your Cloudera cluster
52© Cloudera, Inc. All rights reserved.
Cloudera Documentation
53© Cloudera, Inc. All rights reserved.
Cloudera Professional Services security engagement
• Review security requirements and provide an overview of data security policies
• Audit architecture and current systems for security policies and best practices
• Custom tailor a security reference architecture
• Optimize OS and Java to take advantage of hardware-based crypto-acceleration
• Install and configure Kerberos with MIT Kerberos KDC or Active Directory
• Install and configure Sentry and Cloudera Navigator (license required)
• Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust
• Review fine-grain permissions on sample data using Sentry
• Review audit and lineage on sample data using Navigator
• Use Cloudera Manager and Hue to review security integration for users
• Enable and configure HDFS encryption
https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html
54© Cloudera, Inc. All rights reserved.
Cloudera online ondemand security course
• Online self paced training course https://ondemand.cloudera.com
• Launch planned for mid Feb 2018
• 3 days estimate worth of content at Cloudera level 1 and 2 security level
• Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations :
1. Security overview
2. Security Architecture
3. Host Security
4. Encrypting Data in motion
5. Authentication
6. Authorization
7. Encrypting Data at Rest
8. Auditing
9. Additional Considerations: Data Governance
55© Cloudera, Inc. All rights reserved.
Ondemand security course instructor guided demos
1. Potential Attack vectors
2. Securing the cluster hosts
3. Generating and managing keys for TLS
4. Configuring Cloudera Manager for TLS
5. Encrypting Data in Motion
6. Hadoop default authentication
7. Kerberizing Cluster with MIT Kerberos
8. Kerberizing Cluster with Active Directory
9. Configuring Authorising with Cloudera
Manager
10. Controlling access to Yarn
11. Controlling access to HDFS
12. Controlling access to Tables
13. Enabling HDFS Encryption
14. Protecting local data with NavEncrypt
15. Using Navigator for auditing
16. Reassessing cluster security
56© Cloudera, Inc. All rights reserved.
Ondemand security course disclaimer
THIS IS REALLY IMPORTANT:
The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a
cluster using the CentOS 7.2 operating system.
Given the almost limitless permutations of possible configurations, including different versions of CDH,
Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other
tools, as well as variations in policies, laws, and practices that affect each organization differently, it's
impossible for a training course to cover all aspects of security.
This course is meant to provide a background that will help you to understand many important concepts
and techniques, but is not intended as a replacement for the relevant documentation or a consulting
engagement with an expert who can provide advice based on your specific requirements.
• Disclaimers ~ due to security variety and permutations
• Versions used: CDH 5.12 and Centos 7.2
57© Cloudera, Inc. All rights reserved.
Ondemand security course scenario
• Many of our demonstrations are based on a hypothetical scenario
• However, the concepts should apply to nearly any organization
• Loudacre Mobile is a fast-growing wireless carrier
• Employees serving in a variety of roles
• Data ingested from many sources, in many formats
• Data processed by many tools
58© Cloudera, Inc. All rights reserved.
Ondemand security course environment
59© Cloudera, Inc. All rights reserved.
Comprehensive demonstration cluster
60© Cloudera, Inc. All rights reserved.
Sample chapter structure: Encrypting Data in Motion
• Encryption Fundamentals
• Certificates
• Key Management
 Instructor-Led Demonstration: Generating and Managing Keys for TLS
• Configuring Cloudera Manager for TLS
 Instructor-Led Demonstration: Configuring Cloudera Manager for TLS
• Encrypting Hadoop’s Data in Motion
 Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion
• Essential Points
61© Cloudera, Inc. All rights reserved.
Register your interest for
OnDemand security course:
peter.rizvi@cloudera.com
© Cloudera, Inc. All rights reserved.
Thank you

Más contenido relacionado

La actualidad más candente

Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsSandesh Rao
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinTiago Simões
 

La actualidad más candente (20)

Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
How to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata EnvironmentsHow to Use EXAchk Effectively to Manage Exadata Environments
How to Use EXAchk Effectively to Manage Exadata Environments
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelin
 

Similar a Cloudera training: secure your Cloudera cluster

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataCloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certificationCloudera, Inc.
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019 Timothy Spann
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 

Similar a Cloudera training: secure your Cloudera cluster (20)

Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Road to Cloudera certification
Road to Cloudera certificationRoad to Cloudera certification
Road to Cloudera certification
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 

Último

Cracking the ‘Business Process Outsourcing’ Code Main.pptx
Cracking the ‘Business Process Outsourcing’ Code Main.pptxCracking the ‘Business Process Outsourcing’ Code Main.pptx
Cracking the ‘Business Process Outsourcing’ Code Main.pptxWorkforce Group
 
Borderless Access - Global B2B Panel book-unlock 2024
Borderless Access - Global B2B Panel book-unlock 2024Borderless Access - Global B2B Panel book-unlock 2024
Borderless Access - Global B2B Panel book-unlock 2024Borderless Access
 
Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024Winbusinessin
 
Introduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptxIntroduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptxJemalSeid25
 
Talent Management research intelligence_13 paradigm shifts_20 March 2024.pdf
Talent Management research intelligence_13 paradigm shifts_20 March 2024.pdfTalent Management research intelligence_13 paradigm shifts_20 March 2024.pdf
Talent Management research intelligence_13 paradigm shifts_20 March 2024.pdfCharles Cotter, PhD
 
Building Your Personal Brand on LinkedIn - Expert Planet- 2024
 Building Your Personal Brand on LinkedIn - Expert Planet-  2024 Building Your Personal Brand on LinkedIn - Expert Planet-  2024
Building Your Personal Brand on LinkedIn - Expert Planet- 2024Stephan Koning
 
7movierulz.uk
7movierulz.uk7movierulz.uk
7movierulz.ukaroemirsr
 
A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.mcshagufta46
 
HELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptx
HELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptxHELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptx
HELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptxHelene Heckrotte
 
Data skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story pointsData skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story pointsyasinnathani
 
Upgrade Your Banking Experience with Advanced Core Banking Applications
Upgrade Your Banking Experience with Advanced Core Banking ApplicationsUpgrade Your Banking Experience with Advanced Core Banking Applications
Upgrade Your Banking Experience with Advanced Core Banking ApplicationsIntellect Design Arena Ltd
 
Lecture_6.pptx English speaking easyb to
Lecture_6.pptx English speaking easyb toLecture_6.pptx English speaking easyb to
Lecture_6.pptx English speaking easyb toumarfarooquejamali32
 
Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access
 
Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)Lviv Startup Club
 
TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...
TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...
TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...TalentView
 
Mihir Menda - Member of Supervisory Board at RMZ
Mihir Menda - Member of Supervisory Board at RMZMihir Menda - Member of Supervisory Board at RMZ
Mihir Menda - Member of Supervisory Board at RMZKanakChauhan5
 
Intellectual Property Licensing Examples
Intellectual Property Licensing ExamplesIntellectual Property Licensing Examples
Intellectual Property Licensing Examplesamberjiles31
 
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...IMARC Group
 
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfPDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfHajeJanKamps
 

Último (20)

Cracking the ‘Business Process Outsourcing’ Code Main.pptx
Cracking the ‘Business Process Outsourcing’ Code Main.pptxCracking the ‘Business Process Outsourcing’ Code Main.pptx
Cracking the ‘Business Process Outsourcing’ Code Main.pptx
 
Borderless Access - Global B2B Panel book-unlock 2024
Borderless Access - Global B2B Panel book-unlock 2024Borderless Access - Global B2B Panel book-unlock 2024
Borderless Access - Global B2B Panel book-unlock 2024
 
Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024
 
Introduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptxIntroduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptx
 
Talent Management research intelligence_13 paradigm shifts_20 March 2024.pdf
Talent Management research intelligence_13 paradigm shifts_20 March 2024.pdfTalent Management research intelligence_13 paradigm shifts_20 March 2024.pdf
Talent Management research intelligence_13 paradigm shifts_20 March 2024.pdf
 
Building Your Personal Brand on LinkedIn - Expert Planet- 2024
 Building Your Personal Brand on LinkedIn - Expert Planet-  2024 Building Your Personal Brand on LinkedIn - Expert Planet-  2024
Building Your Personal Brand on LinkedIn - Expert Planet- 2024
 
7movierulz.uk
7movierulz.uk7movierulz.uk
7movierulz.uk
 
WAM Corporate Presentation Mar 25 2024.pdf
WAM Corporate Presentation Mar 25 2024.pdfWAM Corporate Presentation Mar 25 2024.pdf
WAM Corporate Presentation Mar 25 2024.pdf
 
A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.
 
HELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptx
HELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptxHELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptx
HELENE HECKROTTE'S PROFESSIONAL PORTFOLIO.pptx
 
Data skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story pointsData skills for Agile Teams- Killing story points
Data skills for Agile Teams- Killing story points
 
Upgrade Your Banking Experience with Advanced Core Banking Applications
Upgrade Your Banking Experience with Advanced Core Banking ApplicationsUpgrade Your Banking Experience with Advanced Core Banking Applications
Upgrade Your Banking Experience with Advanced Core Banking Applications
 
Lecture_6.pptx English speaking easyb to
Lecture_6.pptx English speaking easyb toLecture_6.pptx English speaking easyb to
Lecture_6.pptx English speaking easyb to
 
Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024
 
Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)
 
TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...
TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...
TalentView Webinar: Empowering the Modern Workforce_ Redefininig Success from...
 
Mihir Menda - Member of Supervisory Board at RMZ
Mihir Menda - Member of Supervisory Board at RMZMihir Menda - Member of Supervisory Board at RMZ
Mihir Menda - Member of Supervisory Board at RMZ
 
Intellectual Property Licensing Examples
Intellectual Property Licensing ExamplesIntellectual Property Licensing Examples
Intellectual Property Licensing Examples
 
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
 
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfPDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
 

Cloudera training: secure your Cloudera cluster

  • 1. © Cloudera, Inc. All rights reserved. Cloudera training: secure your Cloudera cluster
  • 2. © Cloudera, Inc. All rights reserved. The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider. Job Trends from Indeed.com What are you most concerned about when it comes to your readiness for big data and hadoop? Cloudera MDP webinar poll results, July 2016
  • 3. © Cloudera, Inc. All rights reserved. Why Cloudera training? Aligned to best practices and the pace of change 1 Broadest range of courses Learning paths for Developer, Admin, Analyst 2 Most experienced instructors More than 40,000 trained since 2009 6 Widest geographic coverage Most classes offered: 50 cities worldwide plus online 7 Most relevant platform & community CDH deployed more than all other distributions combined 3 Leader in certification Over 12,000 accredited Cloudera professionals Trusted source for training 100,000+ people have attended online courses4 8 Depth of training material Hands-on labs and VMs support live instruction 9 Ongoing learning Video tutorials and e-learning complement training State of the art curriculum Courses updated as Hadoop evolves5 10Commitment to big data education University partnerships to teach Hadoop in colleges
  • 4. © Cloudera, Inc. All rights reserved. Creating leaders in the field Training enables Big Data solutions and innovation 94% 66% Would recommend or highly recommend Cloudera training to friends or colleagues Draw on lessons from Cloudera training on at least a monthly basis 40% Develop new apps or perform business-critical analyses as a result of training alone Sources: Cloudera Past Public Training Participant Study, December 2012. Cloudera Customer Satisfaction Study, January 2013. 88% Indicate Cloudera training provided the Hadoop expertise their roles require
  • 5. © Cloudera, Inc. All rights reserved. What is available from Cloudera University? • Private training: Course delivered at location of customer choice to internal audience • Public training: Courses regularly scheduled around the globe. Schedule available on web • Virtual training: Live training accessed via the internet; available for public and private courses • OnDemand training: Pre-recorded lecture with identical content/exercises as live training options • Certification: Rigorously developed and meaningful bodies of knowledge OnDemand Virtual live classroom Private onsitePublic live classroom
  • 6. © Cloudera, Inc. All rights reserved. Suggested Cloudera University curricula Developers • Python/Scala Training • Developer for Spark and Hadoop • CCA: Spark and Hadoop Developer • Spark ML & Kafka modules • Topic specific training (Search, HBase) • Hands on practice • CCP: Data Engineer Administrators • Cloudera Administration training • CCA: Administrator • Cloudera Security OnDemand Data Analysts/Data Scientists • Data Analyst: Using Hive, Pig & Impala • CCA: Data Analyst • Cloudera Data Science
  • 7. 7© Cloudera, Inc. All rights reserved. Security for Hadoop Carlo Lazzaris | Technical Instructor
  • 8. 8© Cloudera, Inc. All rights reserved. Security Webinar Agenda 1. The need for Hadoop Security Hacker news and legal regulations 2. Cloudera Security Implementation Five levels of security 3. How to secure your Cloudera cluster Cloudera Documentation Cloudera professional services Cloudera OnDemand security course
  • 9. 9© Cloudera, Inc. All rights reserved. The need for Hadoop security
  • 10. 10© Cloudera, Inc. All rights reserved. Unguarded data stores are the victims
  • 11. 11© Cloudera, Inc. All rights reserved. Regulatory Compliance Organizations can be fined up to 4% of annual global turnover for breaching GDPR or €20 Million
  • 12. 12© Cloudera, Inc. All rights reserved. Cloudera security implementation
  • 13. 13© Cloudera, Inc. All rights reserved. Cloudera Enterprise CDH 13 The modern platform for machine learning and analytics optimized for the cloud EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA CATALOG INGEST & REPLICATION SECURITY GOVERNANCE WORKLOAD MANAGEMENT DATA SCIENCE S3 ADLS HDFS KUDU STORAGE SERVICES
  • 14. 14© Cloudera, Inc. All rights reserved. • Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads • Consistent governance – enables secure self-service access to all relevant data and increases compliance • Easy workload management – increases user productivity and boosts job predictability • Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration • Shared catalog – defines and preserves structure and business context of data for new applications and partner solutions Open platform services Built for multi-function analytics | Optimized for cloud
  • 15. 15© Cloudera, Inc. All rights reserved. Cloudera Enterprise-Grade Security and Governance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protection Shielding data in the cluster from unauthorized visibility Technical Concepts: Encryption at rest & in motion Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry Cloudera Navigator Navigator Encrypt & Key Trustee Identity Validate users by membership in enterprise directory Technical Concepts: Authentication User/group mapping
  • 16. 16© Cloudera, Inc. All rights reserved. Cloudera Certified Technology Partners Data Sources Data Ingest Process, Refine & Prep Data Discovery Advanced Analytics Connected Machines/Data sources Other Data Sources
  • 17. 17© Cloudera, Inc. All rights reserved. A certified product ensures it integrates securely • Authenticate via Kerberos or LDAP Authentication • Handle Apache Sentry with Hive, Impala, Search, HDFS Authorization • Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption Encryption
  • 18. 18© Cloudera, Inc. All rights reserved. Vulnerability Response and Process Vulnerability reports Upstream Internal External Fix Publish
  • 19. 19© Cloudera, Inc. All rights reserved. Cluster Security Levels
  • 20. 20© Cloudera, Inc. All rights reserved. Cloudera Enterprise 20 The modern platform for machine learning and analytics optimized for the cloud
  • 21. 21© Cloudera, Inc. All rights reserved. Enterprise Encryption Performance
  • 22. 23© Cloudera, Inc. All rights reserved. Disclaimer This talk serves as a general guideline for security implementation on Hadoop. The actual implementation procedures and scope of implementation vary on a case-by- case basis, and should be assessed by Cloudera’s Professional Services team or certified Cloudera SI Partners.
  • 23. 24© Cloudera, Inc. All rights reserved. Non-secure #0 Data Free for All
  • 24. 25© Cloudera, Inc. All rights reserved. Firewall ActiveDirectory/KDC Hadoop cluster Cloudera Manager Gateway node Cloudera Worker nodesDatacenter Applications
  • 25. 26© Cloudera, Inc. All rights reserved. 4 modes of Identity Management 1. Simple Authentication 2. Kerberos 3. LDAP 4. SAML File group ownership • AD integration • SSSD or Centrify Consideration in large enterprises. via SSSD via
  • 26. 27© Cloudera, Inc. All rights reserved. Simple Authentication detect the user Firewall ActiveDirectory Master Worker Worker Worker Cloudera Manager Master (SSSD/Centrify)
  • 27. 28© Cloudera, Inc. All rights reserved. Simple authentication = no authentication
  • 28. 29© Cloudera, Inc. All rights reserved. Minimal Security #1 Reduce Risk Exposure
  • 29. 30© Cloudera, Inc. All rights reserved. How it works: Authentication • LDAP and SAML authentication options Web UIs • LDAP/AD and Kerberos authentication options SQL Access •Kerberos authentication •Automation provided by Cloudera Manager to leverage Active Directory (AD) Command Lines User authenticates to AD or KDC Authenticated user gets Kerberos Ticket Ticket grants access to Services e.g. Impala User [ssmith] Password [***** ]
  • 30. 31© Cloudera, Inc. All rights reserved. Kerberos EXAMPLE.COM KDC user@EXAMPLE.COM Hadoop user@EXAMPLE.COM  user Strong Authentication KDC Key Distribution Center • MIT • ActiveDirectory (more common) realmprimary
  • 31. 32© Cloudera, Inc. All rights reserved. Kerberos Consideration in large corporates Time synchronization CM Kerberos Wizard • Configure AD to create a Kerberos principal for CM server, and to delegate CM the ability to create/manage Kerberos principals
  • 32. 33© Cloudera, Inc. All rights reserved. Kerberos Consideration in large corporates Time synchronization CM Kerberos Wizard • Configure AD to create a Kerberos principal for CM server, and to delegate CM the ability to create/manage Kerberos principals
  • 33. 34© Cloudera, Inc. All rights reserved. Kerberos Authentication * LDAP over SSL
  • 34. 35© Cloudera, Inc. All rights reserved. Authorization/Access Control HDFS File ACL YARN job submission Hbase ACLsOozie ACL Access Control List (ACLs) Hive Sentry Managed (RBAC) Impala
  • 35. 36© Cloudera, Inc. All rights reserved. Auditing
  • 36. 37© Cloudera, Inc. All rights reserved. Backup/Disaster Recovery Cloudera Backup/Disaster Recovery (BDR) • A high performance data replicator • Copies incremental data on the source cluster at specified schedules Supports  Kerberos  Data encryption  HDFS replication to cloud
  • 37. 38© Cloudera, Inc. All rights reserved. Kerberized BDR Best Practice Production DR Cloudera BDR PROD.EXAMPLE.COM Cross-realm trust KDC KDC DR.EXAMPLE.COM
  • 38. 39© Cloudera, Inc. All rights reserved. More Security #2 Managed, Secure, Protected
  • 39. 40© Cloudera, Inc. All rights reserved. Data In-Motion Encryption RPC encryption Data transport encryption • Supports AES CTR, up to 256-bit key length HTTP TLS/SSL encryption • No self-signed certificates in production Master Worker Worker Worker Master Application RPC encryption Transport encryption TLS/SSL
  • 40. 41© Cloudera, Inc. All rights reserved. Data At-Rest Encryption Transparent encryption Supports any Hadoop applications Encryption Zone $ hadoop key create mykey $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName mykey -path /zone / /tmp /zone foo bar Encryption zone
  • 41. 42© Cloudera, Inc. All rights reserved. Key Management Server Deployment (non-prod) HDFS NameNode Client Java Keystore KMS Keystore file Separation of duties • Encryption Zone Key (EZK) is stored in KMS server • HDFS super user can not decrypt files
  • 42. 43© Cloudera, Inc. All rights reserved. Key Management Server/Key Trustee Server Deployment HDFS NameNode Client Key Trustee KMS Key Trustee KMS Firewall Key Trustee Server (Active) Key Trustee Server (Passive) synchronization (or more)
  • 43. 44© Cloudera, Inc. All rights reserved. KMS+KTS+HSM Deployment HDFS NameNode Client HSM KMS HSM KMS Firewall Key Trustee Server (Active) Key Trustee Server (Passive) synchronization Key HSM (or more) Key HSM HSM HSM
  • 44. 45© Cloudera, Inc. All rights reserved. Troubleshooting: Encryption Performance Anomaly • Configuration • AES-NI Hardware acceleration • OpenSSL library • Entropy
  • 45. 46© Cloudera, Inc. All rights reserved. Fine Grained Access Control with Apache Sentry
  • 46. 47© Cloudera, Inc. All rights reserved. Most Security #3 Secure Data Vault
  • 47. 48© Cloudera, Inc. All rights reserved. Level 3 Secure Data Vault • All data, both data-at-rest and data-in-transit is encrypted • Key management system is fault-tolerant • Auditing mechanisms comply with industry, government, and regulatory standards (PCI, HIPAA, NIST, for example) • Auditing extends from EDH to the other systems that integrate with it. • Cluster administrators are well-trained • Security procedures have been certified by an expert • Cluster can pass technical review
  • 48. 49© Cloudera, Inc. All rights reserved. Data Redaction Personal Identifiable Information • PCI-DSS, HIPAA Best practices followed Password • stores in credential files, not in configuration Log, queries • Cloudera Manager
  • 49. 50© Cloudera, Inc. All rights reserved. Full Encryption Encrypt Data Spills • MapReduce • Impala • Hive • Flume OS-level encryption • Navigator Encrypt
  • 50. 51© Cloudera, Inc. All rights reserved. How to secure your Cloudera cluster
  • 51. 52© Cloudera, Inc. All rights reserved. Cloudera Documentation
  • 52. 53© Cloudera, Inc. All rights reserved. Cloudera Professional Services security engagement • Review security requirements and provide an overview of data security policies • Audit architecture and current systems for security policies and best practices • Custom tailor a security reference architecture • Optimize OS and Java to take advantage of hardware-based crypto-acceleration • Install and configure Kerberos with MIT Kerberos KDC or Active Directory • Install and configure Sentry and Cloudera Navigator (license required) • Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust • Review fine-grain permissions on sample data using Sentry • Review audit and lineage on sample data using Navigator • Use Cloudera Manager and Hue to review security integration for users • Enable and configure HDFS encryption https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html
  • 53. 54© Cloudera, Inc. All rights reserved. Cloudera online ondemand security course • Online self paced training course https://ondemand.cloudera.com • Launch planned for mid Feb 2018 • 3 days estimate worth of content at Cloudera level 1 and 2 security level • Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations : 1. Security overview 2. Security Architecture 3. Host Security 4. Encrypting Data in motion 5. Authentication 6. Authorization 7. Encrypting Data at Rest 8. Auditing 9. Additional Considerations: Data Governance
  • 54. 55© Cloudera, Inc. All rights reserved. Ondemand security course instructor guided demos 1. Potential Attack vectors 2. Securing the cluster hosts 3. Generating and managing keys for TLS 4. Configuring Cloudera Manager for TLS 5. Encrypting Data in Motion 6. Hadoop default authentication 7. Kerberizing Cluster with MIT Kerberos 8. Kerberizing Cluster with Active Directory 9. Configuring Authorising with Cloudera Manager 10. Controlling access to Yarn 11. Controlling access to HDFS 12. Controlling access to Tables 13. Enabling HDFS Encryption 14. Protecting local data with NavEncrypt 15. Using Navigator for auditing 16. Reassessing cluster security
  • 55. 56© Cloudera, Inc. All rights reserved. Ondemand security course disclaimer THIS IS REALLY IMPORTANT: The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a cluster using the CentOS 7.2 operating system. Given the almost limitless permutations of possible configurations, including different versions of CDH, Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other tools, as well as variations in policies, laws, and practices that affect each organization differently, it's impossible for a training course to cover all aspects of security. This course is meant to provide a background that will help you to understand many important concepts and techniques, but is not intended as a replacement for the relevant documentation or a consulting engagement with an expert who can provide advice based on your specific requirements. • Disclaimers ~ due to security variety and permutations • Versions used: CDH 5.12 and Centos 7.2
  • 56. 57© Cloudera, Inc. All rights reserved. Ondemand security course scenario • Many of our demonstrations are based on a hypothetical scenario • However, the concepts should apply to nearly any organization • Loudacre Mobile is a fast-growing wireless carrier • Employees serving in a variety of roles • Data ingested from many sources, in many formats • Data processed by many tools
  • 57. 58© Cloudera, Inc. All rights reserved. Ondemand security course environment
  • 58. 59© Cloudera, Inc. All rights reserved. Comprehensive demonstration cluster
  • 59. 60© Cloudera, Inc. All rights reserved. Sample chapter structure: Encrypting Data in Motion • Encryption Fundamentals • Certificates • Key Management  Instructor-Led Demonstration: Generating and Managing Keys for TLS • Configuring Cloudera Manager for TLS  Instructor-Led Demonstration: Configuring Cloudera Manager for TLS • Encrypting Hadoop’s Data in Motion  Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion • Essential Points
  • 60. 61© Cloudera, Inc. All rights reserved. Register your interest for OnDemand security course: peter.rizvi@cloudera.com
  • 61. © Cloudera, Inc. All rights reserved. Thank you

Notas del editor

  1. Markets, and customers, can only expand as quickly as the human element is able to support it. Right now we are in a time where the demand is very much outpacing the supply of qualified big data professionals. Maintaining a training function is critical for cloudera because we need to maintain a capable delivery ecosystem that allow our customers to thrive within the hadoop environment. Recruitment is one option for organizations to overcome this barrier, but that path comes with an additional challenge: finding the right candidates. When it comes to emerging technology skills, it’s a seller’s market. There is significant competition for a finite pool of skilled technologists; and this competition will only increase as the use of this technology increases. Faced with an ever-tightening supply of qualified job applicants, organizations are finding that the costs to recruit new employees far exceeds the cost to train existing ones, and also that current employees are more than willing to be trained. The need for IT talent is only going to increase in an ever-expanding range of industries. Consider that by 2020, GE – known primarily as a manufacturer, expects to generate $15 billion from software, which would make it one of the top 10 software companies in the world. Or consider that 70 percent of Monsanto’s total jobs are already in science, technology, engineering, or math. Certainly many of those are in chemical and crop engineering, but increasingly, many are in IT, analytics, the Internet of Things, and digital operations. Monsanto is competing for skills not just with other agribusinesses but with companies in all industries. Organizations need to consider the cost of recruitment, and attrition. A majority of analysis around the topic of training confirm that employees that receive training are more likely to remain at their current employers. It allows them to learn new skills, and illustrates their employers are investing in them. For technologists, hadoop… spark… and the other projects that compose our platform open up a world of possibilities and curiosity. It is challenging and rewarding. We have several customers that build out robust hadoop training plans as a benefit to their employees, and the returns they see in the innovation on the platform and employee retention makes the cost of training a major value when viewed the spectrum of both short and long term returns. The evolution of the data center in the past few decades has mandated that IT decisions are now critical not just for back office operations, but more so critical in nearly every aspect of a business. With regard to “big data”, the technologies leveraged are very linked to an organizations customers and markets. As such, Business leaders are tasked with transforming their business to accommodate the realities of the “data-driven” market. This mean in some cases updating of hardware, and implementation of new software, but also the upgrade of the skills of their internal staff. If the talent of your staff is a concern, you are not alone. Cloudera, and analyst firms such as IDC, have polled organizations about enterprise software deployments… not surprisingly one of the primary areas of concern for Cloudera prospects and customers are the skills of their staff. This is a new way of computing, and harnessing the benefits of a Cloudera subscription requires employees familiar with the tools included in the platform, and an understanding of how to best leverage them for their use case. IDC looked at projects more generally, but solicited input from over 500 managers implementing IT projects on what were the critical factors in the success of a project. Since we are discussing training, and building out a team of experts on this call, I’m guessing you are assuming it was not the software, not clearly defined business objectives, or a solid project plan which predicated success. Overwhelmingly managers ranked the skill and dedication of the project team as the factor which played the largest part in the success of their project. We want to make sure that customers include the human element needed to role out a successful project as they consider a Cloudera subscription.
  2. I’ve alluded to some of these options early in the presentation; but to ensure there is clarity on our delivery options… we offer both public and private training. Public training courses are scheduled around the globe by Cloudera and by our Authorized Training Partners. Authorized training partner instructors go through the same procedures as Cloudera instructors, regularly also provide field services in their regions, and allow for local language delivery in areas where we do not have direct coverage. Public training schedules can be found on Cloudera’s website where you can search by course title and/or location of interest. Public training is a nice option if you have just a few team members that need training, or you need to get someone ramped up in a short timeframe. Students are able to interact with their peers from other organizations implementing Cloudera solutions, and a live instructor. Private Training is reserved for a customer who wants their entire team to be trained. Normally we say if you have seven or more students who need the same training class, its worth your while to explore our private training option. We’ll send an instructor to a location of your choice to deliver training specific to your needs. Regularly the training is one of the courses that I’ve described earlier in this presentation, but if needed, we can also customize the content to align it with your business objectives. To be clear, “customization” is not new content creation, it is creating an agenda from our portfolio of content that makes sense for the customer. Some examples would be adding Spark ML or JEP to Spark and Hadoop training to make it a five day course, or cutting Pig from Data Analyst training to make it a three day course. We generally recommend not trying to customize a course by looking at disparate topics across many classes – it usually ends up having no flow or connection, and the students leave with more questions than answers. Our courses build on concepts throughout the duration of the class. Customization is encouraged, but shouldn’t be abused. Private Training courses are available for “up to 10” or “up to 16” students. Virtual training is live training that is delivered over the internet. Both public and private classes can be delivered in this manner. From a public perspective, it’s a popular option for individuals who are not local to one of our training locations. Private customers with geographically dispersed team also find this means to save on the travel costs it would take to bring the team to a central location. OnDemand training is a library of pre-recorded training classes, which allows for 24x7, self-paced training in a searchable environment. Our entire portfolio of content is available in this format, and students leverage a cloud-based lab environment to complete the same hands-on exercises we deliver in the live classrooms. Courses can be bought as a library, or by individual title. Certification, I’ve touched on earlier. Certifications may be bought in bulk via PO, or purchased directly via our website. Certification candidates are remotely monitored, and are not required to go into a testing center to compete the exam. All you need is an internet connection. Prices range from $295 for CCA level exams to $400 for CCP: Data Engineer, or $600 per CCP: Data Science exam.
  3. … and here is what I talked about in the past three slides, in summary. Over time, we will be adding courses to the Administrator training path focused on Security, Cloud, and Architecture – look for those in the next calendar year. We also have plans to iterate and/or augment our Developer, Data Analyst, and Data Scientist content to reflect the evolution of the technology.
  4. This talk is mainly about security implementation from both an engineering and a support perspective.
  5. Data breach incidents are increasing year by year. This year alone there have been a number of high profile breaches. Security is built deep in Hadoop, but it does not work out of box. Rome is not bulit in a day. As you will learn during your security implementation process, it takes a lot of configurations and best practices to make a secure Hadoop cluster. Good news: Cloudera Manager and Navigator is there to the rescue! Cloudera’s platform is built on top of Apache Hadoop technology. It is the first Hadoop platform to achieve PCI-compliance.
  6. New York State Department of Financial Services “紐約州金融服務署” Breach Notification Right to Access Right to be Forgotten Data portability Privacy by Design Data Protection Officers
  7. But obviously it takes more than good people and processes. You need the right technology. Let’s get down to brass tacks on what the software is about We’re based on an open source core. A complete, integrated enterprise platform leveraging open source HOSS business model - core set of platform capabilities – we contribute actively into that community. and we layer value added software on top - that’s how we run our business. But what’s truly differentiating about our platform is the enterprise experience you get. It’s why we’re able to claim 7 of the top ten banks and 9 of the top ten telcos are our customers. For regulated industries, the enterprise experience is critical. Multi-cloud – No vendor lock in. Work in the environment of your choice. Better pricing leverage Managed TCO – Multiple pricing and deployment options Integrated – Integrated components with shared metadata, security and operations Secure - Protect sensitive data from unauthorized access – encryption, key management Compliance – Full auditing and visibility Governance – Ensure data veracity
  8. Apps share data, rather than data replicated for apps Lower costs because less data to replicate More secure because data is in one central location Easier to build apps because data is easily accessible Open architecture to share data with other teams and workloads, including data science
  9. Apps share data, rather than data replicated for apps Lower costs because less data to replicate More secure because data is in one central location Easier to build apps because data is easily accessible Open architecture to share data with other teams and workloads, including data science
  10. As a customer, you will most likely not interact with Cloudera’s platform directly. Typically customers access Cloudera’s platform indirectly through partner products. To ensure the same security protocol is not breached, we certify partner products with security in mind. For the purpose of this talk, I am going to briefly mention Cloudera’s certification process from a security perspective. Should also hire Cloudera certified administrators, or hire professional services from Cloudera SI partners
  11. A little bit on partner product certification https://docs.google.com/a/cloudera.com/document/d/1XwRV_bVZrM90JsPhHxLYAgd6vCdvT7qQ-k8eIQ2QYsk/edit?usp=sharing
  12. Upstream = reports coming from apache project. Each apache project has a private security@ mailing alias. Obey Apache’s security policy Internal = reports coming internally from Cloudera. Cloudera Engineering run several security weakness detection tools looking for security issues in the software. External = reports coming from third party or a customer.
  13. Cloudera works hard to provide security on top of the big data platform. In this talk, I will present the best practices and common pitfalls of security implementation on Hadoop, based on my experience working with customers. Source: https://www.cloudera.com/documentation/enterprise/latest/topics/sg_edh_overview.html#topic_ads_t2q_1r Achieving data security is costly. Depending on use cases and sensitivity of data, enterprise may decide which level of security is desired. Typically, enterprises choose to implement security on Hadoop step by step. Or hire Cloudera PS to make a custom security implementation plan and complete these steps in one shot.
  14. https://cloudera.app.box.com/files/0/f/6321638305/1/f_56252438130 TPC-DS Impact is very little This is tested with Key Trustee. HSM is currently very slow AES-NI As the result shows below the percentage overhead of using encryption on system was 2% in terms of query execution time and 3.1% in CPU time.
  15. A secure system takes more than just a good product. It also requires experienced people to integrate it and operate it. These people must receive the proper training. Technology: Cloudera’s platform and certified partners’ products, post-sell support People: Cloudera PS team or SI partners, consulting firms, customer’s admin, users Process: SOP, documentation, regular audits, compliance plan, not covered in this talk
  16. Depend on existing firewalls.
  17. Leverage existing firewall mechanisms in the enterprise to set up perimeter. First line of defense Firewall exposes only: gateway nodes for submitting jobs, and CM and CN interface. System chart: CM, master node (HA), worker nodes, firewalls,
  18. The Cloudera’s platform does not manage user authentication. Instead, it relies on external authentication mechanism for that purpose, such as Kerberos, LDAPs or AD. For simple authentication it gets user name from local operating system user name. But it is too much effort trying to ensure accounts are consistent. So use AD + SSSD/Centrify CDH is composed of many open source projects, and as a result, not all of them support the same set of authentication mechanisms. There are (simple, kerberos, ldap, saml) supported. AD integration – it is likely your enterprise is already using ActiveDirectory for user identity control. --- use SSSD instead of LdapGroupsMapping. --- Create dedicated OU for cluster --- use LDAP over SSL Need to select a good base, so that AD returns quickly. A slow lookup can stop all operations. LDAP authentication can be used for CM, Hue, Hive and Impala. The latency of LDAP request/response is critical for cluster performance.
  19. User identity can be forged easily. It is okay to have unsecured dev cluster, or PoC cluster.
  20. This should be the _minimal_ security requirement for any production cluster Kerberos is a cryptographic authentication mechanism. Key Distribution Center KDC Kerberos -- Kerberos to user name mapping Simple authentication = no authentication Time synchronization -- NTP Keytab handling – keytab stores password and is required for Hadoop services https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s3_cm_principal.html CM makes it extremely easy.
  21. This should be the _minimal_ security requirement for any production cluster Kerberos is a cryptographic authentication mechanism. Kerberos -- Kerberos to user name mapping Simple authentication = no authentication Time synchronization -- NTP Keytab handling – keytab stores password and is required for Hadoop services https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s3_cm_principal.html CM makes it extremely easy.
  22. This should be the _minimal_ security requirement for any production cluster Kerberos is a cryptographic authentication mechanism. Kerberos -- Kerberos to user name mapping Simple authentication = no authentication Time synchronization -- NTP Keytab handling – keytab stores password and is required for Hadoop services https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_s3_cm_principal.html CM makes it extremely easy.
  23. Authentication is a prerequisite of authorization Access control lists (ACLs) restrict who can submit work to dynamic resource pools and administer them.
  24. Cloudera Navigator  Enable Audit Collection Audit log retention Provenance use case A number of business decisions and transactions rely on the verifiability of the data used in those decisions and transactions. Data-verification questions might include:How was this mortgage credit score computed? How can I prove that this number on a sales report is correct? What data sources were used in this calculation? Auditing use case What was a specific user doing on a specific day? Who deleted a particular directory? What happened to data in a production database, and why is it no longer available?
  25. A backup/DR cluster that is purely for DR purpose (replicates between multiple untrusted Kerberos realms) https://blog.cloudera.com/blog/2016/08/considerations-for-production-environments-running-cloudera-backup-and-disaster-recovery-for-apache-hive-and-hdfs/
  26. One Kerberos realm per cluster BDR runs from destination. Must configure the destination realm to trust source realm The DR cluster should not be used for any purposes other than DR.
  27. AES/CTR/NoPadding is an encryption algorithm.
  28. At-rest encryption is required by PCI-DSS, FISMA, HIPAA Separation of duties -- NameNode vs KMS Hdfs superuser cannot decrypt keys. At rest encryption is more complex than in-transit, because the key is typically not updated for a long time, so need a more complex mechanism to protect keys An encryption zone can only be created for an empty directory. There’s a workaround to run hdfs distcp to copy files into the EZ. Supports at most 256 bit encryption. ”Always-on encryption zone”/”nested encryption zone” support in CDH5.7 but no CM support i.e. doesn’t work end-to-end
  29. https://www.cloudera.com/documentation/enterprise/latest/topics/encryption_ref_arch.html Deployment consideration: at least 2 KMS proxy. At least 2 keytrustee servers. KTS should be a separate cluster. The two clusters are protected by a firewall. Keytrustee servers are active-passive. If the active is down, the passive is able to serve reads, but not writes Keytrustee servers should be on its own box. KTS HA: if either one fails, only reads are allowed. It does not affect reading/writing encrypted files, but can’t create encryption zones. May have more than 2 KMS proxies for load balancing purposes. KMS is cpu intensive, so use hardware equivalent to NameNode hardware security module (HSM)
  30. Resource planning & requirement: Deployment consideration: at least 2 KMS proxy. At least 2 keytrustee servers. (total of 4 hosts) KTS should be a separate cluster. The two clusters are protected by a firewall. Keytrustee servers are active-passive. If the active is down, the passive is able to serve reads, but not writes Keytrustee servers should be on its own box. KTS HA: if either one fails, only reads are allowed. It does not affect reading/writing encrypted files, but can’t create encryption zones. May have more than 2 KMS proxies for load balancing purposes. KMS is cpu intensive, so use hardware equivalent to NameNode hardware security module (HSM)
  31. Deployment consideration: at least 2 KMS proxy. At least 2 keytrustee servers. KTS should be a separate cluster. The two clusters are protected by a firewall. Keytrustee servers are active-passive. If the active is down, the passive is able to serve reads, but not writes Keytrustee servers should be on its own box. KTS HA: if either one fails, only reads are allowed. It does not affect reading/writing encrypted files, but can’t create encryption zones. May have more than 2 KMS proxies for load balancing purposes. KMS is cpu intensive, so use hardware equivalent to NameNode hardware security module (HSM)
  32. https://cloudera.app.box.com/files/0/f/6321638305/1/f_56252438130 TPC-DS Misconfiguration Use aes/ctr/nopadding, (Data Transfer Encryption Algorithm) default is 128-bits/ 256-bits (managed by CM) Low entropy : /proc/sys/kernel/random/entropy_avail Hardware acceleration Openssl library Entropy configuration
  33. One of the characteristics of Hadoop platform, is there are a variety of tools capable of accessing the same set of data. For example, …MapReduce, Hive, Impala, Pig and 3rd party software can all access HDFS. A unified access control is crucial. Pig, Sqoop and Kafka are also supported by Sentry. If Impala is used, Sentry is a must. By default, Impala can be accessed by user impala 3rd party BI tools may not support Sentry, which must enforce access through HiveServer2. Migrating from no Sentry to Sentry is a tremendous work, and hard to rollback
  34. In regulated industry, the regulation such as PCI or HIPAA requires redaction of PIIs. (such as SSNs) https://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html https://blog.cloudera.com/blog/2015/06/new-in-cdh-5-4-sensitive-data-redaction/
  35. In regulated industry, the regulation such as PCI or HIPAA requires redaction of PIIs. (such as SSNs) https://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html https://blog.cloudera.com/blog/2015/06/new-in-cdh-5-4-sensitive-data-redaction/
  36. Intermediate files. Certain services may write spilled data outside HDFS, on local disk. So additional configuration is required to ensure they are encrypted as well. Navigator Encrypt is a kernel model that intercepts I/O requests to encrypted datastores, including log files, config file, temp file, databases
  37. Other references: https://cloudera.app.box.com/files/0/s/firewall/1/f_202846938208 Ben and Joey were both long time Cloudera Solution Architects