Leveraging the Cloud for Big Data Analytics 12.11.18

BIG DATA ANALYTICS IN THE CLOUD
Sushant Rao
Cloud Product Marketing @ Cloudera
Rohit Pujari
Solutions Architect @ Amazon Web Services

© Cloudera, Inc. All rights reserved.2
Primary Advantages for Cloud
● Agility
○ Speed of making changes to meet business / technical needs
● Scalable & Elastic
○ Scale up and down quickly
● Reliable
○ Multiple options to ensure infrastructure / services are available
● Cost effectiveness
○ Pay for what you use (but may not be cheaper than on-prem)

Big Data Use Cases for Cloud
● Corporate directive to leverage the cloud
○ C-level has decided to utilize the cloud more
● Disaster Recovery “location” in the cloud
○ Backup all data to the cloud, without a second “physical” location
● On-demand data mart / data engineering
○ Separate environment for new, production workloads
○ Ad-hoc workloads that run intermittently
● Sandbox environment for workloads
○ Environment to test queries and algorithms

Cloudera’s Solution for Data Analytics / Engineering in Cloud
• The modern platform for machine learning and analytics
○ Numerous functions for all types of jobs and queries
• with multiple deployment options
○ On-premises, Public cloud (including multi-), and Hybrid
• and one shared data experience
○ Framework for consistent security, governance, and metadata management across
applications and deployments

The Modern Platform for Machine Learning & Analytics
OPERATIONAL
DATABASE
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
SCIENCE
DATA PROCESSING
• Cost efficient
• Reliable
• Scalable
• Based on Spark,
MapReduce,
Hive & Pig
• Supported by
Workload
Analytics
FAST BI & SQL
• Flexibility
• Elastic scale
• Go beyond SQL
• Based on
Impala & Hive
• SQL dev enviro
• Supported by
Workload
Analytics
MACHINE LEARNING
• Fast dev to
production
• Secure self-serve
• Based on
Python, R, and
Spark
• ML dev
environment
(CDSW)
ONLINE & REAL-TIME
• High throughput,
low latency
• Strongly consistent
• Based on
Hbase, Kudu
& Spark
streaming

© Cloudera, Inc. All rights reserved. 6
Cloudera’s Vision for AI and Machine Learning
Modern Enterprise Platform, Tools, and Expert Guidance to help you Unlock Business Value with ML /
AI
Agile platform to build,
train, and deploy
scalable ML applications
Enterprise data science
tools to accelerate team
productivity
Expert guidance,
services & training to
fast track value & scale

Via Cloudera Altus Director
INFRASTRUCTURE SERVICES
OPERATIONAL
DATABASE
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
SCIENCE
DATA
ENGINEERING
DATA
WAREHOUSE
Via Cloudera Altus Services
With Multiple Deployment Options
Traditional Infrastructure
(combined storage and compute)
Cloud Infrastructure
(decoupled storage and compute)
Cloud Infrastructure
(decoupled storage and compute)

Cloudera
Enterprise with
SDX
Benefits for IT infra & ops
●Central control and security
●Focus on curating not
firefighting
Benefits for users
●Value from single source of
truth
●Bring the best tools for each
job
WORKLOADS 3RD PARTY
SERVICES
DATA
ENGINEERING
DATA
SCIENCE
DATA
WAREHOUSE
OPERATIONAL
DATABASE
DATA CATALOG
GOVERNANCESECURITY LIFECYCLE
MANAGEMENT
STORAGE
Microsoft
ADLS
COMMON SERVICES
HDFS
Amazon
S3 KUDU

Many Options for Data Analytics / Engineering in the Cloud
Altus Director
Altus
Services
Existing On-
Prem
Deployment

Many Options for Data Analytics / Engineering in the Cloud
Altus Director
Altus
Services
Existing On-
Prem
Deployment
Starting New
Deployment

Journey from On-Prem Cluster to Cloud
BARE METAL
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
0 - ON PREMISES
HDFS

CUSTOMER CLOUD
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
1 - LIFT AND SHIFT
HDFS
BARE METAL
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
0 - ON PREMISES
HDFS

CUSTOMER CLOUDCUSTOMER CLOUD
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
1 - LIFT AND SHIFT 2 - OBJECT STORAGE
HDFS
BARE METAL
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
0 - ON PREMISES
HDFS

CUSTOMER CLOUD CUSTOMER CLOUDCUSTOMER CLOUD
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
1 - LIFT AND SHIFT 2 - OBJECT STORAGE
HDFS
CLOUDERA
CLUSTERS
(TRANSIENT–
ALTUS)
COMPUTE
Data
Engineering
CLOUDERA CLOUD
CLOUDERA
ALTUS
CONTROL
PLANE
STORAGE
CLOUD OBJECT STORE
DATA
CONTEXT
CLOUDERA CLUSTER
(PERSISTENT–DIRECTOR)
COMPUTE DATA
CONTEXT
CLOUDERA
CLUSTERS
(TRANSIENT–
ALTUS)
COMPUTE
Data
Warehouse
3 - CLOUD NATIVE ARCHITECTURES
BARE METAL
CLOUDERA CLUSTER
(PERSISTENT)
COMPUTE DATA
CONTEXT
Data
Engineering
Data
Warehouse
Data Science
Security
Metadata
Governance
STORAGE
CLOUD OBJECT STORE
0 - ON PREMISES
HDFS

Customer Examples
Many Cloudera customers (Global 5K) used public cloud
• Online retailer
• Over 2,000 nodes with ~2PB of data on AWS running in an active - active configuration
• Transforming data with Spark and then analyzing with Apache Hive
• German chain of coffee retailers and cafés
• 30+ nodes with 50TB of data on AWS
• Modern Cloudera platform with an Impala data warehouse
• Global information company
• 50+ nodes on Microsoft Azure and 20+ nodes on AWS
• Replaced Netezza with Hadoop and leveraging both Impala and Spark for analytics

Security Use Case
Cloudera is using cloud as well
Altus based solution saved more than 50% cost compared to initial implementation

Cloudera Altus
Key Differentiators
• Multi-function: Unified platform for data engineering, data warehouse, and data
science
• Multi-cloud: Option for on-premises, Public cloud (including multi-), and Hybrid
• SDX: Integrated shared data experience across multi-function clusters

Rohit Pujari, Solutions Architect
AWS Security & Compliance

Why is security traditionally so hard?
Lack of
visibility
Low degree
of automation

ORANDMove fast Stay secure
Before…Now…

Making life easier
Choosing security does not mean
giving up on convenience or introducing
complexity

The most sensitive workloads run on AWS
“We can be even more secure in the AWS cloud
than in our own datacenters.”
—Tom Soderstrom, CTO, NASA JPL
“We knew the cloud was the only way to get the scalability,
speed, and security our customers expect from 3M.”
—Rick Austin, 3M Health Information Systems
“We determined that security in AWS is superior to our on-premises
data center across several dimensions, including patching,
encryption, auditing and logging, entitlements, and compliance.”
—John Brady, CISO, FINRA (Financial Industry Regulatory Authority)

Benefits of a Data Lake - All Data is in One Place
Analyze all of your data,
from all of your sources, in one stored
location
“Why is the data distributed in many
locations? Where is the single source
of truth?”

Durable
Designed for 11 9s
of durability
Available
Designed for
99.99% availability
High performance
▪ Multiple upload
▪ Range GET
▪ Scalable throughput
Scalable
▪ Store as much as you need
▪ Scale storage and compute independently
▪ No minimum usage commitments
Integrated Partner Tools
▪ Cloudera EDH
▪ Cloudera Altus
▪ Cloudera Impala
Easy to use
▪ Simple REST API
▪ AWS SDKs
▪ Simple management tools
▪ Event notification
▪ Lifecycle policies
Why Amazon S3 for a Data Lake?

AWS Direct Connect AWS Snowball ISV Connectors
Kafka/Flume
Amazon Kinesis
Firehose
Amazon S3 Transfer
Acceleration
AWS Storage
Gateway
Data Ingestion into Amazon S3

Encryption ComplianceSecurity
▪ Identity and access
Management (IAM) policies
▪ Bucket policies
▪ Access Control Lists (ACLs)
▪ Private VPC endpoints to
Amazon S3
▪ Amazon S3 object tagging to
manage access policies
▪ SSL endpoints
▪ Server-side encryption
(SSE-S3)
▪ S3 server-side
encryption with provided
keys (SSE-C, SSE-KMS)
▪ Client-side encryption
▪ Buckets access logs
▪ Lifecycle management
policies
▪ Access Control Lists
(ACLs)
▪ Versioning and MFA
deletes
▪ Certifications—HIPAA,
PCI, SOC 1/2/3, etc.
Strong Security Controls

Automate
with deeply integrated
security tools
and services
Inherit
global
security and
compliance
controls
Highest
standards
for privacy
and data
security
Largest
network
of security
partners and solutions
Scale
with superior visibility
and control that
satisfies the most
risk-sensitive orgs
Move to AWS
Strengthen your security posture

Encrypt data in
transit and at rest
with keys managed by
our AWS Key Management
System (KMS) or managing
your own encryption keys
with Cloud HSM using
FIPS 140-2 Level 3
validated HSMs
Meet data
residency requirements
Choose an AWS Region
and AWS will not replicate it
elsewhere unless you choose
to do so
Access services and tools that
enable you to
build GDPR-compliant
infrastructure
on top of AWS
Comply with local
data privacy laws
by controlling who
can access content, its
lifecycle and disposal
Highest standards for privacy

Inherit global security and compliance controls

Data Analytics / Engineering with Cloudera
$
• Lower risk of data breach
• Analysts more productive on jobs
• Self-service (no shadow IT) and
more productive
• IT more strategic, less admin time
• Deployment choices and no lock-in
• Same solution as on-premises and multi-
cloud
• Eliminate data copies
• Single security framework with universally
shared metadata
• Easy to track data lineage
• Unified services
+
CLOUDERA
ADVANTAGES
BUSINESS
VALUE

Ready to try Data Analytics / Engineering in the Cloud?
Have an existing cluster for DW / DE
• Up to $2K Free AWS Credits*
• Email: awsoffer@cloudera.com
Don’t have an existing cluster
• Free Altus DE / DW Trial
• https://sso.cloudera.com/register.html
*Must work with AWS and Cloudera account managers on POC to be eligible for offer

APPENDIX

Cloudera Pricing / Acquisition
• Acquisition Options
• Pay-as-you-go usage-based pricing
• Node-based license subscription
• Free 30-day trial
• Pre-pay of cloud credits
• Free version that can be deployed in the cloud
• Pricing - https://www.cloudera.com/products/pricing.html

Leveraging the Cloud for Big Data Analytics 12.11.18

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Leveraging the Cloud for Big Data Analytics 12.11.18

Similar a Leveraging the Cloud for Big Data Analytics 12.11.18 (20)

Más de Cloudera, Inc.

Más de Cloudera, Inc. (16)

Último

Último (20)

Leveraging the Cloud for Big Data Analytics 12.11.18

Notas del editor