Más contenido relacionado La actualidad más candente (20) Similar a Modern Data Warehouse Fundamentals Part 1 (20) Más de Cloudera, Inc. (13) Modern Data Warehouse Fundamentals Part 13. © Cloudera, Inc. All rights reserved. 3
SPEAKERS
Eva Nahari
Director, Product
Management
eva.nahari@cloudera.com
David Dichmann
Director, Product Marketing
ddichmann@cloudera.com
5. 5 © Cloudera, Inc. All rights reserved.
LARGE NORTH AMERICAN BANK
• LoB Data Analysts
access all data
• Saved $4M+ in
deposit fraud
Terabytes
Users
Databases
Queries / Month
FRAUD PREVENTION
6. 6 © Cloudera, Inc. All rights reserved.
GLOBAL PHARMACEUTICAL
• Curated Use and
Agile Discovery with
HIPAA compliance
• Accelerated new
Drug Development
Use Cases
Users
Fewer Silos
Diverse Data
NEW PRODUCT
DEVELOPMENT
7. 7 © Cloudera, Inc. All rights reserved.
MAJOR TELCO MANUFACTURER
• $10 M new revenue
from optimized
marketing
• $30 M+ from Price
Optimization
• $100K+ from
weather correlationQuery
Responses
New Sources
Min. Data Sets
Users
BUSINESS
OPTIMIZATION
8. © Cloudera, Inc. All rights reserved. 8
NEW TRENDS IN DATA WAREHOUSING
Deeper Business Insights at Extreme Speed and Scale While Managing Cost
DEEPER
business insights
EXTREME
speed & scale
CONTROLLED
resources & costs
9. © Cloudera, Inc. All rights reserved. 9
NEW TRENDS IN DATA WAREHOUSING
Deeper Business Insights
Protect
● Proactive Fraud Prevention
● Keep up with Regulatory
Compliance
● Preempt Cyberthreats
Real-time response on
massive data volume
and variety
Optimize
● Improve Operational
Efficiency
● Support Internet of Things
(IoT)
New analytics techniques
democratized to all users
Grow
● Customer Sentiment
● Fault Prevention
● Improve Product Quality
● New Revenue Streams
Experimentation and
collaboration at scale
10. © Cloudera, Inc. All rights reserved. 10
NEW TRENDS IN DATA WAREHOUSING
Extreme Speed and Scale
More Data
● Massive amounts handled
faster at scale
● More variety from new
sources (social media, IoT)
● Insight within minutes of
new data arrival
Performance and
flexibility at scale
More Workloads
● 100’s of production grade
deployments
● Enterprise grade
dependability
● Strict security and
governance
On-demand scale out,
discovery, collaboration
More People
● 1,000’s of new users and
new user types
● 1,000’s of new use cases
● All skill levels: Analytics,
Data Science, and Machine
Learning
All workloads with a
shared data experience
11. © Cloudera, Inc. All rights reserved. 11
NEW TRENDS IN DATA WAREHOUSING
Managing Resources and Costs
Optimize Core Processes
● Automation to reduce
pressure on organizational
bottlenecks
● Consistent user experience
Broaden data reach
without increasing IT
burden or costs
Self-Service Everything
● Resource provisioning
● Workload development
● Optimizing and
troubleshooting
Deliver on increased
SLA pressures without
runaway cost
Dynamic Consumption
● Transient Workloads
● Short-lived Workloads
● Permanent Workloads
● Public, Private, Hybrid Cloud
Environmental flexibility
and adaptive compute,
storage
12. © Cloudera, Inc. All rights reserved. 12
Quickly enable business analytics by sharing petabytes of verified data
across thousands of users while surpassing demands of SLAs and costs
13. 13 © Cloudera, Inc. All rights reserved.
TRADITIONAL DATA WAREHOUSE:
Structured Data
Sources
(ERP, CRM, SCM)
Transformations
EDW
Advanced
Analytics
Dashboards
Ad Hoc
Canned
Reports
Staging
Data Marts
Many Months
Master Schema
ETLODS
2 3
4
1 5
Struggle to handle volume
and variety
Limited
access
14. 14 © Cloudera, Inc. All rights reserved.
WHAT CONCEPTS SURVIVE?
Data Modeling Security & Governance Reports & Dashboards
15. 15 © Cloudera, Inc. All rights reserved.
WHAT HAS CHANGED?
Traditional DW Modern DW
Supporting Role Foundational Role
Primarily Internal Internal & External
Constrained, Structured
Freeform,
Multi-Structured
Planned ETLs On-Demand Pipelines
Users
Data Exploration
Data Curation
Data & Analytics
16. 16 © Cloudera, Inc. All rights reserved.
WHAT IS NEW?
Experimentation
& Collaboration
Dynamic Consumption Self Service
Everything
17. 17 © Cloudera, Inc. All rights reserved.
MODERN DATA WAREHOUSE
Advanced
Analytics
Dashboards
Ad Hoc
Canned
Reports
Data Store
Within Days
Data Marts
1
2
Ingest & Store all data
at scale
Self-serve / On-
demand
Variety of data
sources/types
18. 18 © Cloudera, Inc. All rights reserved.
CLOUDERA MODERN DATA WAREHOUSE
The modern platform for machine learning and analytics optimized for the cloud
Amazon S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
ANALYTICSDATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA ENGINEERING
19. 19 © Cloudera, Inc. All rights reserved.
Preferred BI & ELT ToolsSQL Workbench
Workload
XM
Navigator
& Sentry
Impala
MPP Query Engine
Hive-on-Spark / Spark
MPP ELT Processing
KUDU | HDFS
Local Storage
AWS S3 | ADLS
Object Storage
Shared Data Experience (SDX)
Optimized File Formats (Parquet, Avro)
Solr
MPP Search Analytics
Cloudera
Manager
HYBRID
Controls
HYBRID
Compute
HYBRID
Storage
A MODERN DATA WAREHOUSE SOLUTION
Altus
20. 20 © Cloudera, Inc. All rights reserved.
Proactively Optimize Workloads
WORKLOAD XM
Self Serve Diagnostics and Optimizations
Self Serve Analytics Workbench
Move faster
Serve more users
Reduce IT pressure
21. 21 © Cloudera, Inc. All rights reserved.
EXTREME SPEED & SCALE
Fastest ELT
at Scale
for Data Engineers
Fastest Self-Service BI
at Scale
for Analysts & Developers
Impala
Flexibility at scale
1000s of users
On-demand scale out
Speed to insight
22. 22 © Cloudera, Inc. All rights reserved.
EXPLORE
Discovery
(raw)
EXPERIMENT
Exploration
(curated)
EMERGING LOB
Prep - New
Report
SALES
BI/New
Reporting
EXPERIMENT
Model
Build/Test
DEV & TEST
Prep –
Known
FINANCE
Regular
Reporting
Shared Storage (HDFS, KUDU, S3, ADLS)
Shared Metadata, Security, Governance
Landing Zone Experimental Zone Archived ZoneRefined Zone
ON-DEMAND SCALING & MULTI-TENANCY
23. 23 © Cloudera, Inc. All rights reserved.
Stateful Context, Shared Experience
ENABLES FULL FLEXIBILITY AND DYNAMIC CONSUMPTION
24. Confidential-Restricted – For Discussion Purposes Only24 © Cloudera, Inc. All rights reserved.
CLOUD NATIVE OPTION - ALTUS DW
● Quick time to value - no software or
clusters to manage
● Bring warehouse to the data with zero
copy simplicity
● Use your security policies with your
data - no proprietary stacks
● Apply enterprise governance to
transient workloads
● Shared data experience with SDX
● Optimized for Azure & AWS
DATA WAREHOUSE
GOVERNANCESECURITY
ALTUS CONTROL
PLANE
LIFECYCLE
MANAGEMENT
MULTI-CLOUD
Amazon
S3
Microsoft
ADLS
MULTI-CLOUD PAAS SOLUTION
25. 25 © Cloudera, Inc. All rights reserved.
Moving from Known Questions on Known Data to Unknown Questions on Unknown Data
FROM ANALYTICS TO MACHINE LEARNING
25
DATA
ENGINEERING
DATA
WAREHOUSE
+
+
● Run ETL with Spark or partner tools to ingest
and process data at any scale
● Assign permissions and classifications once
● Data, along with all data context, is
immediately available in the data warehouse
for analytical processing and BI use cases
● Run data science and machine learning
analysis to blend, augment, and score data
● Blended and augmented data, along with all
data context, is immediately available to to
business teams and analysts with unified
security and governance
DATA
WAREHOUSE
DATA
SCIENCE
Cloudera SDX makes it easy for
administrators, BI users, data
scientists to work together on a
common data set, with consistent
data context
BETTER
TOGETHER
26. 26 © Cloudera, Inc. All rights reserved.
TOOLS & FRAMEWORKS FOR SUCCESS
Plan Offload
(Optional)
Optimize
Estimate Effort
Risk Analysis
Schema Design
Test & Validate
Evaluate
Identify Use Cases Impact Analysis
Set Objectives
Prioritized Plan
Initial POC
Identify Suitable
Workloads
Offload Actions
Capacity Planning
Fine Tuning Data
Model on Hadoop
Optimize Queries for
Performance
Validate ROI, Cost
27. 27 © Cloudera, Inc. All rights reserved.
TD BANK: Delivering “Legendary Customer Experience”
CHALLENGES
Significantly improve customer
experience with sentiment
analysis, behavioral patterns,
and predictive modeling
Current system couldn’t handle:
• Centralizing data from
thousands of sources
• Demands from increased
users and use cases
• Data cost and manageability
at scale
RESULTS
• 30% reduction in repeat
customer complaints
• 90% productivity
improvement for analytics
projects
• 60% decrease in data
management costs
• 98% decrease in per TB
storage costs
SOLUTION
Modern Data Warehouse for
customer marketing, fraud
analytics and cybersecurity
• Ingest data from 100+
corporate systems
• Centralized data into “the
hands of those that need it
much more quickly”
• Significantly reduce storage
and management costs
https://www.cloudera.com/more/customers/td-bank.html
28. 28 © Cloudera, Inc. All rights reserved.
DEUTSCHE TELEKOM: Fraud reduction and customer retention
CHALLENGES
Improve fraud detection speed
to near-real time and respond
to network service quality
issues before customers notice
Current system couldn’t handle:
• Massive volumes of network
data - at higher granularity
• Enterprise view of data -
machine learning at scale
• Near-real time fraud
detection on incoming data
RESULTS
• 10-20% reduction in revenue
loss by increased fraud
detection
• 5-10% decrease in customer
churn with increased
network quality
• 50% increase in overall
operational efficiencies with
faster analytics
SOLUTION
Modern Data Warehouse to
detect fraud patterns and
network problems in real-time
before business impact
• Quickly analyze massive
streaming data sets
• Enterprise grade reliability
and stability with shared
data experience (no silos)
• Machine learning and fast
analytics - real-time
https://www.cloudera.com/more/customers/deutsche-telekom.html
29. 29 © Cloudera, Inc. All rights reserved.
KOMATSU MINING: Optimize Machine Performance
CHALLENGES
Create an Industrial IoT (IIoT)
solution for optimizing mining
equipment utility and build
better next-generation products
Current system couldn’t handle:
• Scale of IoT data
• Demand for new users and
use cases
• 30TB/month data growth
RESULTS
• 2X Increase in production
hours on key equipment
• Design next-generation
equipment: environmentally
smarter, more productive, at
lower cost
• Meet or exceed all KPIs:
“Deliver all of the data with
less complexity and
significant cost savings”
SOLUTION
Cloud-based IIoT analytics for a
full view of mining operations
• Quickly and easily analyze
huge volume and variety
(time-series, sensor, event,
and more) of data
• More use cases and users:
“democratizing analytics for
different user groups”
• Scale quickly and easily in
the cloud
https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
30. 30 © Cloudera, Inc. All rights reserved.
CLOUDERA DW - PARTING THOUGHTS
Hybrid Optimized Shared Data ExperiencePerformance @Scale
Shared Data
Exponential Use Cases, Successful Outcomes