Más contenido relacionado La actualidad más candente (20) Similar a Introducing the data science sandbox as a service 8.30.18 (20) Más de Cloudera, Inc. (9) Introducing the data science sandbox as a service 8.30.181. © Cloudera, Inc. All rights reserved.
Enterprise-Ready Data Science:
Scaling, Governance, and Operationalization
2. © Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.2
Mark Chisam
Senior Solution Engineer
Introducing Cloudera Data Science Workbench
9. © Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.9
Dr. Daniel Parton
Lead Data Scientist
Operationalizing Data Science for Enterprises
10. © Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved.
Bardess® is a consulting company focused on
designing and implementing data analytics solutions.
We are a team of data and business professionals,
who ask insightful questions, extend boundaries and
take action.
We transform data into
insights and action, everyday.
1
0
11. © Cloudera, Inc. All rights reserved.
11
Requirements
Discovery
Strategy +
Planning
Solution
Design
Ingestion +
Shaping
Data
Architecture
Storage +
Processing
Predictive
Analytics
Machine
Learning
Artificial
Intelligence
Visualization
Data
Discovery
Dev / Ops
Bardess Data Practices
MANAGEMENT CONSULTING DATA OPS DATA SCIENCE DATA ANALYTICS
12. © Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved.12
AI
MACHINE
LEARNING
DATA SCIENCE
ANALYTICS
"BIG DATA"
13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
WHAT IS A DATA SCIENTIST?
14. © Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.
WHAT IS A DATA SCIENTIST?
15. © Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.15
Data Engineering Data Science (Exploratory) Production (Operational)
Data has never been
more plentiful.
Open source data science and
machine learning libraries are
rapidly evolving.
Commodity (and on-demand)
compute makes scalable
production machine learning
affordable.
Reports,
Dashboards
Production Data
Pipelines
Batch scoring
…
THE GOOD NEWS
16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
THE BAD NEWS
Data needs to move
across multiple
different systems.
Teams have different
conflicting requests for
languages and libraries.
Most data science done at
small scale, individually,
and is difficult to replace.
Very few models
reach production.
Data Engineering Data Science (Exploratory) Production (Operational)
17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.17
THE CHALLENGE
Balance these needs
DATA SCIENCE
• Access to granular data
• Flexibility
• Preferred open source tools
• Elastic provisioning
• Compute
• Storage
• Reproducible research
• Path to production
DATA MANAGEMENT
• Security
• Governance
• Standards
• Low maintenance
• Low cost
• Self-service access
18. © Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.18
THE TYPICAL SOLUTION
“If I can’t use my favorite tools, I’ll…”
• Copy data to my laptop
• Copy data to a data science appliance
• Copy data to a cloud service
Why this is a problem:
• Complicates security
• Breaks data governance
• Adds latency to process
• Makes collaboration more difficult
• Complicates model management and
deployment
• Creates infrastructure silos
19. © Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved.19
CLOUDERA DATA SCIENCE WORKBENCH
Accelerate Machine Learning from Research to Production
For data scientists
• Experiment faster
Use R, Python, or Scala with
on-demand compute and
secure CDH data access
• Work together
Share reproducible research
with your whole team
• Deploy with confidence
Get to production repeatably
and without recoding
For IT professionals
• Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
• Secure by default
Leverage common security and
governance across workloads
• Run anywhere
On-premises or in the cloud
20. © Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved.20
CASE STUDY
Transforming Business Decision-Making with Machine Learning at Scale
Background:
• Retail client aimed to use clustering to
understand their most common types of
transactions
• And to find which groups of products
tend to be purchased together
• Cloudera cluster, storing 2 billion rows of
historical transaction data
• Used CDSW to build custom clustering
workflow in Spark and Python
Representative image of clustering
21. © Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved.21
CASE STUDY
Transforming Business Decision-Making with Machine Learning at Scale
Result:
• Clusters describe transactions with far
more nuance than the simple category-
level aggregations that were previously
in use
• Identified major trends in certain types of
transaction, worth multiples of $100M
• Clusters transforming how company
thinks about their business, from shop
floor to board level
• Clustering workflow is easily
maintainable, reproducible, and scalable
Representative image of clustering
22. © Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.22
CASE STUDY
Transforming Business Decision-Making with Machine Learning at Scale
Benefits of CDSW:
• Easy access to big datasets from
Cloudera HDFS
• Access to Spark to apply clustering on
entire 2 billion row dataset
• Notebook environment allows data
scientists to innovate while staying within
secure Cloudera environment
• Collaborative environment enabling
organized project structure and
collaboration within team of data
scientists
Representative image of clustering
23. © Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved.
LIVE DEMO
24. © Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.24
Introducing the Data Science Sandbox
Lovan Chetty
VP, Product
25. © Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved.
SOLUTION
Data Science Workbench
EDH Stack
+ Option for Altus PaaS & More…
Cloud IaaS (Fully-Managed)
+ BYOL options
End to End Management (Cloud>Cluster>Workload)
24x7 Production DevOps
Security, Governance & Compliance
Workload Optimization
Fully-Managed, Complete Cloud Platform for Analytics and Data Science
DevOps Built-In, Cloudera & Cloud IaaS Included. Fast Setup, Ready in Hours.
Fully-Managed Data Science Sandbox as a Service
26. © Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.
The Fastest, Most Cost-Effective Way to Expand or Deploy
a Modern Platform for Data Science in the Cloud.
• Ready Now, with No New Resources 24x7 Production DevOps &
Monitoring
• Secure, Enterprise-Ready: Hybrid Gateways, Governance, Compliance
• Simple: All-in-one solutions for agility, flexibility in analytics & tools
• Cost-Effective: ½ TCO, Best price-performance, SLA Optimization
Benefits Fully-Managed Data Science Sandbox
www.cazena.com/cloudera
WHY CLOUD?
27. © Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved.
Q&A
28. © Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved.28
Q&A - TECHNICAL PANELISTS
Lovan Chetty
VP, Products
lovan@cazena.com
Dr. Daniel Parton
Lead Data Scientist
dparton@bardess.com
Mark Chisam
Senior Solution Engineer
mchisam@cloudera.com
29. © Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved.
The Data Science Sandbox as a Service
Try it Now with the FastStart Business Value Pilot:
4 Weeks to a Guaranteed Business Outcome.
Philip Duplisey,
Senior Director of Consulting
pduplisey@bardess.com
Bardess.com
Bardess: Data Science &
Management Consulting
Cazena: Fully-Managed
Cloudera Solutions for Azure &
AWS
Cloudera: The Modern
Platform for Data Science and
Analytics.
Sam Berg
VP Sales
sberg@cazena.com
Cazena.com
Tia Watson
Partner Manager
twatson@cloudera.com
Cloudera.com