SlideShare una empresa de Scribd logo
1 de 30
© Cloudera, Inc. All rights reserved.
Enterprise-Ready Data Science:
Scaling, Governance, and Operationalization
© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.2
Mark Chisam
Senior Solution Engineer
Introducing Cloudera Data Science Workbench
© Cloudera, Inc. All rights reserved. 3© Cloudera, Inc. All rights reserved.
© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
© Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
© Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
© Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved.
© Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved.
© Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.9
Dr. Daniel Parton
Lead Data Scientist
Operationalizing Data Science for Enterprises
© Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved.
Bardess® is a consulting company focused on
designing and implementing data analytics solutions.
We are a team of data and business professionals,
who ask insightful questions, extend boundaries and
take action.
We transform data into
insights and action, everyday.
1
0
© Cloudera, Inc. All rights reserved.
11
Requirements
Discovery
Strategy +
Planning
Solution
Design
Ingestion +
Shaping
Data
Architecture
Storage +
Processing
Predictive
Analytics
Machine
Learning
Artificial
Intelligence
Visualization
Data
Discovery
Dev / Ops
Bardess Data Practices
MANAGEMENT CONSULTING DATA OPS DATA SCIENCE DATA ANALYTICS
© Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved.12
AI
MACHINE
LEARNING
DATA SCIENCE
ANALYTICS
"BIG DATA"
© Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
WHAT IS A DATA SCIENTIST?
© Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.
WHAT IS A DATA SCIENTIST?
© Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.15
Data Engineering Data Science (Exploratory) Production (Operational)
Data has never been
more plentiful.
Open source data science and
machine learning libraries are
rapidly evolving.
Commodity (and on-demand)
compute makes scalable
production machine learning
affordable.
Reports,
Dashboards
Production Data
Pipelines
Batch scoring
…
THE GOOD NEWS
© Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
THE BAD NEWS
Data needs to move
across multiple
different systems.
Teams have different
conflicting requests for
languages and libraries.
Most data science done at
small scale, individually,
and is difficult to replace.
Very few models
reach production.
Data Engineering Data Science (Exploratory) Production (Operational)
© Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.17
THE CHALLENGE
Balance these needs
DATA SCIENCE
• Access to granular data
• Flexibility
• Preferred open source tools
• Elastic provisioning
• Compute
• Storage
• Reproducible research
• Path to production
DATA MANAGEMENT
• Security
• Governance
• Standards
• Low maintenance
• Low cost
• Self-service access
© Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.18
THE TYPICAL SOLUTION
“If I can’t use my favorite tools, I’ll…”
• Copy data to my laptop
• Copy data to a data science appliance
• Copy data to a cloud service
Why this is a problem:
• Complicates security
• Breaks data governance
• Adds latency to process
• Makes collaboration more difficult
• Complicates model management and
deployment
• Creates infrastructure silos
© Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved.19
CLOUDERA DATA SCIENCE WORKBENCH
Accelerate Machine Learning from Research to Production
For data scientists
• Experiment faster
Use R, Python, or Scala with
on-demand compute and
secure CDH data access
• Work together
Share reproducible research
with your whole team
• Deploy with confidence
Get to production repeatably
and without recoding
For IT professionals
• Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
• Secure by default
Leverage common security and
governance across workloads
• Run anywhere
On-premises or in the cloud
© Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved.20
CASE STUDY
Transforming Business Decision-Making with Machine Learning at Scale
Background:
• Retail client aimed to use clustering to
understand their most common types of
transactions
• And to find which groups of products
tend to be purchased together
• Cloudera cluster, storing 2 billion rows of
historical transaction data
• Used CDSW to build custom clustering
workflow in Spark and Python
Representative image of clustering
© Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved.21
CASE STUDY
Transforming Business Decision-Making with Machine Learning at Scale
Result:
• Clusters describe transactions with far
more nuance than the simple category-
level aggregations that were previously
in use
• Identified major trends in certain types of
transaction, worth multiples of $100M
• Clusters transforming how company
thinks about their business, from shop
floor to board level
• Clustering workflow is easily
maintainable, reproducible, and scalable
Representative image of clustering
© Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.22
CASE STUDY
Transforming Business Decision-Making with Machine Learning at Scale
Benefits of CDSW:
• Easy access to big datasets from
Cloudera HDFS
• Access to Spark to apply clustering on
entire 2 billion row dataset
• Notebook environment allows data
scientists to innovate while staying within
secure Cloudera environment
• Collaborative environment enabling
organized project structure and
collaboration within team of data
scientists
Representative image of clustering
© Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved.
LIVE DEMO
© Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.24
Introducing the Data Science Sandbox
Lovan Chetty
VP, Product
© Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved.
SOLUTION
Data Science Workbench
EDH Stack
+ Option for Altus PaaS & More…
Cloud IaaS (Fully-Managed)
+ BYOL options
End to End Management (Cloud>Cluster>Workload)
24x7 Production DevOps
Security, Governance & Compliance
Workload Optimization
Fully-Managed, Complete Cloud Platform for Analytics and Data Science
DevOps Built-In, Cloudera & Cloud IaaS Included. Fast Setup, Ready in Hours.
Fully-Managed Data Science Sandbox as a Service
© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.
The Fastest, Most Cost-Effective Way to Expand or Deploy
a Modern Platform for Data Science in the Cloud.
• Ready Now, with No New Resources 24x7 Production DevOps &
Monitoring
• Secure, Enterprise-Ready: Hybrid Gateways, Governance, Compliance
• Simple: All-in-one solutions for agility, flexibility in analytics & tools
• Cost-Effective: ½ TCO, Best price-performance, SLA Optimization
Benefits Fully-Managed Data Science Sandbox
www.cazena.com/cloudera
WHY CLOUD?
© Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved.
Q&A
© Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved.28
Q&A - TECHNICAL PANELISTS
Lovan Chetty
VP, Products
lovan@cazena.com
Dr. Daniel Parton
Lead Data Scientist
dparton@bardess.com
Mark Chisam
Senior Solution Engineer
mchisam@cloudera.com
© Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved.
The Data Science Sandbox as a Service
Try it Now with the FastStart Business Value Pilot:
4 Weeks to a Guaranteed Business Outcome.
Philip Duplisey,
Senior Director of Consulting
pduplisey@bardess.com
Bardess.com
Bardess: Data Science &
Management Consulting
Cazena: Fully-Managed
Cloudera Solutions for Azure &
AWS
Cloudera: The Modern
Platform for Data Science and
Analytics.
Sam Berg
VP Sales
sberg@cazena.com
Cazena.com
Tia Watson
Partner Manager
twatson@cloudera.com
Cloudera.com
© Cloudera, Inc. All rights reserved.
THANK YOU

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...How komatsu is driving operational efficiencies using io t and machine learni...
How komatsu is driving operational efficiencies using io t and machine learni...
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 

Similar a Introducing the data science sandbox as a service 8.30.18

Similar a Introducing the data science sandbox as a service 8.30.18 (20)

Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 

Más de Cloudera, Inc.

Más de Cloudera, Inc. (9)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Introducing the data science sandbox as a service 8.30.18

  • 1. © Cloudera, Inc. All rights reserved. Enterprise-Ready Data Science: Scaling, Governance, and Operationalization
  • 2. © Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.2 Mark Chisam Senior Solution Engineer Introducing Cloudera Data Science Workbench
  • 3. © Cloudera, Inc. All rights reserved. 3© Cloudera, Inc. All rights reserved.
  • 4. © Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
  • 5. © Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
  • 6. © Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
  • 7. © Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved.
  • 8. © Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved.
  • 9. © Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.9 Dr. Daniel Parton Lead Data Scientist Operationalizing Data Science for Enterprises
  • 10. © Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved. Bardess® is a consulting company focused on designing and implementing data analytics solutions. We are a team of data and business professionals, who ask insightful questions, extend boundaries and take action. We transform data into insights and action, everyday. 1 0
  • 11. © Cloudera, Inc. All rights reserved. 11 Requirements Discovery Strategy + Planning Solution Design Ingestion + Shaping Data Architecture Storage + Processing Predictive Analytics Machine Learning Artificial Intelligence Visualization Data Discovery Dev / Ops Bardess Data Practices MANAGEMENT CONSULTING DATA OPS DATA SCIENCE DATA ANALYTICS
  • 12. © Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved.12 AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA"
  • 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. WHAT IS A DATA SCIENTIST?
  • 14. © Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved. WHAT IS A DATA SCIENTIST?
  • 15. © Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.15 Data Engineering Data Science (Exploratory) Production (Operational) Data has never been more plentiful. Open source data science and machine learning libraries are rapidly evolving. Commodity (and on-demand) compute makes scalable production machine learning affordable. Reports, Dashboards Production Data Pipelines Batch scoring … THE GOOD NEWS
  • 16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved. THE BAD NEWS Data needs to move across multiple different systems. Teams have different conflicting requests for languages and libraries. Most data science done at small scale, individually, and is difficult to replace. Very few models reach production. Data Engineering Data Science (Exploratory) Production (Operational)
  • 17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.17 THE CHALLENGE Balance these needs DATA SCIENCE • Access to granular data • Flexibility • Preferred open source tools • Elastic provisioning • Compute • Storage • Reproducible research • Path to production DATA MANAGEMENT • Security • Governance • Standards • Low maintenance • Low cost • Self-service access
  • 18. © Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.18 THE TYPICAL SOLUTION “If I can’t use my favorite tools, I’ll…” • Copy data to my laptop • Copy data to a data science appliance • Copy data to a cloud service Why this is a problem: • Complicates security • Breaks data governance • Adds latency to process • Makes collaboration more difficult • Complicates model management and deployment • Creates infrastructure silos
  • 19. © Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved.19 CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production For data scientists • Experiment faster Use R, Python, or Scala with on-demand compute and secure CDH data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  • 20. © Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved.20 CASE STUDY Transforming Business Decision-Making with Machine Learning at Scale Background: • Retail client aimed to use clustering to understand their most common types of transactions • And to find which groups of products tend to be purchased together • Cloudera cluster, storing 2 billion rows of historical transaction data • Used CDSW to build custom clustering workflow in Spark and Python Representative image of clustering
  • 21. © Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved.21 CASE STUDY Transforming Business Decision-Making with Machine Learning at Scale Result: • Clusters describe transactions with far more nuance than the simple category- level aggregations that were previously in use • Identified major trends in certain types of transaction, worth multiples of $100M • Clusters transforming how company thinks about their business, from shop floor to board level • Clustering workflow is easily maintainable, reproducible, and scalable Representative image of clustering
  • 22. © Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.22 CASE STUDY Transforming Business Decision-Making with Machine Learning at Scale Benefits of CDSW: • Easy access to big datasets from Cloudera HDFS • Access to Spark to apply clustering on entire 2 billion row dataset • Notebook environment allows data scientists to innovate while staying within secure Cloudera environment • Collaborative environment enabling organized project structure and collaboration within team of data scientists Representative image of clustering
  • 23. © Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved. LIVE DEMO
  • 24. © Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.24 Introducing the Data Science Sandbox Lovan Chetty VP, Product
  • 25. © Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved. SOLUTION Data Science Workbench EDH Stack + Option for Altus PaaS & More… Cloud IaaS (Fully-Managed) + BYOL options End to End Management (Cloud>Cluster>Workload) 24x7 Production DevOps Security, Governance & Compliance Workload Optimization Fully-Managed, Complete Cloud Platform for Analytics and Data Science DevOps Built-In, Cloudera & Cloud IaaS Included. Fast Setup, Ready in Hours. Fully-Managed Data Science Sandbox as a Service
  • 26. © Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved. The Fastest, Most Cost-Effective Way to Expand or Deploy a Modern Platform for Data Science in the Cloud. • Ready Now, with No New Resources 24x7 Production DevOps & Monitoring • Secure, Enterprise-Ready: Hybrid Gateways, Governance, Compliance • Simple: All-in-one solutions for agility, flexibility in analytics & tools • Cost-Effective: ½ TCO, Best price-performance, SLA Optimization Benefits Fully-Managed Data Science Sandbox www.cazena.com/cloudera WHY CLOUD?
  • 27. © Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved. Q&A
  • 28. © Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved.28 Q&A - TECHNICAL PANELISTS Lovan Chetty VP, Products lovan@cazena.com Dr. Daniel Parton Lead Data Scientist dparton@bardess.com Mark Chisam Senior Solution Engineer mchisam@cloudera.com
  • 29. © Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved. The Data Science Sandbox as a Service Try it Now with the FastStart Business Value Pilot: 4 Weeks to a Guaranteed Business Outcome. Philip Duplisey, Senior Director of Consulting pduplisey@bardess.com Bardess.com Bardess: Data Science & Management Consulting Cazena: Fully-Managed Cloudera Solutions for Azure & AWS Cloudera: The Modern Platform for Data Science and Analytics. Sam Berg VP Sales sberg@cazena.com Cazena.com Tia Watson Partner Manager twatson@cloudera.com Cloudera.com
  • 30. © Cloudera, Inc. All rights reserved. THANK YOU