Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
INTRODUCING CLOUDERA MACHINE LEARNING
Cloud-Native Platform for Enterprise Scale Machine Learning
Tristan Zajonc, CTO for ...
2 © Cloudera, Inc. All rights reserved.
DISCLAIMER
The information in this document is proprietary to Cloudera. No part of...
Confidential-Restricted – For Discussion Purposes
Only
3 © Cloudera, Inc. All rights reserved.
Let’s get started. Have you...
4 © Cloudera, Inc. All rights reserved.
THE OPPORTUNITY: THE INDUSTRIALIZATION OF AI
PROTECT
business
CONNECT
products &
s...
5 © Cloudera, Inc. All rights reserved.
A TIPPING POINT OF DATA, COMPUTE, AND ALGORITHMS
Three major trends enable the pot...
6 © Cloudera, Inc. All rights reserved.
7 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK
Industrialized AI requires require...
8 © Cloudera, Inc. All rights reserved.
UBER
Michelangelo: Uber’s Machine Learning Platform
Source: https://eng.uber.com/m...
9 © Cloudera, Inc. All rights reserved.
NETFLIX
Netflix recommendation platform
Source: https://medium.com/netflix-techblo...
10 © Cloudera, Inc. All rights reserved.
FACEBOOK
Facebook’s AI infrastructure
Source: https://www.slideshare.net/KarthikM...
11 © Cloudera, Inc. All rights reserved.
LESSON 1: ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY
Streaming
Ingest
Batch Ing...
12 © Cloudera, Inc. All rights reserved.
LESSON 2: EMERGING CONSENSUS ON MODERN ML STACK
Big Data
Platform
ML/AI
Framework...
Confidential-Restricted – For Discussion Purposes
Only
13 © Cloudera, Inc. All rights reserved.
LESSON 3: AGILITY REQUIRES...
Confidential-Restricted – For Discussion Purposes
Only
14 © Cloudera, Inc. All rights reserved.
LESSON 4: LEVERAGE THE BES...
15 © Cloudera, Inc. All rights reserved.
TODAY: CLOUDERA DATA SCIENCE WORKBENCH
Accelerate Machine Learning from Research ...
16 © Cloudera, Inc. All rights reserved.
TODAY: CURRENT CDSW ARCHITECTURE
Containerized gateway to CDH/HDP cluster for dat...
17 © Cloudera, Inc. All rights reserved.
INTRODUCING
18 © Cloudera, Inc. All rights reserved. 18
enable the end-to-end machine learning lifecycle for teams
building thousands ...
19 © Cloudera, Inc. All rights reserved.
CLOUDERA MACHINE LEARNING (PRIVATE PREVIEW)
Cloud-native enterprise machine learn...
20 © Cloudera, Inc. All rights reserved.
Requires and extends CDH/HDP, pushing
distributed compute to the cluster
CDH / HD...
21 © Cloudera, Inc. All rights reserved.
• Lightweight form-factor optimized for cloud and private cloud
• Elastic compute...
22 © Cloudera, Inc. All rights reserved.
cldr ml run --cmd “python train.py”
23 © Cloudera, Inc. All rights reserved.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
r...
24 © Cloudera, Inc. All rights reserved.
DEMO
Confidential-Restricted – For Discussion Purposes Only25 © Cloudera, Inc. All rights reserved.
OPERATIONAL EXPERIENCE
✓ Si...
Confidential-Restricted – For Discussion Purposes
Only
26 © Cloudera, Inc. All rights reserved.
What capability of Clouder...
© Cloudera, Inc. All rights reserved.
Request access to the private preview
tiny.cloudera.com/CML
QUESTIONS?
THANK YOU
29 © Cloudera, Inc. All rights reserved.
Kubernetes
CMLCML
Engine
Spark
Driver
Spark
Executor
Logging Logging
Project
Volu...
Próxima SlideShare
Cargando en…5
×

Introducing Cloudera Machine Learning: Cloudera’s New Cloud-Native Platform for Enterprise Scale Machine Learning 1.15.19

875 visualizaciones

Publicado el

In this webinar, we introduce Cloudera Machine Learning, Cloudera’s new cloud-native machine learning platform powered by Kubernetes that makes it easy to quickly provision environments and scale resources for end-to-end data engineering and machine learning workflows, with secure data access and a unified experience across on-premises, public and hybrid cloud environments.

Publicado en: Tecnología
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ♣♣♣ http://tinyurl.com/y3hc8gpw
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Introducing Cloudera Machine Learning: Cloudera’s New Cloud-Native Platform for Enterprise Scale Machine Learning 1.15.19

  1. 1. INTRODUCING CLOUDERA MACHINE LEARNING Cloud-Native Platform for Enterprise Scale Machine Learning Tristan Zajonc, CTO for Machine Learning
  2. 2. 2 © Cloudera, Inc. All rights reserved. DISCLAIMER The information in this document is proprietary to Cloudera. No part of this document may be reproduced or disclosed without the express prior written permission of Cloudera. This document is not binding upon Cloudera in any way (including with respect to Cloudera’s course of business, product strategy, future releases, or development commitments), and is not a part of or subject to any license agreement or other agreement you may have with Cloudera. Cloudera makes no commitments about any future developments or functionality in any Cloudera product. The development, release, and timing of release of any software features or functionality described in this document remains at the discretion of Cloudera and you should not rely on any statements about development plans or anticipated functionality in this document when making any purchasing decisions. Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials.”
  3. 3. Confidential-Restricted – For Discussion Purposes Only 3 © Cloudera, Inc. All rights reserved. Let’s get started. Have you • Used Spark? • Used Cloudera Data Science Workbench? • Used Kubernetes? • Used GPUs for deep learning? POLL QUESTION 1
  4. 4. 4 © Cloudera, Inc. All rights reserved. THE OPPORTUNITY: THE INDUSTRIALIZATION OF AI PROTECT business CONNECT products & services (IoT) DRIVE customer insights Thousands of use cases for ML and AI to unlock value within the enterprise ● Predictive maintenance ● Logistics optimization ● Self-driving cars ● Medical diagnostics ● Marketing effectiveness ● Next best action ● Insider threat prevention ● Fraud prevention ● Payment integrity
  5. 5. 5 © Cloudera, Inc. All rights reserved. A TIPPING POINT OF DATA, COMPUTE, AND ALGORITHMS Three major trends enable the potential of machine learning COMPUTEDATA! ALGORITHMS
  6. 6. 6 © Cloudera, Inc. All rights reserved.
  7. 7. 7 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT UBER, NETFLIX, AND FACEBOOK Industrialized AI requires requires new supporting tools and platforms Facebook FBLearner Uber Michelangelo Netflix Recommendation Platform
  8. 8. 8 © Cloudera, Inc. All rights reserved. UBER Michelangelo: Uber’s Machine Learning Platform Source: https://eng.uber.com/michelangelo/ DATA CENTER PUBLIC CLOUD
  9. 9. 9 © Cloudera, Inc. All rights reserved. NETFLIX Netflix recommendation platform Source: https://medium.com/netflix-techblog/netflix-at-spark-ai-summit-2018-5304749ed7fa PUBLIC CLOUD
  10. 10. 10 © Cloudera, Inc. All rights reserved. FACEBOOK Facebook’s AI infrastructure Source: https://www.slideshare.net/KarthikMurugesan2/facebook-machine-learning-infrastructure-2018-slides DATA CENTER
  11. 11. 11 © Cloudera, Inc. All rights reserved. LESSON 1: ML AT SCALE REQUIRES A UNIFIED DATA STRATEGY Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  12. 12. 12 © Cloudera, Inc. All rights reserved. LESSON 2: EMERGING CONSENSUS ON MODERN ML STACK Big Data Platform ML/AI Frameworks Container Infrastructure
  13. 13. Confidential-Restricted – For Discussion Purposes Only 13 © Cloudera, Inc. All rights reserved. LESSON 3: AGILITY REQUIRES SELF-SERVICE PLATFORM Manage pipelines + models Deploy models Automate pipelines Monitor performance DEPLOYDEVELOP Make teams more productive Explore data Develop reports, pipelines, models Collaborate with peers TRAIN Scale resources efficiently Train models Tune parameters Track performance End-to-end machine learning infrastructure for teams building at scale MANAGE Run anywhere with a common architecture Manage access and resources Scale cost with usage
  14. 14. Confidential-Restricted – For Discussion Purposes Only 14 © Cloudera, Inc. All rights reserved. LESSON 4: LEVERAGE THE BEST INFRASTRUCTURE FOR TASK Non-Cloud Private Cloud Public Cloud PaaS I want to maximize • Cost-efficiency • Control, elasticity, and convenience • Control, elasticity, and convenience • Agility I want to minimize • Dependence on unproven technology • Resource contention between departments • Dependence on data center floor space • Dependence on IT and therefore need as simple as possible I want to standardize • On whatever provides the best ROI • On a single environment for the entire data center • On a single cloud provider for all infrastructure needs • On whatever is easiest to use I want to store my data • On premises because cheaper and/or more secure • On premises due to company / government mandate • In the cloud because easier • In the cloud because easier
  15. 15. 15 © Cloudera, Inc. All rights reserved. TODAY: CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production on CDH/HDP For data scientists • Experiment faster Use R, Python, or Scala with on- demand compute and secure CDH data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatably and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  16. 16. 16 © Cloudera, Inc. All rights reserved. TODAY: CURRENT CDSW ARCHITECTURE Containerized gateway to CDH/HDP cluster for data scientists • Benefits • Easy to attach for self-service data science • Seamless connectivity to existing clusters • Isolated, reproducible user environments • Tradeoffs • Requires sizing, managing two clusters • Requires dependency coordination • Not optimized for distributed training • Complex, inelastic setup in public cloud CDH/HDP CDH/HDP Cloudera Manager / Ambari gateway node(s) CDH/HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  17. 17. 17 © Cloudera, Inc. All rights reserved. INTRODUCING
  18. 18. 18 © Cloudera, Inc. All rights reserved. 18 enable the end-to-end machine learning lifecycle for teams building thousands of applications across petabytes of data with an enterprise-ready platform built for the cloud Rapid provisioning and experimentation DevOps mindset From development to production Team collaboration and shared context Any framework, your data, your models Secure by default CPUs, GPUS, TPUs, and beyond Massive, diverse data Automation Runs anywhere
  19. 19. 19 © Cloudera, Inc. All rights reserved. CLOUDERA MACHINE LEARNING (PRIVATE PREVIEW) Cloud-native enterprise machine learning platform DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS CLOUDERA ML RUNTIME Python/R, Spark, TensorFlow, CPU/GPU-Optimized Interactive Development Batch Pipelines Predictive APIs Full capability of CDSW Rapid cloud provisioning and elastic autoscaling Unified data engineering and ML with seamless dependency management Multi-cloud portability powered by Kubernetes Connects to HDFS or cloud object storage and shared metadata Accelerated deep learning with distributed GPU training * Initially targeted for cloud managed K8s services, then OpenShift KUBERNETES EKS, AKS, GKE, OpenShift DATA CATALOG GOVERNANCESECURITY LIFECYCLEWORKLOAD XM STORAGE Amazon S3 Microsoft ADLS HDFS KUDU
  20. 20. 20 © Cloudera, Inc. All rights reserved. Requires and extends CDH/HDP, pushing distributed compute to the cluster CDH / HDP CDH / HDP Cloudera Manager / Ambari gateway node(s) CDH/HDP nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine Cloud optimized, elastic distributed compute; can optionally use CDH / HDP Kubernetes (EKS, GKE, AKS, OpenShift) CDH / HDP CDH / HDP Hive, HDFS, HMS, Sentry, Navigator CML CML ... Master ... Engine EngineEngine EngineEngine Object Store Altus Director / CloudBreak OPTIONAL DATA SCIENCE WORKBENCH vs. CLOUDERA MACHINE LEARNING Cloud ML APIs
  21. 21. 21 © Cloudera, Inc. All rights reserved. • Lightweight form-factor optimized for cloud and private cloud • Elastic compute (CPU / GPU) in the cloud • Unified workflow for distributed data engineering and machine learning • Simple dependency management, particularly for ML workloads • python: pip install, R: install.packages, etc WHY SPARK ON KUBERNETES?
  22. 22. 22 © Cloudera, Inc. All rights reserved. cldr ml run --cmd “python train.py”
  23. 23. 23 © Cloudera, Inc. All rights reserved. from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() results = spark.sparkContext .parallelize(xrange(0, 1000)) .map(f) .collect()
  24. 24. 24 © Cloudera, Inc. All rights reserved. DEMO
  25. 25. Confidential-Restricted – For Discussion Purposes Only25 © Cloudera, Inc. All rights reserved. OPERATIONAL EXPERIENCE ✓ Simple installation and management ✓ Lower costs with elastic compute ✓ Easier to support through self-service ✓ Cloud agility everywhere RESEARCH EXPERIENCE ✓ Instant access to resources ✓ Faster model training and iteration ✓ Unified data engineering and ML ✓ Easier to onboard team members DEPLOYMENT EXPERIENCE ✓ Easier to scale out compute ✓ Faster deployment of models ✓ Better reproducibility and isolation ✓ Flexible deployment targets BENEFITS OF A CLOUD-NATIVE ML PLATFORM
  26. 26. Confidential-Restricted – For Discussion Purposes Only 26 © Cloudera, Inc. All rights reserved. What capability of Cloudera ML is most interesting for your workloads? • Simple installation and management • Autoscaling in the cloud • Dependency management for Spark • Distributed GPU training • Cloud experience on-premise • Potential multi-tenancy with other Kubernetes workloads POLL QUESTION 2
  27. 27. © Cloudera, Inc. All rights reserved. Request access to the private preview tiny.cloudera.com/CML QUESTIONS?
  28. 28. THANK YOU
  29. 29. 29 © Cloudera, Inc. All rights reserved. Kubernetes CMLCML Engine Spark Driver Spark Executor Logging Logging Project Volume Project Volume Spark Executor Project Volume Hive, HDFS, HMS, Sentry, Navigator - Create user service accounts and namespace for quota management & security - Configure Spark driver / executor configuration to connect to K8S API - Populate kerberos related information for accessing HDFS and other CDH services - Populate dependency management related configurations using pod template: image, volumes, CPU/GPU/memory resources, etc - Configure Spark logging and metric information to push to external storage HOW DOES SPARK IN CML WORK?

×