SlideShare una empresa de Scribd logo
1 de 22
February 2019
Cutting Time, Complexity and Costs
from Data Science to Production
 Data science challenges
 Iguazio data science PaaS over Kubernetes
 NVIDIA solutions to accelerate data science with Kubernetes
o GPU integration, TensorRT, RAPIDS
 Hands on tutorial
o End-to-end application: real-time predictive infrastructure monitoring
(ingest, explore, hyper param training, deploy to production)
o Serverless and scale-out data science
o NVIDIA RAPIDS
 Summary
 Q&A
Agenda
Today: ML Lifecycle is Complex and Siloed
Data Prep & Analytics
Data Engineers
Model Building
Data Scientists
Model Deployment
ETL Data Lakes/
Warehouses
CSVs Model
Need more
fresh data
Tune model
Active Data
(CSV/in-mem)
GPU
Data Engineers and App Developers
ML Model
Serving
App Deployment
Interactive App
Stream Processing
Triggers and
InteractionsDatabase
4
ML Challenges in Real Life
Re-coding &
instrumenting
AI Model “Depth” & Accuracy
vs Performance & Costs
Observability &
Reproducibility
Infrastructure and
Software Complexity
Can we gather (and prep)
model features in production?
5
Solution: Fast & Continuous Data Science Pipeline
Collect
Constantly Ingest, Clean &
Tag Data via “Collectors”
Develop
“Serverless” Functions
& Notebooks
Deploy to Production
Triggers and
Interactions
Intelligent
Serverless
Run-Time
In Cloud, On-prem or Edge
Build & Test
CI/CD for Code
& Models
ML Model
Training
CPU GPU
Monitor & Reiterate
Deploy in Any
Cloud or Edge
Deliver Accurate
Results in Real-time
Develop and
Iterate Faster
6
Iguazio: Open & High-Performance Data-Science PaaS
Real-time Structured & Unstructured Data Fabric
External Data
Managed & hardened open-source
plus 3rd party services and apps
Secure real-time data sharing
enabling collaboration & parallelism
Self-service experience from A to Z
CPU GPU
Built on a cloud-native architecture
Compute
7
Develop Faster, Run Faster, Use Less Resources
Managed Jupyter
Data science notebooks and online IDE
 Serverless notebooks: self-service, scale to zero on idle
 Simplify, secure and accelerate data access and processing
 Accelerate applications and training using shared GPUs and ML services
 One-click deployment to production (as jobs, real-time functions and dashboards)
Time Series Stream Table Object
GPU
Historical and real-time data
from a variety of sources
Integrated, 3rd party or cloud
ML services on-demand
8
Deploy Faster to Production with Serverless
Nuclio: the leading open-source serverless for real-time intelligence
 Minimize software development and maintenance overhead
 Extreme performance (Up to 370K events/sec per process, 0.1 ms latency, fast data access)
 Open, supports many event/data sources - HTTP, streaming, messaging, jobs
 One-click deployment from many sources (code, containers, notebooks, git, templates)
Cloud, On-prem
or Edge
One-Click
Deployment
9
Kubernetes
Kubernetes Helps Simplify the Use of Clusters and GPUs
Think of Kubernetes as an operating
system for a cluster.
Kubernetes manages nodes, administer
access, launch containers, jobs and more
Container
Worker
Worker
Worker
Worker
C. C.
Container
Master
Server
API Server
Replication Controller
Scheduler
Daemon
Daemon
Daemon
Daemon
Infrastructure as code:
e.g. PyTorch Training Job
pytorch-job.yml
---
apiVersion: batch/v1
kind: Job
metadata:
name: pytorch-example
spec:
backoffLimit: 5
template:
spec:
imagePullSecrets:
- name: nvcr.dgxkey
containers:
- name: pytorch-container
image: nvcr.io/nvidia/pytorch:18.06-py3
command: ["/bin/sh"]
args: ["-c", "python /examples/mnist/main.py"]
resources:
limits:
nvidia.com/gpu: 1
9
10
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
Data
preparation
/ wrangling
cuDF
Optimized ML
model
training
cuML Visualization
Data
visualization
libraries
data insights
Re-Imagining Data Science Workflow
10
11
Software Stack Python
Data Preparation
cuDF
Visualization
cuGRAPH
Model Training
cuML
CUDA
PYTHON
APACHE ARROW on GPU Memory
DASK
DEEP
LEARNING
FRAMEWORKS
CUDNN
RAPIDS
CUMLCUDF CUGRAPH
Read/Write RAPIDS
dataframes Directly into
Iguzaio Database & FS
RAPIDS – GPU Accelerated Data Science
11
12
2,290
1,956
1,999
1,948
169
157
0 1,000 2,000 3,000
20 CPU
Nodes
30 CPU
Nodes
50 CPU
Nodes
100 CPU
Nodes
DGX-2
5x DGX-1
0 5,000 10,000
20 CPU
Nodes
30 CPU
Nodes
50 CPU
Nodes
100 CPU
Nodes
DGX-2
5x DGX-1
cuML — XGBoost
2,741
1,675
715
379
42
19
0 1,000 2,000 3,000
20 CPU
Nodes
30 CPU
Nodes
50 CPU
Nodes
100 CPU
Nodes
DGX-2
5x DGX-1
End-to-End
cuIO/cuDF —
Load and Data Preparation
Benchmark
200GB CSV dataset; Data preparation
includes joins, variable
transformations.
CPU Cluster Configuration
CPU nodes (61 GiB of memory, 8 vCPUs,
64-bit platform), Apache Spark
DGX Cluster Configuration
5x DGX-1 on InfiniBand network
Time in seconds — Shorter is better
cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost
Faster Speeds, Real World Benefits
12
13
TensorRT – GPU Powered Inference Server
Available with Monthly Updates
Models supported
● TensorFlow GraphDef/SavedModel
● TensorFlow and TensorRT GraphDef
● TensorRT Plans
● Caffe2 NetDef (ONNX import)
Multi-GPU support
Concurrent model execution
Server HTTP REST API/gRPC
Python/C++ client libraries
Python/C++ Client Library
13
Details: https://developer.nvidia.com/tensorrt
Time Series DB
NVIDIA TensorRT Over Kubernetes & Iguazio
Nuclio Function
(Serverless)
14
Demo Time !
15
16
 Eliminate complexity through pre-integrated managed services
 Leverage parallelism and hardware acceleration to improve ROI
 Consolidate data engineering, science and app dev platforms
 Focus on the end goal:
Build and Deploy Intelligent Apps Faster:
Summary
Production Deployment of Intelligent Applications
Q&A
17
info@iguazio.com | www.iguazio.com
Thank You
19
 Many APIs and models on the same data
o SQL, NoSQL, time series, stream, files
o Custom APIs, streaming, sync and ETLs
 Minimize CPU, mem, and ops overhead
Iguazio Smart Unified Real-time DB & File-System
100TB NVMe Flash
(direct attached)
High-Speed Fabric
Real-time Firewall
Smart Real-time DB
Many standard &
open APIs on a
unified DB Engine
Use NVMe Flash
as an extension
of memory
Granular
security
S3
ETL Streams
 In-memory performance, at 1/30 of the
cost and 30x the density (on Flash)
 Real-time time series & data analytics
 Fine-grained security
Apps & Users Backup
Real-time Intelligent Infrastructure Management
Auto-Healing Network Operations
 Replaced a complex Hadoop based data
pipeline that was never productized
 Cross correlating real-time data from
multiple sources with historical data
 AI-based predictions trigger pre-
programmed actions that fix evolving
problems in the network
 Implemented within weeks of initial
deployment
Singtel uses Iguazio to predict network outages and avoid them in real-time
Singtel’s self-healing network is the perfect example of a client shifting from
reactive to proactive with Iguazio
20
21
Real-time Intelligent Infrastructure Management
Maintaining Continuous Fast Response for 2nd Tier Cloud Services
Analyzing and predicting cloud service response time for optimal results
Real-time Data Ingestion
From multiple monitoring tools including Jennifer and Zabbix
Anomaly Detection
Accurate anomaly detection with order of magnitude lower
false positives as opposed to the previous Elasticsearch based
platform
Root Cause Analysis
Real-time root cause analysis from multiple factors. For
example, correlating servers’ CPU’s and applications response
time changes occurring simultaneously
Predictive Analytics
Predicting response times and sending real-time alerts
indicating which factors need to be adjusted to avoid
malfunctions
From deployment to completion in less than two weeks!
22
Evolve Into an Agile Cloud-Native Architecture
YARN
HbaseHDFS
Map
Reduce
Pig,
Hive, ..
DBaaS
S3 (object)
From a Legacy & Resource
Intensive Architecture To Simpler & Modern Approach
Data
Orchestration
Middleware
Your Business Logic
Consume
Innovate
Serverless Data-Science BigData

Más contenido relacionado

La actualidad más candente

Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureSingleStore
 
[Serverless OpenHack Tokyo] Azure Serverless (English)
[Serverless OpenHack Tokyo] Azure Serverless (English)[Serverless OpenHack Tokyo] Azure Serverless (English)
[Serverless OpenHack Tokyo] Azure Serverless (English)Naoki (Neo) SATO
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesSingleStore
 
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017Codemotion
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering RolesAdam Doyle
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...HostedbyConfluent
 
Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...
Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...
Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...confluent
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...confluent
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...HostedbyConfluent
 
Power Your Delta Lake with Streaming Transactional Changes
 Power Your Delta Lake with Streaming Transactional Changes Power Your Delta Lake with Streaming Transactional Changes
Power Your Delta Lake with Streaming Transactional ChangesDatabricks
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixJerome Boulon
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scaleHenry Saputra
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020confluent
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...Databricks
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINESingleStore
 
SharePoint User Group - Leeds - 2015-09-02
SharePoint User Group - Leeds - 2015-09-02SharePoint User Group - Leeds - 2015-09-02
SharePoint User Group - Leeds - 2015-09-02Michael Stephenson
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluDataWorks Summit
 

La actualidad más candente (20)

Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
[Serverless OpenHack Tokyo] Azure Serverless (English)
[Serverless OpenHack Tokyo] Azure Serverless (English)[Serverless OpenHack Tokyo] Azure Serverless (English)
[Serverless OpenHack Tokyo] Azure Serverless (English)
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering Roles
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
 
Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...
Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...
Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail...
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
 
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
Extracting Value from IOT using Azure Cosmos DB, Azure Synapse Analytics and ...
 
Power Your Delta Lake with Streaming Transactional Changes
 Power Your Delta Lake with Streaming Transactional Changes Power Your Delta Lake with Streaming Transactional Changes
Power Your Delta Lake with Streaming Transactional Changes
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud | Kafka Summit 2020
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
 
INTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINEINTRODUCING: CREATE PIPELINE
INTRODUCING: CREATE PIPELINE
 
SharePoint User Group - Leeds - 2015-09-02
SharePoint User Group - Leeds - 2015-09-02SharePoint User Group - Leeds - 2015-09-02
SharePoint User Group - Leeds - 2015-09-02
 
Lessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at HuluLessons Learned - Monitoring the Data Pipeline at Hulu
Lessons Learned - Monitoring the Data Pipeline at Hulu
 

Similar a Webinar: Cutting Time, Complexity and Cost from Data Science to Production

Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptxachakracu
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataMatt Stubbs
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Matt Stubbs
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...Kai Wähner
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformWorkload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformActiveeon
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagedbpublications
 

Similar a Webinar: Cutting Time, Complexity and Cost from Data Science to Production (20)

Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
 
EPSRC CDT Conference
EPSRC CDT ConferenceEPSRC CDT Conference
EPSRC CDT Conference
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformWorkload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning Platform
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 

Más de iguazio

Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018iguazio
 
Stac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesStac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesiguazio
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioiguazio
 
iguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30thiguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30thiguazio
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017iguazio
 

Más de iguazio (7)

Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
The Serverless Native Mindset: Ben Kehoe, iRobot, Serverless NYC 2018
 
Stac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesStac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakes
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
iguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30thiguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30th
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 

Último

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Último (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

Webinar: Cutting Time, Complexity and Cost from Data Science to Production

  • 1. February 2019 Cutting Time, Complexity and Costs from Data Science to Production
  • 2.  Data science challenges  Iguazio data science PaaS over Kubernetes  NVIDIA solutions to accelerate data science with Kubernetes o GPU integration, TensorRT, RAPIDS  Hands on tutorial o End-to-end application: real-time predictive infrastructure monitoring (ingest, explore, hyper param training, deploy to production) o Serverless and scale-out data science o NVIDIA RAPIDS  Summary  Q&A Agenda
  • 3. Today: ML Lifecycle is Complex and Siloed Data Prep & Analytics Data Engineers Model Building Data Scientists Model Deployment ETL Data Lakes/ Warehouses CSVs Model Need more fresh data Tune model Active Data (CSV/in-mem) GPU Data Engineers and App Developers ML Model Serving App Deployment Interactive App Stream Processing Triggers and InteractionsDatabase
  • 4. 4 ML Challenges in Real Life Re-coding & instrumenting AI Model “Depth” & Accuracy vs Performance & Costs Observability & Reproducibility Infrastructure and Software Complexity Can we gather (and prep) model features in production?
  • 5. 5 Solution: Fast & Continuous Data Science Pipeline Collect Constantly Ingest, Clean & Tag Data via “Collectors” Develop “Serverless” Functions & Notebooks Deploy to Production Triggers and Interactions Intelligent Serverless Run-Time In Cloud, On-prem or Edge Build & Test CI/CD for Code & Models ML Model Training CPU GPU Monitor & Reiterate Deploy in Any Cloud or Edge Deliver Accurate Results in Real-time Develop and Iterate Faster
  • 6. 6 Iguazio: Open & High-Performance Data-Science PaaS Real-time Structured & Unstructured Data Fabric External Data Managed & hardened open-source plus 3rd party services and apps Secure real-time data sharing enabling collaboration & parallelism Self-service experience from A to Z CPU GPU Built on a cloud-native architecture Compute
  • 7. 7 Develop Faster, Run Faster, Use Less Resources Managed Jupyter Data science notebooks and online IDE  Serverless notebooks: self-service, scale to zero on idle  Simplify, secure and accelerate data access and processing  Accelerate applications and training using shared GPUs and ML services  One-click deployment to production (as jobs, real-time functions and dashboards) Time Series Stream Table Object GPU Historical and real-time data from a variety of sources Integrated, 3rd party or cloud ML services on-demand
  • 8. 8 Deploy Faster to Production with Serverless Nuclio: the leading open-source serverless for real-time intelligence  Minimize software development and maintenance overhead  Extreme performance (Up to 370K events/sec per process, 0.1 ms latency, fast data access)  Open, supports many event/data sources - HTTP, streaming, messaging, jobs  One-click deployment from many sources (code, containers, notebooks, git, templates) Cloud, On-prem or Edge One-Click Deployment
  • 9. 9 Kubernetes Kubernetes Helps Simplify the Use of Clusters and GPUs Think of Kubernetes as an operating system for a cluster. Kubernetes manages nodes, administer access, launch containers, jobs and more Container Worker Worker Worker Worker C. C. Container Master Server API Server Replication Controller Scheduler Daemon Daemon Daemon Daemon Infrastructure as code: e.g. PyTorch Training Job pytorch-job.yml --- apiVersion: batch/v1 kind: Job metadata: name: pytorch-example spec: backoffLimit: 5 template: spec: imagePullSecrets: - name: nvcr.dgxkey containers: - name: pytorch-container image: nvcr.io/nvidia/pytorch:18.06-py3 command: ["/bin/sh"] args: ["-c", "python /examples/mnist/main.py"] resources: limits: nvidia.com/gpu: 1 9
  • 10. 10 Open Source, End-to-end GPU-accelerated Workflow Built On CUDA Data preparation / wrangling cuDF Optimized ML model training cuML Visualization Data visualization libraries data insights Re-Imagining Data Science Workflow 10
  • 11. 11 Software Stack Python Data Preparation cuDF Visualization cuGRAPH Model Training cuML CUDA PYTHON APACHE ARROW on GPU Memory DASK DEEP LEARNING FRAMEWORKS CUDNN RAPIDS CUMLCUDF CUGRAPH Read/Write RAPIDS dataframes Directly into Iguzaio Database & FS RAPIDS – GPU Accelerated Data Science 11
  • 12. 12 2,290 1,956 1,999 1,948 169 157 0 1,000 2,000 3,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 0 5,000 10,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 cuML — XGBoost 2,741 1,675 715 379 42 19 0 1,000 2,000 3,000 20 CPU Nodes 30 CPU Nodes 50 CPU Nodes 100 CPU Nodes DGX-2 5x DGX-1 End-to-End cuIO/cuDF — Load and Data Preparation Benchmark 200GB CSV dataset; Data preparation includes joins, variable transformations. CPU Cluster Configuration CPU nodes (61 GiB of memory, 8 vCPUs, 64-bit platform), Apache Spark DGX Cluster Configuration 5x DGX-1 on InfiniBand network Time in seconds — Shorter is better cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost Faster Speeds, Real World Benefits 12
  • 13. 13 TensorRT – GPU Powered Inference Server Available with Monthly Updates Models supported ● TensorFlow GraphDef/SavedModel ● TensorFlow and TensorRT GraphDef ● TensorRT Plans ● Caffe2 NetDef (ONNX import) Multi-GPU support Concurrent model execution Server HTTP REST API/gRPC Python/C++ client libraries Python/C++ Client Library 13
  • 14. Details: https://developer.nvidia.com/tensorrt Time Series DB NVIDIA TensorRT Over Kubernetes & Iguazio Nuclio Function (Serverless) 14
  • 16. 16  Eliminate complexity through pre-integrated managed services  Leverage parallelism and hardware acceleration to improve ROI  Consolidate data engineering, science and app dev platforms  Focus on the end goal: Build and Deploy Intelligent Apps Faster: Summary Production Deployment of Intelligent Applications
  • 19. 19  Many APIs and models on the same data o SQL, NoSQL, time series, stream, files o Custom APIs, streaming, sync and ETLs  Minimize CPU, mem, and ops overhead Iguazio Smart Unified Real-time DB & File-System 100TB NVMe Flash (direct attached) High-Speed Fabric Real-time Firewall Smart Real-time DB Many standard & open APIs on a unified DB Engine Use NVMe Flash as an extension of memory Granular security S3 ETL Streams  In-memory performance, at 1/30 of the cost and 30x the density (on Flash)  Real-time time series & data analytics  Fine-grained security Apps & Users Backup
  • 20. Real-time Intelligent Infrastructure Management Auto-Healing Network Operations  Replaced a complex Hadoop based data pipeline that was never productized  Cross correlating real-time data from multiple sources with historical data  AI-based predictions trigger pre- programmed actions that fix evolving problems in the network  Implemented within weeks of initial deployment Singtel uses Iguazio to predict network outages and avoid them in real-time Singtel’s self-healing network is the perfect example of a client shifting from reactive to proactive with Iguazio 20
  • 21. 21 Real-time Intelligent Infrastructure Management Maintaining Continuous Fast Response for 2nd Tier Cloud Services Analyzing and predicting cloud service response time for optimal results Real-time Data Ingestion From multiple monitoring tools including Jennifer and Zabbix Anomaly Detection Accurate anomaly detection with order of magnitude lower false positives as opposed to the previous Elasticsearch based platform Root Cause Analysis Real-time root cause analysis from multiple factors. For example, correlating servers’ CPU’s and applications response time changes occurring simultaneously Predictive Analytics Predicting response times and sending real-time alerts indicating which factors need to be adjusted to avoid malfunctions From deployment to completion in less than two weeks!
  • 22. 22 Evolve Into an Agile Cloud-Native Architecture YARN HbaseHDFS Map Reduce Pig, Hive, .. DBaaS S3 (object) From a Legacy & Resource Intensive Architecture To Simpler & Modern Approach Data Orchestration Middleware Your Business Logic Consume Innovate Serverless Data-Science BigData