SlideShare una empresa de Scribd logo
1 de 115
Descargar para leer sin conexión
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AI & Machine Learning Pipelines with Knative
@AnimeshSingh @Tomipli
≈
Center for Open Source
Data and AI
Technologies (CODAIT)
Code – Build and improve practical frameworks to enable
more developers to realize immediate value.
Content – Showcase solutions for complex and
real-world AI problems.
Community – Bring developers and data
scientists to engage with IBM
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
Improving Enterprise AI lifecycle in
Open Source
•  Team	contributes	to	over	10	open	source	projects
•  17	committers	and	many	contributors	in	Apache	projects	
•  Over	1100	JIRAs	and	66,000	lines	of	code	committed	to	Apache	Spark	itself;	over	65,000	
LoC	into	SystemML			
•  Over	25	product	lines	within	IBM	leveraging	Apache	Spark	
•  Speakers	at	over	100	conferences,	meetups,	unconferences	and	more
CODAIT
codait.org
Agenda
3
•  Progress in ML and DL
•  AI and Cloud: Complimentary
•  Why we need Knative to build AI
Platform
•  How to build a transparent and
trusted AI platform leveraging
Knative
•  Demo
CODAIT
codait.org
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
4
1997
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
5
2011
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
6
2017
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
7
2017
2018
2011
IBM Watson
Jeopardy
2017
AlphaGo
Apple’s
releases Siri
1997
…
Facebook’s
face 
recognition
2015 2016
Siri gets
deep learning
IBM Deep Blue
chess
AlexNet
Progress in Deep Learning
2012
Introduced
deep learning
with GPUs
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
A human brain has:
•  200 billion neurons
•  32 trillion connections between them
•  25 million “neurons”
•  100 million connections (parameters)
Deep Learning = Training Artificial Neural Networks
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
Neural Network Design Workflow! domain
data
design
neural network
 HPO
•  neural network structure
•  hyperparameters
NO
Performance
meets needs?
Start another
experiment
 optimal
hyperparameters
Neural Network Design Workflow! domain
data
HPO
•  neural network structure
•  hyperparameters
NO
yes
Performance
meets needs?
Start another
experiment
trained
model
deployCloud
optimal
hyperparameters
evaluate
BAD
 Still
good!
design
neural network
So AI in general and
Deep Learning in
particular are very
iterative and repetitive.
And they need Cloud.
Why?
AI requires
the strength
of HPC &
GPUs
Ability to
scale AI
workloads on
demand
15
!
!
1.  Model/Data
Parallelism!
!
2.  MPI!
3.  NCCL!
4.  …..!
!
Ability to utilize various
technologies and achieve high
performance computing.
16
!
!
Microservices!
!
Containers!
!
DevOps
automation!
To scale we need to go
Cloud native for AI
API!
UI!
CLI!
Kubernetes !
!
Master!
Worker Node 1!
Worker Node 2!
Worker Node 3!
Worker Node n!
Registry
•  Etcd
•  API Server
•  Controller Manager
Server
•  Scheduler Server
And Cloud native means we use Kubernetes!!
18	
But is Kubernetes enough for Cloud Native platform?!
19	
Kubernetes is not the end game – Says who?!
20	
What else do we need? Let`s understand it from the context of an AI Lifecycle!
AI		
PLATFORM
AIOps 
Prepared
and
Analyzed
Data
AI		
PLATFORM	
Initial Model
Trained
Model
Deployed
Model
We need a Cloud native AI Platform to build, train, deploy and monitor Models!
AIOps 
Prepared
and
Analyzed
Data
AI		
PLATFORM	
Trained
Model
Deployed
Model
Also, we need a transparent and trusted AI Platform!
Prepared
and
Analyzed
Data
AI		
PLATFORM	
Initial Model
Deployed
Model
Is	model	training	is	showing	increasing	
loss?	
Are	model	weights		biased?	
Is	the	model	vulnerable	to	adversarial	
attacks?	
Has	the	training	data	changed?	
Are	the	model	predictions	less	accurate?	
Are	Hyperparameters	suboptimal?	
Are	predictions	biased?	
Is	the	dataset	biased?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Infact, we need a transparent, trusted and automated AI Pipeline!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Is	model	training	is	showing	increasing	
loss?	
Are	model	weights		biased?	
Is	the	model	vulnerable	to	adversarial	
attacks?	
Has	the	training	data	changed?	
Are	ate	model	predictions	less	accurate?	
Are	Hyperparameters	suboptimal?	
Are	predictions	biased?	
Is	the	dataset	biased?
Trigger:	Trained	Model	is	showing	increasing	loss		
Prepare Data
AIOps 
Prepared
and
Analyzed
Data
Initial Model
Trained
Model
Deployed
Model
Train
Deploy
Harden
Train
Deploy
Trigger:	Model	performance	is	suboptimal	
Implement Defense
Train
Deploy
Trigger:	Model	is	vulnerable	to	attack	
Trigger:	Data	changed	
Optimize
Hyperparameters
Train
Trigger:	Bias	Detected	
Actions	
Transparent, trusted, automated and event driven AI Pipeline!
Trigger:	Trained	Model	is	showing	increasing	loss		
Prepare Data
AIOps 
Prepared
and
Analyzed
Data
Initial Model
Trained
Model
Deployed
Model
Train
Deploy
Harden
Train
Deploy
Trigger:	Model	performance	is	suboptimal	
Implement Defense
Train
Deploy
Trigger:	Model	is	vulnerable	to	attack	
Trigger:	Data	changed	
Optimize
Hyperparameters
Train
Trigger:	Bias	Detected	
Actions	
Transparent, trusted, automated, event driven and auditable AI Pipeline!
Trigger:	Trained	Model	is	showing	increasing	loss		
Prepare Data
AIOps 
Prepared
and
Analyzed
Data
Initial Model
Trained
Model
Deployed
Model
Train
Deploy
Debias
Train
Deploy
Trigger:	Model	performance	is	biased	
Implement Defense
Train
Deploy
Trigger:	Model	is	vulnerable	to	attack	
Trigger:	Data	changed	
Optimize
Hyperparameters
Train
Trigger:	Bias	Detected	
Data	Scientist	
AIOps	Engineer	
Actions	
Transparent, trusted, automated, event driven, auditable AI Pipeline as a Service!


Training Pipe











Model
Validation
Pipe






If we translate it to logical architecture, it looks like this…!










Data Pipe



















Model
Deployment
Pipe




















Deployment
Analysis
Pipe







AI Pipeline (Python Definition – Orchestrate and Track)
AI Developer and Data Scientist
 AI Operator




Python
Function











Python
Function











Python
Function











Python
Function











Python
Function
So we need more than Kubernetes. We need to be able to……!
build Data
Scientists
Code
orchestrate
the ML
code
automate
the ML
workflow
send event
notifications
Data	Scientist	
AIOps	Engineer
That means, we need the concepts of..!
Build
 Serving
Pipeline
Eventing
So……We need KNative!
Build
 Serving
Pipeline
Eventing
● Build
● Eventing
● Serving
● Pipeline
KNative provides a set of building blocks that enable modern, source-centric and container-based
serverless workloads on Kubernetes. Uses K8S CRDs to define a new set of APIs around
Well….what is Knative?!
Build
 Serving
Pipeline
Eventing
Knative Build Components
•  Build
•  Builder
•  BuildTemplate
For example, you can write a build that uses
Kubernetes-native resources to obtain your source
code from a repository, build a container image, then
run that image.
•  A Build can include multiple steps where each step specifies
a Builder.
•  A Builder is a type of container image that you create to
accomplish any task, whether that's a single step in a
process, or the whole process itself.
•  The steps in a Build can push to a repository.
•  A BuildTemplate can be used to defined reusable
parameterized templates.
Knative Build!
Build — Source-to-container build orchestration
KNative Eventing components
●  Sources — Sources that are firing events
●  Channels — A single event forwarding and
persistence layer
●  Subscriptions — Deliver and forward
events to channels/services.
	
	
Knative Eventing is designed around the
following goals:
●  Knative	Eventing	services	are	loosely	coupled.	
●  Event	producers	and	event	sources	are	independent.		
●  Other	services	can	be	connected	to	the	Eventing	
system..	
●  Ensure	cross-service	interoperability.	Knative	
Eventing	is	consistent	with	the	CloudEvents	
specification	that	is	developed	by	the	CNCF	
Serverless	WG.	
Knative Eventing!
Eventing — Delivery and management of events, universal subscription, binding services to event
ecosystems,
Knative Pipeline!
Knative Pipeline components
● Task — A collection of sequential steps you would want
to run as part of your CI flow. It including the inputs/
outputs and steps
● Pipeline — A graph of Tasks to execute.
● Runs — To invoke a Pipeline or a Task.
	
High	level	details	of	this	design:	
	
•  Pipelines	do	not	know	what	will	trigger	them,	they	can	be	triggered	
by	events	or	by	manually	creating	PipelineRuns	
•  Tasks	can	exist	and	be	invoked	completely	independently	of	
Pipelines	
•  Test	results	are	a	first	class	concept,	being	able	to	navigate	test	
results	easily	is	powerful		
•  Tasks	can	depend	on	artifacts,	output	and	parameters	created	by	
other	tasks.	
•  Resources	are	the	artifacts	used	as	inputs	and	outputs	of	TaskRuns.	
Pipeline— Configure	and	run	CI/CD	style	pipelines	for	your	kubernetes	application
●  Configuration
○  Desired current state of deployment (#HEAD)
○  Records both code and configuration (separated, ala 12 factor)
○  Stamps out builds / revisions as it is updated
●  Revision
○  Code and configuration snapshot
○  k8s infra: Deployment, ReplicaSet, Pods, etc
●  Route
○  Traffic assignment to Revisions (fractional scaling or by name)
○  Built using Istio
●  Service
○  Provides a simple entry point for UI and CLI tooling to achieve
common behavior
○  Acts as a top-level controller to orchestrate Route and
Configuration.
Configuration
Revision
Revision
Revision
Route
latest
explicit
creates
Service	
creates
creates
Knative Serving!
Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing
traffic	
Knative Serving components
Revision	
Serving
Knative Serving!
Deployment	
Revision	
Serving
Knative Serving Building Block!
Worker Node	 Worker Node	
Deployment	
Replica Set	
Revision	
Serving
Knative Serving Building Block!
Worker Node	
Pod	
Worker Node	
Deployment	
Replica Set	
Revision	
Serving
Knative Serving Building Block!
Worker Node	
Pod	
Worker Node	
Deployment	
Replica Set	
Revision	
Pod	
Serving
Knative Serving Building Block!
Worker Node	 Worker Node	
Deployment	
Revision	
Replica Set	
Serving
Knative Serving Building Block!
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Route	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Route	
Serving
Knative Serving Building Block!
Configuration	
Revision 1	
Revision 2	
Revision 3	
Revision 4	
Revision 5	
Route	
5%	
95%	
Serving
Knative Serving Building Block!
Route	
Configuration	 Revision	
Creates	
References	
References	
Serving
Knative Serving Building Block!
Route	
Configuration	 Revision	
Creates	
References	
References	
Service	
Knative Serving Building Block!
Route	
Configuration	 Revision	
Creates	
References	
References	
Creates	
Service	
Creates	
Knative Serving Building Block!
● Build — Source-to-container build orchestration
● Eventing — Universal subscription, binding services to event ecosystems, delivery and management of events
● Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic
● Pipeline — Configure and run CI/CD style pipelines for your kubernetes application.
.
So Knative uses K8S CRDs to define a new set of APIs
!
Build
 Serving
Pipeline
Eventing
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
But do we really need something like Knative to build an ML Platform?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Let`s go through different phases of AI Lifecycle through our use cases…!
Many tools available to build initial models!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Many tools to train machine learning and deep learning models!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
We need a multi framework ML- DL cloud native platform !
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
e.g. Tensorflow is awesome ,
but has static graphs so
PyTorch’s dynamic graphs are
becoming more popular.
How do we handle multiple
Open Source Deep Learning
frameworks in a consistent way.
and leverage
the power of cloud for
distributed computing?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter: Fabric for Deep Learning!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL
Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/projects/
fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20/
fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20/
democratize-ai-with-fabric-for-deep-learning
Deep Learning as a Service within Watson Studio
https://www.ibm.com/cloud/deep-learning
Research paper: “Scalable Multi-Framework
Management of Deep Learning Training Jobs”
http://learningsys.org/nips17/assets/papers/
paper_29.pdf
•  Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’
aims at making Deep Learning easily accessible to Data
Scientists, and AI developers.
•  FfDL Provides a consistent way to train and visualize Deep
Learning jobs across multiple frameworks like TensorFlow,
Caffe, PyTorch, Keras etc.
FfDL
58
Community Partners
FfDL is one of InfoWorld’s 2018 Best of Open
Source Software Award winners for machine
learning and deep learning!
Fabric for Deep Learning
https://github.com/IBM/FfDL
FfDL is built using Microservices architecture
on Kubernetes
•  FfDL platform uses a microservices architecture to offer
resilience, scalability, multi-tenancy, and security without
modifying the deep learning frameworks, and with no or
minimal changes to model code.
•  FfDL control plane microservices are deployed as pods on
Kubernetes to manage this cluster of GPU- and CPU-
enabled machines effectively
•  Tested Platforms: Kube DIND, IBM Cloud Public, IBM Cloud
Private, GPUs using both Kubernetes feature gate
Accelerators and NVidia device plugins
59
source code
training
definition
Access to elastic compute leveraging Kubernetes
Auto-allocation means infrastructure is used only when needed
Kubernetes container
training
artifacts
compute cluster
NVIDIA Tesla K80, P100, V100
Cloud Object Storage
Training assets are
managed and tracked.
IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 60
NVIDIA GPUs
Kubernetes
container orchestration

training runs
containers
Model training distributed across containers
server cluster
dataset
Cloud Object Storage
61
OBJECT
STORAGE	
Model
Definition


Training
Data

Trained
Models


REST 
API
Parameter
Server
Lifecycle 
Manager
Job Monitor
Training
Data
Mongo
DB
Trainer 
Service
Launch
Job
Status
Job Info
"

Prometheus
Push Gateway
Alert Manager


Web UI
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Log Collector
Training Job
MOUNT	
OBJECT	
STORAGE	
BUCKET	
Elastic
Searc
h
FfDL: Architecture - Current Release!
Open MPI
CLIs
Browser
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Great , but we also need a batch scheduler for AI Workloads. Enter Kube-Batch!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
kube-batch
OBJECT
STORAGE	
Model
Definition


Training
Data

Trained
Models


REST 
API
Parameter
Server
Job Monitor
Mongo
DB
Trainer 
Service
Launch
Job
Job Info
"

Prometheus
Push Gateway
Alert Manager


Web UI
Learner (e.g. TensorFlow, Caffe,
PyTorch, Keras etc.)
Controller
Learner Pod
Log Collector
Training Job
MOUNT	
OBJECT	
STORAGE	
BUCKET	
Elastic
Searc
h
Open MPI
CLIs
Browser
Scheduling	
Engine	
Kube-
Batch	
QueueJob	
Advanced Batch Scheduling with Kube-Batch !Advanced Batch Scheduling with Kube-Batch !
!
!
!
!
!
!
!
!
!
!
!
!
Kube-Batch: Filling gaps for FfDL and AI Workloads!
•  Job Queuing for enabling job priority and preemption
•  Holistic scheduling
•  Support for job dynamic prioritization/budgeting
•  User requested time based scheduling
•  Preemption/resumption in a supported checkpoint/
restart env
•  Topology aware placement (data intensive workloads
requiring network topology aware placement)
•  Partial preemption of elastic jobs
•  Performance estimation
•  Kube-Batch
–  Kubernetes incubator project
–  Introduces batch scheduling in Kubernetes
–  Support for simple job definition and lifecycle
management
–  Support for job queuing
–  Introduces queue-based quota management
12 Watson
services/apps
represented as
800+
Kubernetes
services
IBM Watson workloads:
Proven AI workload on IBM
Cloud Kubernetes Service
One deployment example:
3000+ pods on
500+ nodes
“We no longer worry about managing the
infrastructure because IBM Cloud
Kubernetes Service takes care of that
for us.” – Watson Project Team
Jason McGee / © 2018 IBM Corporation
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Training is accomplished. Model is ready – Can we trust it?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Can	the	model	be	trusted?
Is it fair?
Is it easy to
understand?
Did anyone
tamper with it? Is it accountable?
#21, #32, #93	
#21, #32, #93	
What does it take to trust a decision made by a machine?!
(Other than that it is 99% accurate)?!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
So let`s start with vulnerability detection of Models?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Is	the	model	vulnerable	to	
adversarial	attacks?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter: Adversarial Robustness Toolbox!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
ART
IBM Adversarial Robustness
Toolbox
ART
ART is a library dedicated to adversarial
machine learning. Its purpose is to allow rapid
crafting and analysis of attack and defense
methods for machine learning models. The
Adversarial Robustness Toolbox provides an
implementation for many state-of-the-art
methods for attacking and defending
classifiers.
71
https://github.com/IBM/adversarial-robustness-
toolbox
The Adversarial Robustness Toolbox contains
implementations of the following attacks:
Deep Fool (Moosavi-Dezfooli et al., 2015)
Fast Gradient Method (Goodfellow et al., 2014)
Jacobian Saliency Map (Papernot et al., 2016)
Universal Perturbation (Moosavi-Dezfooli et al., 2016)
Virtual Adversarial Method (Moosavi-Dezfooli et al.,
2015)
C&W Attack (Carlini and Wagner, 2016)
NewtonFool (Jang et al., 2017)
The following defense methods are also supported:
Feature squeezing (Xu et al., 2017)
Spatial smoothing (Xu et al., 2017)
Label smoothing (Warde-Farley and Goodfellow, 2016)
Adversarial training (Szegedy et al., 2013)
Virtual adversarial training (Miyato et al., 2017)
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Robustness check accomplished. How do we check for bias throughout lifecycle?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Are	model	weights		
and	classifiers	biased?	
Are	predictions	
biased?	
Is	the	dataset	biased?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter: AI Fairness 360!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIF360
AI Fairness 360
https://github.com/IBM/AIF360
AIF360AIF360 toolkit is an open-source library to
help detect and remove bias in machine
learning models.
The AI Fairness 360 Python package includes
a comprehensive set of metrics for datasets
and models to test for biases, explanations for
these metrics, and algorithms to mitigate bias
in datasets and models.
Toolbox
Fairness metrics (30+)
Fairness metric explanations
Bias mitigation algorithms (9+)
74
Supported bias mitigation algorithms
Optimized Preprocessing (Calmon et al., 2017)
Disparate Impact Remover (Feldman et al., 2015)
Equalized Odds Postprocessing (Hardt et al., 2016)
Reweighing (Kamiran and Calders, 2012)
Reject Option Classification (Kamiran et al., 2012)
Prejudice Remover Regularizer (Kamishima et al., 2012)
Calibrated Equalized Odds Postprocessing (Pleiss et al.,
2017)
Learning Fair Representations (Zemel et al., 2013)
Adversarial Debiasing (Zhang et al., 2018)
Supported fairness metrics
Comprehensive set of group fairness metrics derived
from selection rates and error rates
Comprehensive set of sample distortion metrics
Generalized Entropy Index (Speicher et al., 2018)
(d’Alessandro et al., 2017)
AIF 360 detects for fairness in building and deploying models throughout AI
Lifecycle!
dataset
metric
pre-
processing
algorithm in-
processing
algorithm
post-
processing
algorithm
classifier
metric
classifier
metric
explainer
dataset
metric
explainer
With Metrics, Algorithms, and Explainers!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Model is trained, tested and validated. Just deploy as microservices on
Kubernetes? Do we need anything else?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Model is trained, tested and validated. Just deploy as microservices on
Kubernetes? Do we need anything else?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
q  As I develop multiple versions
of my models, how can I
easily dark launch and shift
traffic?
q  How can I add and enforce
policies on my model
microservices?
q  The network among microservices
is not reliable.
q  How can I monitor and trace my
model microservices?
q  How can I ensure the
communication among
microservices is secure?
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Enter Istio!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
An open service mesh platform to connect, observe,
secure, and control microservices.
Istio!
Connect: Traffic Control, Discovery,
Load Balancing, Resiliency
Observe: Metrics, Logging, Tracing
Secure: Encryption (TLS),
Authentication, and Authorization of
service-to-service communication
Control: Policy Enforcement
Istio!
BA
call
How does it work?

!
1. Deploy a proxy (Envoy) beside your application (“sidecar deployment”)
Env
oy
A
Envoy
Env
oy
B
Envoy
call
How does it work?

!
2. Deploy Pilot to configure the sidecars
Envoy
Pilot
config
Env
oy
A
Envoy
Env
oy
B
Envoy
How does it work?

!
3. Deploy Telemetry to get telemetry
Envoy
Pilot
Env
oy
A
Envoy
Env
oy
B
Envoy
Envoy
Telemetry
telemetry
How does it work?

!
4. Deploy Citadel to assign identities and enable secure communication
Envoy
Pilot
Env
oy
A
Envoy
Env
oy
B
Envoy
Envoy
Telemetry
Envoy
Citadel
How does it work?

!
5. Deploy Policy to enforce policies
Envoy
Pilot
policy decisions
Envoy
A
Envoy Envoy
B
Envoy
Envoy
Policy
Envoy
Telemetry
Envoy
Citadel
How does it work?

!
Pod	
Traffic Routing
50%	
50%	
Traffic Splitting
Pod
user	=	jason	
Traffic Steering
Pod
100%	
Access Policy
Pod
And….Istio is part of Knative!!!!
AIOps 
Model is deployed. We need to ensure triggers and actions can be defined based
on any vulnerability detection, dataset changes, bias detection etc… !
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
AIOps 
Enter Apache OpenWhisk!
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Triggers
(response)
Rules
Actions
(code)
Source
(events) Results
OpenWhisk	Apache OpenWhisk!
FaaS platform to
execute code in
response to events!
Delivered as

Open source via Apache
openwhisk.org
OpenWhisk	Apache OpenWhisk!
Event	
Provider	 Periodic	 IBM	Cloudant Message	Hub	
Mobile	Push	 Github	
OpenWhisk	
IBM	App	Connect	
OpenWhisk!
98
And….OpenWhisk is being developed to work with Knative!!!!
AIOps 
And last but not the least, what do we use for Pipeline orchestration?!
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
100
•  Currently in IBM we use IBM DevOps toolchain pipeline, built on top of Jenkins
•  On open source, we are investigating, Pachyderm, Argo and Airflow. 
•  Pachyderm and Argo are Dataflow and Workflow orchestrators on Kubernetes
•  Both have strong merits, and provide a DevOps Operator view around pipeline execution
times, logs, metrics, exit handlers, notification systems etc.
•  As discussed, Knative is now evolving its own Pipelining capabilities, though in early stage 
AI Pipeline Orchestration!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
AIOps platform is ready. How do Data Scientists interact with it using Notebooks?!
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
Jupyter
Enterprise
Gateway
Jupyter Enterprise Gateway
March 30 2018 / © 2018 IBM Corporation
Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
102
A lightweight, multi-tenant, scalable and secure
gateway that enables Jupyter Notebooks to
share resources across an Apache Spark or
Kubernetes cluster for Enterprise/Cloud use
cases
Kernel
Kernel
Kernel
Kernel
Kernel
KernelKernel
Jupyter Notebooks: Support for FfDL	
User Experience
103	
•  User	select	where	to	run	the	experiment	
•  Job	is	packaged	and	submitted	on	behalf	of	
user	
•  User	has	access	to	Job	Console	to	monitor	
experiment
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Together: A Transparent, and trusted Open Source AI Pipeline using Knative
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL kube-batch
Jupyter Enterprise Gateway
MAX
AIF360 AIF360
Istio OpenWhisk
ART
DEMO
105
CODAIT
codait.org
Demo Model: Gender Classification!
Data	
UTKFace	
Simple	convolutional	neural	network	(CNN)		
with	3	convolutional	layers		
and	2	fully	connected	layers	
Male	
	
	
	
Female	
SoftMax	
Confident	Score
Demo Flow!
Fabric for Deep Learning
Data	
Robustness	
Check	
Model	
Deployment	
ADVERSARIAL ROBUSTNESS 
TOOLBOX
Fairness	
Check	
AI Fairness 360
Model		
Revision	1	
10%	
90%	
Model		
Revision	2	
Gender Classification Model
Source to Container:
Knative Build
ML
Code
Training
To fully automate the workflow, Knative Pipeline can be used with
Knative Eventing!
Fabric for Deep Learning
Data	
Robustness	
Check	
Model	
Deployment	
ADVERSARIAL ROBUSTNESS 
TOOLBOX
Fairness	
Check	
AI Fairness 360
Model		
Revision	1	
10%	
90%	
Model		
Revision	2	
Gender Classification Model
Source to Container:
Knative Build
ML
Code
Training	
Eventing
Knative Eventing Sources, Channel and Subscription can be defined!
Knative Pipeline Tasks and Flow Definition can be defined!
For this demonstration, we used Argo Workflows for Pipelining!
•  Knative brings serveless actions, events, workflows, apps, containers, service-mesh in one single
stack
•  Knative eliminates the need for stitching various individual components together (events, actions,
service mesh, pipeline etc.)
•  Knative Build and Serving are simple to use and integrate with our existing containers.
•  Traffic Routing and Canary testing on Knative can be done without any Istio knowledge.
•  Built-in monitoring tools are very helpful to visualize the benefits of building services using Knative.
•  Knative Eventing shows promise, needs to evolve more for production
•  Knative Build-Pipeline is still in very early stage, and changing rapidly.
•  Knative Build-Pipeline could get very complicated with many tasks and it needs a better debugging
tool/GUI to monitor errors and visualize the pipeline flow.
What we learnt!
AIOps 
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Bringing it together: A Secure, transparent, and trusted Open Source AI Pipeline
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
FfDL kube-batch
Jupyter Enterprise Gateway
MAX
AIF360 AIF360
Istio OpenWhisk
ART
Links:!
•  Knative: https://github.com/knative
•  Fabric for Deep Learning: https://github.com/IBM/FfDL
•  Adversarial Robustness Toolbox: https://github.com/IBM/adversarial-robustness-toolbox
•  AI Fairness 360: http://aif360.mybluemix.net/
•  Jupyter Enterprise Gateway: https://github.com/jupyter/enterprise_gateway
•  Kube-Batch: https://github.com/kubernetes-sigs/kube-batch
•  Istio: https://github.com/istio/istio
•  OpenWhisk: https://github.com/apache/incubator-openwhisk
•  Kiali: https://github.com/kiali/kiali
•  Argo: https://github.com/argoproj/argo
•  PyTorch: https://pytorch.org/
Prepared
and
Analyzed
Data
Trained
Model
Deployed
Model
Prepared
and
Analyzed
Data
Initial Model
Deployed
Model
@AnimeshSingh @Tomipli
Thank You!
AI & Machine Learning Pipelines with Knative

Más contenido relacionado

La actualidad más candente

WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
Brian Grant
 

La actualidad más candente (20)

Amazon Game Tech Night #24 KPIダッシュボードを最速で用意するために
Amazon Game Tech Night #24 KPIダッシュボードを最速で用意するためにAmazon Game Tech Night #24 KPIダッシュボードを最速で用意するために
Amazon Game Tech Night #24 KPIダッシュボードを最速で用意するために
 
Serverless integration with Knative and Apache Camel on Kubernetes
Serverless integration with Knative and Apache Camel on KubernetesServerless integration with Knative and Apache Camel on Kubernetes
Serverless integration with Knative and Apache Camel on Kubernetes
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1tServerless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
Serverless with Spring Cloud Function, Knative and riff #SpringOneTour #s1t
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018
 
Infrastructure as Code with Terraform and Ansible
Infrastructure as Code with Terraform and AnsibleInfrastructure as Code with Terraform and Ansible
Infrastructure as Code with Terraform and Ansible
 
Serverless Framework (2018)
Serverless Framework (2018)Serverless Framework (2018)
Serverless Framework (2018)
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
[GKE & Spanner 勉強会] GKE 入門
[GKE & Spanner 勉強会] GKE 入門[GKE & Spanner 勉強会] GKE 入門
[GKE & Spanner 勉強会] GKE 入門
 
Kubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tk
Kubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tkKubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tk
Kubernetesの良さを活かして開発・運用!Cloud Native入門 / An introductory Cloud Native #osc19tk
 
cluster-monitoringで困ったこと学んだこと
cluster-monitoringで困ったこと学んだことcluster-monitoringで困ったこと学んだこと
cluster-monitoringで困ったこと学んだこと
 
JAZUG12周年 俺の Azure Cosmos DB
JAZUG12周年 俺の Azure Cosmos DBJAZUG12周年 俺の Azure Cosmos DB
JAZUG12周年 俺の Azure Cosmos DB
 
An introduction to terraform
An introduction to terraformAn introduction to terraform
An introduction to terraform
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
Databricksを初めて使う人に向けて.pptx
Databricksを初めて使う人に向けて.pptxDatabricksを初めて使う人に向けて.pptx
Databricksを初めて使う人に向けて.pptx
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
Introduction To Terraform
Introduction To TerraformIntroduction To Terraform
Introduction To Terraform
 
[넥슨] kubernetes 소개 (2018)
[넥슨] kubernetes 소개 (2018)[넥슨] kubernetes 소개 (2018)
[넥슨] kubernetes 소개 (2018)
 
Understanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdfUnderstanding LLM LLMOps & MLOps_open version.pdf
Understanding LLM LLMOps & MLOps_open version.pdf
 

Similar a AI & Machine Learning Pipelines with Knative

Similar a AI & Machine Learning Pipelines with Knative (20)

Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine Learning
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
DevOps demystified
DevOps demystifiedDevOps demystified
DevOps demystified
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
Your easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processingYour easy move to serverless computing and radically simplified data processing
Your easy move to serverless computing and radically simplified data processing
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Serverless brewbox
Serverless   brewboxServerless   brewbox
Serverless brewbox
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help? Webcast: DevOps in AWS is different! How can containers help?
Webcast: DevOps in AWS is different! How can containers help?
 

Más de Animesh Singh

Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
Animesh Singh
 

Más de Animesh Singh (20)

Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)Machine Learning Exchange (MLX)
Machine Learning Exchange (MLX)
 
KFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AIKFServing Payload Logging for Trusted AI
KFServing Payload Logging for Trusted AI
 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
KFServing and Feast
KFServing and FeastKFServing and Feast
KFServing and Feast
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
AIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AIAIF360 - Trusted and Fair AI
AIF360 - Trusted and Fair AI
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 
Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!Microservices, Kubernetes and Istio - A Great Fit!
Microservices, Kubernetes and Istio - A Great Fit!
 
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
How to build a Distributed Serverless Polyglot Microservices IoT Platform us...
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons LearntAs a Service: Cloud Foundry on OpenStack - Lessons Learnt
As a Service: Cloud Foundry on OpenStack - Lessons Learnt
 
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
Introducing Cloud Native, Event Driven, Serverless, Micrsoservices Framework ...
 
Finding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User GroupsFinding and-organizing Great Cloud Foundry User Groups
Finding and-organizing Great Cloud Foundry User Groups
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
 

Último

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

AI & Machine Learning Pipelines with Knative

  • 2. ≈ Center for Open Source Data and AI Technologies (CODAIT) Code – Build and improve practical frameworks to enable more developers to realize immediate value. Content – Showcase solutions for complex and real-world AI problems. Community – Bring developers and data scientists to engage with IBM Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow Improving Enterprise AI lifecycle in Open Source •  Team contributes to over 10 open source projects •  17 committers and many contributors in Apache projects •  Over 1100 JIRAs and 66,000 lines of code committed to Apache Spark itself; over 65,000 LoC into SystemML •  Over 25 product lines within IBM leveraging Apache Spark •  Speakers at over 100 conferences, meetups, unconferences and more CODAIT codait.org
  • 3. Agenda 3 •  Progress in ML and DL •  AI and Cloud: Complimentary •  Why we need Knative to build AI Platform •  How to build a transparent and trusted AI platform leveraging Knative •  Demo CODAIT codait.org
  • 4. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 4 1997
  • 5. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 5 2011
  • 6. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 6 2017
  • 7. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs 7 2017 2018
  • 8. 2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997 … Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Progress in Deep Learning 2012 Introduced deep learning with GPUs IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 8
  • 9. A human brain has: •  200 billion neurons •  32 trillion connections between them •  25 million “neurons” •  100 million connections (parameters) Deep Learning = Training Artificial Neural Networks IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 9
  • 10. Neural Network Design Workflow! domain data design neural network HPO •  neural network structure •  hyperparameters NO Performance meets needs? Start another experiment optimal hyperparameters
  • 11. Neural Network Design Workflow! domain data HPO •  neural network structure •  hyperparameters NO yes Performance meets needs? Start another experiment trained model deployCloud optimal hyperparameters evaluate BAD Still good! design neural network
  • 12. So AI in general and Deep Learning in particular are very iterative and repetitive. And they need Cloud. Why?
  • 15. 15 ! ! 1.  Model/Data Parallelism! ! 2.  MPI! 3.  NCCL! 4.  …..! ! Ability to utilize various technologies and achieve high performance computing.
  • 17. API! UI! CLI! Kubernetes ! ! Master! Worker Node 1! Worker Node 2! Worker Node 3! Worker Node n! Registry •  Etcd •  API Server •  Controller Manager Server •  Scheduler Server And Cloud native means we use Kubernetes!!
  • 18. 18 But is Kubernetes enough for Cloud Native platform?!
  • 19. 19 Kubernetes is not the end game – Says who?!
  • 20. 20 What else do we need? Let`s understand it from the context of an AI Lifecycle! AI PLATFORM
  • 21. AIOps Prepared and Analyzed Data AI PLATFORM Initial Model Trained Model Deployed Model We need a Cloud native AI Platform to build, train, deploy and monitor Models!
  • 22. AIOps Prepared and Analyzed Data AI PLATFORM Trained Model Deployed Model Also, we need a transparent and trusted AI Platform! Prepared and Analyzed Data AI PLATFORM Initial Model Deployed Model Is model training is showing increasing loss? Are model weights biased? Is the model vulnerable to adversarial attacks? Has the training data changed? Are the model predictions less accurate? Are Hyperparameters suboptimal? Are predictions biased? Is the dataset biased?
  • 23. AIOps Prepared and Analyzed Data Trained Model Deployed Model Infact, we need a transparent, trusted and automated AI Pipeline! Prepared and Analyzed Data Initial Model Deployed Model Is model training is showing increasing loss? Are model weights biased? Is the model vulnerable to adversarial attacks? Has the training data changed? Are ate model predictions less accurate? Are Hyperparameters suboptimal? Are predictions biased? Is the dataset biased?
  • 24. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Harden Train Deploy Trigger: Model performance is suboptimal Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Actions Transparent, trusted, automated and event driven AI Pipeline!
  • 25. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Harden Train Deploy Trigger: Model performance is suboptimal Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Actions Transparent, trusted, automated, event driven and auditable AI Pipeline!
  • 26. Trigger: Trained Model is showing increasing loss Prepare Data AIOps Prepared and Analyzed Data Initial Model Trained Model Deployed Model Train Deploy Debias Train Deploy Trigger: Model performance is biased Implement Defense Train Deploy Trigger: Model is vulnerable to attack Trigger: Data changed Optimize Hyperparameters Train Trigger: Bias Detected Data Scientist AIOps Engineer Actions Transparent, trusted, automated, event driven, auditable AI Pipeline as a Service!
  • 27. 
 Training Pipe Model Validation Pipe If we translate it to logical architecture, it looks like this…! 
 
 Data Pipe Model Deployment Pipe Deployment Analysis Pipe AI Pipeline (Python Definition – Orchestrate and Track) AI Developer and Data Scientist AI Operator 
 Python Function 
 Python Function 
 Python Function 
 Python Function 
 Python Function
  • 28. So we need more than Kubernetes. We need to be able to……! build Data Scientists Code orchestrate the ML code automate the ML workflow send event notifications Data Scientist AIOps Engineer
  • 29. That means, we need the concepts of..! Build Serving Pipeline Eventing
  • 30. So……We need KNative! Build Serving Pipeline Eventing
  • 31. ● Build ● Eventing ● Serving ● Pipeline KNative provides a set of building blocks that enable modern, source-centric and container-based serverless workloads on Kubernetes. Uses K8S CRDs to define a new set of APIs around Well….what is Knative?! Build Serving Pipeline Eventing
  • 32. Knative Build Components •  Build •  Builder •  BuildTemplate For example, you can write a build that uses Kubernetes-native resources to obtain your source code from a repository, build a container image, then run that image. •  A Build can include multiple steps where each step specifies a Builder. •  A Builder is a type of container image that you create to accomplish any task, whether that's a single step in a process, or the whole process itself. •  The steps in a Build can push to a repository. •  A BuildTemplate can be used to defined reusable parameterized templates. Knative Build! Build — Source-to-container build orchestration
  • 33. KNative Eventing components ●  Sources — Sources that are firing events ●  Channels — A single event forwarding and persistence layer ●  Subscriptions — Deliver and forward events to channels/services. Knative Eventing is designed around the following goals: ●  Knative Eventing services are loosely coupled. ●  Event producers and event sources are independent. ●  Other services can be connected to the Eventing system.. ●  Ensure cross-service interoperability. Knative Eventing is consistent with the CloudEvents specification that is developed by the CNCF Serverless WG. Knative Eventing! Eventing — Delivery and management of events, universal subscription, binding services to event ecosystems,
  • 34. Knative Pipeline! Knative Pipeline components ● Task — A collection of sequential steps you would want to run as part of your CI flow. It including the inputs/ outputs and steps ● Pipeline — A graph of Tasks to execute. ● Runs — To invoke a Pipeline or a Task. High level details of this design: •  Pipelines do not know what will trigger them, they can be triggered by events or by manually creating PipelineRuns •  Tasks can exist and be invoked completely independently of Pipelines •  Test results are a first class concept, being able to navigate test results easily is powerful •  Tasks can depend on artifacts, output and parameters created by other tasks. •  Resources are the artifacts used as inputs and outputs of TaskRuns. Pipeline— Configure and run CI/CD style pipelines for your kubernetes application
  • 35. ●  Configuration ○  Desired current state of deployment (#HEAD) ○  Records both code and configuration (separated, ala 12 factor) ○  Stamps out builds / revisions as it is updated ●  Revision ○  Code and configuration snapshot ○  k8s infra: Deployment, ReplicaSet, Pods, etc ●  Route ○  Traffic assignment to Revisions (fractional scaling or by name) ○  Built using Istio ●  Service ○  Provides a simple entry point for UI and CLI tooling to achieve common behavior ○  Acts as a top-level controller to orchestrate Route and Configuration. Configuration Revision Revision Revision Route latest explicit creates Service creates creates Knative Serving! Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic Knative Serving components
  • 38. Worker Node Worker Node Deployment Replica Set Revision Serving Knative Serving Building Block!
  • 39. Worker Node Pod Worker Node Deployment Replica Set Revision Serving Knative Serving Building Block!
  • 40. Worker Node Pod Worker Node Deployment Replica Set Revision Pod Serving Knative Serving Building Block!
  • 41. Worker Node Worker Node Deployment Revision Replica Set Serving Knative Serving Building Block!
  • 42. Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  • 43. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  • 44. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Serving Knative Serving Building Block!
  • 45. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Serving Knative Serving Building Block!
  • 46. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route Serving Knative Serving Building Block!
  • 47. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route Serving Knative Serving Building Block!
  • 48. Configuration Revision 1 Revision 2 Revision 3 Revision 4 Revision 5 Route 5% 95% Serving Knative Serving Building Block!
  • 52. ● Build — Source-to-container build orchestration ● Eventing — Universal subscription, binding services to event ecosystems, delivery and management of events ● Serving — Request-driven compute model, scale to zero, autoscaling, routing and managing traffic ● Pipeline — Configure and run CI/CD style pipelines for your kubernetes application. . So Knative uses K8S CRDs to define a new set of APIs ! Build Serving Pipeline Eventing
  • 53. AIOps Prepared and Analyzed Data Trained Model Deployed Model But do we really need something like Knative to build an ML Platform?! Prepared and Analyzed Data Initial Model Deployed Model
  • 54. AIOps Prepared and Analyzed Data Trained Model Deployed Model Let`s go through different phases of AI Lifecycle through our use cases…! Many tools available to build initial models! Prepared and Analyzed Data Initial Model Deployed Model
  • 55. AIOps Prepared and Analyzed Data Trained Model Deployed Model Many tools to train machine learning and deep learning models! Prepared and Analyzed Data Initial Model Deployed Model
  • 56. AIOps Prepared and Analyzed Data Trained Model Deployed Model We need a multi framework ML- DL cloud native platform ! Prepared and Analyzed Data Initial Model Deployed Model e.g. Tensorflow is awesome , but has static graphs so PyTorch’s dynamic graphs are becoming more popular. How do we handle multiple Open Source Deep Learning frameworks in a consistent way. and leverage the power of cloud for distributed computing?
  • 57. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: Fabric for Deep Learning! Prepared and Analyzed Data Initial Model Deployed Model FfDL
  • 58. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/projects/ fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20/ fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20/ democratize-ai-with-fabric-for-deep-learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning Research paper: “Scalable Multi-Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/papers/ paper_29.pdf •  Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’ aims at making Deep Learning easily accessible to Data Scientists, and AI developers. •  FfDL Provides a consistent way to train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. FfDL 58 Community Partners FfDL is one of InfoWorld’s 2018 Best of Open Source Software Award winners for machine learning and deep learning!
  • 59. Fabric for Deep Learning https://github.com/IBM/FfDL FfDL is built using Microservices architecture on Kubernetes •  FfDL platform uses a microservices architecture to offer resilience, scalability, multi-tenancy, and security without modifying the deep learning frameworks, and with no or minimal changes to model code. •  FfDL control plane microservices are deployed as pods on Kubernetes to manage this cluster of GPU- and CPU- enabled machines effectively •  Tested Platforms: Kube DIND, IBM Cloud Public, IBM Cloud Private, GPUs using both Kubernetes feature gate Accelerators and NVidia device plugins 59
  • 60. source code training definition Access to elastic compute leveraging Kubernetes Auto-allocation means infrastructure is used only when needed Kubernetes container training artifacts compute cluster NVIDIA Tesla K80, P100, V100 Cloud Object Storage Training assets are managed and tracked. IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 60
  • 61. NVIDIA GPUs Kubernetes container orchestration training runs containers Model training distributed across containers server cluster dataset Cloud Object Storage 61
  • 62. OBJECT STORAGE Model Definition 
 Training Data Trained Models REST API Parameter Server Lifecycle Manager Job Monitor Training Data Mongo DB Trainer Service Launch Job Status Job Info " Prometheus Push Gateway Alert Manager Web UI Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE BUCKET Elastic Searc h FfDL: Architecture - Current Release! Open MPI CLIs Browser
  • 63. AIOps Prepared and Analyzed Data Trained Model Deployed Model Great , but we also need a batch scheduler for AI Workloads. Enter Kube-Batch! Prepared and Analyzed Data Initial Model Deployed Model kube-batch
  • 64. OBJECT STORAGE Model Definition 
 Training Data Trained Models REST API Parameter Server Job Monitor Mongo DB Trainer Service Launch Job Job Info " Prometheus Push Gateway Alert Manager Web UI Learner (e.g. TensorFlow, Caffe, PyTorch, Keras etc.) Controller Learner Pod Log Collector Training Job MOUNT OBJECT STORAGE BUCKET Elastic Searc h Open MPI CLIs Browser Scheduling Engine Kube- Batch QueueJob Advanced Batch Scheduling with Kube-Batch !Advanced Batch Scheduling with Kube-Batch !
  • 65. ! ! ! ! ! ! ! ! ! ! ! ! Kube-Batch: Filling gaps for FfDL and AI Workloads! •  Job Queuing for enabling job priority and preemption •  Holistic scheduling •  Support for job dynamic prioritization/budgeting •  User requested time based scheduling •  Preemption/resumption in a supported checkpoint/ restart env •  Topology aware placement (data intensive workloads requiring network topology aware placement) •  Partial preemption of elastic jobs •  Performance estimation •  Kube-Batch –  Kubernetes incubator project –  Introduces batch scheduling in Kubernetes –  Support for simple job definition and lifecycle management –  Support for job queuing –  Introduces queue-based quota management
  • 66. 12 Watson services/apps represented as 800+ Kubernetes services IBM Watson workloads: Proven AI workload on IBM Cloud Kubernetes Service One deployment example: 3000+ pods on 500+ nodes “We no longer worry about managing the infrastructure because IBM Cloud Kubernetes Service takes care of that for us.” – Watson Project Team Jason McGee / © 2018 IBM Corporation
  • 67. AIOps Prepared and Analyzed Data Trained Model Deployed Model Training is accomplished. Model is ready – Can we trust it?! Prepared and Analyzed Data Initial Model Deployed Model Can the model be trusted?
  • 68. Is it fair? Is it easy to understand? Did anyone tamper with it? Is it accountable? #21, #32, #93 #21, #32, #93 What does it take to trust a decision made by a machine?! (Other than that it is 99% accurate)?!
  • 69. AIOps Prepared and Analyzed Data Trained Model Deployed Model So let`s start with vulnerability detection of Models?! Prepared and Analyzed Data Initial Model Deployed Model Is the model vulnerable to adversarial attacks?
  • 70. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: Adversarial Robustness Toolbox! Prepared and Analyzed Data Initial Model Deployed Model ART
  • 71. IBM Adversarial Robustness Toolbox ART ART is a library dedicated to adversarial machine learning. Its purpose is to allow rapid crafting and analysis of attack and defense methods for machine learning models. The Adversarial Robustness Toolbox provides an implementation for many state-of-the-art methods for attacking and defending classifiers. 71 https://github.com/IBM/adversarial-robustness- toolbox The Adversarial Robustness Toolbox contains implementations of the following attacks: Deep Fool (Moosavi-Dezfooli et al., 2015) Fast Gradient Method (Goodfellow et al., 2014) Jacobian Saliency Map (Papernot et al., 2016) Universal Perturbation (Moosavi-Dezfooli et al., 2016) Virtual Adversarial Method (Moosavi-Dezfooli et al., 2015) C&W Attack (Carlini and Wagner, 2016) NewtonFool (Jang et al., 2017) The following defense methods are also supported: Feature squeezing (Xu et al., 2017) Spatial smoothing (Xu et al., 2017) Label smoothing (Warde-Farley and Goodfellow, 2016) Adversarial training (Szegedy et al., 2013) Virtual adversarial training (Miyato et al., 2017)
  • 72. AIOps Prepared and Analyzed Data Trained Model Deployed Model Robustness check accomplished. How do we check for bias throughout lifecycle?! Prepared and Analyzed Data Initial Model Deployed Model Are model weights and classifiers biased? Are predictions biased? Is the dataset biased?
  • 73. AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter: AI Fairness 360! Prepared and Analyzed Data Initial Model Deployed Model AIF360
  • 74. AI Fairness 360 https://github.com/IBM/AIF360 AIF360AIF360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. Toolbox Fairness metrics (30+) Fairness metric explanations Bias mitigation algorithms (9+) 74 Supported bias mitigation algorithms Optimized Preprocessing (Calmon et al., 2017) Disparate Impact Remover (Feldman et al., 2015) Equalized Odds Postprocessing (Hardt et al., 2016) Reweighing (Kamiran and Calders, 2012) Reject Option Classification (Kamiran et al., 2012) Prejudice Remover Regularizer (Kamishima et al., 2012) Calibrated Equalized Odds Postprocessing (Pleiss et al., 2017) Learning Fair Representations (Zemel et al., 2013) Adversarial Debiasing (Zhang et al., 2018) Supported fairness metrics Comprehensive set of group fairness metrics derived from selection rates and error rates Comprehensive set of sample distortion metrics Generalized Entropy Index (Speicher et al., 2018)
  • 75. (d’Alessandro et al., 2017) AIF 360 detects for fairness in building and deploying models throughout AI Lifecycle!
  • 77. AIOps Prepared and Analyzed Data Trained Model Deployed Model Model is trained, tested and validated. Just deploy as microservices on Kubernetes? Do we need anything else?! Prepared and Analyzed Data Initial Model Deployed Model
  • 78. AIOps Prepared and Analyzed Data Trained Model Deployed Model Model is trained, tested and validated. Just deploy as microservices on Kubernetes? Do we need anything else?! Prepared and Analyzed Data Initial Model Deployed Model q  As I develop multiple versions of my models, how can I easily dark launch and shift traffic? q  How can I add and enforce policies on my model microservices? q  The network among microservices is not reliable. q  How can I monitor and trace my model microservices? q  How can I ensure the communication among microservices is secure?
  • 80. An open service mesh platform to connect, observe, secure, and control microservices. Istio!
  • 81. Connect: Traffic Control, Discovery, Load Balancing, Resiliency Observe: Metrics, Logging, Tracing Secure: Encryption (TLS), Authentication, and Authorization of service-to-service communication Control: Policy Enforcement Istio!
  • 82. BA call How does it work?
 !
  • 83. 1. Deploy a proxy (Envoy) beside your application (“sidecar deployment”) Env oy A Envoy Env oy B Envoy call How does it work?
 !
  • 84. 2. Deploy Pilot to configure the sidecars Envoy Pilot config Env oy A Envoy Env oy B Envoy How does it work?
 !
  • 85. 3. Deploy Telemetry to get telemetry Envoy Pilot Env oy A Envoy Env oy B Envoy Envoy Telemetry telemetry How does it work?
 !
  • 86. 4. Deploy Citadel to assign identities and enable secure communication Envoy Pilot Env oy A Envoy Env oy B Envoy Envoy Telemetry Envoy Citadel How does it work?
 !
  • 87. 5. Deploy Policy to enforce policies Envoy Pilot policy decisions Envoy A Envoy Envoy B Envoy Envoy Policy Envoy Telemetry Envoy Citadel How does it work?
 !
  • 92. And….Istio is part of Knative!!!!
  • 93. AIOps Model is deployed. We need to ensure triggers and actions can be defined based on any vulnerability detection, dataset changes, bias detection etc… ! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  • 94. AIOps Enter Apache OpenWhisk! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  • 95. Triggers (response) Rules Actions (code) Source (events) Results OpenWhisk Apache OpenWhisk! FaaS platform to execute code in response to events! Delivered as
 Open source via Apache openwhisk.org
  • 97. Event Provider Periodic IBM Cloudant Message Hub Mobile Push Github OpenWhisk IBM App Connect OpenWhisk!
  • 98. 98 And….OpenWhisk is being developed to work with Knative!!!!
  • 99. AIOps And last but not the least, what do we use for Pipeline orchestration?! Prepared and Analyzed Data Trained Model Deployed Model Prepared and Analyzed Data Initial Model Deployed Model
  • 100. 100 •  Currently in IBM we use IBM DevOps toolchain pipeline, built on top of Jenkins •  On open source, we are investigating, Pachyderm, Argo and Airflow. •  Pachyderm and Argo are Dataflow and Workflow orchestrators on Kubernetes •  Both have strong merits, and provide a DevOps Operator view around pipeline execution times, logs, metrics, exit handlers, notification systems etc. •  As discussed, Knative is now evolving its own Pipelining capabilities, though in early stage AI Pipeline Orchestration!
  • 101. AIOps Prepared and Analyzed Data Trained Model Deployed Model AIOps platform is ready. How do Data Scientists interact with it using Notebooks?! Prepared and Analyzed Data Initial Model Deployed Model Jupyter Enterprise Gateway
  • 102. Jupyter Enterprise Gateway March 30 2018 / © 2018 IBM Corporation Jupyter Enterprise Gateway at IBM Code https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ 102 A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases Kernel Kernel Kernel Kernel Kernel KernelKernel
  • 103. Jupyter Notebooks: Support for FfDL User Experience 103 •  User select where to run the experiment •  Job is packaged and submitted on behalf of user •  User has access to Job Console to monitor experiment
  • 104. AIOps Prepared and Analyzed Data Trained Model Deployed Model Together: A Transparent, and trusted Open Source AI Pipeline using Knative Prepared and Analyzed Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART
  • 106. Demo Model: Gender Classification! Data UTKFace Simple convolutional neural network (CNN) with 3 convolutional layers and 2 fully connected layers Male Female SoftMax Confident Score
  • 107. Demo Flow! Fabric for Deep Learning Data Robustness Check Model Deployment ADVERSARIAL ROBUSTNESS TOOLBOX Fairness Check AI Fairness 360 Model Revision 1 10% 90% Model Revision 2 Gender Classification Model Source to Container: Knative Build ML Code Training
  • 108. To fully automate the workflow, Knative Pipeline can be used with Knative Eventing! Fabric for Deep Learning Data Robustness Check Model Deployment ADVERSARIAL ROBUSTNESS TOOLBOX Fairness Check AI Fairness 360 Model Revision 1 10% 90% Model Revision 2 Gender Classification Model Source to Container: Knative Build ML Code Training Eventing
  • 109. Knative Eventing Sources, Channel and Subscription can be defined!
  • 110. Knative Pipeline Tasks and Flow Definition can be defined!
  • 111. For this demonstration, we used Argo Workflows for Pipelining!
  • 112. •  Knative brings serveless actions, events, workflows, apps, containers, service-mesh in one single stack •  Knative eliminates the need for stitching various individual components together (events, actions, service mesh, pipeline etc.) •  Knative Build and Serving are simple to use and integrate with our existing containers. •  Traffic Routing and Canary testing on Knative can be done without any Istio knowledge. •  Built-in monitoring tools are very helpful to visualize the benefits of building services using Knative. •  Knative Eventing shows promise, needs to evolve more for production •  Knative Build-Pipeline is still in very early stage, and changing rapidly. •  Knative Build-Pipeline could get very complicated with many tasks and it needs a better debugging tool/GUI to monitor errors and visualize the pipeline flow. What we learnt!
  • 113. AIOps Prepared and Analyzed Data Trained Model Deployed Model Bringing it together: A Secure, transparent, and trusted Open Source AI Pipeline Prepared and Analyzed Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART
  • 114. Links:! •  Knative: https://github.com/knative •  Fabric for Deep Learning: https://github.com/IBM/FfDL •  Adversarial Robustness Toolbox: https://github.com/IBM/adversarial-robustness-toolbox •  AI Fairness 360: http://aif360.mybluemix.net/ •  Jupyter Enterprise Gateway: https://github.com/jupyter/enterprise_gateway •  Kube-Batch: https://github.com/kubernetes-sigs/kube-batch •  Istio: https://github.com/istio/istio •  OpenWhisk: https://github.com/apache/incubator-openwhisk •  Kiali: https://github.com/kiali/kiali •  Argo: https://github.com/argoproj/argo •  PyTorch: https://pytorch.org/