SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
End to End ML
With Kubeflow
& friends
@holdenkarau
Signal
2018
Legit-enough
Some links (slides & recordings will be at):
http://bit.ly/2QgsqF9
^ Slides & code-lab links
(after)
CatLoversShow
Holden:
● Prefered pronouns are she/her
● Developer Advocate at Google
● Apache Spark PMC/Committer, contribute to many other projects
● previously IBM, Alpine, Databricks, Google, Foursquare & Amazon
● co-author of Learning Spark & High Performance Spark
● Twitter: @holdenkarau
● Slide share http://www.slideshare.net/hkarau
● Code review livestreams: https://www.twitch.tv/holdenkarau /
https://www.youtube.com/user/holdenkarau
● Spark Talk Videos http://bit.ly/holdenSparkVideos
Who do I think you all are?
● Nice people*
● Interested in Machine Learning
● Possibly Familiar with one of Java, Scala, or Python
Amanda
What is in store for our adventure?
● We have 30 minutes :)
● Brief intros to what Kubernetes & Spark, and Kubeflow are
● How to train a model (ish)
● How to serve a model (ish)
● Scaling (ish)
● Updating models and other scary thoughts
Ada Doglace
What is Kubernetes?
● General purpose distributed system
○ With a really nice API including Python :)
● Apache project
● Faster than Hadoop Map/Reduce
● Good when too big for a single
machine
● Built on top of two abstractions for
distributed data: RDDs & Datasets
● Has ML Libraries
● WIP Kubeflow integration PR 1467
What is Spark?
The different pieces of Spark
Apache Spark
SQL, DataFrames & Datasets
Structured
Streaming
Scala,
Java,
Python, &
R
Spark ML
bagel &
Graph X
MLLib
Scala,
Java,
PythonStreaming
Graph
Frames
Paul Hudson
So what does ML look like?
Code on Laptop
Train Model on ML-Rig
Photo by Tomomi
Deploy to Production
Problem:
Models are Cool,
Feature prep is Hard
Training is Tedious,
Everyone Forgot Deployment
What is Kubeflow?
What is Kubeflow?
What is Kubeflow?
“Data Scientists”
Model Serving On Kube
Model Training
*
What is Kubeflow?
“Kubeflow is a Cloud Native platform for machine learning based on
Google’s internal machine learning pipelines.”
or:
● The recognition that just a bunch of model weights isn’t enough
● Designed to support the ecosystem of tools needed (from data
prep to serving)
● Open source project :)
Ada Doglace
Really just want to replace this:
Photo by: Milestoned
So you want to use this?
What’s Next?!
Step away from keyboard
Think about type(s) of model
Look at components directory and see what’s a fit tool wise
Don’t know? Choose jupyter deal with the details live
Can’t find it?
Containers Buffet
argo
automation
chainer-job
core
credentials-pod-preset
katib
mpi-job
mxnet-job
openmpi
pachyderm
pytorch-job
seldon
tf-serving
weaveflux
What about just the basics?*
./scripts/kfctl.sh init ${KFAPP} --platform gcp --project ${PROJECT}
cd ${KFAPP}
../scripts/kfctl.sh generate platform
../scripts/kfctl.sh apply platform
../scripts/kfctl.sh generate k8s
../scripts/kfctl.sh apply k8s
What about just tensorflow?*
ks registry add kubeflow
github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}
Ok well I need to be able to access Jupyter
too...
kubectl port-forward -n ${NAMESPACE} `kubectl get pods -n
${NAMESPACE} --selector=service=ambassador -o
jsonpath='{.items[0].metadata.name}'` 8080:80
Your Special ML Training Goes here
Don’t have any pressing projects but still want to have fun? Check
out Michelle’s notebook for Github Issue summarization.
Or want to see mnist again? here :)
Your Special ML Training Goes here
...
from keras.callbacks import CSVLogger, ModelCheckpoint
script_name_base = 'tutorial_seq2seq'
csv_logger =
CSVLogger('{:}.log'.format(script_name_base))
model_checkpoint =
ModelCheckpoint('{:}.epoch{{epoch:02d}}-val{{val_loss:
.5f}}.hdf5'.format(script_name_base),
save_best_only=True)
Your Special ML Training Goes here
history = seq2seq_Model.fit([encoder_input_data,
decoder_input_data],
np.expand_dims(decoder_target_data, -1),
batch_size=batch_size,
epochs=epochs,
validation_split=0.12,
callbacks=[csv_logger, model_checkpoint])
Really just check out Michelle’s notebook for Github Issue
summarization.
But what about [special foo-baz-inator] or
[special-yak-shaving-tool]?
Write a Dockerfile and build an image, use FROM so you’re not
starting from scratch.
FROM gcr.io/kubeflow-images-public/tensorflow-1.6.0-notebook-cpu
RUN pip install py-special-yak-shaving-tool
Then tell set it as a param for your training/serving job as needed:
ks param set tfjob-v1alpha2 image "my-special-image-goes-here”
What about that magical feature prep?
For now it’s a mostly write-by-hand situation
However TFX has some cool tools we can use today (like
TF.Transform) if we’re ok with DirectRunner or Dataflow (with Flink
support in the works indirectly)
Enter: TF.Transform
● For pre-processing of your data
● e.g. where you spend 90% of your dev time anyways
● Integrates into serving time :D
● OSS
● Runs on top of Apache Beam, but current release not yet
scalable outside of GCP
● On Apache Beam master this can run-ish on Flink, but rough
● Please don’t use this in production today unless your on
GCP/Dataflow
PROKathryn Yengel
Defining a Transform processing function
def preprocessing_fn(inputs):
x = inputs['x']
y = inputs['y']
s = inputs['s']
x_centered = x - tft.mean(x)
y_normalized = tft.scale_to_0_1(y)
s_int = tft.string_to_int(s)
return { 'x_centered': x_centered,
'y_normalized': y_normalized, 's_int': s_int}
mean stddev
normalize
multiply
quantiles
bucketize
Analyzers
Reduce (full pass)
Implemented as a distributed
data pipeline
Transforms
Instance-to-instance (don’t
change batch dimension)
Pure TensorFlow
Analyze
normalize
multiply
bucketize
constant
tensors
data
mean stddev
normalize
multiply
quantiles
bucketize
Scale to ... Bag of Words / N-Grams
Bucketization Feature Crosses
tft.ngrams
tft.string_to_int
tf.string_split
tft.scale_to_z_score
tft.apply_buckets
tft.quantiles
tft.string_to_int
tf.string_join
...
Some common use-cases...
BEAM Beyond the JVM: Current release
● Non JVM BEAM doesn’t work outside of Google’s environment yet
● tl;dr : uses grpc / protobuf
○ Similar to the common design but with more efficient representations (often)
● But exciting new plans to unify the runners and ease the support of different
languages (called SDKS)
○ See https://beam.apache.org/contribute/portability/
● If this is exciting, you can come join me on making BEAM work in Python3
○ Yes we still don’t have that :(
○ But we're getting closer & you can come join us on BEAM-2874 :D
Emma
Serving: TF is probably easiest for now...
MODEL_COMPONENT=my-model-server
MODEL_NAME=cat-finder-3k
ks generate tf-serving ${MODEL_COMPONENT}
--name=${MODEL_NAME}
ks param set ${MODEL_COMPONENT} deployHttpProxy true
ks param set ${MODEL_COMPONENT} modelPath
${MODEL_PATH}
ks apply ${KF_ENV} -c ${MODEL_COMPONENT}
Or use Seldon Core & friends*
Seldon Core is an OSS platform for deploying ML models on
Kubernetes supported by Kubeflow.
Supports Many Model types/formats:
● Tensorflow
● Sklearn
● Spark ML**
● R
● H20
Set up seldon core for serving
# Gives cluster-admin role to the default service account
kubectl create clusterrolebinding seldon-admin
--clusterrole=cluster-admin
--serviceaccount=${NAMESPACE}:default
# Install the kubeflow/seldon package
ks pkg install kubeflow/seldon
# Generate the seldon component and deploy it
ks generate seldon seldon --name=seldon
Build an image with your model*
docker run -v $(pwd):/my_model
seldonio/core-python-wrapper:0.7 /my_model
IssueSummarization 0.1 gcr.io --base-image=python:3.6
--image-name=gcr-repository-name/my-image-name
And kick off the new model:
ks generate seldon-serve-simple new-serving-magic
--name=model-name 
--image=gcr.io/gcr-repository-name/model:version 
--namespace=${NAMESPACE} 
--replicas=2
ks apply ${KF_ENV} -c new-serving-magic
Wait so how do I use this?
Your favourite rest library goes here*
Timeouts matter!
Doing recommendations? Have fall-backs
Have multiple models? fall-backs
*Need to use in batch? Maybe skip seldon, tf-serving &
friends and integrate the library into your code. Or
not.
Trish Hamme
Scaling - or ruh roh people are using this!
replicas: 1
Becomes
replicas: 10
Factor of 10 =~ “science”
Wait really?
● Early: switch from mini-kube to ${cloud provider} with GPUs
○ “Vertical” scaling
● Next: increase # of workers for training
○ “Horizontal” scaling
○ Auto-scaling also WIP per-backend for the most part
● Serving, # of replicas
○ Auto-scaling is a WIP -
https://github.com/kubeflow/kubeflow/issues/1219
PROJennifer C.
What about validation?
TensorFlow Data Validation (TFDV)
Or Roll your own?
● Counters & execution time most common
● Please also check % of data change
Spark-validator (proof of concept)
Please validate your pipelines, and not just for data code changes too.
Demo!
Recorded Demos
Previously live demos recorded
● Kubeflow intro
https://codelabs.developers.google.com/codelabs/kubeflow-intr
oduction/index.html & streamed http://bit.ly/kfIntroStream
● Kubeflow E2E with Github issue
summurizationhttps://codelabs.developers.google.com/codelab
s/cloud-kubeflow-e2e-gis/ & streamed http://bit.ly/kfGHStream
● You can tell they were live streamed by how poorly went, I
promise no video editing has occurred.
● You can do these yourself too (including one of them at our
booth)!
Join me & Boo @ Google’s booth @ 5PM
And join my-coworker Casey West @ 6talking about:
Building Captain Obvious:
Understand Faster with Machine Learning APIs
Want to watch working on a Kubeflow PR?
● Join Holden Friday @ 2pm pacific for live coding continuing
working on her Apache Spark to Kubeflow (using the existing
Spark operator as a base)
https://www.youtube.com/watch?v=zHnTdqbjPik
● Or just https://youtube.com/user/holdenkarau & like +
subscribe + click the bell :p
k thnx bye :)
Give feedback on this presentation
http://bit.ly/holdenTalkFeedback

Más contenido relacionado

La actualidad más candente

Building Recoverable (and optionally async) Pipelines with Apache Spark (+ s...
Building Recoverable (and optionally async) Pipelines with Apache Spark  (+ s...Building Recoverable (and optionally async) Pipelines with Apache Spark  (+ s...
Building Recoverable (and optionally async) Pipelines with Apache Spark (+ s...Holden Karau
 
Validating big data pipelines - FOSDEM 2019
Validating big data pipelines -  FOSDEM 2019Validating big data pipelines -  FOSDEM 2019
Validating big data pipelines - FOSDEM 2019Holden Karau
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkHolden Karau
 
Validating big data jobs - Spark AI Summit EU
Validating big data jobs  - Spark AI Summit EUValidating big data jobs  - Spark AI Summit EU
Validating big data jobs - Spark AI Summit EUHolden Karau
 
Validating big data pipelines - Scala eXchange 2018
Validating big data pipelines -  Scala eXchange 2018Validating big data pipelines -  Scala eXchange 2018
Validating big data pipelines - Scala eXchange 2018Holden Karau
 
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018Holden Karau
 
Big data beyond the JVM - DDTX 2018
Big data beyond the JVM -  DDTX 2018Big data beyond the JVM -  DDTX 2018
Big data beyond the JVM - DDTX 2018Holden Karau
 
Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018
Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018
Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018Holden Karau
 
Sharing (or stealing) the jewels of python with big data & the jvm (1)
Sharing (or stealing) the jewels of python with big data & the jvm (1)Sharing (or stealing) the jewels of python with big data & the jvm (1)
Sharing (or stealing) the jewels of python with big data & the jvm (1)Holden Karau
 
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark... Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...Databricks
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Holden Karau
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Holden Karau
 
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional   w/ Apache Spark @ Scala Days NYCKeeping the fun in functional   w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional w/ Apache Spark @ Scala Days NYCHolden Karau
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Codemotion
 
Puppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisManojit Nandi
 
Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Chris Fregly
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git RightSven Peters
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)DECK36
 

La actualidad más candente (20)

Building Recoverable (and optionally async) Pipelines with Apache Spark (+ s...
Building Recoverable (and optionally async) Pipelines with Apache Spark  (+ s...Building Recoverable (and optionally async) Pipelines with Apache Spark  (+ s...
Building Recoverable (and optionally async) Pipelines with Apache Spark (+ s...
 
Validating big data pipelines - FOSDEM 2019
Validating big data pipelines -  FOSDEM 2019Validating big data pipelines -  FOSDEM 2019
Validating big data pipelines - FOSDEM 2019
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New York
 
Validating big data jobs - Spark AI Summit EU
Validating big data jobs  - Spark AI Summit EUValidating big data jobs  - Spark AI Summit EU
Validating big data jobs - Spark AI Summit EU
 
Validating big data pipelines - Scala eXchange 2018
Validating big data pipelines -  Scala eXchange 2018Validating big data pipelines -  Scala eXchange 2018
Validating big data pipelines - Scala eXchange 2018
 
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018Debugging Spark:  Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
Debugging Spark: Scala and Python - Super Happy Fun Times @ Data Day Texas 2018
 
Big data beyond the JVM - DDTX 2018
Big data beyond the JVM -  DDTX 2018Big data beyond the JVM -  DDTX 2018
Big data beyond the JVM - DDTX 2018
 
Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018
Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018
Apache spark as a gateway drug to FP concepts taught and broken - Curry On 2018
 
Sharing (or stealing) the jewels of python with big data & the jvm (1)
Sharing (or stealing) the jewels of python with big data & the jvm (1)Sharing (or stealing) the jewels of python with big data & the jvm (1)
Sharing (or stealing) the jewels of python with big data & the jvm (1)
 
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark... Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...
 
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional   w/ Apache Spark @ Scala Days NYCKeeping the fun in functional   w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
 
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
 
Puppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops RollsPuppet Camp Dallas 2014: How Puppet Ops Rolls
Puppet Camp Dallas 2014: How Puppet Ops Rolls
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysis
 
Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016
 
Web::Scraper
Web::ScraperWeb::Scraper
Web::Scraper
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git Right
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)
 

Similar a Intro - End to end ML with Kubeflow @ SignalConf 2018

Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)DataWorks Summit
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...Holden Karau
 
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowMigrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowDatabricks
 
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...Chris Fregly
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Holden Karau
 
Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Holden Karau
 
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017Chris Fregly
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingStanislav Osipov
 
CoffeeScript: A beginner's presentation for beginners copy
CoffeeScript: A beginner's presentation for beginners copyCoffeeScript: A beginner's presentation for beginners copy
CoffeeScript: A beginner's presentation for beginners copyPatrick Devins
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformBob Killen
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Holden Karau
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"Lviv Startup Club
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckData Con LA
 
Docker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore MeetupDocker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore Meetupsangam biradar
 
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo OmuraPreferred Networks
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...Chris Fregly
 
Serverless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformServerless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformMeetupDataScienceRoma
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 

Similar a Intro - End to end ML with Kubeflow @ SignalConf 2018 (20)

Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
 
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowMigrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
 
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
 
Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018
 
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
CoffeeScript: A beginner's presentation for beginners copy
CoffeeScript: A beginner's presentation for beginners copyCoffeeScript: A beginner's presentation for beginners copy
CoffeeScript: A beginner's presentation for beginners copy
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
 
Introduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdfIntroduction to Google Colaboratory.pdf
Introduction to Google Colaboratory.pdf
 
Leonid Kuligin "Training ML models with Cloud"
 Leonid Kuligin   "Training ML models with Cloud" Leonid Kuligin   "Training ML models with Cloud"
Leonid Kuligin "Training ML models with Cloud"
 
An introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuckAn introduction into Spark ML plus how to go beyond when you get stuck
An introduction into Spark ML plus how to go beyond when you get stuck
 
Docker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore MeetupDocker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore Meetup
 
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
20180926 kubeflow-meetup-1-kubeflow-operators-Preferred Networks-Shingo Omura
 
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
High Performance Distributed TensorFlow with GPUs - Nvidia GPU Tech Conferenc...
 
Serverless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformServerless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud Platform
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 

Último

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Último (20)

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 

Intro - End to end ML with Kubeflow @ SignalConf 2018

  • 1. End to End ML With Kubeflow & friends @holdenkarau Signal 2018 Legit-enough
  • 2. Some links (slides & recordings will be at): http://bit.ly/2QgsqF9 ^ Slides & code-lab links (after) CatLoversShow
  • 3. Holden: ● Prefered pronouns are she/her ● Developer Advocate at Google ● Apache Spark PMC/Committer, contribute to many other projects ● previously IBM, Alpine, Databricks, Google, Foursquare & Amazon ● co-author of Learning Spark & High Performance Spark ● Twitter: @holdenkarau ● Slide share http://www.slideshare.net/hkarau ● Code review livestreams: https://www.twitch.tv/holdenkarau / https://www.youtube.com/user/holdenkarau ● Spark Talk Videos http://bit.ly/holdenSparkVideos
  • 4.
  • 5. Who do I think you all are? ● Nice people* ● Interested in Machine Learning ● Possibly Familiar with one of Java, Scala, or Python Amanda
  • 6. What is in store for our adventure? ● We have 30 minutes :) ● Brief intros to what Kubernetes & Spark, and Kubeflow are ● How to train a model (ish) ● How to serve a model (ish) ● Scaling (ish) ● Updating models and other scary thoughts Ada Doglace
  • 8. ● General purpose distributed system ○ With a really nice API including Python :) ● Apache project ● Faster than Hadoop Map/Reduce ● Good when too big for a single machine ● Built on top of two abstractions for distributed data: RDDs & Datasets ● Has ML Libraries ● WIP Kubeflow integration PR 1467 What is Spark?
  • 9. The different pieces of Spark Apache Spark SQL, DataFrames & Datasets Structured Streaming Scala, Java, Python, & R Spark ML bagel & Graph X MLLib Scala, Java, PythonStreaming Graph Frames Paul Hudson
  • 10. So what does ML look like?
  • 12. Train Model on ML-Rig Photo by Tomomi
  • 14. Problem: Models are Cool, Feature prep is Hard Training is Tedious, Everyone Forgot Deployment
  • 17. What is Kubeflow? “Data Scientists” Model Serving On Kube Model Training *
  • 18. What is Kubeflow? “Kubeflow is a Cloud Native platform for machine learning based on Google’s internal machine learning pipelines.” or: ● The recognition that just a bunch of model weights isn’t enough ● Designed to support the ecosystem of tools needed (from data prep to serving) ● Open source project :) Ada Doglace
  • 19. Really just want to replace this: Photo by: Milestoned
  • 20. So you want to use this?
  • 21. What’s Next?! Step away from keyboard Think about type(s) of model Look at components directory and see what’s a fit tool wise Don’t know? Choose jupyter deal with the details live Can’t find it?
  • 23. What about just the basics?* ./scripts/kfctl.sh init ${KFAPP} --platform gcp --project ${PROJECT} cd ${KFAPP} ../scripts/kfctl.sh generate platform ../scripts/kfctl.sh apply platform ../scripts/kfctl.sh generate k8s ../scripts/kfctl.sh apply k8s
  • 24. What about just tensorflow?* ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow ks pkg install kubeflow/core@${VERSION} ks pkg install kubeflow/tf-serving@${VERSION} ks pkg install kubeflow/tf-job@${VERSION}
  • 25. Ok well I need to be able to access Jupyter too... kubectl port-forward -n ${NAMESPACE} `kubectl get pods -n ${NAMESPACE} --selector=service=ambassador -o jsonpath='{.items[0].metadata.name}'` 8080:80
  • 26. Your Special ML Training Goes here Don’t have any pressing projects but still want to have fun? Check out Michelle’s notebook for Github Issue summarization. Or want to see mnist again? here :)
  • 27. Your Special ML Training Goes here ... from keras.callbacks import CSVLogger, ModelCheckpoint script_name_base = 'tutorial_seq2seq' csv_logger = CSVLogger('{:}.log'.format(script_name_base)) model_checkpoint = ModelCheckpoint('{:}.epoch{{epoch:02d}}-val{{val_loss: .5f}}.hdf5'.format(script_name_base), save_best_only=True)
  • 28. Your Special ML Training Goes here history = seq2seq_Model.fit([encoder_input_data, decoder_input_data], np.expand_dims(decoder_target_data, -1), batch_size=batch_size, epochs=epochs, validation_split=0.12, callbacks=[csv_logger, model_checkpoint]) Really just check out Michelle’s notebook for Github Issue summarization.
  • 29. But what about [special foo-baz-inator] or [special-yak-shaving-tool]? Write a Dockerfile and build an image, use FROM so you’re not starting from scratch. FROM gcr.io/kubeflow-images-public/tensorflow-1.6.0-notebook-cpu RUN pip install py-special-yak-shaving-tool Then tell set it as a param for your training/serving job as needed: ks param set tfjob-v1alpha2 image "my-special-image-goes-here”
  • 30. What about that magical feature prep? For now it’s a mostly write-by-hand situation However TFX has some cool tools we can use today (like TF.Transform) if we’re ok with DirectRunner or Dataflow (with Flink support in the works indirectly)
  • 31. Enter: TF.Transform ● For pre-processing of your data ● e.g. where you spend 90% of your dev time anyways ● Integrates into serving time :D ● OSS ● Runs on top of Apache Beam, but current release not yet scalable outside of GCP ● On Apache Beam master this can run-ish on Flink, but rough ● Please don’t use this in production today unless your on GCP/Dataflow PROKathryn Yengel
  • 32. Defining a Transform processing function def preprocessing_fn(inputs): x = inputs['x'] y = inputs['y'] s = inputs['s'] x_centered = x - tft.mean(x) y_normalized = tft.scale_to_0_1(y) s_int = tft.string_to_int(s) return { 'x_centered': x_centered, 'y_normalized': y_normalized, 's_int': s_int}
  • 33. mean stddev normalize multiply quantiles bucketize Analyzers Reduce (full pass) Implemented as a distributed data pipeline Transforms Instance-to-instance (don’t change batch dimension) Pure TensorFlow
  • 35. Scale to ... Bag of Words / N-Grams Bucketization Feature Crosses tft.ngrams tft.string_to_int tf.string_split tft.scale_to_z_score tft.apply_buckets tft.quantiles tft.string_to_int tf.string_join ... Some common use-cases...
  • 36. BEAM Beyond the JVM: Current release ● Non JVM BEAM doesn’t work outside of Google’s environment yet ● tl;dr : uses grpc / protobuf ○ Similar to the common design but with more efficient representations (often) ● But exciting new plans to unify the runners and ease the support of different languages (called SDKS) ○ See https://beam.apache.org/contribute/portability/ ● If this is exciting, you can come join me on making BEAM work in Python3 ○ Yes we still don’t have that :( ○ But we're getting closer & you can come join us on BEAM-2874 :D Emma
  • 37. Serving: TF is probably easiest for now... MODEL_COMPONENT=my-model-server MODEL_NAME=cat-finder-3k ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} ks param set ${MODEL_COMPONENT} deployHttpProxy true ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH} ks apply ${KF_ENV} -c ${MODEL_COMPONENT}
  • 38. Or use Seldon Core & friends* Seldon Core is an OSS platform for deploying ML models on Kubernetes supported by Kubeflow. Supports Many Model types/formats: ● Tensorflow ● Sklearn ● Spark ML** ● R ● H20
  • 39. Set up seldon core for serving # Gives cluster-admin role to the default service account kubectl create clusterrolebinding seldon-admin --clusterrole=cluster-admin --serviceaccount=${NAMESPACE}:default # Install the kubeflow/seldon package ks pkg install kubeflow/seldon # Generate the seldon component and deploy it ks generate seldon seldon --name=seldon
  • 40. Build an image with your model* docker run -v $(pwd):/my_model seldonio/core-python-wrapper:0.7 /my_model IssueSummarization 0.1 gcr.io --base-image=python:3.6 --image-name=gcr-repository-name/my-image-name
  • 41. And kick off the new model: ks generate seldon-serve-simple new-serving-magic --name=model-name --image=gcr.io/gcr-repository-name/model:version --namespace=${NAMESPACE} --replicas=2 ks apply ${KF_ENV} -c new-serving-magic
  • 42. Wait so how do I use this? Your favourite rest library goes here* Timeouts matter! Doing recommendations? Have fall-backs Have multiple models? fall-backs *Need to use in batch? Maybe skip seldon, tf-serving & friends and integrate the library into your code. Or not. Trish Hamme
  • 43. Scaling - or ruh roh people are using this! replicas: 1 Becomes replicas: 10 Factor of 10 =~ “science”
  • 44. Wait really? ● Early: switch from mini-kube to ${cloud provider} with GPUs ○ “Vertical” scaling ● Next: increase # of workers for training ○ “Horizontal” scaling ○ Auto-scaling also WIP per-backend for the most part ● Serving, # of replicas ○ Auto-scaling is a WIP - https://github.com/kubeflow/kubeflow/issues/1219 PROJennifer C.
  • 45. What about validation? TensorFlow Data Validation (TFDV) Or Roll your own? ● Counters & execution time most common ● Please also check % of data change Spark-validator (proof of concept) Please validate your pipelines, and not just for data code changes too.
  • 46. Demo!
  • 48. Previously live demos recorded ● Kubeflow intro https://codelabs.developers.google.com/codelabs/kubeflow-intr oduction/index.html & streamed http://bit.ly/kfIntroStream ● Kubeflow E2E with Github issue summurizationhttps://codelabs.developers.google.com/codelab s/cloud-kubeflow-e2e-gis/ & streamed http://bit.ly/kfGHStream ● You can tell they were live streamed by how poorly went, I promise no video editing has occurred. ● You can do these yourself too (including one of them at our booth)!
  • 49. Join me & Boo @ Google’s booth @ 5PM And join my-coworker Casey West @ 6talking about: Building Captain Obvious: Understand Faster with Machine Learning APIs
  • 50. Want to watch working on a Kubeflow PR? ● Join Holden Friday @ 2pm pacific for live coding continuing working on her Apache Spark to Kubeflow (using the existing Spark operator as a base) https://www.youtube.com/watch?v=zHnTdqbjPik ● Or just https://youtube.com/user/holdenkarau & like + subscribe + click the bell :p
  • 51. k thnx bye :) Give feedback on this presentation http://bit.ly/holdenTalkFeedback