SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Bighead
Airbnb’s End-to-End Machine
Learning Infrastructure
Andrew Hoh and Krishna Puttaswamy
ML Infra @ Airbnb
In 2016
● Only a few major models in production
● Models took on average 8 week to 12 weeks to build
● Everything built in Aerosolve, Spark and Scala
● No support for Tensorflow, PyTorch, SK-Learn or other popular ML packages
● Significant discrepancies between offline and online data
ML Infra was formed with the charter to:
● Enable more users to build ML products
● Reduce time and effort
● Enable easier model evaluation
Q4 2016: Formation of our ML Infra team
Before ML
Infrastructure
ML has had a massive impact on Airbnb’s
product
● Search Ranking
● Smart Pricing
● Fraud Detection
After ML
Infrastructure
But there were many other areas that had
high-potential for ML, but had yet to realize its
full potential.
● Paid Growth - Hosts
● Classifying listing
● Room Type Categorizations
● Experience Ranking + Personalization
● Host Availability
● Business Travel Classifier
● Make Listing a Space Easier
● Customer Service Ticket Routing
● … And many more
Vision
Airbnb routinely ships ML-powered features throughout the
product.
Mission
Equip Airbnb with shared technology to build
production-ready ML applications with no incidental
complexity.
(Technology = tools, platforms, knowledge, shared feature data, etc.)
Value of ML
Infrastructure
Machine Learning Infrastructure can:
● Remove incidental complexities, by providing
generic, reusable solutions
● Simplify the workflow by providing tooling,
libraries, and environments that make ML
development more efficient
And at the same time:
● Establish a standardized platform that
enables cross-company sharing of feature
data and model components
● “Make it easy to do the right thing” (ex:
consistent training/streaming/scoring logic)
Bighead: Motivations
Learnings:
● No consistency between ML Workflows
● New teams struggle to begin using ML
● Airbnb has a wide variety in ML use cases
● Existing ML workflows are slow, fragmented, and brittle
● Incidental complexity vs. intrinsic complexity
● Build and forget - ML as a linear process
Q1 2017: Figuring out what to build
Architecture
● Consistent environment across the stack with Docker
● Consistent data transformation
○ Multi-row aggregation in the warehouse, single row transformation is part of
the model
○ Model transformation code is the same in online and offline
● Common workflow across different ML frameworks
○ Supports Scikit-learn, TF, PyTorch, etc.
● Modular components
○ Easy to customize parts
○ Easy to share data/pipelines
Key Design Decisions
Bighead Architecture
DeepThought Service
Deployment
App for ML
Models
ML Models
git repo
Redspot
ML Automator
(w/ Airflow)
ML DAG
Generator
Zipline Data
(K/V Store)
Clients
Offline Online
Bighead Service
Docker Image Service
Zipline
Feature
Repository
User ML Model
Bighead library
BigQueue
Service
Worker
Bighead UI
Zipline features
git repo
Components
● Data Management: Zipline
● Training: Redspot / BigQueue
● Core ML Library: ML Pipeline
● Productionisation: Deep Thought (online) / ML Automator (offline)
● Model Management: Bighead service
Zipline (ML Data Management Framework)
Zipline - Why
● Defining features (especially windowed) with hive was complicated and error
prone
● Backfilling training sets (on inefficient hive queries) was a major bottleneck
● No feature sharing
● Inconsistent offline and online datasets
● Warehouse is built as of end-of-day, lacked point-in-time features
● ML data pipelines lacked data quality checks or monitoring
● Ownership of pipelines was in disarray
For information on Zipline, please watch the recording of our other
Spark Summit session:
Zipline: Airbnb’s Machine Learning Data
Management Platform
Redspot (Hosted Jupyter Notebook Service)
Bighead Architecture
DeepThought Service
Deployment
App for ML
Models
ML Models
git repo
Redspot
ML Automator
(w/ Airflow)
ML DAG
Generator
Zipline Data
(K/V Store)
Clients
Offline Online
Bighead Service
Docker Image Service
Zipline
Feature
Repository
User ML Model
Bighead library
BigQueue
Service
Worker
Bighead UI
Zipline features
git repo
● Started with Jupyterhub (open-source project), which manages multiple Jupyter
Notebook Servers (prototyping environment)
● But users were installing packages locally, and then creating virtualenv for
other parts of our infra
○ Environment was very fragile
● Users wanted to be able to use jupyterhub on larger instances or instances
with GPU
● Wanting to share notebooks with other teammates was common too
● Files/content resilient to node failures
Redspot - Why
Containerized environments
● Every user’s environment is containerized via docker
○ Allows customizing the notebook environment without affecting
other users
■ e.g. install system/python packages
○ Easier to restore state therefore helps with reproducibility
● Support using custom docker images
○ Base images based on user’s needs
■ e.g. GPU access, pre-installed ML packages
Redspot
Redspot
Remote Instance Spawner
● For bigger jobs and total isolation,
Redspot allows launching a dedicated
instance
● Hardware resources not shared with
other users
● Automatically terminates idle instances
periodically
Redspot redspot-standalone
Local Docker Containers (x91)
Jupyterhub
Docker
Daemon
MySQL
EFS
Data Backend
S3, Hive,
Presto, Spark
Users
Remote Instances (x32)
redspot-singleuser
redspot-singleuser-gpu
Docker
Daemon
Docker
Daemon
X1e.32xlarge (128 vCPUs, 3.9 TB)
● Native dockerfile enforces strict single inheritance.
○ Prevents composition of base images
○ Might lead to copy/pasting Dockerfile
snippets around.
● A git repo of Dockerfiles for each stage and yml
file expressing:
○ Pre-build/post-build commands.
○ Build time/runtime dependencies. (mounting
directories, docker runtime)
● Image builder:
○ Build flow tool for chaining stages to
produce a single image.
○ Build independent images in parallel.
Docker Image Repo/Service
ubuntu16.04-py3.6-cuda9-cudnn7:
base: ubuntu14.04-py3.6
description: "A base Ubuntu 16.04 image with
python 3.6, CUDA 9, CUDNN 7"
stages:
- cuda/9.0
- cudnn/7
args:
python_version: '3.6'
cuda_version: '9.0'
cudnn_version: '7.0'
● A multi-tenant notebook environment
● Makes it easy to iterate and prototype ML models, share work
○ Integrated with the rest of our infra - so one can deploy a notebook to prod
● Improved upon open source Jupyterhub
○ Containerized; can bring custom Docker env
○ Remote notebook spawner for dedicated instances (P3 and X1 machines on
AWS)
○ Persist notebooks in EFS and share with teams
○ Reverting to prior checkpoint
● Support 200+ Weekly Active Users
Redspot Summary
Bighead Library
Bighead Library - Why
● Transformations (NLP, images) are often re-written by different users
● No clear abstraction for data transformation in the model
○ Every user can do data processing in a different way, leading to confusion
○ Users can easily write inefficient code
○ Loss of feature metadata during transformation (can’t plot feature importance)
○ Need special handling of CPU/GPU
○ No visualization of transformations
● Visualizing and understanding input data is key
○ But few good libraries to do so
● Library of transformations; holds more than 100+ different transformations including
automated preprocessing for common input formats (NLP, images, etc.)
● Pipeline abstraction to build a DAG of transformation on input data
○ Propagate feature metadata so we can plot feature importance at the end and
connect it to feature names
○ Pipelines for data processing are reusable in other pipelines
○ Feature parallel and data parallel transformations
○ CPU and GPU support
○ Supports Scikit APIs
● Wrappers for model frameworks (XGB, TF, etc.) so they can be easily serialized/deserialized
(robust to minor version changes)
● Provides training data visualization to help identify data issues
Bighead Library
Bighead Library: ML Pipeline
ML Pipeline Serial Transformation Visualization
ML Parallel Visualization
Feature Importance: Metadata Preserved Through
Transformations
Easy to Serialize/Deserialize
Training Data - Visualization
Productionisation:
Deep Thought (Online Inference Service)
Bighead Architecture
DeepThought Service
Deployment
App for ML
Models
ML Models
git repo
Redspot
ML Automator
(w/ Airflow)
ML DAG
Generator
Zipline Data
(K/V Store)
Clients
Offline Online
Bighead Service
Docker Image Service
Zipline
Feature
Repository
User ML Model
Bighead library
BigQueue
Service
Worker
Bighead UI
Zipline features
git repo
● Performant, scalable execution of model inference in production is hard
○ Engineers shouldn’t build one off solutions for every model.
○ Data scientists should be able to launch new models in production with minimal
eng involvement.
● Debugging differences between online inference and training are difficult
○ We should support the exact serialized version of the model the data scientist
built
○ We should be able to run the same python transformations data scientists write
for training.
○ We should be able to load data computed in the warehouse or streaming easily
into online scoring.
Deep Thought - Why
● Deep Thought is a shared service for online inference
○ Supports all frameworks integrated in ML Pipeline
○ Deployment is completely config driven so data scientists don’t have to involve
engineers to launch new models.
○ Engineers can then connect to a REST API from other services to get scores.
○ Support for loading data from K/V stores
○ Standardized logging, alerting and dashboarding for monitoring and offline
analysis of model performance
○ Isolation to enable multi-tenancy
○ Scalable and Reliable: 100+ models. Highest QPS service at Airbnb. Median
response time: 4ms. p95: 13ms.
Deep Thought - How
Productionisation: ML Automator
(Offline Training and Inference Service)
● Tools and services to automate common (mundane) tasks
○ Periodic training, evaluation and offline scoring of model
■ Get specified amount of resources (GPU/memory) for these tasks
○ Uploading scores to K/V stores
○ Dashboards on scores, alert on score changes
○ Orchestration of these tasks (via Airflow in our case),
○ Scoring on large tables is tricky to scale
ML Automator - Why
● Automate such tasks via configuration
○ We generate Airflow DAGs automatically to train, score, evaluate, upload scores, etc.
with appropriate resources
● Built custom tools to train/score on Spark for large datasets
○ Tools to get training data to the training machine quickly
○ Tool to generate virtualenv (that’s equivalent to a specified docker image) and run
executors in it as (our version of) Yarn doesn’t run executors in a docker image
ML Automator
Bighead Service
● Model management is hard
○ Need a single source of truth to track the model history
● Models reproducibility is hard
○ Models are trained on developer laptops and put into production
○ Model artifact isn’t tagged with model code git sha
● Model monitoring is done in many places
○ Evaluation metrics are stored in ipython notebooks
○ Online scores and model features are monitored separately
○ Online/offline scoring may not be consistent version
Bighead Service - Why
Bighead
Service
Overview
Bighead’s model management service
● Contains prototype and production models
● Can serve models “raw” or trained
● The source of truth on which trained models are
in production
● Stores model health data
Bighead
Service
Internals
We decompose Models into two components:
● Model Version - raw model code + docker image
● Model Artifact - parameters learned via training
Model
Version
Model Artifact
Code
Docker
Image
A trained model consists of:
Model Version
+
Model Artifact
Production
ML models have diverse dependency sets (tensorflow,
xgboost, etc.). We allow users to provide a docker image
within which model code always runs.
ML models don’t run in isolation however, so we’ve built a
lightweight API to interact with the “dockerized model”
Docker Container
Model
(user code)
Other ML
Infra
Services
Model
API
Dockerized
Models
Our built-in UI provides:
● Deployment - review changes, deploy, and rollback trained models
● Model Health - metrics, visualizations, alerting, central dashboard
● Experimentation - Ability to setup model experiments - e.g. split traffic
between two or more models
Bighead: UI
Bighead Architecture
Deployment
App for ML
Models
ML Models
git repo
Redspot
Bighead Service
User ML Model
Bighead library
Bighead UI
Bighead Service/UI
● Bighead’s central model
management service
● The “Deployboard” for
trained ML models - i.e.
the sole source of truth
about what model is
deployed
● End-to-End platform to build and deploy ML models to production
● Built on open source technology
○ Spark, Jupyter, Kubernetes, Scikit-learn, TF, XGBoost, etc.
● But had to fix various gaps in the path to productionisation
○ Generic online and offline inference service (that supports different
frameworks)
○ Feature generation and management framework
○ Model data transformation library
○ Model and data visualization libraries
○ Docker image customization service
○ Multi-tenant training environment
Bighead Summary
Plan to Open Source Soon
If you want to collaborate come talk to
andrew.hoh@airbnb.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
Airflow tutorials hands_on
Airflow tutorials hands_onAirflow tutorials hands_on
Airflow tutorials hands_on
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Airflow at lyft
Airflow at lyftAirflow at lyft
Airflow at lyft
 
Serving models using KFServing
Serving models using KFServingServing models using KFServing
Serving models using KFServing
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 

Similar a Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswamy and Andrew Hoh

Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
Chester Chen
 

Similar a Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswamy and Andrew Hoh (20)

ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
 
Devoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipsterDevoxx Belgium 2017 - easy microservices with JHipster
Devoxx Belgium 2017 - easy microservices with JHipster
 
Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017Easy Microservices with JHipster - Devoxx BE 2017
Easy Microservices with JHipster - Devoxx BE 2017
 
Go at uber
Go at uberGo at uber
Go at uber
 
Instant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositoriesInstant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositories
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
 
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architectureCommit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
 
Collaborative environment with data science notebook
Collaborative environment with data science notebook Collaborative environment with data science notebook
Collaborative environment with data science notebook
 
How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product
 
Meetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaCMeetup 2020 - Back to the Basics part 101 : IaC
Meetup 2020 - Back to the Basics part 101 : IaC
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...Designing and coding for cloud-native applications using Python, Harjinder Mi...
Designing and coding for cloud-native applications using Python, Harjinder Mi...
 
introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro services
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 

Más de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Último (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswamy and Andrew Hoh

  • 1. Bighead Airbnb’s End-to-End Machine Learning Infrastructure Andrew Hoh and Krishna Puttaswamy ML Infra @ Airbnb
  • 2. In 2016 ● Only a few major models in production ● Models took on average 8 week to 12 weeks to build ● Everything built in Aerosolve, Spark and Scala ● No support for Tensorflow, PyTorch, SK-Learn or other popular ML packages ● Significant discrepancies between offline and online data ML Infra was formed with the charter to: ● Enable more users to build ML products ● Reduce time and effort ● Enable easier model evaluation Q4 2016: Formation of our ML Infra team
  • 3. Before ML Infrastructure ML has had a massive impact on Airbnb’s product ● Search Ranking ● Smart Pricing ● Fraud Detection
  • 4. After ML Infrastructure But there were many other areas that had high-potential for ML, but had yet to realize its full potential. ● Paid Growth - Hosts ● Classifying listing ● Room Type Categorizations ● Experience Ranking + Personalization ● Host Availability ● Business Travel Classifier ● Make Listing a Space Easier ● Customer Service Ticket Routing ● … And many more
  • 5. Vision Airbnb routinely ships ML-powered features throughout the product. Mission Equip Airbnb with shared technology to build production-ready ML applications with no incidental complexity. (Technology = tools, platforms, knowledge, shared feature data, etc.)
  • 6. Value of ML Infrastructure Machine Learning Infrastructure can: ● Remove incidental complexities, by providing generic, reusable solutions ● Simplify the workflow by providing tooling, libraries, and environments that make ML development more efficient And at the same time: ● Establish a standardized platform that enables cross-company sharing of feature data and model components ● “Make it easy to do the right thing” (ex: consistent training/streaming/scoring logic)
  • 8. Learnings: ● No consistency between ML Workflows ● New teams struggle to begin using ML ● Airbnb has a wide variety in ML use cases ● Existing ML workflows are slow, fragmented, and brittle ● Incidental complexity vs. intrinsic complexity ● Build and forget - ML as a linear process Q1 2017: Figuring out what to build
  • 9.
  • 11. ● Consistent environment across the stack with Docker ● Consistent data transformation ○ Multi-row aggregation in the warehouse, single row transformation is part of the model ○ Model transformation code is the same in online and offline ● Common workflow across different ML frameworks ○ Supports Scikit-learn, TF, PyTorch, etc. ● Modular components ○ Easy to customize parts ○ Easy to share data/pipelines Key Design Decisions
  • 12. Bighead Architecture DeepThought Service Deployment App for ML Models ML Models git repo Redspot ML Automator (w/ Airflow) ML DAG Generator Zipline Data (K/V Store) Clients Offline Online Bighead Service Docker Image Service Zipline Feature Repository User ML Model Bighead library BigQueue Service Worker Bighead UI Zipline features git repo
  • 13. Components ● Data Management: Zipline ● Training: Redspot / BigQueue ● Core ML Library: ML Pipeline ● Productionisation: Deep Thought (online) / ML Automator (offline) ● Model Management: Bighead service
  • 14. Zipline (ML Data Management Framework)
  • 15. Zipline - Why ● Defining features (especially windowed) with hive was complicated and error prone ● Backfilling training sets (on inefficient hive queries) was a major bottleneck ● No feature sharing ● Inconsistent offline and online datasets ● Warehouse is built as of end-of-day, lacked point-in-time features ● ML data pipelines lacked data quality checks or monitoring ● Ownership of pipelines was in disarray
  • 16. For information on Zipline, please watch the recording of our other Spark Summit session: Zipline: Airbnb’s Machine Learning Data Management Platform
  • 17. Redspot (Hosted Jupyter Notebook Service)
  • 18. Bighead Architecture DeepThought Service Deployment App for ML Models ML Models git repo Redspot ML Automator (w/ Airflow) ML DAG Generator Zipline Data (K/V Store) Clients Offline Online Bighead Service Docker Image Service Zipline Feature Repository User ML Model Bighead library BigQueue Service Worker Bighead UI Zipline features git repo
  • 19. ● Started with Jupyterhub (open-source project), which manages multiple Jupyter Notebook Servers (prototyping environment) ● But users were installing packages locally, and then creating virtualenv for other parts of our infra ○ Environment was very fragile ● Users wanted to be able to use jupyterhub on larger instances or instances with GPU ● Wanting to share notebooks with other teammates was common too ● Files/content resilient to node failures Redspot - Why
  • 20. Containerized environments ● Every user’s environment is containerized via docker ○ Allows customizing the notebook environment without affecting other users ■ e.g. install system/python packages ○ Easier to restore state therefore helps with reproducibility ● Support using custom docker images ○ Base images based on user’s needs ■ e.g. GPU access, pre-installed ML packages
  • 23. Remote Instance Spawner ● For bigger jobs and total isolation, Redspot allows launching a dedicated instance ● Hardware resources not shared with other users ● Automatically terminates idle instances periodically
  • 24. Redspot redspot-standalone Local Docker Containers (x91) Jupyterhub Docker Daemon MySQL EFS Data Backend S3, Hive, Presto, Spark Users Remote Instances (x32) redspot-singleuser redspot-singleuser-gpu Docker Daemon Docker Daemon X1e.32xlarge (128 vCPUs, 3.9 TB)
  • 25. ● Native dockerfile enforces strict single inheritance. ○ Prevents composition of base images ○ Might lead to copy/pasting Dockerfile snippets around. ● A git repo of Dockerfiles for each stage and yml file expressing: ○ Pre-build/post-build commands. ○ Build time/runtime dependencies. (mounting directories, docker runtime) ● Image builder: ○ Build flow tool for chaining stages to produce a single image. ○ Build independent images in parallel. Docker Image Repo/Service ubuntu16.04-py3.6-cuda9-cudnn7: base: ubuntu14.04-py3.6 description: "A base Ubuntu 16.04 image with python 3.6, CUDA 9, CUDNN 7" stages: - cuda/9.0 - cudnn/7 args: python_version: '3.6' cuda_version: '9.0' cudnn_version: '7.0'
  • 26. ● A multi-tenant notebook environment ● Makes it easy to iterate and prototype ML models, share work ○ Integrated with the rest of our infra - so one can deploy a notebook to prod ● Improved upon open source Jupyterhub ○ Containerized; can bring custom Docker env ○ Remote notebook spawner for dedicated instances (P3 and X1 machines on AWS) ○ Persist notebooks in EFS and share with teams ○ Reverting to prior checkpoint ● Support 200+ Weekly Active Users Redspot Summary
  • 28. Bighead Library - Why ● Transformations (NLP, images) are often re-written by different users ● No clear abstraction for data transformation in the model ○ Every user can do data processing in a different way, leading to confusion ○ Users can easily write inefficient code ○ Loss of feature metadata during transformation (can’t plot feature importance) ○ Need special handling of CPU/GPU ○ No visualization of transformations ● Visualizing and understanding input data is key ○ But few good libraries to do so
  • 29. ● Library of transformations; holds more than 100+ different transformations including automated preprocessing for common input formats (NLP, images, etc.) ● Pipeline abstraction to build a DAG of transformation on input data ○ Propagate feature metadata so we can plot feature importance at the end and connect it to feature names ○ Pipelines for data processing are reusable in other pipelines ○ Feature parallel and data parallel transformations ○ CPU and GPU support ○ Supports Scikit APIs ● Wrappers for model frameworks (XGB, TF, etc.) so they can be easily serialized/deserialized (robust to minor version changes) ● Provides training data visualization to help identify data issues Bighead Library
  • 31. ML Pipeline Serial Transformation Visualization
  • 33. Feature Importance: Metadata Preserved Through Transformations
  • 35. Training Data - Visualization
  • 37. Bighead Architecture DeepThought Service Deployment App for ML Models ML Models git repo Redspot ML Automator (w/ Airflow) ML DAG Generator Zipline Data (K/V Store) Clients Offline Online Bighead Service Docker Image Service Zipline Feature Repository User ML Model Bighead library BigQueue Service Worker Bighead UI Zipline features git repo
  • 38. ● Performant, scalable execution of model inference in production is hard ○ Engineers shouldn’t build one off solutions for every model. ○ Data scientists should be able to launch new models in production with minimal eng involvement. ● Debugging differences between online inference and training are difficult ○ We should support the exact serialized version of the model the data scientist built ○ We should be able to run the same python transformations data scientists write for training. ○ We should be able to load data computed in the warehouse or streaming easily into online scoring. Deep Thought - Why
  • 39. ● Deep Thought is a shared service for online inference ○ Supports all frameworks integrated in ML Pipeline ○ Deployment is completely config driven so data scientists don’t have to involve engineers to launch new models. ○ Engineers can then connect to a REST API from other services to get scores. ○ Support for loading data from K/V stores ○ Standardized logging, alerting and dashboarding for monitoring and offline analysis of model performance ○ Isolation to enable multi-tenancy ○ Scalable and Reliable: 100+ models. Highest QPS service at Airbnb. Median response time: 4ms. p95: 13ms. Deep Thought - How
  • 40.
  • 41.
  • 42. Productionisation: ML Automator (Offline Training and Inference Service)
  • 43. ● Tools and services to automate common (mundane) tasks ○ Periodic training, evaluation and offline scoring of model ■ Get specified amount of resources (GPU/memory) for these tasks ○ Uploading scores to K/V stores ○ Dashboards on scores, alert on score changes ○ Orchestration of these tasks (via Airflow in our case), ○ Scoring on large tables is tricky to scale ML Automator - Why
  • 44. ● Automate such tasks via configuration ○ We generate Airflow DAGs automatically to train, score, evaluate, upload scores, etc. with appropriate resources ● Built custom tools to train/score on Spark for large datasets ○ Tools to get training data to the training machine quickly ○ Tool to generate virtualenv (that’s equivalent to a specified docker image) and run executors in it as (our version of) Yarn doesn’t run executors in a docker image ML Automator
  • 46. ● Model management is hard ○ Need a single source of truth to track the model history ● Models reproducibility is hard ○ Models are trained on developer laptops and put into production ○ Model artifact isn’t tagged with model code git sha ● Model monitoring is done in many places ○ Evaluation metrics are stored in ipython notebooks ○ Online scores and model features are monitored separately ○ Online/offline scoring may not be consistent version Bighead Service - Why
  • 47. Bighead Service Overview Bighead’s model management service ● Contains prototype and production models ● Can serve models “raw” or trained ● The source of truth on which trained models are in production ● Stores model health data
  • 48. Bighead Service Internals We decompose Models into two components: ● Model Version - raw model code + docker image ● Model Artifact - parameters learned via training Model Version Model Artifact Code Docker Image A trained model consists of: Model Version + Model Artifact Production
  • 49. ML models have diverse dependency sets (tensorflow, xgboost, etc.). We allow users to provide a docker image within which model code always runs. ML models don’t run in isolation however, so we’ve built a lightweight API to interact with the “dockerized model” Docker Container Model (user code) Other ML Infra Services Model API Dockerized Models
  • 50. Our built-in UI provides: ● Deployment - review changes, deploy, and rollback trained models ● Model Health - metrics, visualizations, alerting, central dashboard ● Experimentation - Ability to setup model experiments - e.g. split traffic between two or more models Bighead: UI
  • 51. Bighead Architecture Deployment App for ML Models ML Models git repo Redspot Bighead Service User ML Model Bighead library Bighead UI
  • 52. Bighead Service/UI ● Bighead’s central model management service ● The “Deployboard” for trained ML models - i.e. the sole source of truth about what model is deployed
  • 53. ● End-to-End platform to build and deploy ML models to production ● Built on open source technology ○ Spark, Jupyter, Kubernetes, Scikit-learn, TF, XGBoost, etc. ● But had to fix various gaps in the path to productionisation ○ Generic online and offline inference service (that supports different frameworks) ○ Feature generation and management framework ○ Model data transformation library ○ Model and data visualization libraries ○ Docker image customization service ○ Multi-tenant training environment Bighead Summary
  • 54. Plan to Open Source Soon If you want to collaborate come talk to andrew.hoh@airbnb.com