Unified MLOps: Feature Stores & Model Deployment

•

1 recomendación•453 vistas

If you’ve brought two or more ML models into production, you know the struggle that comes from managing multiple data sets, feature engineering pipelines, and models. This talk will propose a whole new approach to MLOps that allows you to successfully scale your models, without increasing latency, by merging a database, a feature store, and machine learning. Splice Machine is a hybrid (HTAP) database built upon HBase and Spark. The database powers a one of a kind single-engine feature store, as well as the deployment of ML models as tables inside the database. A simple JDBC connection means Splice Machine can be used with any model ops environment, such as Databricks. The HBase side allows us to serve features to deployed ML models, and generate ML predictions, in milliseconds. Our unique Spark engine allows us to generate complex training sets, as well as ML predictions on petabytes of data. In this talk, Monte will discuss how his experience running the AI lab at NASA, and as CEO of Red Pepper, Blue Martini Software and Rocket Fuel, led him to create Splice Machine. Jack will give a quick demonstration of how it all works.

Datos y análisis

Uniﬁed MLOps:
Feature Stores and
Model Deployment
Monte Zweben- CEO @ Splice Machine
Jack Ploshnick- Data Scientist @ Splice Machine

Agenda
● Goals of production machine learning
● Why are these goals hard to achieve?
● What is a Feature Store
● Feature Store Landscape
● Database Deployment & Feature Stores

Real-Time Machine Learning Components
Scale-Out Operational
Data Platform
Feature Store
Re-usability, Governance, Serving
Model Deployment
Modeling Experimentation
Scale-Out Analytical Data Platform

Typical Machine Learning Infrastructure
Bespoke pipelines
Data Warehouse
Database
Real-Time Data
Model 1
Dashboard
Model 2

Pipeline Duplication is Not Enough
Higher Compute Costs Recreating Features
Lost Signal Data Lineage Nightmare

What is a Feature Store?
Real-Time Data Batch Data
Feature
Store
Feature
Search
Training Sets
Feature
Serving
Governance

Machine Learning with a Feature Store
Feature Store
Model 1
Data Warehouse
Database
Real-Time Data
Dashboard
Model 2

Feature Store Requirements
● Scales > 1B records
● Scales > 20K features
● Feature vector retrieval by primary key for inference <5ms-10ms
● Point-in-time consistency on training data
● Event-driven feature updates
● Batch feature updates
● Track feature lineage
● Discoverability and reuse with feature metadata
● Feature lineage
● Backfill of new features

Existing Architectures
Raw Data
Streaming
(KV store)
Batch
(Analytics Engine)
Feature
Store
Consumer

Alternative Approach- HTAP Database
Feature
Serving

Challenges of HTAP Databases
● In Memory
● Custom Hardware
● No support for secondary indexes or triggers
● Not ACID compliant

Splice Machine
● Scale-out
● Any Cloud/On-Prem
● Indexes and Triggers
● Full ACID Compliance

Feature Set Implementation
Feature Set Pipeline
INSERT / UPDATE
Initial
Backfill

Scalable & Persistent Storage of Predictions
● Easily track data drift
● Easily track concept drift
● Compare new models to history
● Fully audit-proof history

Database Deployment - Evaluation Store
Prediction made and populated at millisecond speed

HTAP Database: Feature Store + Deployment

Predictions Models Features Data
Which model made that
prediction?
Which algorithm,
parameters, and features
were used to train the
model?
How were the features
computed?
What was the raw data at
the time of training?
Splice Machine
Database Deployment Feature Store
Guaranteed Lineage and Governance

Más contenido relacionado

La actualidad más candente

MLOps with Azure DevOpsMarco Parenzan

MLOps with Kubeflow Saurabh Kaushik

Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks

Feature Store as a Data Foundation for Machine LearningProvectus

MLOps - The Assembly Line of MLJordan Birdsell

What is MLOpsHenrik Skogström

Learn to Use Databricks for the Full ML LifecycleDatabricks

Achieving Lakehouse Models with Spark 3.0Databricks

Data pipelines from zero to solidLars Albertsson

Ml ops on AWSPhilipBasford

Databricks FundamentalsDalibor Wijas

Introducing Databricks DeltaDatabricks

Introduction to Azure DatabricksJames Serra

Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner

Machine Learning & Amazon SageMakerAmazon Web Services

Hadoop and ManufacturingCloudera, Inc.

Data Streaming with Apache Kafka & MongoDBconfluent

Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra

Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer

Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks

La actualidad más candente (20)

MLOps with Azure DevOps

MLOps with Kubeflow

Using MLOps to Bring ML to Production/The Promise of MLOps

Feature Store as a Data Foundation for Machine Learning

MLOps - The Assembly Line of ML

What is MLOps

Learn to Use Databricks for the Full ML Lifecycle

Achieving Lakehouse Models with Spark 3.0

Data pipelines from zero to solid

Ml ops on AWS

Databricks Fundamentals

Introducing Databricks Delta

Introduction to Azure Databricks

Apache Kafka Streams + Machine Learning / Deep Learning

Machine Learning & Amazon SageMaker

Hadoop and Manufacturing

Data Streaming with Apache Kafka & MongoDB

Data Lakehouse, Data Mesh, and Data Fabric (r2)

Data Quality Patterns in the Cloud with Azure Data Factory

Building Lakehouses on Delta Lake with SQL Analytics Primer

Similar a Unified MLOps: Feature Stores & Model Deployment

PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io

Machine learning at scale - Webinar By zekeLabszekeLabs Technologies

DevOps in the Cloud with Microsoft Azuregjuljo

Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks

Continuous delivery for machine learningRajesh Muppalla

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain

Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...Data Con LA

MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...Piyush Kumar

Machine Learning Models in ProductionDataWorks Summit

Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11

Splice Machine's use of Apache Spark and MLflowDatabricks

SnappyData @ Seattle Spark MeetupSnappyData

Next Gen Big Data Analytics with Apache Apex DataWorks Summit/Hadoop Summit

Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media

Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex

Productionizing Machine Learning with a Microservices ArchitectureDatabricks

Low Latency Polyglot Model Scoring using Apache ApexApache Apex

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX

Similar a Unified MLOps: Feature Stores & Model Deployment (20)

PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...

Machine learning at scale - Webinar By zekeLabs

DevOps in the Cloud with Microsoft Azure

Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...

Continuous delivery for machine learning

New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...

Data Con LA 2019 - MetaConfig driven FeatureStore with Feature compute & Serv...

MetaConfig driven FeatureStore : MakeMyTrip | Presented at Data Con LA 2019 b...

Machine Learning Models in Production

Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf

Splice Machine's use of Apache Spark and MLflow

SnappyData @ Seattle Spark Meetup

Next Gen Big Data Analytics with Apache Apex

Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...

Big Data Berlin v8.0 Stream Processing with Apache Apex

Productionizing Machine Learning with a Microservices Architecture

Low Latency Polyglot Model Scoring using Apache Apex

Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...

Más de Databricks

DW Migration Webinar-March 2022.pptxDatabricks

Data Lakehouse Symposium | Day 1 | Part 1Databricks

Data Lakehouse Symposium | Day 1 | Part 2Databricks

Data Lakehouse Symposium | Day 2Databricks

Data Lakehouse Symposium | Day 4Databricks

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Democratizing Data Quality Through a Centralized PlatformDatabricks

Learn to Use Databricks for Data ScienceDatabricks

Why APM Is Not the Same As ML MonitoringDatabricks

The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks

Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks

Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks

Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks

Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks

Sawtooth Windows for Feature AggregationsDatabricks

Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Raven: End-to-end Optimization of ML Prediction QueriesDatabricks

Processing Large Datasets for ADAS Applications using Apache SparkDatabricks

Massive Data Processing in Adobe Using Delta LakeDatabricks

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 4

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Último

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

ASML's Taxonomy Adventure by Daniel Cantervoginip

办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ

办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Unified MLOps: Feature Stores & Model Deployment

1. Uniﬁed MLOps: Feature Stores and Model Deployment Monte Zweben- CEO @ Splice Machine Jack Ploshnick- Data Scientist @ Splice Machine

2. Agenda ● Goals of production machine learning ● Why are these goals hard to achieve? ● What is a Feature Store ● Feature Store Landscape ● Database Deployment & Feature Stores

3. Real-Time Machine Learning Components Scale-Out Operational Data Platform Feature Store Re-usability, Governance, Serving Model Deployment Modeling Experimentation Scale-Out Analytical Data Platform

4. Real-Time Machine Learning Components Scale-Out Operational Data Platform Feature Store Re-usability, Governance, Serving Model Deployment Modeling Experimentation Scale-Out Analytical Data Platform ML Landscape Today

5. Typical Machine Learning Infrastructure Bespoke pipelines Data Warehouse Database Real-Time Data Model 1 Dashboard Model 2

6. Pipeline Duplication is Not Enough Higher Compute Costs Recreating Features Lost Signal Data Lineage Nightmare

7. Feature Store

8. What is a Feature Store? Real-Time Data Batch Data Feature Store Feature Search Training Sets Feature Serving Governance

9. Machine Learning with a Feature Store Feature Store Model 1 Data Warehouse Database Real-Time Data Dashboard Model 2

10. Feature Store Requirements ● Scales > 1B records ● Scales > 20K features ● Feature vector retrieval by primary key for inference <5ms-10ms ● Point-in-time consistency on training data ● Event-driven feature updates ● Batch feature updates ● Track feature lineage ● Discoverability and reuse with feature metadata ● Feature lineage ● Backfill of new features

11. Existing Architectures Raw Data Streaming (KV store) Batch (Analytics Engine) Feature Store Consumer

12. Existing Architectures Raw Data Streaming (KV store) Batch (Analytics Engine) Feature Store Consumer

13. Alternative Approach- HTAP Database Feature Serving

14. Challenges of HTAP Databases ● In Memory ● Custom Hardware ● No support for secondary indexes or triggers ● Not ACID compliant

15. Splice Machine ● Scale-out ● Any Cloud/On-Prem ● Indexes and Triggers ● Full ACID Compliance

16. Feature Set Implementation Feature Set Pipeline INSERT / UPDATE Initial Backfill

17. Intuitive API

18. Model Deployment

19. Scalable & Persistent Storage of Predictions ● Easily track data drift ● Easily track concept drift ● Compare new models to history ● Fully audit-proof history

20. Database Deployment - Evaluation Store Prediction made and populated at millisecond speed

21. HTAP Database: Feature Store + Deployment

22. Predictions Models Features Data Which model made that prediction? Which algorithm, parameters, and features were used to train the model? How were the features computed? What was the raw data at the time of training? Splice Machine Database Deployment Feature Store Guaranteed Lineage and Governance

23. Questions?

Unified MLOps: Feature Stores & Model Deployment

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Unified MLOps: Feature Stores & Model Deployment

Similar a Unified MLOps: Feature Stores & Model Deployment (20)

Más de Databricks

Más de Databricks (20)

Último

Último (20)

Unified MLOps: Feature Stores & Model Deployment