SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
KFServing, model monitoring
with Spark and a Feature Store
Jim Dowling and Javier de la Rúa Martínez
Logical Clocks AB
Machine Learning (ML) with Information Dense Input
Input Image is Information Dense
JellyFish AI
Complex behaviour from information
dense signals. But, no brain to enrich
with history or context. Behaviour is
autonomic based only on the input.
NLP Input Text is Information Dense
Encoder1
Encoder24
[ Image from https://dl.acm.org/doi/fullHtml/10.1145/3329784 ]
“Python can do
everything that PySpark
can do and more”
99%
Spam
Bert Large
ML with Information Light Input - Enrich with History/Context
Web Search
Input signal: characters
Enrich with: your history,
profile, context, location.
Fraud: Transfer Money
Input signal: customer/bank ID, $$$
Enrich with: your credit, historical
transfers, location, bank’s ranking.
5G Edge Security*
Input signal: IP packets
Enrich with: your device history,
traffic flow characteristics.
*Image from https://www.ericsson.com/en/blog/2021/3/5g-edge-computing-gaming
The Feature Store enables AI-Enabled Products
Model
AI-enabled
Product
uses
Feature Store
Add
Context /
History
Enterprise Data
0. Pipelines Continually
Update Features
1. Predict
2. Enrich
Hopsworks Online Feature Store - RonDB
https://www.logicalclocks.com/blog/ai-ml-needs-a-key-value-store-and-redis-is-not-up-to-it
https://github.com/logicalclocks/rondb
RonDB is an open-source LATS
Database
RonDB out-performs Redis on a 32-core Server
Kafka
Input
RTFeatureGroup
ClickFeatureGroup
TableFeatureGroup
UserFeatureGroup
LogsFeatureGroup
User Clicks
DB Updates
User Profile Updates
Weblogs
Real-time features
Event Data
SQL
SQL DW
S3, HDFS
Feature Store
Train,
Batch App
Model
Serving
DataFrame
API
Low
Latency
Features
High
Latency
Features
Feature Pipelines, Online/Offline Feature Store
Online
Offline
User
▪ Join (reuse) features and materialize as training datasets
▪ File formats: TFRecord, NPY, CSV, PETASTORM, etc
Representing Models in the Feature Store with Training Datasets
transaction_type
transaction_amount
user_id
user_nationality
user_gender
transactions_fg
users_fg
Feature Groups Training Datasets
pk join
transactions_2020_td
Descriptive Statistics
Feature Correlations
Histograms
...
Baseline
Statistics
used for Data
Drift
Detection
fraud_classifier
Models
KubeFlow Model Serving (KFServing)
with a Feature Store
Local Remote
AI-Enabled
Product
Online
Feature Store
1.
3.
4.
2.
KFServing with an
Online Feature Store
1. Request Features
2. Return Enriched Feature Vector
3. Prediction Request
4. Make Prediction & Return Result
td = fs.get_training_dataset("card_fraud_model", 1)
input_keys = { “cc_num” : ... }
fv = td.get_serving_vector(input_keys)
1. Request Features
KFServing
KFServing with an
Online Feature Store
Local Remote
AI-Enabled
Product
KFServing Online
Feature Store
1. 2.
3.
4.
1. Prediction Request
2. Request Features
3. Return Enriched Feature Vector
4. Make Prediction & Return Result
class Transformer:
def _init_(self):
self.fs = #connect to feature store
self.td = self.fs.get_training_dataset("card_fraud_model")
def preprocess(inputs):
return td.get_serving_vector(inputs["cc_num"])
2. Request Features from inside the KFServing Transformer
KFServing Internals
KFServing
Internals
● KFServing Supports Complex inference pipelines
○ Transformer, Explainer, Multi-model serving
Online
Feature Store
[Image from https://www.kubeflow.org/docs/components/kfserving/kfserving/ ]
[Image from https://www.kubeflow.org/docs/components/kfserving/kfserving/ ]
Model Monitoring with KFServing and Hopsworks
AI Data Lifecycle - Model Serving to Feature Store
Feature
Data
Model
Registry
Training Artifacts
(Logs, Experiments)
Model
Serving
Training
Data
Feature Vectors
Inference Data, Stats,
New Training Data
Feed the AI Data Flywheel
AI Data Lifecycle - KFServing to Hopsworks
Model
Registry
Training Artifacts
(Logs, Experiments)
KFServing
Training
Data
Feature Vectors
Inference
Data
Kafka
Hopsworks
Feature
Store
Add support for Kafka/Spark Logging to KFServing
KFServing
● Enable automated ingestion to
Feature Store
○ Hopsworks can automatically
create an Avro schema for the
target Training Dataset
● Enable live monitoring of
inference data with Spark
Streaming
Kafka
Transformer Predictor
cc_num
long
num_trans_12h
avg_trans_1h
std_trans_10m
long, double,
float
fraud
bool
request response
Inference Data
AI-Enabled
Product
Kafka
Feature Vector
Context
Online
Feature Store
Offline
Feature Store
Baseline Statistics
Inference data
Evaluation
Data drift, outliers
Online Model Monitoring with Spark Streaming
Request
Response
KFServing
Hopsworks
Feature Store
Live inference data is an unbounded data stream
Stateful, global window-based monitoring on inference data.
Use Feature Store APIs to access descriptive statistics of the training
set to help identify data drift and outliers compared to the live
inference data.
Challenges in Online Model Monitoring with Spark
Streaming
Usage example
Windowed Outliers Pipe
Windowed Drift Pipe
Stats Outliers Pipe
Stats Drift Pipe
Outliers Pipe
Drift Pipe
Monitor pipe Window pipe
Stats pipe
Sink Pipe
Alerts
Reports
Insights
Inference
data
Spark Streaming For Online Model Monitoring
Scalable Architecture for Automating Machine Learning Model Monitoring
http://kth.diva-portal.org/smash/get/diva2:1464577/FULLTEXT01.pdf ]
Inference
Data
● Interactive Queries to debug the Model
● Interactive Queries to debug Inference Data
● Inspect Model KPIs Charts
● Inspect Model Serving Performance Charts
● Identify Model/Data Drift
● Interactive Queries to Audit Logs
Model Monitoring with Evaluation/Feature Store
Evaluation
Store
Feature
Store
ML Engineer
Data Scientist
● Understand Live Model Performance
● Use new Training Data
Kafka
Unified Feature and Data Drift Detection
Hopsworks
Feature
Store
Model
Registry
Training Artifacts
(Logs, Experiments)
KFServing
Training
Data
Feature Vectors
Deequ
Data
Validation
Feature
Pipelines
Feature Drift
Data Drift
Inference Data
Outcomes
Reuse Deequ Data Validation Rules in Hopsworks*
# Insert and validate feature data using the following expectation
expect = fs.create_expectation(...,
rules=[ Rule(name="HAS_MIN", level="WARNING", min=0),
Rule(name="HAS_MAX", level="ERROR", max=1000000) ])
pipeline_fg = fs.create_feature_group(..., expectations=[expect] )
pipeline_df = # dataframe from feature pipeline
pipeline_fg.insert(pipeline_df) # Expectations are validated on ingestion
# Insert inference data and validate using the following expectation
td = fs.get_training_dataset("model", version=1)
log_expect = fs.create_expectation(...,
rules=[ Rule(name="HAS_MIN", level="WARNING", min=td.stats[‘feature’].min),
Rule(name="HAS_MAX", level="ERROR", max=td.stats[‘feature’].max)])
logging_fg = fs.create_feature_group(..., expectations=[log_expect])
logging_df = # dataframe from prediction logging
logging_fg.insert(logging_df) # Rule evaluated on ingestion
*https://examples.hopsworks.ai/featurestore/hsfs/data_validation/feature_validation_python/
DEMO
github.com/logicalclocks
www.hopsworks.ai
@logicalclocks
This work was part-funded by the Aniara
Project (led by Ericsson), EU Celtic-Next.
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Más contenido relacionado

La actualidad más candente

Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsDataPhoenix
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
 
Databricks Partner Enablement Guide.pdf
Databricks Partner Enablement Guide.pdfDatabricks Partner Enablement Guide.pdf
Databricks Partner Enablement Guide.pdfssuserb74636
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Databricks
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms Arne Roßmann
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersDaniel Zivkovic
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Animesh Singh
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumChengKuan Gan
 

La actualidad más candente (20)

Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Databricks Partner Enablement Guide.pdf
Databricks Partner Enablement Guide.pdfDatabricks Partner Enablement Guide.pdf
Databricks Partner Enablement Guide.pdf
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML EngineersIntro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
Intro to Vertex AI, unified MLOps platform for Data Scientists & ML Engineers
 
Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
Data Product Architectures
Data Product ArchitecturesData Product Architectures
Data Product Architectures
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 

Similar a KFServing, Model Monitoring with Apache Spark and a Feature Store

Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseFeatureByte
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Data science for infrastructure dev week 2022
Data science for infrastructure   dev week 2022Data science for infrastructure   dev week 2022
Data science for infrastructure dev week 2022ZainAsgar1
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
Models in Minutes using AutoML
Models in Minutes using AutoMLModels in Minutes using AutoML
Models in Minutes using AutoMLBill Liu
 

Similar a KFServing, Model Monitoring with Apache Spark and a Feature Store (20)

Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Data science for infrastructure dev week 2022
Data science for infrastructure   dev week 2022Data science for infrastructure   dev week 2022
Data science for infrastructure dev week 2022
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Models in Minutes using AutoML
Models in Minutes using AutoMLModels in Minutes using AutoML
Models in Minutes using AutoML
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Último (20)

ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 

KFServing, Model Monitoring with Apache Spark and a Feature Store

  • 1. KFServing, model monitoring with Spark and a Feature Store Jim Dowling and Javier de la Rúa Martínez Logical Clocks AB
  • 2. Machine Learning (ML) with Information Dense Input Input Image is Information Dense JellyFish AI Complex behaviour from information dense signals. But, no brain to enrich with history or context. Behaviour is autonomic based only on the input. NLP Input Text is Information Dense Encoder1 Encoder24 [ Image from https://dl.acm.org/doi/fullHtml/10.1145/3329784 ] “Python can do everything that PySpark can do and more” 99% Spam Bert Large
  • 3. ML with Information Light Input - Enrich with History/Context Web Search Input signal: characters Enrich with: your history, profile, context, location. Fraud: Transfer Money Input signal: customer/bank ID, $$$ Enrich with: your credit, historical transfers, location, bank’s ranking. 5G Edge Security* Input signal: IP packets Enrich with: your device history, traffic flow characteristics. *Image from https://www.ericsson.com/en/blog/2021/3/5g-edge-computing-gaming
  • 4. The Feature Store enables AI-Enabled Products Model AI-enabled Product uses Feature Store Add Context / History Enterprise Data 0. Pipelines Continually Update Features 1. Predict 2. Enrich
  • 5. Hopsworks Online Feature Store - RonDB https://www.logicalclocks.com/blog/ai-ml-needs-a-key-value-store-and-redis-is-not-up-to-it https://github.com/logicalclocks/rondb RonDB is an open-source LATS Database RonDB out-performs Redis on a 32-core Server
  • 6. Kafka Input RTFeatureGroup ClickFeatureGroup TableFeatureGroup UserFeatureGroup LogsFeatureGroup User Clicks DB Updates User Profile Updates Weblogs Real-time features Event Data SQL SQL DW S3, HDFS Feature Store Train, Batch App Model Serving DataFrame API Low Latency Features High Latency Features Feature Pipelines, Online/Offline Feature Store Online Offline User
  • 7. ▪ Join (reuse) features and materialize as training datasets ▪ File formats: TFRecord, NPY, CSV, PETASTORM, etc Representing Models in the Feature Store with Training Datasets transaction_type transaction_amount user_id user_nationality user_gender transactions_fg users_fg Feature Groups Training Datasets pk join transactions_2020_td Descriptive Statistics Feature Correlations Histograms ... Baseline Statistics used for Data Drift Detection fraud_classifier Models
  • 8. KubeFlow Model Serving (KFServing) with a Feature Store
  • 9. Local Remote AI-Enabled Product Online Feature Store 1. 3. 4. 2. KFServing with an Online Feature Store 1. Request Features 2. Return Enriched Feature Vector 3. Prediction Request 4. Make Prediction & Return Result td = fs.get_training_dataset("card_fraud_model", 1) input_keys = { “cc_num” : ... } fv = td.get_serving_vector(input_keys) 1. Request Features KFServing
  • 10. KFServing with an Online Feature Store Local Remote AI-Enabled Product KFServing Online Feature Store 1. 2. 3. 4. 1. Prediction Request 2. Request Features 3. Return Enriched Feature Vector 4. Make Prediction & Return Result class Transformer: def _init_(self): self.fs = #connect to feature store self.td = self.fs.get_training_dataset("card_fraud_model") def preprocess(inputs): return td.get_serving_vector(inputs["cc_num"]) 2. Request Features from inside the KFServing Transformer
  • 11. KFServing Internals KFServing Internals ● KFServing Supports Complex inference pipelines ○ Transformer, Explainer, Multi-model serving Online Feature Store [Image from https://www.kubeflow.org/docs/components/kfserving/kfserving/ ]
  • 13. Model Monitoring with KFServing and Hopsworks
  • 14. AI Data Lifecycle - Model Serving to Feature Store Feature Data Model Registry Training Artifacts (Logs, Experiments) Model Serving Training Data Feature Vectors Inference Data, Stats, New Training Data Feed the AI Data Flywheel
  • 15. AI Data Lifecycle - KFServing to Hopsworks Model Registry Training Artifacts (Logs, Experiments) KFServing Training Data Feature Vectors Inference Data Kafka Hopsworks Feature Store
  • 16. Add support for Kafka/Spark Logging to KFServing KFServing ● Enable automated ingestion to Feature Store ○ Hopsworks can automatically create an Avro schema for the target Training Dataset ● Enable live monitoring of inference data with Spark Streaming Kafka Transformer Predictor cc_num long num_trans_12h avg_trans_1h std_trans_10m long, double, float fraud bool request response
  • 17. Inference Data AI-Enabled Product Kafka Feature Vector Context Online Feature Store Offline Feature Store Baseline Statistics Inference data Evaluation Data drift, outliers Online Model Monitoring with Spark Streaming Request Response KFServing Hopsworks Feature Store
  • 18. Live inference data is an unbounded data stream Stateful, global window-based monitoring on inference data. Use Feature Store APIs to access descriptive statistics of the training set to help identify data drift and outliers compared to the live inference data. Challenges in Online Model Monitoring with Spark Streaming
  • 19. Usage example Windowed Outliers Pipe Windowed Drift Pipe Stats Outliers Pipe Stats Drift Pipe Outliers Pipe Drift Pipe Monitor pipe Window pipe Stats pipe Sink Pipe Alerts Reports Insights Inference data Spark Streaming For Online Model Monitoring Scalable Architecture for Automating Machine Learning Model Monitoring http://kth.diva-portal.org/smash/get/diva2:1464577/FULLTEXT01.pdf ]
  • 20. Inference Data ● Interactive Queries to debug the Model ● Interactive Queries to debug Inference Data ● Inspect Model KPIs Charts ● Inspect Model Serving Performance Charts ● Identify Model/Data Drift ● Interactive Queries to Audit Logs Model Monitoring with Evaluation/Feature Store Evaluation Store Feature Store ML Engineer Data Scientist ● Understand Live Model Performance ● Use new Training Data Kafka
  • 21. Unified Feature and Data Drift Detection Hopsworks Feature Store Model Registry Training Artifacts (Logs, Experiments) KFServing Training Data Feature Vectors Deequ Data Validation Feature Pipelines Feature Drift Data Drift Inference Data Outcomes
  • 22. Reuse Deequ Data Validation Rules in Hopsworks* # Insert and validate feature data using the following expectation expect = fs.create_expectation(..., rules=[ Rule(name="HAS_MIN", level="WARNING", min=0), Rule(name="HAS_MAX", level="ERROR", max=1000000) ]) pipeline_fg = fs.create_feature_group(..., expectations=[expect] ) pipeline_df = # dataframe from feature pipeline pipeline_fg.insert(pipeline_df) # Expectations are validated on ingestion # Insert inference data and validate using the following expectation td = fs.get_training_dataset("model", version=1) log_expect = fs.create_expectation(..., rules=[ Rule(name="HAS_MIN", level="WARNING", min=td.stats[‘feature’].min), Rule(name="HAS_MAX", level="ERROR", max=td.stats[‘feature’].max)]) logging_fg = fs.create_feature_group(..., expectations=[log_expect]) logging_df = # dataframe from prediction logging logging_fg.insert(logging_df) # Rule evaluated on ingestion *https://examples.hopsworks.ai/featurestore/hsfs/data_validation/feature_validation_python/
  • 23. DEMO
  • 24. github.com/logicalclocks www.hopsworks.ai @logicalclocks This work was part-funded by the Aniara Project (led by Ericsson), EU Celtic-Next.
  • 25. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.