SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Uddeholm Data Architecture/ML Workshop
Feature Store: the missing data layer in ML pipelines?1
Kim Hammar
kim@logicalclocks.com
May 8, 2019
1
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 1 / 17
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Distributed Metadata
(Available from REST API)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
Outline
1 What is a Feature Store
2 Why You Need a Feature Store
3 How to Build a Feature Store (Hopsworks Feature Store)
4 Demo
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 3 / 17
Solution: Disentangle ML Pipelines
with a Feature Store
Raw/Structured Data
Feature Store
Feature Engineering Training
Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
A feature store is a central vault for storing documented, curated, and
access-controlled features.
The feature store is the interface between data engineering and data
model development
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 4 / 17
Make ML-Features A First-Class Citizen in Your Data Lakes
Traditional Feature Engineering
N spark tasks Derived data Model Training
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Raw data lake
Feature Engineering w Feature Store
N spark tasks Derived data Model Training
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Raw data lake
Feature Store
Metadata
...
Make your features first-class citizens:
Document features
Version features
Invest in a data layer specifically for features (feature store)
Make features access-controlled and searchable
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 5 / 17
What is a Feature?
A feature is a measurable property of some data-sample
A feature could be..
An aggregate value (min, max, mean, sum)
A raw value (a pixel, a word from a piece of text)
A value from a database table (the age of a customer)
A derived representation: e.g an embedding or a cluster
Features are the fuel for AI systems:




x1
...
xn




Features
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 6 / 17
Feature Engineering is Crucial for Model Performance
x1• • • • • •◦ ◦ ◦
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 7 / 17
Feature Engineering is Crucial for Model Performance
x1
x2
•
•
• •
•
•
◦
◦
◦
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 8 / 17
Feature Engineering is Crucial for Model Performance
x1
x2
•
•
• •
•
•
◦
◦
◦
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 9 / 17
Feature Engineering is Complex
Input Data
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale?
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale? How to manage the feature pipelines?
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale? How to manage the feature pipelines?
How to make features reusable and robust?
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Engineering is Complex
Input Data
lorem
ipsum
dolor
sit
amet
ut adjective
verb
noun
stopword
noun
verb IP
NP
Det
lorem
N
N
ipsum
I
I
dolor
VP
V
V
sit
AP
Deg
amet
A
A
ut
lorem
ipsum
dolor
sit
amet
ut
[0.141, −0.715, 0.51, . . .]
[0.303, −0.56, 0.32, . . .]
[−0.071, 0.9, 0.77, . . .]
[0.877, −0.05, −0.81, . . .]
[0.642, 0.115, 0.19, . . .]
[0.141, −0.295, 0.51, . . .]
Embeddings
Linguistic Features
Feature Engineering
[0.141, 0.51, . . .]
Feature Matrix
Train Val Test
Train/Val/Test Split
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Train Model
How do you make this scale? How to manage the feature pipelines?
How to make features reusable and robust?
Feature Engineering is Complex Yet Crucial for Model Performance
Treat your features accordingly!
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
Feature Pipeline Jungles
Data Lake (Raw/Structured Data)
Feature Data (Derived Data)
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 11 / 17
Disentangle Your ML Pipelines with a Feature Store
Data Sources Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Feature Store
A data management platform for machine learning.
The interface between data engineering and data science.
Models
Models are trained using sets of features.
The features are fetched from the feature store
and can overlap between models.
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Versioning
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
Disentangle Your ML Pipelines with a Feature Store
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Versioning
Documentation
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
High-Level APIs and Abstractions
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
High-Level APIs and Abstractions
Read from the feature store
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
High-Level APIs and Abstractions
Read from the feature store
Write to the feature store
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
High-Level APIs and Abstractions
Read from the feature store
Write to the feature store
Metadata operations
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
featurestore.create_featuregroup(
f_df, "t_features",
description="...", version=2)
d_dir = featurestore.get_training_dataset_path(td_name)
tf_schema = featurestore.get_tf_record_schema(td_name)
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
Existing Feature Stores
Uber’s feature store2
Airbnb’s feature store3
Comcast’s feature store4
Facebook’s feature store5
GO-JEK’s feature store6
Twitter’s feature store7
Branch International’s feature store8
Hopsworks’ feature store9 (the only open-source one!)
2
Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd International
Conference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of Machine
Learning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL:
http://proceedings.mlr.press/v67/li17a.html.
3
Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform.
https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018.
4
Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions.
https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to-
predictions. 2018.
5
Kim Hazelwood et al. “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”. In:
Feb. 2018, pp. 620–629. DOI: 10.1109/HPCA.2018.00059.
6
Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18.
https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018.
7Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 14 / 17
The Components of a Feature Store
Feature Storage
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 15 / 17
The Components of a Feature Store
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 15 / 17
The Components of a Feature Store
Client Interface
API
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
Feature Registry
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 15 / 17
Hopsworks Feature Store
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
Hopsworks Feature Store
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
Hopsworks Feature Store
REST API
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
Hopsworks Feature Store
Client Interface
API
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
Hopsworks
Feature Registry
REST API
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
Hopsworks Feature Store
Client Interface
API
from hops import featurestore
features_df = featurestore.get_features(
[
"average_attendance",
"average_player_age"
])
Hopsworks
Feature Registry
REST API
Feature Metadata
Schema Statistics Documentation Jobs
Feature Storage
HopsFS
Feature groups Training Datasets
e1
e2
e3
e4
Gradient
Gradient
Gradient
Gradient
Model Training/Serving
Feature Engineering
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
Summary
Machine learning comes with a high technical cost
Machine learning pipelines needs proper data management
A feature store is a place to store curated and documented features
The feature store serves as an interface between feature engineering
and model development, it can help disentangle complex ML pipelines
Hopsworks10 provides the world’s first open-source feature store
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
11
10
Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.
11
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou,
Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan,
Rasmus Toivonen, and Steffen Grohsschmiedt
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 17 / 17
References
Hopsworks’ feature store12
HopsML13
Hopsworks14
12
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines?
https://www.logicalclocks.com/feature-store/. 2018.
13
Logical Clocks AB. HopsML: Python-First ML Pipelines.
https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018.
14
Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018.
Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 18 / 17

Más contenido relacionado

Similar a Uddeholm ml workshop_hagfors_kim_hammar

Deep Dive on Deep Learning with MXNet - DevDay Austin 2017
Deep Dive on Deep Learning with MXNet - DevDay Austin 2017Deep Dive on Deep Learning with MXNet - DevDay Austin 2017
Deep Dive on Deep Learning with MXNet - DevDay Austin 2017
Amazon Web Services
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 

Similar a Uddeholm ml workshop_hagfors_kim_hammar (20)

Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
Begin with Machine Learning
Begin with Machine LearningBegin with Machine Learning
Begin with Machine Learning
 
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
 
Open Source SOA@OpenFest.org
Open Source SOA@OpenFest.orgOpen Source SOA@OpenFest.org
Open Source SOA@OpenFest.org
 
Deep Dive on Deep Learning with MXNet - DevDay Austin 2017
Deep Dive on Deep Learning with MXNet - DevDay Austin 2017Deep Dive on Deep Learning with MXNet - DevDay Austin 2017
Deep Dive on Deep Learning with MXNet - DevDay Austin 2017
 
Ember
EmberEmber
Ember
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17Hivemall meets Digdag @Hackertackle 2018-02-17
Hivemall meets Digdag @Hackertackle 2018-02-17
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
아마존의 딥러닝 기술 활용 사례 - 윤석찬 (AWS 테크니컬 에반젤리스트)
 
Distributed ML in Apache Spark
Distributed ML in Apache SparkDistributed ML in Apache Spark
Distributed ML in Apache Spark
 

Más de Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
Kim Hammar
 

Más de Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 

Último

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Uddeholm ml workshop_hagfors_kim_hammar

  • 1. Uddeholm Data Architecture/ML Workshop Feature Store: the missing data layer in ML pipelines?1 Kim Hammar kim@logicalclocks.com May 8, 2019 1 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 1 / 17
  • 2. HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Distributed Metadata (Available from REST API) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 3. Outline 1 What is a Feature Store 2 Why You Need a Feature Store 3 How to Build a Feature Store (Hopsworks Feature Store) 4 Demo Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 3 / 17
  • 4. Solution: Disentangle ML Pipelines with a Feature Store Raw/Structured Data Feature Store Feature Engineering Training Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy A feature store is a central vault for storing documented, curated, and access-controlled features. The feature store is the interface between data engineering and data model development Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 4 / 17
  • 5. Make ML-Features A First-Class Citizen in Your Data Lakes Traditional Feature Engineering N spark tasks Derived data Model Training b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Raw data lake Feature Engineering w Feature Store N spark tasks Derived data Model Training b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Raw data lake Feature Store Metadata ... Make your features first-class citizens: Document features Version features Invest in a data layer specifically for features (feature store) Make features access-controlled and searchable Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 5 / 17
  • 6. What is a Feature? A feature is a measurable property of some data-sample A feature could be.. An aggregate value (min, max, mean, sum) A raw value (a pixel, a word from a piece of text) A value from a database table (the age of a customer) A derived representation: e.g an embedding or a cluster Features are the fuel for AI systems:     x1 ... xn     Features b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy) Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 6 / 17
  • 7. Feature Engineering is Crucial for Model Performance x1• • • • • •◦ ◦ ◦ Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 7 / 17
  • 8. Feature Engineering is Crucial for Model Performance x1 x2 • • • • • • ◦ ◦ ◦ Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 8 / 17
  • 9. Feature Engineering is Crucial for Model Performance x1 x2 • • • • • • ◦ ◦ ◦ Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 9 / 17
  • 10. Feature Engineering is Complex Input Data Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 11. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 12. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 13. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 14. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 15. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 16. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? How to manage the feature pipelines? Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 17. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? How to manage the feature pipelines? How to make features reusable and robust? Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 18. Feature Engineering is Complex Input Data lorem ipsum dolor sit amet ut adjective verb noun stopword noun verb IP NP Det lorem N N ipsum I I dolor VP V V sit AP Deg amet A A ut lorem ipsum dolor sit amet ut [0.141, −0.715, 0.51, . . .] [0.303, −0.56, 0.32, . . .] [−0.071, 0.9, 0.77, . . .] [0.877, −0.05, −0.81, . . .] [0.642, 0.115, 0.19, . . .] [0.141, −0.295, 0.51, . . .] Embeddings Linguistic Features Feature Engineering [0.141, 0.51, . . .] Feature Matrix Train Val Test Train/Val/Test Split b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Train Model How do you make this scale? How to manage the feature pipelines? How to make features reusable and robust? Feature Engineering is Complex Yet Crucial for Model Performance Treat your features accordingly! Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 10 / 17
  • 19. Feature Pipeline Jungles Data Lake (Raw/Structured Data) Feature Data (Derived Data) Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 11 / 17
  • 20. Disentangle Your ML Pipelines with a Feature Store Data Sources Dataset 1 Dataset 2 . . . Dataset n Feature Store Feature Store A data management platform for machine learning. The interface between data engineering and data science. Models Models are trained using sets of features. The features are fetched from the feature store and can overlap between models. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
  • 21. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
  • 22. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
  • 23. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Versioning Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
  • 24. Disentangle Your ML Pipelines with a Feature Store Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Versioning Documentation Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 12 / 17
  • 25. High-Level APIs and Abstractions from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
  • 26. High-Level APIs and Abstractions Read from the feature store from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
  • 27. High-Level APIs and Abstractions Read from the feature store Write to the feature store from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
  • 28. High-Level APIs and Abstractions Read from the feature store Write to the feature store Metadata operations from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) featurestore.create_featuregroup( f_df, "t_features", description="...", version=2) d_dir = featurestore.get_training_dataset_path(td_name) tf_schema = featurestore.get_tf_record_schema(td_name) Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 13 / 17
  • 29. Existing Feature Stores Uber’s feature store2 Airbnb’s feature store3 Comcast’s feature store4 Facebook’s feature store5 GO-JEK’s feature store6 Twitter’s feature store7 Branch International’s feature store8 Hopsworks’ feature store9 (the only open-source one!) 2 Li Erran Li et al. “Scaling Machine Learning as a Service”. In: Proceedings of The 3rd International Conference on Predictive Applications and APIs. Ed. by Claire Hardgrove et al. Vol. 67. Proceedings of Machine Learning Research. Microsoft NERD, Boston, USA: PMLR, 2017, pp. 14–29. URL: http://proceedings.mlr.press/v67/li17a.html. 3 Nikhil Simha and Varant Zanoyan. Zipline: Airbnb’s Machine Learning Data Management Platform. https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform. 2018. 4 Nabeel Sarwar. Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions. https://databricks.com/session/operationalizing-machine-learning-managing-provenance-from-raw-data-to- predictions. 2018. 5 Kim Hazelwood et al. “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”. In: Feb. 2018, pp. 620–629. DOI: 10.1109/HPCA.2018.00059. 6 Willem Pienaar. Building a Feature Platform to Scale Machine Learning | DataEngConf BCN ’18. https://www.youtube.com/watch?v=0iCXY6VnpCc. 2018. 7Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 14 / 17
  • 30. The Components of a Feature Store Feature Storage Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 15 / 17
  • 31. The Components of a Feature Store Feature Metadata Schema Statistics Documentation Jobs Feature Storage Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 15 / 17
  • 32. The Components of a Feature Store Client Interface API from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) Feature Registry Feature Metadata Schema Statistics Documentation Jobs Feature Storage Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 15 / 17
  • 33. Hopsworks Feature Store Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
  • 34. Hopsworks Feature Store Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
  • 35. Hopsworks Feature Store REST API Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
  • 36. Hopsworks Feature Store Client Interface API from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) Hopsworks Feature Registry REST API Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
  • 37. Hopsworks Feature Store Client Interface API from hops import featurestore features_df = featurestore.get_features( [ "average_attendance", "average_player_age" ]) Hopsworks Feature Registry REST API Feature Metadata Schema Statistics Documentation Jobs Feature Storage HopsFS Feature groups Training Datasets e1 e2 e3 e4 Gradient Gradient Gradient Gradient Model Training/Serving Feature Engineering Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 16 / 17
  • 38. Summary Machine learning comes with a high technical cost Machine learning pipelines needs proper data management A feature store is a place to store curated and documented features The feature store serves as an interface between feature engineering and model development, it can help disentangle complex ML pipelines Hopsworks10 provides the world’s first open-source feature store @hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops 11 10 Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018. 11 Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, Rasmus Toivonen, and Steffen Grohsschmiedt Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 17 / 17
  • 39. References Hopsworks’ feature store12 HopsML13 Hopsworks14 12 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.logicalclocks.com/feature-store/. 2018. 13 Logical Clocks AB. HopsML: Python-First ML Pipelines. https://hops.readthedocs.io/en/latest/hopsml/hopsML.html. 2018. 14 Jim Dowling. Introducing Hopsworks. https://www.logicalclocks.com/introducing-hopsworks/. 2018. Kim Hammar (Logical Clocks) Hopsworks Feature Store May 8, 2019 18 / 17