Sometimes just creating a good model is not enough we need to enable people to use it and that often means making it a part of a bigger system or somehow deploying it. This will be from an engineering point of view of how we work with a data scientist or a team of them to make sure the model is production ready. Here is a short check list of things we would do for each model: 1. understand what the model is trying to do/predict 2. define all of the model inputs and outputs 3. define point (as a point in time and integration point) in the wider system when the model is called 4. define how we want to host the model. We from engineering team usually help to make sure we can gather all of the model inputs and process all of the model outputs, also we make sure models are fast and reliable to call in a production environment and we help optimize them for that we also help enforce good engineering practices that rub off on DS people and make them more efficient. And in this talk we will see a few examples of how we do things and what things to look for.
[DSC Europe 22] Engineers guide for shepherding models in to production - Marko Dimitrijevic
1. Engineers guide for
shepherding models
in to production
Marko Dimitrijević
Staff software engineer at vroom
marko.dimitrijevic@vast.com
m_a_r_e.91@hotmail.com
2. What is an example of a
data science model?
● Takes in data
● Learns from data
● Gets updated
● Produces results
● Provides value
3. What does it mean to
run a model in
production
environment
And what does it mean to run a model in real time
● Returns a result every time (or a
meaningful error)
● Inputs and outputs are well
defined and explained
● Performant and reasonably
optimized for the task
● Results can be depended on and
are verified to be correct
● System is able to handle errors
and edge cases
4. Two main topics we will cover
Building model pipeline
● Defining inputs and outputs
● Defining a call place
● Collecting inputs and
delivering outputs
● Handling edge cases
Hosting the model
● Defining resource needs
● Optimizing model code
● Integrating with platform
code
● Testing and iterating
5. What does the
engineering team do
We also rename stuff to follow standards!
● We take care of monitoring and
scaling to match the load
● We take care of the model
hosting and plumbing for
providing inputs and outputs
● We also advise on what is
doable and not and help find the
optimal solution
● We help rework models to run
faster, better or to follow good
practices
● We know to ask the right
question and prevent future
problems
6. How model pipeline
is built
Assuming research part is done and algorithm is
picked
● Produce the model
● Identify where we can get the
data for training
● Find a way to get all the inputs
the model needs
● Find the right location to integrate
the model and call it
● Deliver model outputs to the
right place
8. Common issues in building model pipelines
Most common issues are data related
● Model is using data upstream of the place from where it will be called.
This can lead to inputs that do not take raw data but some processed/aggregated version of it that is not
available at model call place.
● Data gets renamed many times on its way from source to the model training
Sometimes people call same things different names and vice versa, this can lead to confusion and wrong
inputs being used. If i want a location is that location of the customer or vehicle location or address of
reconditioning center handling the vehicle?
● Incomplete data is filtered out in training but can it be in real world
It is easy to ignore incomplete data when there is a lot of leftover data to work with, but in the real world pieces
of information are often missing and sometimes a prediction is better than no prediction. Coverage can be very
low for models that are strict about their inputs.
9. Common issues in building model pipelines
Potential timing issues
● If model is called in real time and user is waiting for a response it should be <1s
Sometimes models are not built to operate fast on a single input but are optimized for batch processing. Also
we often need to spend time querying different services to prepare model inputs and that time adds up to the
total response time to the user.
● Once model is done and needs to be integrated the process takes a long time
Inserting a model in a production flow can take time and if we need to build a pipeline that gets multiple
different and “exotic” inputs that can include many teams and be time consuming
● It is hard to identify the right moment in time to trigger the model
We often want to predict something as early in the process as possible but we also want to have as many
inputs as possible, sometimes inputs become available over time and we have to decide how long to wait.
10. Some tips for building models
● Build for data available at model call time/place
Figure out what data is actually available at the place in the system that is going to use our model
Figure out at what point in time the model is going to be called and what data we have at that point in time
Expect to have missing data at runtime and prepare for it
● Describe your inputs and outputs
When defining inputs have a description for each one that helps people figure out what it is
● Communicate early and optimize
Start a conversation with with people early in the process about what model needs to run so it can be planned for and be ready to
iterate on the inputs to adjust for the system limitations
Optimize for the way data is being processed batch or single piece at the time and make sure model is fast enough for the intended
use case
11. Some more tips for building models
● Build lookup tables for “static” data
Not everything has to be provided to the model, consider making lookup tables that contain some of the “static” slower changing
inputs. For example average user activity per day per state, or popularity per year, make, model, trim of a vehicle. Also include keys
for missing in the lookup tables.
Bundle lookup tables with the model and make an output of the training process. So each time a model is updated tables can be
updated as well.
● Retrain often and include previous model tracking
Automate model training process and retrain the model often if data is chaining.
Setup tracking of the model once it is in production and record inputs and outputs. Compare those values with what is expected and
include it in future model training.
● Be careful with string inputs data normalization might not happen on the source of data
and some edge case values might appear rarely enough to be missed. Define a set of
acceptable inputs or do normalization at runtime
12. How we host our
models
We support thousands of requests per second and
few tens of milliseconds response times in some
cases!
This is an example of how we could host a model
and does not have to represent a real vroom
approach.
● Iterate on model code with DS
owner many times
● Provide a template on how to
deliver and build a model
● Provide a stable, fast and
reliable platform that makes
calling models easy
● Provide an easy way to run many
version of the model and to add
new versions
● Be ready to evolve to satisfy
requirements and add new
features
13. Example of a hosted model
● Each time model is
trained a new mode data
version is created
● Model calling code is
shared for all model data
versions
● Service can load multiple
versions at the same time
● Model data version is
picked dynamically
● Alias can be defined and
pointed to specific model
data version
Simplified example system
14. How to define model code and model data
Model data
Produced by training, owned by DS, on remote storage
Model code
Owned by the ENG team
Located in the hosting
service
Consists of model artifact and lookup files
Artifact is packed up code (like .pkl file ) that has simple api to
call a predict function.
Lookup files should have values needed for input processing
and can be used by the model code and values from them are
passed in to model artifact
Should be updated often
New model data can be produced with each training cycle, API
and format should remain the same so it is compatible with
model code using it
Model code should not be
changed often (ever)
It is intended to do pre processing and post
processing of inputs/outputs using lookup
files
Model code should not contain
model logic
All the logic should be in model artifact
that is called by the model code.
15. How to integrate a model
● Write some code to invoke the trained model
Define generalized calling code that is parametrized by lookup files and knows how to pass in data and interpret the result. Make that
in to model code
● Add all dependencies
Identify all dependencies model needs to run, all inputs, all files and extract some inputs into lookup and make it in to model data
● Deliver model data to a remote location
Define a location where model data is delivered and the way data is structured
● iterate, iterate, iterate
Analyze performance and speed and iterate on improving it, convert from batch processing to single input processing, change data
types used or rework code blocks that are sub optimal
Promote model through environments and compared results with expected results
● Integrate with other service and start using it in production
16. What does a platform like this offer
● Easy and fast integration
Simple calling code logic to invoke the model written in python
● Easy and fast model updates
Model data is delivered to a remote location at will at any cadence.
Service automatically scans and detects new model data versions periodically at runtime.
● Easy and fast testing
Detected model data versions can be loaded(or unloaded) and hosted dynamically
Service hosts multiple model data versions of a model at the same time allowing for easy comparison, AB testing, and independent
updated of different use cases.
When calling the service a specific model data version that should be used can be defined.
If requested version is loaded response is sub second, if that version is discovered but not loaded then service downloads required
model data, loads it up and starts hosting it and then result is returned still within the same request with a delay. After first load
subsequent calls are fast.
17. What more does a platform like this offer
● Easy data manipulation
You can define calling code to transform your input before passing it to the model and also do the same thing with your outputs.
● Monitoring and constant uptime are built in
We monitor and report various metrics and have alerting for potential issues
● Designed for scaling and cost optimization
We can scale horizontally to cover huge loads and have automatic load based scaling and automatic unhealthy host replacements
with continuous uptime
We can host model with high resource requirements in an optimized way
We run efficiently with multiple model data versions sharing the same server but also multiple different models each having multiple
model data versions are sharing the same server.
This cuts down on cost dramatically and it might not seem as important with only one or two models in use, but when we get in to
tens of models each with many different model data versions that can become a huge cost if each model is running on a separate
server.
18. A peak under the
hood
● We use python based service for model hosting
to make DS integration of calling code easy
● We expose a set of REST API routes for each
supported model
● We run on aws ec2 instances directly and
support gpu and cpu based models
● We use s3 to store model data
● We run batch predictions and one by one real
time predictions and produce millions of
predictions per day
● Models are called by services that specialize in
orchestration, data gathering and caching
host/some-model/1/predict?modelDataVersion=last_one
Request:
[{
"vehicle": {
"year": 2019,
"make": "Lamborghini",
"model": "Huracan Spyder"
....
}
other inputs....
}]
Response:
{
"modelDataVersion": "2022-11-11",
"prediction": [
{
"reconditioningCost": "a lot of money :)"
}
]
}
*fictional api example
19. Let’s summarize
We also do a lot of other interesting stuff!
● Start building the pipeline early
in the proces
● Don’t build a model in a silo
● Ability to run many models in
parallel and many model
versions for each of them
● Automated support for updating
models
● Ability to run efficiently and
scale to match high loads
20. Vroom is hiring!
Reach out to our recruiters via LinkedIn to find out
more or send us your CV at vroomcareer@vast.com