At Schiphol airport we run a lot of mission critical machine learning models in production, ranging from models that predict passenger flow to computer vision models that analyze what is happening around the aircraft. Especially now in times of Covid it is paramount for us to be able to quickly iterate on these models by implementing new features, retraining them to match the new dynamics and above all to monitor them actively to see if they still fit the current state of affairs.
To achieve those needs we rely on MLFlow but have also integrated that with many of our other systems. So have we written Airflow operators for MLFlow to ease the retraining of our models, have we integrated MLFlow deeply with our CI pipelines and have we integrated it with our model monitoring tooling.
In this talk we will take you through the way we rely on MLFlow and how that enables us to release (sometimes) multiple versions of a model per week in a controlled fashion. With this set-up we are achieving the same benefits and speed as you have with a traditional software CI pipeline.
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Consolidating MLOps at One of Europe’s Biggest Airports
1. Consolidating MLOps at
Schiphol Airport
Floris Hoogenboom – Lead Data Scientist (Floris.Hoogenboom@schiphol.nl)
Sebastiaan Grasdijk - Senior Data Scientist (Sebastiaan.Grasdijk@Schiphol.nl)
2. Introduction
• Amsterdam Airport
• Before Covid: Europe’s third airport
• Approx 500.000 ATMs
• 72 Milion PAX in 2019
• Royal Schiphol Group
• Schiphol Digital
7. “Show how we implemented MLOps and how that enables us
to keep applying ML in a constantly changing environment.”
• Motivation
• Our MLFlow training set-up
• Bringing a trained model to production
• Monitoring
8. • Schiphol is a very dynamic place to apply AI in
Everyday some physical aspect of the Airport changes meaning that dynamics of e.g. PAX flow will be different.
• Most things we capture in our models, but some things we are not able to.
Sometimes we don’t know works will occur, sometimes long term incidents happen we hadn't foreseen and we
quickly want to adapt our models to.
• Keeping track and monitoring our models in production was always a big task already
• We often released updates to our models, e.g. including new data sources, deprecating temporarily
unavailable feeds etc. to make sure we always had the best quality.
9.
10. • Quite standard
• Have a very strict format for all of our models:
• Python package containing (1) library code (2) training
application and (3) inference application
• Training just entails installing the package and
referencing a fixed entrypoint that is everywhere the same
11.
12. • Machine learning deals with data
• There is only one type of data that matters for modelling: PRD
data
• Lots of organizations use the engineering DTAP flow where
scientists work on "DEV" to train their models
• This works if it's their DEV and not also some engineer's DEV
13. • Three types of models we deploy:
• Batch (e.g. Block Time Prediction) -> Databricks Job
• Streaming (e.g. Bagage time on Belt) -> Databricks Job
• Request/Reply (e.g. The forecasted disturbance at a given location) -> API in kubernetes
• Our way of integrating models in each of those deployments is more or less the same
• Focus on Batch for the rest of the talk
14.
15. • Cross environment dependencies
• Runtime dependencies (mlflow.load_model only executed when
running the job)
• Stability assumptions on your inference & model codebase
• "non-atomic" deployments: it is hard to keep track of exactly what is
running where
Dive into these points before showing how we resolved this.
16. • There is a discrepancy between a "model" in the deployment sense and
a "model" in the data science sense
• Models come with an interface that specifies:
• The features that should go in
• (implicitly) the data distribution of those features
• Deploying a model means deploying:
• The trained artifact
• Any code that is needed to do preprocessing/fetch queries to fetch data from a
datasource etc.
• These cannot be decoupled! (!!)
Is this always a big problem? No
Some models have a very stable API (e.g. computer
vision models).
17. • Not every model can be deployed with every version of your inference code
• You need to ensure that they are "feature compatible"
• This makes the Model registry UI a bit dangerous
New release that
dropped a few
features
Old release that still
used those features
What if we want to revert?
18. • There are two version identifiers that determine the actual prediction job that will run
• This is hard to reason about, debug, log and manage
• Having a single source of truth makes it possible to know what is running where and how to revert
19.
20. • Data Scientist adapts the codebase to train a
new model.
• Stores changes in Git
• Uses mlflow run to kick of a new mlflow run
on databricks that logs the new run to some
experiment.
• Data Scientist judges the quality of the
experiment and decides whether this is good
enough for review
21. • Data Scientist creates an MR on the repo to
merge:
• The code for training the new model
• The adapted inference code such that it matches
with the model
• The configuration files for the deployed model (!)
• Unittests, linting etc. Runs
• Then the interesting part starts.....
22. • CI Fetches the model from the MLFlow
experiment based on the specified Run ID
• CI "builds" the deployment artifact which
contains
• The model we wish to deploy
• The inference code you need to run it
• This creates a single artifact that can be
deployed without any runtime dependencies!
23. • Deploy the created deployment artifact
• As a databricks job
• As a docker container
• Etc..
• Environment just based on Git Tags
• Keep track of your environments like you
would do traditionally
24. • We do still use the model registry!
• The model registry is managed from the CI pipeline
• We use the following stages:
• On feature branch deployments: register a new version in the registry if it does not exist yet
• On master: promote model to staging
• On tags: promote model to production
25. • We use airflow for scheduling automated retraining
• We don't automatically "update" models in production based on retraining
• Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a
data scientist.
26. • We use airflow for scheduling automated retraining
• We don't automatically "update" models in production based on retraining
• Rather, we take away the manual process of starting a run etc., but the decision to go live is always up to a
data scientist.
27. • Metrics get logged to Datadog
• Anomaly monitoring and warnings are sent to a slack channel
• We use notebooks to dive into any anomalies we see
28. • Data Scientist can deploy models without any support
• We release new versions of many of our models every week
• Not only by training on new data, but also by adding features, changing data fetching etc.
• Fully versioned with a single source of truth
• If it works on DEV, it will work on ACP and PRD because of the single deployment package
• Easy to revert if something breaks
29. • MLFlow is a great tool, but it is not a click & go solution always
• Feature compatibility is an important issue to keep in mind, your model is much more than just your
algorithm
• Having a single source of truth, makes managing models much more like managing traditional software
• Having a proper MLOps flow enables speed in getting ML to production