Data science teams have different levels of maturity and they need to be equipped with the right tools and infrastructure to make them more agile and ready. Here, I will be discussing a combination of open source tools and cloud managed services that can go hand-by-hand and grow with your data science teams needs as they mature.
5. Luis Vaquero
- PhD in Cardiology, MSc Pharmacology
- PhD in Distrib Systems, MSc EE, BSc Electronics
- Reader in Data Systems at Uni. of Bristol, UK
- Data at Telefonica/O2, Hewlett-Packard Labs, HPE,
Dyson
- Director of Data at GoCo Group
6. In the beginning
● Small data
● Simple algorithms (regression)
● No tests, no CI, no
infrastructure, no monitoring
23. Team b4 Infrastructure: The Notebooks Manifesto
1. Follow established software development best practices
2. Version control and Continuous Integration/Deployment (CI/CD)
3. Parameterised Notebooks
4. Log All Experiments Automatically -- Monitor
5. Sharing is Caring
24. Software Best Practices: Single Responsibility
#export annotation in our notebook cells to export them into a
single module that can be reused later
25. Software Best Practices: Test
#test annotation in our notebook cells to run them and assess
their output
27. Team b4 Infrastructure: The Notebooks Manifesto
1. Follow established software development best practices
2. Version control and Continuous Integration/Deployment (CI/CD)
3. Parameterised Notebooks
4. Log All Experiments Automatically -- Monitor
5. Sharing is Caring
28. Version Control and CI
Dev Kubernetes Cluster
Kubeflow GCS
Save
Trained
model
29. Version Control and CI
Dev Kubernetes Cluster
Kubeflow
merge
review
GCS
Save
Trained
model
push pull
Git repository
30.
31. Version Control and CI
Dev Kubernetes Cluster
Kubeflow
merge
review
GCS
Save
Trained
model
push pull
Git repository
Build System
trigger
32. Version Control and CI
Dev Kubernetes Cluster
Kubeflow
merge
review
GCS
Save
Trained
model
push pull
Git repository
Build System
trigger
Container Registry
publish
33. Version Control and CI
https://medium.com/kubeflow/automating-jupyter-notebook-deployments-to-kubeflow-pipelines-with-kale-a4ede38bea1f
34. Version Control and CI
https://medium.com/kubeflow/automating-jupyter-notebook-deployments-to-kubeflow-pipelines-with-kale-a4ede38bea1f
35. Version Control and CI
Dev Kubernetes Cluster
Kubeflow
merge
review
GCS
Save
Trained
model
push pull
Git repository
Build System
trigger
Container Registry
publish
Local Notebook
Load
Trained
model
36. Version Control and CI
Dev Kubernetes Cluster
Kubeflow
merge
review
GCS
Save
Trained
model
push pull
Git repository
Build System
trigger
Container Registry
publish
Local Notebook
Load
Trained
model
push
37. Version Control and CI
Dev Kubernetes Cluster
Kubeflow
merge
review
GCS
Save
Trained
model
push pull
Git repository
Build System
trigger
Container Registry
publish
Local Notebook
Load
Trained
model
push
Prod Kubernetes Cluster
deploy
39. Team b4 Infrastructure: The Notebooks Manifesto
1. Follow established software development best practices
2. Version control and Continuous Integration/Deployment (CI/CD)
3. Parameterised Notebooks
4. Log All Experiments Automatically -- Monitor
5. Sharing is Caring
40. Parametrised Notebooks
Tweak and Run
● Analysts who can
tweak code
● Use dedicated
jupyterhub server
● Notebook templates
Visual results
● Analysis who need to
see results
● Use Voila to run
automatically
Auto: Test and Exec
● Tag cells as #test
and let Papermill to
evaluate results of
the test output cells
● Run Notebooks “as a
remote script”
42. Team b4 Infrastructure: The Notebooks Manifesto
1. Follow established software development best practices
2. Version control and Continuous Integration/Deployment (CI/CD)
3. Parameterised Notebooks
4. Log All Experiments Automatically -- Monitor
5. Sharing is Caring
45. Team b4 Infrastructure: The Notebooks Manifesto
1. Follow established software development best practices
2. Version control and Continuous Integration/Deployment (CI/CD)
3. Parameterised Notebooks
4. Log All Experiments Automatically -- Monitor
5. Sharing is Caring