SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
CI/CD with
Azure DevOps, Pre-Commit, and Azure Databricks
30-10-2019
A typical pipeline
Automate everything
• Deploy to production Efficiently & Reliably
• Allow everyone in the team to do so
• Smaller increments
• Roll-forward don’t Roll-back
2
Trigger
Version control
Test
Code
Build
Artifact
Deploy
Dev
Integration tests
Deploy
Prod
User facing
Measure
Capture performance
3
Today’s pipeline
Overall project structure
• src, containing the library
• input, data used while testing
• notebook, containing the application
• tests, for tests
4
Testing
Our approach
• Use Pre-Commit
• Apply Black, Flake8
• Run PySpark tests in a Docker container
5
• Checkout code
• Install requirements
• Apply linters
• Run unit-tests
• Publish test/coverage
Pre-Commit
Eg, solving the “Fixing lint issues” commit
• Framework for creating Git Hooks
• Eg, scripts that run on each commit
• Compare it to a local-CI
Pre-Commit
.pre-commit-config.yaml
• In our case
• run black/Flake8 on each commit
• run pytest on each push
repos:
- repo: https://github.com/psf/black
rev: 19.3b0
hooks:
- id: black
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: flake8
- id: check-merge-conflict
- repo: https://github.com/godatadriven/pre-commit-docker-pyspark
rev: master
hooks:
- id: pyspark-docker
name: Run tests
entry: /entrypoint.sh python setup.py test
language: docker
pass_filenames: false
stages: [push]
8
python setup.py test
setup.py setup.cfg conftest.py test_etl.py
docker
9
Testing PySpark
conftest.py create pytest fixture called spark
def test_load_df(spark):
df = load_df(spark, "input/data.csv")
assert df.count() == 891
assert df.filter(df.Name == "Sandstrom, Miss. Marguerite Rut").count() == 1
def test_fill_na(spark):
input_df = spark.createDataFrame(
[(None, None, None)], "Age: double, Cabin: string, Fare: double"
)
output_df = fill_na(input_df)
output = df_to_list_dict(output_df)
expected_output = [{"Age": -0.5, "Cabin": "N", "Fare": -0.5}]
assert output == expected_output
Test output
Integrates with Azure Devops
• Which test frequently fail
• Full stack traces of a failed test
• Code coverage
10
Building
Our approach
• Python wheel of library
• Modify notebook/version.py
• Create a build artifact of notebook
11
• Checkout code
• Build wheel
• Authenticate with Azure Devops Artifacts
• Push wheel
• Publish notebook folder as Build Artifact
Deployment
Our approach
• Copy version.py to the DEV workspace
• After a manual step
• Copy notebook/* to the Prod workspace
12
• Authenticate with Databricks cli
• Copy notebook/version.py to the DEV workspace
• Authenticate with Databricks cli
• Copy notebook/* to the PROD workspace
13
version.py
• A successful change to master results in a new version of the library
• Deploy that version to DEV
• and maybe at a later time to PROD
Azure DevOps
Pipeline
Azure DevOps
Artifacts
Dev
Notebook
Prod
Notebook
Azure Databricks
Version: 1.0.100
Version: 1.0.200
14
• On Dev only version.py is deployed by our CI/CD
• On Prod the whole notebook folder
• e.g. our application
• Using dbutils and version.py
• We can install a specific version of our library
dbutils.library
15
The complete pipeline
Run black, flake8 and
pytest using pre-commit
Upload wheel to DevOps
artifacts, export Notebook
folder with modified
version.py
Copy version.py to the
DEV workspace
Copy the Notebook folder
to the PROD workspace
Your Data Career
16
Check out the job opportunities
WE ARE HIRING
GoDataDriven.com/Careers

Más contenido relacionado

La actualidad más candente

Kubernetes Deployment Strategies
Kubernetes Deployment StrategiesKubernetes Deployment Strategies
Kubernetes Deployment StrategiesAbdennour TM
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 
Yale Jenkins Show and Tell
Yale Jenkins Show and TellYale Jenkins Show and Tell
Yale Jenkins Show and TellE. Camden Fisher
 
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...Edureka!
 
Continuous Integration & Continuous Delivery with GCP
Continuous Integration & Continuous Delivery with GCPContinuous Integration & Continuous Delivery with GCP
Continuous Integration & Continuous Delivery with GCPKAI CHU CHUNG
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes VMware Tanzu
 
Cloud Native Application
Cloud Native ApplicationCloud Native Application
Cloud Native ApplicationVMUG IT
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingPiotr Perzyna
 
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...Edureka!
 
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD  For BeginnersAzure DevOps CI/CD  For Beginners
Azure DevOps CI/CD For BeginnersRahul Nath
 
Continuous Integration/Deployment with Gitlab CI
Continuous Integration/Deployment with Gitlab CIContinuous Integration/Deployment with Gitlab CI
Continuous Integration/Deployment with Gitlab CIDavid Hahn
 

La actualidad más candente (20)

DevOps seminar ppt
DevOps seminar ppt DevOps seminar ppt
DevOps seminar ppt
 
Kubernetes Deployment Strategies
Kubernetes Deployment StrategiesKubernetes Deployment Strategies
Kubernetes Deployment Strategies
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
Yale Jenkins Show and Tell
Yale Jenkins Show and TellYale Jenkins Show and Tell
Yale Jenkins Show and Tell
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
 
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
 
Continuous Integration & Continuous Delivery with GCP
Continuous Integration & Continuous Delivery with GCPContinuous Integration & Continuous Delivery with GCP
Continuous Integration & Continuous Delivery with GCP
 
Kubernetes CI/CD with Helm
Kubernetes CI/CD with HelmKubernetes CI/CD with Helm
Kubernetes CI/CD with Helm
 
Jenkins CI
Jenkins CIJenkins CI
Jenkins CI
 
Introduction to CI/CD
Introduction to CI/CDIntroduction to CI/CD
Introduction to CI/CD
 
Azure DevOps in Action
Azure DevOps in ActionAzure DevOps in Action
Azure DevOps in Action
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes
 
Cloud Native Application
Cloud Native ApplicationCloud Native Application
Cloud Native Application
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals Training
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
Jenkins Pipeline Tutorial | Continuous Delivery Pipeline Using Jenkins | DevO...
 
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD  For BeginnersAzure DevOps CI/CD  For Beginners
Azure DevOps CI/CD For Beginners
 
Continuous Integration/Deployment with Gitlab CI
Continuous Integration/Deployment with Gitlab CIContinuous Integration/Deployment with Gitlab CI
Continuous Integration/Deployment with Gitlab CI
 

Similar a CI/CD with Azure DevOps and Azure Databricks

Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERIndrajit Poddar
 
Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @jSession#5Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @jSession#5Marcin Grzejszczak
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development PipelineGlobalLogic Ukraine
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker budMandi Walls
 
Continuous Deployment to the cloud
Continuous Deployment to the cloudContinuous Deployment to the cloud
Continuous Deployment to the cloudVMware Tanzu
 
CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...
CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...
CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...E. Camden Fisher
 
Automated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAutomated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAlberto Molina Coballes
 
Continuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneContinuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneciberkleid
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochranedotCloud
 
Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud @DevoxxPL 2017 Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud @DevoxxPL 2017 Marcin Grzejszczak
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for RealistsOracle Developers
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realistsKarthik Gaekwad
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Dockernklmish
 
Docker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshDocker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshCodefresh
 
Using Grunt with Drupal
Using Grunt with DrupalUsing Grunt with Drupal
Using Grunt with Drupalarithmetric
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2Docker, Inc.
 
DCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development PipelineDCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development PipelineDocker, Inc.
 

Similar a CI/CD with Azure DevOps and Azure Databricks (20)

Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWERContinuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER
 
Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @jSession#5Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @jSession#5
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Continuous Deployment to the cloud
Continuous Deployment to the cloudContinuous Deployment to the cloud
Continuous Deployment to the cloud
 
CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...
CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...
CT Software Developers Meetup: Using Docker and Vagrant Within A GitHub Pull ...
 
Automated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAutomated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. Ansible
 
Continuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOneContinuous Deployment of your Application @SpringOne
Continuous Deployment of your Application @SpringOne
 
Django and Docker
Django and DockerDjango and Docker
Django and Docker
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken Cochrane
 
Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud @DevoxxPL 2017 Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud @DevoxxPL 2017
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for Realists
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realists
 
CI/CD on AWS
CI/CD on AWSCI/CD on AWS
CI/CD on AWS
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
Docker based-Pipelines with Codefresh
Docker based-Pipelines with CodefreshDocker based-Pipelines with Codefresh
Docker based-Pipelines with Codefresh
 
Using Grunt with Drupal
Using Grunt with DrupalUsing Grunt with Drupal
Using Grunt with Drupal
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
 
DCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development PipelineDCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development Pipeline
 

Más de GoDataDriven

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogGoDataDriven
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenGoDataDriven
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerGoDataDriven
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectGoDataDriven
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...GoDataDriven
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022GoDataDriven
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022GoDataDriven
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanGoDataDriven
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesGoDataDriven
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...GoDataDriven
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...GoDataDriven
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofGoDataDriven
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 

Más de GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Último

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 

Último (20)

Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 

CI/CD with Azure DevOps and Azure Databricks

  • 1. CI/CD with Azure DevOps, Pre-Commit, and Azure Databricks 30-10-2019
  • 2. A typical pipeline Automate everything • Deploy to production Efficiently & Reliably • Allow everyone in the team to do so • Smaller increments • Roll-forward don’t Roll-back 2 Trigger Version control Test Code Build Artifact Deploy Dev Integration tests Deploy Prod User facing Measure Capture performance
  • 4. Overall project structure • src, containing the library • input, data used while testing • notebook, containing the application • tests, for tests 4
  • 5. Testing Our approach • Use Pre-Commit • Apply Black, Flake8 • Run PySpark tests in a Docker container 5 • Checkout code • Install requirements • Apply linters • Run unit-tests • Publish test/coverage
  • 6. Pre-Commit Eg, solving the “Fixing lint issues” commit • Framework for creating Git Hooks • Eg, scripts that run on each commit • Compare it to a local-CI
  • 7. Pre-Commit .pre-commit-config.yaml • In our case • run black/Flake8 on each commit • run pytest on each push repos: - repo: https://github.com/psf/black rev: 19.3b0 hooks: - id: black - repo: https://github.com/pre-commit/pre-commit-hooks rev: v2.3.0 hooks: - id: flake8 - id: check-merge-conflict - repo: https://github.com/godatadriven/pre-commit-docker-pyspark rev: master hooks: - id: pyspark-docker name: Run tests entry: /entrypoint.sh python setup.py test language: docker pass_filenames: false stages: [push]
  • 8. 8 python setup.py test setup.py setup.cfg conftest.py test_etl.py docker
  • 9. 9 Testing PySpark conftest.py create pytest fixture called spark def test_load_df(spark): df = load_df(spark, "input/data.csv") assert df.count() == 891 assert df.filter(df.Name == "Sandstrom, Miss. Marguerite Rut").count() == 1 def test_fill_na(spark): input_df = spark.createDataFrame( [(None, None, None)], "Age: double, Cabin: string, Fare: double" ) output_df = fill_na(input_df) output = df_to_list_dict(output_df) expected_output = [{"Age": -0.5, "Cabin": "N", "Fare": -0.5}] assert output == expected_output
  • 10. Test output Integrates with Azure Devops • Which test frequently fail • Full stack traces of a failed test • Code coverage 10
  • 11. Building Our approach • Python wheel of library • Modify notebook/version.py • Create a build artifact of notebook 11 • Checkout code • Build wheel • Authenticate with Azure Devops Artifacts • Push wheel • Publish notebook folder as Build Artifact
  • 12. Deployment Our approach • Copy version.py to the DEV workspace • After a manual step • Copy notebook/* to the Prod workspace 12 • Authenticate with Databricks cli • Copy notebook/version.py to the DEV workspace • Authenticate with Databricks cli • Copy notebook/* to the PROD workspace
  • 13. 13 version.py • A successful change to master results in a new version of the library • Deploy that version to DEV • and maybe at a later time to PROD Azure DevOps Pipeline Azure DevOps Artifacts Dev Notebook Prod Notebook Azure Databricks Version: 1.0.100 Version: 1.0.200
  • 14. 14 • On Dev only version.py is deployed by our CI/CD • On Prod the whole notebook folder • e.g. our application • Using dbutils and version.py • We can install a specific version of our library dbutils.library
  • 15. 15 The complete pipeline Run black, flake8 and pytest using pre-commit Upload wheel to DevOps artifacts, export Notebook folder with modified version.py Copy version.py to the DEV workspace Copy the Notebook folder to the PROD workspace
  • 16. Your Data Career 16 Check out the job opportunities WE ARE HIRING GoDataDriven.com/Careers