SlideShare a Scribd company logo
1 of 38
Build optimization
mechanisms in
GitLab and Docker
by Dmytro Patkovskyi
Dmytro Patkovskyi
Software Engineer at Grammarly
(Core Services team, Java / backend)
Past experience: Amazon, Ciklum, Grammatica.eu
About me
Our CI/CD infrastructure (2019)
GitLab Enterprise
(in-house, AWS)
JFrog Artifactory
(in-house, AWS)
Docker registry
and artifact storage
300+ repositories
in 19 groups
Number of runners
is set per group
AWS ECS
(on EC2 instances)
Deployments
● Reproducibility
● Simplicity
● Speed
Build goals
Clean build speed
Incremental build speed (incrementality)
Docker
Build optimizations available:
Version 18.09
● Layer cache
… and that’s it!
But is it used in your CI builds?
No, if you’re asking this question.
How to enable layer cache
on CI (GitLab)
+
Change this:
script:
- docker build
--build-arg VERSION=$VERSION
--tag $IMAGE:$VERSION .
- docker push $IMAGE:$VERSION
To this:
script:
- docker pull $IMAGE:cache || true
- docker build
--build-arg VERSION=$VERSION
--tag $IMAGE:$VERSION
--tag $IMAGE:cache
--cache-from $IMAGE:cache .
- docker push $IMAGE:$VERSION
- docker push $IMAGE:cache
Key: current image + current instruction.
Value: next layer.
Invalidation: cache-miss on one instruction invalidates cache for all
instructions below.
Layer cache in a few words
● Pull speed
● Build speed
● Push speed
● Storage space
Spend less money and time!
Why layer cache matters
Optimal layer structure
Changes rarely
...
...
Changes most frequently
Change frequency increases
Size & build time decreases
First layer
Last layer
...
Proper instruction order
Inefficient order:
ARG VERSION
COPY nodeserver /opt/nodeserver
ADD /distributions/project-$VERSION.tar /opt
RUN cd /opt/nodeserver && ./install.sh
takes 60s and
rarely changes
changes on every commit
changes on every
commit
rarely changes
Efficient order:
COPY nodeserver /opt/nodeserver
RUN cd /opt/nodeserver && ./install.sh
ARG VERSION
ADD /distributions/project-$VERSION.tar /opt
Result: 60s saved on each build
Chain install & cleanup cmds
Why?
Files created in some layer end up increasing image size even when you
delete them in another layer.
How?
RUN apt-get update && apt-get install ...
RUN rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install … && rm -rf /var/lib/apt/lists/*
Use .dockerignore
Why?
To avoid cache invalidations in ADD/COPY due to irrelevant file changes.
Also, smaller Docker context => faster build start.
How?
Add file masks of all irrelevant files (e.g., readme, IDE files) to .dockerignore.
Separate code layer from
dependency layer
Why?
Your compiled code is smaller and changes faster than dependencies.
How?
For Java: don’t put fat jars in your image. Use Google jib plugin or manually
extract dependencies into a separate layer.
Our experience:
70mb fat jar layer that changes on each commit (2s ECS pull) =>
200kb code layer that changes on each commit (100ms ECS pull).
Multi-stage Docker build
Why?
To avoid any build-time clutter in the final image
(reduce size, optimize layer structure).
How?
Next slides.
Multi-stage Docker build:
Dockerfile
FROM <build-time base image> as builder
# build & run tests
# …………...…….
# …………...…….
# …………...…….
# …………...…….
FROM <run-time base image>
COPY --from=builder <dependencies> ...
COPY --from=builder <compiled code> ...
ENTRYPOINT ...
How to enable layer cache
for multi-stage Docker build on CI
+
Run this for builder image
- docker pull $IMAGE:builder || true
- docker build
--build-arg VERSION=$VERSION
--tag $IMAGE:builder
--cache-from $IMAGE:builder
--target builder .
- docker push $IMAGE:builder
… and then this for final image
- docker pull $IMAGE:cache || true
- docker build
--build-arg VERSION=$VERSION
--tag $IMAGE:$VERSION
--tag $IMAGE:cache
--cache-from $IMAGE:builder .
--cache-from $IMAGE:cache .
- docker push $IMAGE:$VERSION
- docker push $IMAGE:cache
Alternative:
multi-stage CI build
FROM <run-time image>
ADD <dependencies> ...
ADD <compiled code> ...
ENTRYPOINT ...
artifacts from previous CI stage(s),
built & tested separately
Multi-stage Docker build
vs. multi-stage CI build
Multi-stage Docker build:
+ fits into any CI system
+ easy migration between CIs
+ easy to reproduce locally
- poor integration with CI
- hard to modularize
- long Dockerfiles
Multi-stage CI build: the exact opposite.
Docker checklist
● Use lightweight base images
● Check your instruction order
● Chain install & cleanup commands
● Use .dockerignore
● Split code layer from dependency layer
● Use multi-stage build (in Docker or CI)
Recommended tools: Dive, Jib (for Java projects)
https://github.com/wagoodman/dive
https://github.com/GoogleContainerTools/jib
GitLab
Optimization features:
● Artifacts
● Cache (local or shared)
● Persistent volumes
Version 12.4.0-ee
GitLab job executors
● SSH
● Shell
● VirtualBox
● Parallels
● Docker-machine
● Docker (used in Grammarly)
● Kubernetes
● Custom
covered in this talk
Syntax examples (.gitlab-ci.yml)
Artifacts Cache
Persistent volume
(/cache in this case)
artifacts:
paths:
- test-report
- distributive
expire_in: 1 week
when: always
cache:
key: somekey
paths:
- .gradle
script:
- mv myfile /cache
GitLab concepts
Git
commits
c06c4c91
31137606
GitLab
Pipeline for c06c4c91
Stage 1 Stage 2
Stage 3
Job A
Job B
Job C
Job D
Job E
Job F
Pipeline for 31137606
…...
Cache and
persistent volumes
Unlike artifacts,
can be passed
between pipelines
Use artifacts to avoid
work duplication in jobs
of a single pipeline
Anti-pattern #1: unintentional
artifact downloads
By default, a job downloads all artifacts from all previous stages of a
pipeline.
If your job doesn’t need any artifacts:
dependencies: []
If your job needs artifacts from jobs A and C:
dependencies:
- A
- C
Our experience: 70mb jar x 3 jobs ≈ 15 seconds saved
1. Shared
2. Local
The choice is made when setting-up runners.
cache: syntax in .gitlab-ci.yml remains the same.
Cache type choice
Shared cache + local persistent volume = best of both worlds.
How shared cache works
It’s very simple:
1. Download & extract zip from S3 / GCS based on cache:key.
2. Execute job scripts.
3. Pack files under cache:paths into a new zip & upload.
…maybe too simple?
● Not an rsync
● Transfer never skipped
Shared cache gotchas
The whole cache.zip is downloaded
and uploaded every time a job runs
Minor:
● No automatic cleanup of unused files
● Absolute paths to cached files are different across runners unless
you set $GIT_CLONE_PATH to a runner-independent value
● Dynamic storage
● Host-bound persistent storage
● Persistent volume claims (PVC)
● Host path volume
Persistent volume options
Docker executor
Kubernetes executor
Shared cache trade-off:
build time
vs.
transfer time
Local persistent volume
trade-off:
number of runners vs.
cache freshness
Fresh — for build #5, only cache produced by build #4 is fresh.
Stale — for build #5, caches produced by builds #1, #2, #3 are stale.
Freshness — for build #5, cache from #3 is more fresh than from #1.
Define “cache freshness”
Shared cache vs Persistent vol.
Shared cache Local persistent volume
Fresh on all runners.
Fresh only on one runner.
More runners => less freshness.
Bigger cache => longer transfer. No time penalty on size.
Anti-pattern #2: using shared
cache for dependencies
cache:
key: $CI_PROJECT_NAME
paths:
- .gradle/caches
Download + unzip + zip + upload time ≈ no benefit from caching
dependencies.
Use local persistent volume to cache library dependencies.
Our experience: 500mb cache ≈ 50 seconds saved
So, when do you use each option?
Artifacts Shared cache Persistent volume
Pass files between jobs
of a single pipeline to
avoid work repetition.
When fresh cache is
required for speed-up.
When cache is small.
When stale cache also
provides speed-up.
When cache is big.
Q&A
Maryna Veremenko — help & support with GitLab
Dima Shevchuk — help & support with GitLab
Sasha Marynych — motivation :-)
Special thanks!
Thank you!

More Related Content

What's hot

What's hot (20)

Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Kubernetes Security
Kubernetes SecurityKubernetes Security
Kubernetes Security
 
What is Docker
What is DockerWhat is Docker
What is Docker
 
Kubernetes 101
Kubernetes   101Kubernetes   101
Kubernetes 101
 
Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 
Introduction to Docker - IndiaOpsUG
Introduction to Docker - IndiaOpsUGIntroduction to Docker - IndiaOpsUG
Introduction to Docker - IndiaOpsUG
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
 
Docker Basics
Docker BasicsDocker Basics
Docker Basics
 
Smarter deployments with octopus deploy
Smarter deployments with octopus deploySmarter deployments with octopus deploy
Smarter deployments with octopus deploy
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Azure container instances
Azure container instancesAzure container instances
Azure container instances
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
Automate Building your VM Templates with Packer - CPAVMUG 2021-12-02
Automate Building your VM Templates with Packer - CPAVMUG 2021-12-02Automate Building your VM Templates with Packer - CPAVMUG 2021-12-02
Automate Building your VM Templates with Packer - CPAVMUG 2021-12-02
 
Principles of microservices XP Days Ukraine
Principles of microservices   XP Days UkrainePrinciples of microservices   XP Days Ukraine
Principles of microservices XP Days Ukraine
 
(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines(Declarative) Jenkins Pipelines
(Declarative) Jenkins Pipelines
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et Rancher
 
Docker 基礎介紹與實戰
Docker 基礎介紹與實戰Docker 基礎介紹與實戰
Docker 基礎介紹與實戰
 

Similar to Build optimization mechanisms in GitLab and Docker

PDXPortland - Dockerize Django
PDXPortland - Dockerize DjangoPDXPortland - Dockerize Django
PDXPortland - Dockerize Django
Hannes Hapke
 

Similar to Build optimization mechanisms in GitLab and Docker (20)

Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...
Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...
Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...
 
DCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development PipelineDCEU 18: Building Your Development Pipeline
DCEU 18: Building Your Development Pipeline
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned
 
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes][HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
[HKOSCon x COSCUP 2020][20200801][Ansible: From VM to Kubernetes]
 
Kubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battleKubernetes: training micro-dragons for a serious battle
Kubernetes: training micro-dragons for a serious battle
 
Api versioning w_docker_and_nginx
Api versioning w_docker_and_nginxApi versioning w_docker_and_nginx
Api versioning w_docker_and_nginx
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Dockerizing a Symfony2 application
Dockerizing a Symfony2 applicationDockerizing a Symfony2 application
Dockerizing a Symfony2 application
 
Introduction to Docker storage, volume and image
Introduction to Docker storage, volume and imageIntroduction to Docker storage, volume and image
Introduction to Docker storage, volume and image
 
Api Versioning with Docker and Nginx
Api Versioning with Docker and NginxApi Versioning with Docker and Nginx
Api Versioning with Docker and Nginx
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
PDXPortland - Dockerize Django
PDXPortland - Dockerize DjangoPDXPortland - Dockerize Django
PDXPortland - Dockerize Django
 
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UKStorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
StorageOS, Storage for Containers Shouldn't Be Annoying at Container Camp UK
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los AngelesDocker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los Angeles
 
DevOPS training - Day 2/2
DevOPS training - Day 2/2DevOPS training - Day 2/2
DevOPS training - Day 2/2
 
Let's make it flow ... one way
Let's make it flow ... one wayLet's make it flow ... one way
Let's make it flow ... one way
 
Continuous Deployment with Kubernetes, Docker and GitLab CI
Continuous Deployment with Kubernetes, Docker and GitLab CIContinuous Deployment with Kubernetes, Docker and GitLab CI
Continuous Deployment with Kubernetes, Docker and GitLab CI
 
Docker primer and tips
Docker primer and tipsDocker primer and tips
Docker primer and tips
 
Be a better developer with Docker (revision 3)
Be a better developer with Docker (revision 3)Be a better developer with Docker (revision 3)
Be a better developer with Docker (revision 3)
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

Build optimization mechanisms in GitLab and Docker

  • 1. Build optimization mechanisms in GitLab and Docker by Dmytro Patkovskyi
  • 2. Dmytro Patkovskyi Software Engineer at Grammarly (Core Services team, Java / backend) Past experience: Amazon, Ciklum, Grammatica.eu About me
  • 3. Our CI/CD infrastructure (2019) GitLab Enterprise (in-house, AWS) JFrog Artifactory (in-house, AWS) Docker registry and artifact storage 300+ repositories in 19 groups Number of runners is set per group AWS ECS (on EC2 instances) Deployments
  • 4. ● Reproducibility ● Simplicity ● Speed Build goals Clean build speed Incremental build speed (incrementality)
  • 5. Docker Build optimizations available: Version 18.09 ● Layer cache … and that’s it! But is it used in your CI builds? No, if you’re asking this question.
  • 6. How to enable layer cache on CI (GitLab) + Change this: script: - docker build --build-arg VERSION=$VERSION --tag $IMAGE:$VERSION . - docker push $IMAGE:$VERSION To this: script: - docker pull $IMAGE:cache || true - docker build --build-arg VERSION=$VERSION --tag $IMAGE:$VERSION --tag $IMAGE:cache --cache-from $IMAGE:cache . - docker push $IMAGE:$VERSION - docker push $IMAGE:cache
  • 7. Key: current image + current instruction. Value: next layer. Invalidation: cache-miss on one instruction invalidates cache for all instructions below. Layer cache in a few words
  • 8. ● Pull speed ● Build speed ● Push speed ● Storage space Spend less money and time! Why layer cache matters
  • 9. Optimal layer structure Changes rarely ... ... Changes most frequently Change frequency increases Size & build time decreases First layer Last layer ...
  • 10. Proper instruction order Inefficient order: ARG VERSION COPY nodeserver /opt/nodeserver ADD /distributions/project-$VERSION.tar /opt RUN cd /opt/nodeserver && ./install.sh takes 60s and rarely changes changes on every commit changes on every commit rarely changes Efficient order: COPY nodeserver /opt/nodeserver RUN cd /opt/nodeserver && ./install.sh ARG VERSION ADD /distributions/project-$VERSION.tar /opt Result: 60s saved on each build
  • 11. Chain install & cleanup cmds Why? Files created in some layer end up increasing image size even when you delete them in another layer. How? RUN apt-get update && apt-get install ... RUN rm -rf /var/lib/apt/lists/* RUN apt-get update && apt-get install … && rm -rf /var/lib/apt/lists/*
  • 12. Use .dockerignore Why? To avoid cache invalidations in ADD/COPY due to irrelevant file changes. Also, smaller Docker context => faster build start. How? Add file masks of all irrelevant files (e.g., readme, IDE files) to .dockerignore.
  • 13. Separate code layer from dependency layer Why? Your compiled code is smaller and changes faster than dependencies. How? For Java: don’t put fat jars in your image. Use Google jib plugin or manually extract dependencies into a separate layer. Our experience: 70mb fat jar layer that changes on each commit (2s ECS pull) => 200kb code layer that changes on each commit (100ms ECS pull).
  • 14. Multi-stage Docker build Why? To avoid any build-time clutter in the final image (reduce size, optimize layer structure). How? Next slides.
  • 15. Multi-stage Docker build: Dockerfile FROM <build-time base image> as builder # build & run tests # …………...……. # …………...……. # …………...……. # …………...……. FROM <run-time base image> COPY --from=builder <dependencies> ... COPY --from=builder <compiled code> ... ENTRYPOINT ...
  • 16. How to enable layer cache for multi-stage Docker build on CI + Run this for builder image - docker pull $IMAGE:builder || true - docker build --build-arg VERSION=$VERSION --tag $IMAGE:builder --cache-from $IMAGE:builder --target builder . - docker push $IMAGE:builder … and then this for final image - docker pull $IMAGE:cache || true - docker build --build-arg VERSION=$VERSION --tag $IMAGE:$VERSION --tag $IMAGE:cache --cache-from $IMAGE:builder . --cache-from $IMAGE:cache . - docker push $IMAGE:$VERSION - docker push $IMAGE:cache
  • 17. Alternative: multi-stage CI build FROM <run-time image> ADD <dependencies> ... ADD <compiled code> ... ENTRYPOINT ... artifacts from previous CI stage(s), built & tested separately
  • 18. Multi-stage Docker build vs. multi-stage CI build Multi-stage Docker build: + fits into any CI system + easy migration between CIs + easy to reproduce locally - poor integration with CI - hard to modularize - long Dockerfiles Multi-stage CI build: the exact opposite.
  • 19. Docker checklist ● Use lightweight base images ● Check your instruction order ● Chain install & cleanup commands ● Use .dockerignore ● Split code layer from dependency layer ● Use multi-stage build (in Docker or CI) Recommended tools: Dive, Jib (for Java projects) https://github.com/wagoodman/dive https://github.com/GoogleContainerTools/jib
  • 20. GitLab Optimization features: ● Artifacts ● Cache (local or shared) ● Persistent volumes Version 12.4.0-ee
  • 21. GitLab job executors ● SSH ● Shell ● VirtualBox ● Parallels ● Docker-machine ● Docker (used in Grammarly) ● Kubernetes ● Custom covered in this talk
  • 22. Syntax examples (.gitlab-ci.yml) Artifacts Cache Persistent volume (/cache in this case) artifacts: paths: - test-report - distributive expire_in: 1 week when: always cache: key: somekey paths: - .gradle script: - mv myfile /cache
  • 23. GitLab concepts Git commits c06c4c91 31137606 GitLab Pipeline for c06c4c91 Stage 1 Stage 2 Stage 3 Job A Job B Job C Job D Job E Job F Pipeline for 31137606 …... Cache and persistent volumes Unlike artifacts, can be passed between pipelines
  • 24. Use artifacts to avoid work duplication in jobs of a single pipeline
  • 25. Anti-pattern #1: unintentional artifact downloads By default, a job downloads all artifacts from all previous stages of a pipeline. If your job doesn’t need any artifacts: dependencies: [] If your job needs artifacts from jobs A and C: dependencies: - A - C Our experience: 70mb jar x 3 jobs ≈ 15 seconds saved
  • 26. 1. Shared 2. Local The choice is made when setting-up runners. cache: syntax in .gitlab-ci.yml remains the same. Cache type choice Shared cache + local persistent volume = best of both worlds.
  • 27. How shared cache works It’s very simple: 1. Download & extract zip from S3 / GCS based on cache:key. 2. Execute job scripts. 3. Pack files under cache:paths into a new zip & upload. …maybe too simple?
  • 28. ● Not an rsync ● Transfer never skipped Shared cache gotchas The whole cache.zip is downloaded and uploaded every time a job runs Minor: ● No automatic cleanup of unused files ● Absolute paths to cached files are different across runners unless you set $GIT_CLONE_PATH to a runner-independent value
  • 29. ● Dynamic storage ● Host-bound persistent storage ● Persistent volume claims (PVC) ● Host path volume Persistent volume options Docker executor Kubernetes executor
  • 30. Shared cache trade-off: build time vs. transfer time
  • 31. Local persistent volume trade-off: number of runners vs. cache freshness
  • 32. Fresh — for build #5, only cache produced by build #4 is fresh. Stale — for build #5, caches produced by builds #1, #2, #3 are stale. Freshness — for build #5, cache from #3 is more fresh than from #1. Define “cache freshness”
  • 33. Shared cache vs Persistent vol. Shared cache Local persistent volume Fresh on all runners. Fresh only on one runner. More runners => less freshness. Bigger cache => longer transfer. No time penalty on size.
  • 34. Anti-pattern #2: using shared cache for dependencies cache: key: $CI_PROJECT_NAME paths: - .gradle/caches Download + unzip + zip + upload time ≈ no benefit from caching dependencies. Use local persistent volume to cache library dependencies. Our experience: 500mb cache ≈ 50 seconds saved
  • 35. So, when do you use each option? Artifacts Shared cache Persistent volume Pass files between jobs of a single pipeline to avoid work repetition. When fresh cache is required for speed-up. When cache is small. When stale cache also provides speed-up. When cache is big.
  • 36. Q&A
  • 37. Maryna Veremenko — help & support with GitLab Dima Shevchuk — help & support with GitLab Sasha Marynych — motivation :-) Special thanks!

Editor's Notes

  1. Как появился доклад и для кого он
  2. И сразу же распространенная проблема которую я видел во многих наших проектах. Связана она с тем как Гитлаб ведет себя по умолчанию.
  3. Воспроизводимость (корректность), простота (понятность), скорость. Конфликт целей ?
  4. Command + Shift + V - Paste text without ruining the style Place images (or GIFs) on or in place of the green gradient rectangle
  5. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  6. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  7. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  8. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  9. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  10. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  11. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  12. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  13. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  14. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  15. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  16. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  17. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  18. Сам по себе Гитлаб не может ускорить билды но может сохранить файлы между ними. Что это за фичи, есть ли здесь какие-то конфликты, какую для чего использовать...
  19. И сразу же распространенная проблема которую я видел во многих наших проектах. Связана она с тем как Гитлаб ведет себя по умолчанию.
  20. Это не сравнение по краткости. Цель рассказа о Гитлабе — разобраться когда какой механизм использовать.
  21. Джобы — сердце этого механизма, где работа выполняется внутри bash скриптов которые вы пишете сами. Артефакты передаются строго внутри одного пайплайна. Кеш и волюмы могут передавать файлы между пайплайнами.
  22. Итак, мы подходим к самому интересному: какой вариант кеша для чего использовать.
  23. И сразу же распространенная проблема которую я видел во многих наших проектах. Связана она с тем как Гитлаб ведет себя по умолчанию.
  24. Выбор делается инфраструкторной командой при настройке раннеров. Почему локальный кеш перечеркнут? Отбросим третий вариант и останемся с двумя
  25. Дефолт — по проекту. Не используйте Per Pipeline, вместо этого есть артефакты. Хорошая практика: разделять per job, в таком случае каждой джобе нужно меньше скачивать.
  26. Как же работает шеред кеш? Очень просто: он просто скачивает архив, выполняет джобу и загружает новую версию архива. Замечаете чего здесь не хватает?
  27. Не хватает инкрементальности: чексум, рсинка. Также отсутствует автоматическая очистка. Существует несколько фич-реквестов для добавления этой функциональности.
  28. Не хватает инкрементальности: чексум, рсинка. Также отсутствует автоматическая очистка. Существует несколько фич-реквестов для добавления этой функциональности.
  29. Итак, мы подходим к самому интересному: какой вариант кеша для чего использовать.
  30. Use this slide to highlight one important idea like “ Less is more ” or “Ask for honest feedback”
  31. Не хватает инкрементальности: чексум, рсинка. Также отсутствует автоматическая очистка. Существует несколько фич-реквестов для добавления этой функциональности.
  32. Итак, что я здесь имею ввиду под fresh (свежим) и stale (несвежим) кешем. Представим, что ваша джоба ранится 10-ый раз. Свежим кешем является сгенеренный во время рана №9. Несвежим являются сгенеренные во время ранов 1-8. В некоторых случаях, кеши сгенеренные в 8 джобе не будут иметь вообще никакой пользы для 10ой. А в других — будут. Например, в случае когда кешем являются зависимости (сторонние библиотеки).
  33. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  34. Я буду использовать такой вот текст на черном фоне для всего синтаксиса из .gitlab-ci.yml файла. Это не все доступные директивы, а только основные.
  35. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  36. Command + Shift + V - Paste text without ruining the style Use no more than 5-6 points on one slide
  37. Final slide