#HablemosDeTestingDay - José Castillo: Estrategia de QA en un contexto de DevOps
TDC2021-fn-serverless.pptx
1. Arquitecturas Resilientes en OCI,
DevOps no solo es automatizar
Carlos Zela Bueno
Innovación con datos en la nube
2. Carlos Zela Bueno
Senior Cloud Engineer – Oracle LAD
@c_zela
czelabueno
c-zela-bueno-digital-architect
“Construye todo, córrelo donde sea y gasta el menos tiempo posible haciéndolo”
4. SLA (Service Level Agreement)
SLO (Service Level Objective)
RPO (Recovery Point Objective)
&
RTO (Recovery Time Objective)
https://www.oracle.com/assets/paas-iaas-pub-cld-srvs-pillar-4021422.pdf
Conceptos
5. ¿Como diseñamos la resiliencia?
Application
Back Up/
Restore
Application Application
Backups
Activo-Pasivo
Application Application
Activo-Activo
Complejidad
Costo $$
RTO >> 0
RTO > 0
RTO = 0
- Prevención
- Cumplimiento,
legal
- Corrupción
datos
- Tolerancia a
offline
- Fallas no
estimadas.
- No requieres
HA
- Apps de misión
critical
- Core
- Apps globales
- > 3 9’s
Reliability
6. Oracle Cloud Infrastructure Global Footprint
Santiago
San Jose
Toronto
Phoenix
Chicago
Montreal
Ashburn
Sao Paulo
London
Milan
Saudi 2
Jeddah
Amsterdam
Stockholm
Zurich
Johannesburg
Israel 2
Abu Dhabi
Dubai
Mumbai
Hyderabad
Singapore
Seoul
Chuncheon
Osaka
Tokyo
Melbourne
Sydney
Vinhedo
Newport
Frankfurt
November 2021
34 regions; 10 planned by end of 2022
8Azure Interconnect Regions Microsoft Interconnect Azure
Commercial
Commercial Planned
Government
Marseille
Jerusalem
France 2
Spain
Chile 2
Colombia
Mexico
7. DNS
Web Server Web Server
Site 1 Site 2
Availability Domains > 99.99
Regions > 99.99
Activo-Pasivo
- IaC deploy secundary site
- Configuración persistente
- Manual o automático
- Redundancia solo en datos
Alarms
Healthcheck
Resource
Manager
Monitoring
RTO > 0 DNS
Traffic Manager
Web Server Web Server
Site 1 Site 2
Availability Domains > 99.99
Regions > 99.999
Activo-Activo
- Traffic Rules (priority, affinity, size)
- Health Check frequency
- Automático siempre
- Redundancia toda la aplicación
Alarms
Healthcheck
Resource
Manager
Monitoring
RTO = 0
Geo-Repl
8. SLAs
Web Server
Web Server Oracle DB
Web Server
Container Engine
For Kubernetes Oracle DB
Web Server
Container Engine
For Kubernetes Oracle DB
API Gateway
99.95%
Compute
AD-redundant: > 99.99%
FD-redundant: > 99.95%
Single Instance: > 99.9%
99.9%
99.9%
99.9%
99.9%
99.95%
99.95%
99.95%
99.99%
99.99%
SLA Individual
SLACombinado
SLACombinado
SLACombinado
99.9%
99.85%
99.84%
99.75%
Arquitectura Lineal
9. SLAs
Web Server
Container Engine
For Kubernetes Oracle DB
API Gateway
99.95%
Compute
AD-redundant: > 99.99%
FD-redundant: > 99.95%
Single Instance: > 99.9%
99.9%
99.95%
99.99%
SLACompuesto:
100%-[(100%-{SLA1})*(100%-{SLA2})]
Arquitectura Resiliente
Cache
99.95%
Functions
XOR XOR
99.5%
99.95%
99.9% 99.99995%
Backend Data
99.999975% 99.85%
11. ¿Por qué es importante trabajar en la
resiliencia y confiabilidad cuando
desarrollamos aplicaciones?
• Porque nos da el espectro total de como funciona
nuestra aplicación.
• Usuario contento… Cliente contento.
• Nos ayuda a implantar un sistema de medición
para mejorar.
• Nos permite entregar software mas confiable y
rápido estandarizando procesos comunes para
soportarlo.
• Nos permite guardar la consistencia entre:
funcionalidad vs estabilidad.
• Nos permite implantar un ambiente colaborativo
en función de una meta en común entre
operaciones y desarrollo
12. ¿DevOps o DevOpsss?
En términos simples, es la cultura que adopta la empresa para simplificar la experiencia del
desarrollador con la automatización de la entrega de software al aumentar la agilidad, la seguridad
y cumplir con la continuidad operativa.
15. Automation
Implement DevOps
Processes and workflows
Increase reliability and feature
velocity by automating your
approval workflows, tests and
deployment procedures. Rollback
automatically when errors are
encountered.
Secure
Take advantage of Cloud
Security
Leverage security features by
default like IAM, Cloud Guard,
Security Zones.
Governance
End to end Visibility
Fully integrated with OCI
observability and governance.
Track issues in production
deployments back to commits.
OCI DevOps Platform – Benefits
18. Integrated
Connect your CI platform
Jenkins plugin to run a
deployment from your Jenkins job.
Integrate with Gitlab, GitHub.
Release Strategies
Reduce downtime, faster
recovery
Blue/Green deployments,
Canary stage – minimize
downtime and increase confidence
in your deployment
Deployment Pipelines – Capabilities
Rollback
Recover from errors
Automatic or manual rollback of
a deployment stage
RPO: Ultimo punto objetivo de recuperación disponible
RTO: Tiempo total que transcurre para recuperar el sistemaSLO: Son los objetivos medible que se traza cumplir el equipo de producto. Utiliza uno o mas SLI en niveles de cumplimiento con limites superior e inferior. Los SLOs son trabajados entre el SRE y el PO con el fin de asegurar que los servicios corran en un nivel apropiado de confiabilidad para los usuario/clientes.
SLA: Es un acuerdo comercial entre cliente y proveedor y deberia estar basado en los SLOs. Es sacado del tintero por los abogados. Evalua el nivel de riesgo financiero y determinan cual es el nivel de servicio que deben ofrecer. Nada tecnico.
Deben diseñar soluciones en cloud pensando siempre en que algo va a fallar
Hacer el ejemplo con las garantías o los seguros vehiculares.
Resiliencia: Es la capacidad que tiene un producto software para reponerse ante situaciones adversas o de caos. La resiliencia tiene que ser considerada desde la arquitectura en la etapa de diseño y en la implementación con patrones de software de tolerancia a fallos como circuit breaker, retry, rate limit, etc. En términos de infraestructura es aplicado distribuyendo Data Centers redundantes y automatizados.Reliability
La confiabilidad del software se define como la probabilidad de que el software funcione sin fallas durante un período de tiempo específico en un entorno específico.Comúnmente es confundido por disponibilidad. Sin embargo, confiabilidad es un término más amplio y apropiado para usar porque un producto podría estar disponible (en funcionamiento) pero el sitio no tiene mecanismos de recuperación y auto-curación en caso de fallas.
El SLA combinado es el producto del SLA individual de cada componente de la arquitectura y viene de la probabilidad de ocurrencia de al menos uno.
El SLA compuesto es el producto del SLA individual de cada componente de la arquitectura y la probabilidad de caída de uno u otro componente
There are several trends pushing business—across all industries—toward the cloud. For most organizations, the current way of doing business might not deliver the agility to grow, or may not provide the platform or flexibility to compete. The explosion of data created by an increasing number of digital businesses is pushing the cost and complexity of data center storage to new levels—demanding new skills and analytics tools from IT.
Modern cloud solutions help companies meet the challenges of the digital age. Instead of managing their IT, organizations have the ability to respond quickly to a more fast-paced and complex business landscape.
DevOps
The practice of continually improving how teams develop software
CI/CD^2
Continuous Integration – build and test the latest changes in a commit to your code repo
Continuous Delivery – deliver built artifacts to a repository and an environment
Continuous Deployment – release the artifacts to an environment and send production traffic to the new artifacts
When we talk about DevOps as a practice and culture, we can look at where teams are in their journey, how mature they are in their process of delivering software to their customers
These core concepts come from research that the DevOps Research Assessment (Nicole Forsgren, Jez Humble, Gene Kim) group started in 2017, identifying the practices of high performing teams, and I’m using their categorization. The wrote a book, Accelerate that I recommend! https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339
They surveyed organizations and measured
Release frequency (how often are features released?)
Rate of change driven failures (does releasing a change lead to a failure in production?)
Time to recovery (how quickly can a team recover from an error)
Lead time for change (how long does it take to run the delivery)
What they found is that there isn’t a tradeoff between the pace and tempo of delivery and stability. What they found is that there is a correlation between high performing teams, and organizational success (revenue grown, market share). Software delivery performance enables organizations to be more experimental, releasing changes more rapidly enables teams to rapidly iterate, test features (A/B, etc.), and rapidly meet their customer needs
Automating the software delivery process is a key capability in order to achieve high performance – speed and stability
Future security growth:
To setup OCI DevOps – secure by default
Today the problems are:
Systems too complex
Tools that don’t work together
Automation w/ native integrations w/ OCI
Governance to ensure controls
CI/CD Security – built in OCI security integrations for developers to secure their software supply chain
Identify vulnerabilities with Artifact Scanning for Git repos and Containers
Ensure chain-of-trust with Artifact Verification
Enforce compliance and security policies before a deployment with CI/CD Pipeline security
Ok, so what is a Deployment Pipeline to automate deliver to OCI Platforms? Let me show you an example CD pipeline -> delivering and testing in environments before the new release gets to production
The deployment will go through these stages, orchestrated by the OCI DevOps service:
This team has a dev environment to stage their most recently built changes
Once their new artifacts are on DEV then they run a test suite with their Dev environment, Dev database
If the test suite passes, then they promote the artifacts to their staging environment
After Staging, they run a Canary test, send a small amount of production traffic to the new artifacts, and then monitor their metrics
Then a pause, since for this customer reviewing their metrics is a manual step, they have an approval gate
If the metrics review passes, then they release to Production!
Using the OCI DevOps service you can automate as much as you want to, with a Deployment Pipeline
* Automated runs without intervention
Semi-automated has an approval stage for offline checks
Deployment means on each commit to the main branch, choice of the customer depending on their capabilities. Could be Continuous Delivery if each commit isn’t sent automatically to production
Integrate with your existing CI platform – Jenkins, Gitlab, Github
Release strategies: Canary, Blue/Green
Deployment history – audit of what’s released
Rollback – automatic or manual
Also integrated, for multi-cloud deployments we support the OCI driver for Spinnaker.