SlideShare una empresa de Scribd logo
1 de 69
Descargar para leer sin conexión
Learn, measure, and build using architectural best practices
Rick Hwang
Nov 2017
AWS Well-Architected
1
https://aws.amazon.com/blogs/aws/are-you-well-architected/
2
2015/10/02
Jimi Hendrix - Are you experienced
3
https://aws.amazon.com/blogs/aws/well-architected-working-backward-to-play-it-forward/2016/11/23
https://aws.amazon.com/architecture/well-architected/
4
5
General Design Principles
6
● Stop guessing your capacity needs
○ Eliminate guessing about your infrastructure capacity needs.
○ You can use as much or as little capacity as you need, and scale up and down automatically.
○ With cloud computing, these problems can go away.
● Test systems at production scale
○ In the cloud, you can create a production-scale test environment on demand, complete your
testing
○ Pay for the test environment when it’s running.
● Automate to make architectural experimentation eastier
○ Automation allows you to create and replicate your systems at low cost and avoid the expense
of manual effort.
○ You can track changes to your automation, audit the impact, and revert to previous
parameters when necessary
General Design Principles
7
● Allow for evolutionary architectures:
○ In a traditional environment, architectural decisions are often implemented as static, one-time
events, with a few major versions of a system during its lifetime.
○ As a business and its context continue to change, these initial decisions might hinder the
system’s ability to deliver changing business requirements.
○ In the cloud, the capability to automate and test on demand lowers the risk of impact from
design changes. This allows systems to evolve over time so that businesses can take
advantage of innovations as a standard practice.
General Design Principles
8
● Drive architectures using data:
○ In the cloud you can collect data on how your architectural choices affect the behavior of your
workload.
○ This lets you make fact-based decisions on how to improve your workload.
○ Your cloud infrastructure is code, so you can use that data to inform your architecture
choices and improvements over time.
General Design Principles
9
● Improve through game days:
○ Test how your architecture and processes perform by regularly scheduling game days to
simulate events in production.
○ This will help you understand where improvements can be made and can help develop
organizational experience in dealing with events.
General Design Principles
補充:
1. 維運單位新人教育訓練
2. CI / CD / DevOps
10
11
The Five Pillars of
the Well-Architected Framework
12
13
14
15
Operational Excellence
includes the ability to run and monitor systems to deliver
business value and to continually improve supporting
processes and procedures.
16
Design Principles
17
Perform operations as code:
● In the cloud, you can apply the same engineering discipline that you use for application code to
your entire environment.
● You can define your entire workload (applications, infrastructure, etc.) as code and update it with
code.
● You can script your operations procedures and automate their execution by triggering them in
response to events.
● By performing operations as code, you limit human error and enable consistent responses to
events.
補充:
1. `as Code` => 用 工程 方法
Annotate documentation:
● In an on-premises environment, documentation is created by hand (手作的), used by people, and
hard to keep in sync with the pace of change.
● In the cloud, you can automate the creation of documentation after every build (or automatically
annotate hand-crafted documentation).
● Annotated documentation can be used by people and systems.
● Use annotations as an input to your operations code.
Design Principles
18
Make frequent, small, reversible changes:
● Design workloads to allow components to be updated regularly.
● Make changes in small increments that can be reversed if they fail (without affecting customers
when possible).
Design Principles
19
Introduction to DevOps on AWS
部署前的準備工作?
如果跟十個服務有關係,那部署前準備工作是?
請問你的部屬可以 rollback?
一天可以部署一萬次代表什麼?
20
Refine operations procedures frequently:
● As you use operations procedures, look for opportunities to improve them.
● As you evolve (推演) your workload, evolve your procedures appropriately.
● Set up regular game days to review and validate that all procedures are effective and that teams are
familiar with them.
Design Principles
21
Anticipate failure
● Perform “pre-mortem” exercises to identify potential sources of failure so that they can be removed
or mitigated (緩解).
● Test your failure scenarios and validate your understanding of their impact.
● Test your response procedures to ensure that they are effective and that teams are familiar with
their execution.
● Set up regular game days to test workloads and team responses to simulated events.
Design Principles
22
補充:
1. Design for failure
2. SRE CH13: Things break; that’s life.
Chaos Engineering
● Chaos: 混屯工程
● Netflix 提出的概念,Chaos Monkey
● 任意破壞基礎設施,系統能夠自動恢復
● Resilience as a Service (恢復能力)
23
推薦閱讀:Site Reliability Engineering
1. SRE CH13 - Emergency Response
2. SRE CH14 - Managing Incidents
3. SRE CH15 - Learning from Failure
4. 警急事件 by Rick
Learn from all operational failures:
● Drive improvement through lessons learned from all operational events and failures.
● Share what is learned across teams and through the entire organization
Design Principles
24
Best Practices
● OPS 1: What factors drive your operational priorities?
● OPS 2: How do you design your workload to enable operability?
● OPS 3: How do you know that you are ready to support a workload?
● OPS 4: What factors drive your understanding of operational health?
● OPS 5: How do you manage operational events?
● OPS 6: How do you evolve (推演) operations?
25
26
Security(是動詞、是攻擊)
includes the ability to protect information, systems, and assets while delivering
business value through risk assessments and mitigation strategies.
27
● Implement a strong identity foundation
● Enable traceability
● Apply security at all layers
● Automate security best practices
● Protect data in transit and at rest
● Prepare for security events
Design Principles
28
補充:AWS IAM
Implement a strong identity foundation:
● Implement the principle of least privilege and enforce separation of duties with appropriate
authorization for each interaction with your AWS resources.
● Centralize privilege management and reduce or even eliminate reliance on long term credentials.
Design Principles
29
Enable traceability:
● Monitor, alert, and audit actions and changes to your environment in real time.
● Integrate logs and metrics with systems to automatically respond and take action.
Design Principles
30
補充:AWS CloudWatch, CloudTrial, VPC Flow
Log
Apply security at all layers:
● Rather than just focusing on protecting a single outer layer, apply a defense-in-depth approach with
other security controls.
● Apply to all layers, for example, edge network, virtual private cloud (VPC), subnet, load balancer,
every instance, operating system, and application.
Design Principles
31
Automate security best practices:
● Automated software-based security mechanisms improve your ability to securely scale more rapidly
and cost effectively.
● Create secure architectures, including the implementation of controls that are defined and managed
as code in version-controlled templates.
Design Principles
32
Protect data in transit and at rest:
● Classify your data into sensitivity levels and use mechanisms, such as encryption and tokenization
where appropriate.
● Reduce or eliminate direct human access to data to reduce risk of loss or modification.
Design Principles
33
Prepare for security events:
● Prepare for an incident by having an incident management process that aligns to your organizational
requirements.
● Run incident response simulations and use tools with automation to increase your speed for
detection, investigation, and recovery.
Design Principles
34
● Identity and Access Management
● Detective Controls
● Infrastructure Protection
● Data Protection
● Incident Response
Definition
35
Best Practice
SEC 1: How are you protecting access to and use of the AWS account root user credentials?
SEC 2: How are you defining roles and responsibilities of system users to control human access to the AWS Management
Console and API?
SEC 3: How are you limiting automated access to AWS resources (for example, applications, scripts, and/or third-party
tools or services)?
SEC 4: How are you capturing and analyzing logs?
SEC 5: How are you enforcing network and host-level boundary protection?
SEC 6: How are you leveraging AWS service-level security features?
SEC 7: How are you protecting the integrity of the operating system?
SEC 8: How are you classifying your data?
SEC 9: How are you encrypting and protecting your data at rest?
SEC 10: How are you managing keys?
SEC 11: How are you encrypting and protecting your data in transit?
SEC 12: How do you ensure that you have the appropriate incident response?
36
37
Reliability
includes the ability of a system to recover from infrastructure or service
disruptions (中斷), dynamically acquire computing resources to meet
demand, and mitigate disruptions (減輕中斷) such as misconfigurations or
transient network issues.
38
Design Principles
1. Test recovery procedures
2. Automatically recover from failure
3. Use horizontal scalability to increase system availability
4. Automatically add/remove resources as needed to avoid capacity saturation
5. Manage change in automation
39
Test Recovery Procedures
1. In an on-premises environment, testing is often conducted to prove the system works in a particular scenario.
2. Testing is not typically used to validate recovery strategies.
3. In the cloud, you can test how your system fails, and you can validate your recovery procedures.
4. You can use automation to simulate different failures or to recreate scenarios that led to failures before.
5. This exposes failure pathways that you can test and rectify before a real failure scenario, reducing the risk of
components failing that have not been tested before.
40
Automatically recover from failure
● By monitoring a system for key performance indicators (KPIs), you can trigger automation when a
threshold is breached.
● This allows for automatic notification and tracking of failures, and for automated recovery processes
that work around or repair the failure.
● With more sophisticated automation, it’s possible to anticipate and remediate failures before they
occur
41
Scale horizontally to increase aggregate system availability
Replace one large resource with multiple small resources to reduce the impact of a single failure on the
overall system.
Distribute requests across multiple, smaller resources to ensure that they don’t share a common point of
failure.
42
Stop guessing capacity
A common cause of failure in on-premises systems is resource saturation, when the demands placed on a
system exceed the capacity of that system (this is often the objective of denial of service attacks).
In the cloud, you can monitor demand and system utilization, and automate the addition or removal of
resources to maintain the optimal level to satisfy demand without over- or under-provisioning.
43
Manage change in automation
Changes to your infrastructure should be done using automation.
The changes that need to be managed are changes to the automation.
44
Definition
● Foundations
● Change Management
● Failure Management
45
Best Practice
1. REL 1: How are you managing AWS service limits for your accounts?
2. REL 2: How are you planning your network topology on AWS?
3. REL 3: How does your system adapt to changes in demand?
4. REL 4: How are you monitoring AWS resources?
5. REL 5: How are you executing change?
6. REL 6: How are you backing up your data?
7. REL 7: How does your system withstand component failures?
8. REL 8: How are you testing your resiliency?
9. REL 9: How are you planning for disaster recovery?
46
47
Performance Efficiency
includes the ability to use computing resources efficiently to meet system
requirements and to maintain that efficiency as demand changes and
technologies evolve.
48
Design Principles
1. Democratize advanced technologies
2. Go global in minutes
3. Use serverless architectures
4. Experiment more often
5. Try various comparative testing and configurations to find out what performs
better
49
Democratize advanced technologies
Technologies that are difficult to implement can become easier to consume by pushing that knowledge
and complexity into the cloud vendor’s domain.
Rather than having your IT team learn how to host and run a new technology, they can simply consume it
as a service.
For example, NoSQL databases, media transcoding, and machine learning are all technologies that
require expertise that is not evenly dispersed across the technical community.
In the cloud, these technologies become services that your team can consume while focusing on product
development rather than resource provisioning and management.
50
Go global in minutes
Easily deploy your system in multiple Regions around the world with just a few clicks.
This allows you to provide lower latency and a better experience for your customers at minimal cost.
51
Use serverless architectures
In the cloud, serverless architectures remove the need for you to run and maintain servers to carry out
traditional compute activities.
For example, storage services can act as static websites, removing the need for web servers, and event
services can host your code for you.
This not only removes the operational burden of managing these servers, but also can lower transactional
costs because these managed services operate at cloud scale.
52
Experiment more often
With virtual and automatable resources, you can quickly carry out comparative testing using different
types of instances, storage, or configurations.
53
Best Practices
● PERF 1: How do you select the best performing architecture?
● PERF 2: How did you select your compute solution?
● PERF 3: How do you select your storage solution?
● PERF 4: How do you select your database solution?
● PERF 5: How do you configure your networking solution?
● PERF 6: How do you ensure that you continue to have the most appropriate
resource type as new resource types and features are introduced?
● PERF 7: How do you monitor your resources post-launch to ensure they are
performing as expected?
● PERF 8: How do you use tradeoffs to improve performance?
54
55
Cost Optimization
includes the ability to avoid or eliminate unneeded cost or suboptimal
resources.
56
Design Principles
1. Adopt a consumption model
2. Measure overall efficiency
3. Stop spending money on data center operations
4. Analyze and attribute expenditure
5. Use managed services to reduce cost of ownership
57
Adopt a consumption model
Pay only for the computing resources that you consume and increase or decrease usage depending on
business requirements, not by using elaborate forecasting.
For example, development and test environments are typically only used for eight hours a day during the
work week. You can stop these resources when they are not in use for a potential cost savings of 75% (40
hours versus 168 hours).
58
Measure overall efficiency
Measure the business output of the system and the costs associated with delivering it.
Use this measure to understand the gains you make from increasing output and reducing costs.
59
Stop spending money on data center operations
AWS does the heavy lifting of racking, stacking, and powering servers, so you can focus on your
customers and business projects rather than on IT infrastructure.
60
Analyze and attribute expenditure
The cloud makes it easier to accurately identify the usage and cost of systems, which then allows
transparent attribution of IT costs to individual business owners.
This helps measure return on investment (ROI) and gives system owners an opportunity to optimize their
resources and reduce costs
61
Use managed services to reduce cost of ownership
In the cloud, managed services remove the operational burden of maintaining servers for tasks like
sending email or managing databases.
And because managed services operate at cloud scale, they can offer a lower cost per transaction or
service.
62
Definition
● Cost-Effective Resources
● Matching Supply and Demand
● Expenditure Awareness
● Optimizing Over Time
63
Best Practices
1. COST 1: Are you considering cost when you select AWS services for your solution?
2. COST 2: Have you sized your resources to meet your cost targets?
3. COST 3: Have you selected the appropriate pricing model to meet your cost targets?
4. COST 4: How do you make sure your capacity matches but does not substantially exceed what you need?
5. COST 5: Do you consider data-transfer charges when designing your architecture?
6. COST 6: How are you monitoring usage and spending?
7. COST 7: Do you decommission resources that you no longer need or stop resources that are temporarily not
needed?
8. COST 8: What access controls and procedures do you have in place to govern AWS usage?
9. COST 9: How do you manage and/or consider the adoption of new services?
64
65
The Review Process
The review of architectures needs to be done
● consistent manner (一致的方法)
● a light-weight process (hours not days)
● a conversation and not an audit
● identify any critical issues
● building an architecture using the Well-Architected Framework continually review their architecture,
rather than holding a formal review meeting.
66
Items for Review Meeting
● A meeting room with whiteboards
● Print outs of any diagrams or design notes
● Action list of questions that require out-of-band research to answer (for example, did we enable
encryption or not?)
67
SRE: Site Reliability Engineering
CH27 - Reliable Product Launches at Scale
Launch Coordination Engineers
68
69

Más contenido relacionado

La actualidad más candente

Csa summit 2017 - Plataforma de Seguridad para entornos Cloud
Csa summit 2017 - Plataforma de Seguridad para entornos CloudCsa summit 2017 - Plataforma de Seguridad para entornos Cloud
Csa summit 2017 - Plataforma de Seguridad para entornos Cloud
CSA Argentina
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
Puppet
 

La actualidad más candente (20)

Red Hat Insights
Red Hat InsightsRed Hat Insights
Red Hat Insights
 
E4: Building Your First Predix App (Predix Transform 2016)
E4: Building Your First Predix App (Predix Transform 2016)E4: Building Your First Predix App (Predix Transform 2016)
E4: Building Your First Predix App (Predix Transform 2016)
 
Containers: Give Me The Facts, Not The Hype - AppD Summit Europe
Containers: Give Me The Facts, Not The Hype - AppD Summit EuropeContainers: Give Me The Facts, Not The Hype - AppD Summit Europe
Containers: Give Me The Facts, Not The Hype - AppD Summit Europe
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
 
2020 05-tech saturday-devsecops-#2-v03
2020 05-tech saturday-devsecops-#2-v032020 05-tech saturday-devsecops-#2-v03
2020 05-tech saturday-devsecops-#2-v03
 
Csa summit 2017 - Plataforma de Seguridad para entornos Cloud
Csa summit 2017 - Plataforma de Seguridad para entornos CloudCsa summit 2017 - Plataforma de Seguridad para entornos Cloud
Csa summit 2017 - Plataforma de Seguridad para entornos Cloud
 
Dynatrace: The untouchables - the Dynatrace offering here and now
Dynatrace: The untouchables - the Dynatrace offering here and nowDynatrace: The untouchables - the Dynatrace offering here and now
Dynatrace: The untouchables - the Dynatrace offering here and now
 
Concourse, Spinnaker, Cloud Foundry, Oh My! Creating Sophisticated Deployment...
Concourse, Spinnaker, Cloud Foundry, Oh My! Creating Sophisticated Deployment...Concourse, Spinnaker, Cloud Foundry, Oh My! Creating Sophisticated Deployment...
Concourse, Spinnaker, Cloud Foundry, Oh My! Creating Sophisticated Deployment...
 
Soluciones Dynatrace
Soluciones DynatraceSoluciones Dynatrace
Soluciones Dynatrace
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
 
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKSMigrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
 
D6: Cloud Directions ( Predix Transform 2016)
D6: Cloud Directions ( Predix Transform 2016)D6: Cloud Directions ( Predix Transform 2016)
D6: Cloud Directions ( Predix Transform 2016)
 
Pivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First LookPivotal Cloud Foundry 2.3: A First Look
Pivotal Cloud Foundry 2.3: A First Look
 
Cloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CDCloud-Native Operations with Kubernetes and CI/CD
Cloud-Native Operations with Kubernetes and CI/CD
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
 
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave KempeOSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
OSMC 2017 | Icinga2 in a 24/7 Broadcast Environment by Dave Kempe
 
Delivering Quality at Speed with GitOps
Delivering Quality at Speed with GitOpsDelivering Quality at Speed with GitOps
Delivering Quality at Speed with GitOps
 
GitOps & the deployment branching models - DevOps D-day Marseille 2021
GitOps & the deployment branching models - DevOps D-day Marseille 2021GitOps & the deployment branching models - DevOps D-day Marseille 2021
GitOps & the deployment branching models - DevOps D-day Marseille 2021
 

Similar a AWS Well-Architected Framework (nov 2017)

Similar a AWS Well-Architected Framework (nov 2017) (20)

AWS Well Architected Framework
AWS Well Architected FrameworkAWS Well Architected Framework
AWS Well Architected Framework
 
AWS Well Architected Framework
AWS Well Architected FrameworkAWS Well Architected Framework
AWS Well Architected Framework
 
Introduction to DevSecOps
Introduction to DevSecOpsIntroduction to DevSecOps
Introduction to DevSecOps
 
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
 
Infrastructure as Code Maturity Model v1
Infrastructure as Code Maturity Model v1Infrastructure as Code Maturity Model v1
Infrastructure as Code Maturity Model v1
 
Using AWS Well Architectured Framework for Software Architecture Evaluations ...
Using AWS Well Architectured Framework for Software Architecture Evaluations ...Using AWS Well Architectured Framework for Software Architecture Evaluations ...
Using AWS Well Architectured Framework for Software Architecture Evaluations ...
 
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
Unleash Team Productivity with Real-Time Operations (DEV203-S) - AWS re:Inven...
 
Past, Present and Future of DevOps Infrastructure
Past, Present and Future of DevOps InfrastructurePast, Present and Future of DevOps Infrastructure
Past, Present and Future of DevOps Infrastructure
 
Continuous Testing in containerized environment
Continuous Testing in containerized environmentContinuous Testing in containerized environment
Continuous Testing in containerized environment
 
AWS Well Architected Framework in Summary
AWS Well Architected Framework in SummaryAWS Well Architected Framework in Summary
AWS Well Architected Framework in Summary
 
Microservices operational management | Walkingtree Technologies
Microservices operational management  | Walkingtree TechnologiesMicroservices operational management  | Walkingtree Technologies
Microservices operational management | Walkingtree Technologies
 
SRE & Kubernetes
SRE & KubernetesSRE & Kubernetes
SRE & Kubernetes
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdfADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
ADDO_2020-Driving-Digital-Transformation-through-CloudOps-and-SRE.pdf
 
Technology insights: Decision Science Platform
Technology insights: Decision Science PlatformTechnology insights: Decision Science Platform
Technology insights: Decision Science Platform
 
Partnership to Capture Indonesia ERP Cloud Trend Opportunities
Partnership to Capture Indonesia ERP Cloud Trend OpportunitiesPartnership to Capture Indonesia ERP Cloud Trend Opportunities
Partnership to Capture Indonesia ERP Cloud Trend Opportunities
 
AWS Well-Architected Framework
AWS Well-Architected FrameworkAWS Well-Architected Framework
AWS Well-Architected Framework
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Dev ops essentials v2
Dev ops essentials v2Dev ops essentials v2
Dev ops essentials v2
 
Moving from Monolith to Microservices
Moving from Monolith to MicroservicesMoving from Monolith to Microservices
Moving from Monolith to Microservices
 

Más de Rick Hwang

導讀持續交付 2.0 - CH02 價值探索環
導讀持續交付 2.0 - CH02 價值探索環 導讀持續交付 2.0 - CH02 價值探索環
導讀持續交付 2.0 - CH02 價值探索環
Rick Hwang
 

Más de Rick Hwang (20)

在生命轉彎的地方 - 從軟體開發職涯,探索人生
在生命轉彎的地方 - 從軟體開發職涯,探索人生在生命轉彎的地方 - 從軟體開發職涯,探索人生
在生命轉彎的地方 - 從軟體開發職涯,探索人生
 
20230829 - 探索職涯,複利人生
20230829 - 探索職涯,複利人生20230829 - 探索職涯,複利人生
20230829 - 探索職涯,複利人生
 
2023 08 - SRE 實踐與開發平台指南 - 書友見面會
2023 08 - SRE 實踐與開發平台指南 - 書友見面會2023 08 - SRE 實踐與開發平台指南 - 書友見面會
2023 08 - SRE 實踐與開發平台指南 - 書友見面會
 
20230215 - 凝聚團隊共識的溝通方法 (Effective Team Communication)
20230215 - 凝聚團隊共識的溝通方法 (Effective Team Communication)20230215 - 凝聚團隊共識的溝通方法 (Effective Team Communication)
20230215 - 凝聚團隊共識的溝通方法 (Effective Team Communication)
 
軟體測試實務新書發表會 - 從品質與測試,讓軟體再次偉大
軟體測試實務新書發表會 - 從品質與測試,讓軟體再次偉大軟體測試實務新書發表會 - 從品質與測試,讓軟體再次偉大
軟體測試實務新書發表會 - 從品質與測試,讓軟體再次偉大
 
CH02 API Governance
CH02 API Governance CH02 API Governance
CH02 API Governance
 
Chapter 8. Partial updates and retrievals.pdf
Chapter 8. Partial updates and retrievals.pdfChapter 8. Partial updates and retrievals.pdf
Chapter 8. Partial updates and retrievals.pdf
 
Ch09 Custom Methods
Ch09 Custom MethodsCh09 Custom Methods
Ch09 Custom Methods
 
AWS Career Exploration Day
AWS Career Exploration DayAWS Career Exploration Day
AWS Career Exploration Day
 
從理想、到現實的距離,開啟品味軟體測試之路 - 台灣軟體工程協會 (20220813)
從理想、到現實的距離,開啟品味軟體測試之路 - 台灣軟體工程協會 (20220813)從理想、到現實的距離,開啟品味軟體測試之路 - 台灣軟體工程協會 (20220813)
從理想、到現實的距離,開啟品味軟體測試之路 - 台灣軟體工程協會 (20220813)
 
SRE Conf 2022 - 91APP 在 AWS 上的 SRE 實踐之路
SRE Conf 2022 - 91APP 在 AWS 上的 SRE 實踐之路SRE Conf 2022 - 91APP 在 AWS 上的 SRE 實踐之路
SRE Conf 2022 - 91APP 在 AWS 上的 SRE 實踐之路
 
導讀持續交付 2.0 - CH02 價值探索環
導讀持續交付 2.0 - CH02 價值探索環 導讀持續交付 2.0 - CH02 價值探索環
導讀持續交付 2.0 - CH02 價值探索環
 
2020 AWS Summit - 如何有效管理 AWS 的成本結構與系統架構
2020 AWS Summit - 如何有效管理 AWS 的成本結構與系統架構2020 AWS Summit - 如何有效管理 AWS 的成本結構與系統架構
2020 AWS Summit - 如何有效管理 AWS 的成本結構與系統架構
 
災難演練 @ AWS 實戰分享 (Using AWS for Disaster Recovery)
災難演練 @ AWS 實戰分享 (Using AWS for Disaster Recovery)災難演練 @ AWS 實戰分享 (Using AWS for Disaster Recovery)
災難演練 @ AWS 實戰分享 (Using AWS for Disaster Recovery)
 
Software Development Process v1.5 - 20121214
Software Development Process v1.5 - 20121214Software Development Process v1.5 - 20121214
Software Development Process v1.5 - 20121214
 
第三章 建立良好的人際關係網路
第三章 建立良好的人際關係網路第三章 建立良好的人際關係網路
第三章 建立良好的人際關係網路
 
Wiki in Teamroom - Connected Mind
Wiki in Teamroom - Connected MindWiki in Teamroom - Connected Mind
Wiki in Teamroom - Connected Mind
 
導讀持續交付 2.0 - 談當代軟體交付之虛實融合
導讀持續交付 2.0 - 談當代軟體交付之虛實融合導讀持續交付 2.0 - 談當代軟體交付之虛實融合
導讀持續交付 2.0 - 談當代軟體交付之虛實融合
 
Study Notes - Event-Driven Data Management for Microservices
Study Notes - Event-Driven Data Management for MicroservicesStudy Notes - Event-Driven Data Management for Microservices
Study Notes - Event-Driven Data Management for Microservices
 
從緊急事件 談 SRE 應變能力的培養 - DevOpsDays Taipei 2018
從緊急事件  談 SRE 應變能力的培養 - DevOpsDays Taipei 2018從緊急事件  談 SRE 應變能力的培養 - DevOpsDays Taipei 2018
從緊急事件 談 SRE 應變能力的培養 - DevOpsDays Taipei 2018
 

Último

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Último (20)

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

AWS Well-Architected Framework (nov 2017)

  • 1. Learn, measure, and build using architectural best practices Rick Hwang Nov 2017 AWS Well-Architected 1
  • 5. 5
  • 7. ● Stop guessing your capacity needs ○ Eliminate guessing about your infrastructure capacity needs. ○ You can use as much or as little capacity as you need, and scale up and down automatically. ○ With cloud computing, these problems can go away. ● Test systems at production scale ○ In the cloud, you can create a production-scale test environment on demand, complete your testing ○ Pay for the test environment when it’s running. ● Automate to make architectural experimentation eastier ○ Automation allows you to create and replicate your systems at low cost and avoid the expense of manual effort. ○ You can track changes to your automation, audit the impact, and revert to previous parameters when necessary General Design Principles 7
  • 8. ● Allow for evolutionary architectures: ○ In a traditional environment, architectural decisions are often implemented as static, one-time events, with a few major versions of a system during its lifetime. ○ As a business and its context continue to change, these initial decisions might hinder the system’s ability to deliver changing business requirements. ○ In the cloud, the capability to automate and test on demand lowers the risk of impact from design changes. This allows systems to evolve over time so that businesses can take advantage of innovations as a standard practice. General Design Principles 8
  • 9. ● Drive architectures using data: ○ In the cloud you can collect data on how your architectural choices affect the behavior of your workload. ○ This lets you make fact-based decisions on how to improve your workload. ○ Your cloud infrastructure is code, so you can use that data to inform your architecture choices and improvements over time. General Design Principles 9
  • 10. ● Improve through game days: ○ Test how your architecture and processes perform by regularly scheduling game days to simulate events in production. ○ This will help you understand where improvements can be made and can help develop organizational experience in dealing with events. General Design Principles 補充: 1. 維運單位新人教育訓練 2. CI / CD / DevOps 10
  • 11. 11
  • 12. The Five Pillars of the Well-Architected Framework 12
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. Operational Excellence includes the ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures. 16
  • 17. Design Principles 17 Perform operations as code: ● In the cloud, you can apply the same engineering discipline that you use for application code to your entire environment. ● You can define your entire workload (applications, infrastructure, etc.) as code and update it with code. ● You can script your operations procedures and automate their execution by triggering them in response to events. ● By performing operations as code, you limit human error and enable consistent responses to events. 補充: 1. `as Code` => 用 工程 方法
  • 18. Annotate documentation: ● In an on-premises environment, documentation is created by hand (手作的), used by people, and hard to keep in sync with the pace of change. ● In the cloud, you can automate the creation of documentation after every build (or automatically annotate hand-crafted documentation). ● Annotated documentation can be used by people and systems. ● Use annotations as an input to your operations code. Design Principles 18
  • 19. Make frequent, small, reversible changes: ● Design workloads to allow components to be updated regularly. ● Make changes in small increments that can be reversed if they fail (without affecting customers when possible). Design Principles 19 Introduction to DevOps on AWS
  • 21. Refine operations procedures frequently: ● As you use operations procedures, look for opportunities to improve them. ● As you evolve (推演) your workload, evolve your procedures appropriately. ● Set up regular game days to review and validate that all procedures are effective and that teams are familiar with them. Design Principles 21
  • 22. Anticipate failure ● Perform “pre-mortem” exercises to identify potential sources of failure so that they can be removed or mitigated (緩解). ● Test your failure scenarios and validate your understanding of their impact. ● Test your response procedures to ensure that they are effective and that teams are familiar with their execution. ● Set up regular game days to test workloads and team responses to simulated events. Design Principles 22 補充: 1. Design for failure 2. SRE CH13: Things break; that’s life.
  • 23. Chaos Engineering ● Chaos: 混屯工程 ● Netflix 提出的概念,Chaos Monkey ● 任意破壞基礎設施,系統能夠自動恢復 ● Resilience as a Service (恢復能力) 23
  • 24. 推薦閱讀:Site Reliability Engineering 1. SRE CH13 - Emergency Response 2. SRE CH14 - Managing Incidents 3. SRE CH15 - Learning from Failure 4. 警急事件 by Rick Learn from all operational failures: ● Drive improvement through lessons learned from all operational events and failures. ● Share what is learned across teams and through the entire organization Design Principles 24
  • 25. Best Practices ● OPS 1: What factors drive your operational priorities? ● OPS 2: How do you design your workload to enable operability? ● OPS 3: How do you know that you are ready to support a workload? ● OPS 4: What factors drive your understanding of operational health? ● OPS 5: How do you manage operational events? ● OPS 6: How do you evolve (推演) operations? 25
  • 26. 26
  • 27. Security(是動詞、是攻擊) includes the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies. 27
  • 28. ● Implement a strong identity foundation ● Enable traceability ● Apply security at all layers ● Automate security best practices ● Protect data in transit and at rest ● Prepare for security events Design Principles 28
  • 29. 補充:AWS IAM Implement a strong identity foundation: ● Implement the principle of least privilege and enforce separation of duties with appropriate authorization for each interaction with your AWS resources. ● Centralize privilege management and reduce or even eliminate reliance on long term credentials. Design Principles 29
  • 30. Enable traceability: ● Monitor, alert, and audit actions and changes to your environment in real time. ● Integrate logs and metrics with systems to automatically respond and take action. Design Principles 30 補充:AWS CloudWatch, CloudTrial, VPC Flow Log
  • 31. Apply security at all layers: ● Rather than just focusing on protecting a single outer layer, apply a defense-in-depth approach with other security controls. ● Apply to all layers, for example, edge network, virtual private cloud (VPC), subnet, load balancer, every instance, operating system, and application. Design Principles 31
  • 32. Automate security best practices: ● Automated software-based security mechanisms improve your ability to securely scale more rapidly and cost effectively. ● Create secure architectures, including the implementation of controls that are defined and managed as code in version-controlled templates. Design Principles 32
  • 33. Protect data in transit and at rest: ● Classify your data into sensitivity levels and use mechanisms, such as encryption and tokenization where appropriate. ● Reduce or eliminate direct human access to data to reduce risk of loss or modification. Design Principles 33
  • 34. Prepare for security events: ● Prepare for an incident by having an incident management process that aligns to your organizational requirements. ● Run incident response simulations and use tools with automation to increase your speed for detection, investigation, and recovery. Design Principles 34
  • 35. ● Identity and Access Management ● Detective Controls ● Infrastructure Protection ● Data Protection ● Incident Response Definition 35
  • 36. Best Practice SEC 1: How are you protecting access to and use of the AWS account root user credentials? SEC 2: How are you defining roles and responsibilities of system users to control human access to the AWS Management Console and API? SEC 3: How are you limiting automated access to AWS resources (for example, applications, scripts, and/or third-party tools or services)? SEC 4: How are you capturing and analyzing logs? SEC 5: How are you enforcing network and host-level boundary protection? SEC 6: How are you leveraging AWS service-level security features? SEC 7: How are you protecting the integrity of the operating system? SEC 8: How are you classifying your data? SEC 9: How are you encrypting and protecting your data at rest? SEC 10: How are you managing keys? SEC 11: How are you encrypting and protecting your data in transit? SEC 12: How do you ensure that you have the appropriate incident response? 36
  • 37. 37
  • 38. Reliability includes the ability of a system to recover from infrastructure or service disruptions (中斷), dynamically acquire computing resources to meet demand, and mitigate disruptions (減輕中斷) such as misconfigurations or transient network issues. 38
  • 39. Design Principles 1. Test recovery procedures 2. Automatically recover from failure 3. Use horizontal scalability to increase system availability 4. Automatically add/remove resources as needed to avoid capacity saturation 5. Manage change in automation 39
  • 40. Test Recovery Procedures 1. In an on-premises environment, testing is often conducted to prove the system works in a particular scenario. 2. Testing is not typically used to validate recovery strategies. 3. In the cloud, you can test how your system fails, and you can validate your recovery procedures. 4. You can use automation to simulate different failures or to recreate scenarios that led to failures before. 5. This exposes failure pathways that you can test and rectify before a real failure scenario, reducing the risk of components failing that have not been tested before. 40
  • 41. Automatically recover from failure ● By monitoring a system for key performance indicators (KPIs), you can trigger automation when a threshold is breached. ● This allows for automatic notification and tracking of failures, and for automated recovery processes that work around or repair the failure. ● With more sophisticated automation, it’s possible to anticipate and remediate failures before they occur 41
  • 42. Scale horizontally to increase aggregate system availability Replace one large resource with multiple small resources to reduce the impact of a single failure on the overall system. Distribute requests across multiple, smaller resources to ensure that they don’t share a common point of failure. 42
  • 43. Stop guessing capacity A common cause of failure in on-premises systems is resource saturation, when the demands placed on a system exceed the capacity of that system (this is often the objective of denial of service attacks). In the cloud, you can monitor demand and system utilization, and automate the addition or removal of resources to maintain the optimal level to satisfy demand without over- or under-provisioning. 43
  • 44. Manage change in automation Changes to your infrastructure should be done using automation. The changes that need to be managed are changes to the automation. 44
  • 45. Definition ● Foundations ● Change Management ● Failure Management 45
  • 46. Best Practice 1. REL 1: How are you managing AWS service limits for your accounts? 2. REL 2: How are you planning your network topology on AWS? 3. REL 3: How does your system adapt to changes in demand? 4. REL 4: How are you monitoring AWS resources? 5. REL 5: How are you executing change? 6. REL 6: How are you backing up your data? 7. REL 7: How does your system withstand component failures? 8. REL 8: How are you testing your resiliency? 9. REL 9: How are you planning for disaster recovery? 46
  • 47. 47
  • 48. Performance Efficiency includes the ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand changes and technologies evolve. 48
  • 49. Design Principles 1. Democratize advanced technologies 2. Go global in minutes 3. Use serverless architectures 4. Experiment more often 5. Try various comparative testing and configurations to find out what performs better 49
  • 50. Democratize advanced technologies Technologies that are difficult to implement can become easier to consume by pushing that knowledge and complexity into the cloud vendor’s domain. Rather than having your IT team learn how to host and run a new technology, they can simply consume it as a service. For example, NoSQL databases, media transcoding, and machine learning are all technologies that require expertise that is not evenly dispersed across the technical community. In the cloud, these technologies become services that your team can consume while focusing on product development rather than resource provisioning and management. 50
  • 51. Go global in minutes Easily deploy your system in multiple Regions around the world with just a few clicks. This allows you to provide lower latency and a better experience for your customers at minimal cost. 51
  • 52. Use serverless architectures In the cloud, serverless architectures remove the need for you to run and maintain servers to carry out traditional compute activities. For example, storage services can act as static websites, removing the need for web servers, and event services can host your code for you. This not only removes the operational burden of managing these servers, but also can lower transactional costs because these managed services operate at cloud scale. 52
  • 53. Experiment more often With virtual and automatable resources, you can quickly carry out comparative testing using different types of instances, storage, or configurations. 53
  • 54. Best Practices ● PERF 1: How do you select the best performing architecture? ● PERF 2: How did you select your compute solution? ● PERF 3: How do you select your storage solution? ● PERF 4: How do you select your database solution? ● PERF 5: How do you configure your networking solution? ● PERF 6: How do you ensure that you continue to have the most appropriate resource type as new resource types and features are introduced? ● PERF 7: How do you monitor your resources post-launch to ensure they are performing as expected? ● PERF 8: How do you use tradeoffs to improve performance? 54
  • 55. 55
  • 56. Cost Optimization includes the ability to avoid or eliminate unneeded cost or suboptimal resources. 56
  • 57. Design Principles 1. Adopt a consumption model 2. Measure overall efficiency 3. Stop spending money on data center operations 4. Analyze and attribute expenditure 5. Use managed services to reduce cost of ownership 57
  • 58. Adopt a consumption model Pay only for the computing resources that you consume and increase or decrease usage depending on business requirements, not by using elaborate forecasting. For example, development and test environments are typically only used for eight hours a day during the work week. You can stop these resources when they are not in use for a potential cost savings of 75% (40 hours versus 168 hours). 58
  • 59. Measure overall efficiency Measure the business output of the system and the costs associated with delivering it. Use this measure to understand the gains you make from increasing output and reducing costs. 59
  • 60. Stop spending money on data center operations AWS does the heavy lifting of racking, stacking, and powering servers, so you can focus on your customers and business projects rather than on IT infrastructure. 60
  • 61. Analyze and attribute expenditure The cloud makes it easier to accurately identify the usage and cost of systems, which then allows transparent attribution of IT costs to individual business owners. This helps measure return on investment (ROI) and gives system owners an opportunity to optimize their resources and reduce costs 61
  • 62. Use managed services to reduce cost of ownership In the cloud, managed services remove the operational burden of maintaining servers for tasks like sending email or managing databases. And because managed services operate at cloud scale, they can offer a lower cost per transaction or service. 62
  • 63. Definition ● Cost-Effective Resources ● Matching Supply and Demand ● Expenditure Awareness ● Optimizing Over Time 63
  • 64. Best Practices 1. COST 1: Are you considering cost when you select AWS services for your solution? 2. COST 2: Have you sized your resources to meet your cost targets? 3. COST 3: Have you selected the appropriate pricing model to meet your cost targets? 4. COST 4: How do you make sure your capacity matches but does not substantially exceed what you need? 5. COST 5: Do you consider data-transfer charges when designing your architecture? 6. COST 6: How are you monitoring usage and spending? 7. COST 7: Do you decommission resources that you no longer need or stop resources that are temporarily not needed? 8. COST 8: What access controls and procedures do you have in place to govern AWS usage? 9. COST 9: How do you manage and/or consider the adoption of new services? 64
  • 65. 65
  • 66. The Review Process The review of architectures needs to be done ● consistent manner (一致的方法) ● a light-weight process (hours not days) ● a conversation and not an audit ● identify any critical issues ● building an architecture using the Well-Architected Framework continually review their architecture, rather than holding a formal review meeting. 66
  • 67. Items for Review Meeting ● A meeting room with whiteboards ● Print outs of any diagrams or design notes ● Action list of questions that require out-of-band research to answer (for example, did we enable encryption or not?) 67
  • 68. SRE: Site Reliability Engineering CH27 - Reliable Product Launches at Scale Launch Coordination Engineers 68
  • 69. 69