I will share how we develop and host a popular publishing platform in the cloud with a limited budget and technology team.
We'll cover architecture, including a variety of services at Amazon Web Services such as elastic load balancing, S3, Elastic Beanstalk, and RDS in the context of a real site.
We'll cover how we control costs with Spot and burstable instances and scale up with distributed caching.
Finally we'll discuss continuous deployment strategies for Windows and Linux-based cloud applications in the context of a distributed team using an agile process.
How to Troubleshoot Apps for the Modern Connected Worker
Nuts and bolts of running a popular site in the aws cloud
1. Host a hit site in the cloud
without downtime or going
broke
David Veksler
2. Nuts and bolts of running a popular site in the
AWS cloud
• I will share how we develop and host a popular publishing platform in
the cloud with a limited budget and technology team.
• We'll cover architecture, including a variety of services at Amazon
Web Services such as elastic load balancing, S3, Elastic Beanstalk, and
RDS in the context of a real site.
• We'll cover how we control costs with Spot and burstable instances
and scale up with distributed caching.
• Finally we'll discuss continuous deployment strategies for Windows
and Linux-based cloud applications in the context of a distributed
team using an agile process.
3. Contents
1. Cloud Architecture
2. Key AWS Services
3. Keeping costs under control
4. Configuration management
5. Key tools for distributed agile development
5. Northern Virginia AZ
FEE-DB security groupSpot Instance Fleet
fee-media
(US-Standard Region)
Media Storage
EC2
VM
C4.2xlarge
Cloudflare
DNS
CDN,
Firewall
Services
LIVE DB:
feedb2
Amazon Web Services Cloud
FEE-Dev.org
FEE.org Admin Node
TeamCity CI
Fee-dev.org:8080
EC2
VM
C4.2xlarge
Admin.fee.org
Fee-dev.org
Web1.fee.org
Admin.fee.org contains:
SES
Internal
Email
Other Services:
• Domain: Google
Domains
• Performance: New
Relic Pro
• Analytics: Parse.ly,
Clicky, Google
Analytics
• Uptime: Pingdom
• Email: MailChimp
• Code: BitBucket
users
Web2.fee.org
EC2
VM
C4.2xlarge
fee-misc
(US-Standard Region)
Backups
admin.fee.org
hosts both live
and dev, acts as
staging for
deployments
cache cluster:
fee-cache-001
fee-cache-002
Redis Cache
Architecture
Diagram
DEV DB:
fee-dev2
Elastic Load
Balancing
lb.fee.org
Analytics &
Content
Recommendations
Marketing
Email
web#.FEE.org
instances use spot
pricing to bid for
the best price
DNS, Firewall and
CDN
RDS
RDS
6. High-level objectives (by priority)
1. Front end uptime should be 99.8%
2. Back Office (admin) uptime should be 95%
3. Keep personal information (payments, admin access) secure
4. Stay up during traffic surges up to 6X weekly peak
5. Keep budget under $1,600/month
6. Ongoing development should not impact uptime.
7. Design strategy
1. All components should be redundant and self-healing
2. Pay for normal load while supporting surges
3. Outsource infrastructure: let AWS cloud be responsible for as much
infrastructure as feasible
4. Automate all backup processes
5. Semi-automated disaster recovery: site should recover from most
outages automatically, when cost of doing so is reasonable
6. Change management integrated into architecture via imaging and
cache keys
8. Architecture Summary
• Front-end is load balanced, scalable, and self-healing
• Backend is isolated from front-end
• Automatic snapshots for servers, transaction logging for DB
• Rely on AWS services for all infrastructure services
• Combine functionality within servers to save costs
• Massively over-allocate capacity using market-based pricing
• Development process integrated with production architecture
9. Northern Virginia AZ
FEE-DB security groupSpot Instance Fleet
fee-media
(US-Standard Region)
Media Storage
EC2
VM
C4.2xlarge
Cloudflare
DNS
CDN,
Firewall
Services
LIVE DB:
feedb2
EC2
VM
C4.2xlarge
Admin.fee.org
Fee-dev.org
Web1.fee.org
SES
Internal
Email
users
Web2.fee.org
EC2
VM
C4.2xlarge
fee-misc
(US-Standard Region)
Backups
cache cluster:
fee-cache-001
fee-cache-002
Redis Cache
DEV DB: fee-
dev2
Elastic Load
Balancing
lb.fee.org
RDS
RDS
13. Why CloudFlare is awesome
• Flat-rate CDN service (supports CDN daisy-chaining)
• Free, powerful SSL
• Active, crowd-sourced firewall
• Powerful DNS (CNAME flatting, much more)
• HTML and Image minification
• Much more!
• Saves FEE.org $ thousands per year in bandwidth costs
• Starts at $20/month
16. Elastic Load Balancer
• Point DNS at CNAME of load balancer
• Point destination to specific VMs or use auto-scaling rules
• Set destination by path pattern with Application Load Balancer
• Use TCP, HTTP, SSL for health check
• We use a custom health check endpoint which verifies application
uptime & DB connectivity
17. RDS: Relational Database Service
• FEE.org uses SQL Server Web
• Other sites use AuroraDB, which is 10X faster than MySQL
• (With proper tuning, in specific scenarios)
• Use snapshots to create dev instances of DB
• Schedule configuration changed for off-hours
• Be aware that RDS SQL Server restricts most admin actions. There are
special sprocs for some actions such as renaming DB or bringing DB online
(but not taking offline!)
• Backup restore not allowed: use SQL Database Migration Wizard to restore
DB
• Use burstable SQL Server instances, especially for dev DB
18. S3: Media storage + backup
• FEE.org uses S3 as a media (Image/PDF/EPUB/MP4/MP3) store
• Only originals are stored in S3, thumbnails are stored on server
• Amazon Web Services S3 IFileSystem provider for Umbraco + a
custom caching layer
• XSLT transforms to specify production/dev buckets
19. Spot Instances
• Instances only run when market price below the bid price
• In practical terms, Spot = 80% saving on hourly instances
• Supports auto scaling. Use it!
• Set bid price equal to hourly instance price and get 100% availability
(so far)
• Specify a range of qualified instance types (including previous
generations) to maximize chance of availability.
• FEE.org runs master server as xlarge hourly instance and read-only
nodes as 2xlarge Spot instances. This guarantees at least 1 cheap(er)
instance even if prices spike or instances refresh at the same time.
25. When to auto-scale?
• Instances that don’t take very long to spin up
• Individual instances don’t use too much resources
• Version release process is automated (such as with Elastic Beanstalk)
• Don’t release very often, or cost or snapshot management is minimal
• Large difference between minimum and peak traffic
• Unpredictable traffic trends
27. Why doesn’t FEE.org auto-scale?
• Minimum instance count for high availability is 3
• Peak traffic (> 600 concurrent users) can be handled by 2 instances
• Each instance requires 16GB ram and 8 CPUs for optimal performance
• Release process not fully automated & no full-time developers (do
not use Elastic Beanstalk & have to make manual snapshots post-
release)
• Can spin up new instances within minutes with Spot + New Relic
Alerts
• Will probably consider auto-scaling when we have more process
maturity (fully automated release process)
30. Elastic Beanstalk
• Upload DLLs to AWS git reposity, AWS does the rest
• AWS will deploy the code, load balancing, auto-scaling, health
monitoring, etc.
• Environment configuration with web.config XSLT transforms and ACL
permissions (wpp.targets) file.
• FREE service – only pay for resources used
• If using .Net, works with most 100% managed code projects
• GUI integrated with Visual Studio
41. Thinking about IAAS/SAAS Pricing Strategy
• Cloud services almost always cost much more per compute resource
than colocations or dedicated hardware
• Cost savings come in matching demand to infrastructure and
outsourcing management services
• Amazon & Azure are some of the most costly cloud services per
resources, but recommended for most scenarios because of
productivity benefits from breadth and depth of managed services.
42. Cloud Services Pricing Summary
• Each cloud service provider has a unique bundle of services and pricing
model. Different providers have unique price advantages for different
products. Provider selection should be based on a typical application mix
for our business.
• Azure may have a price advantage over Amazon when using cloud-
optimized architecture based on Microsoft products.
• Softlayer, Digital Ocean, and Google Compute all have better prices than
bost for various scenarios, especially Windows VM, but offer fewer
services.
• Cost is just one of many criteria for choosing a provider! No provider has a
decisive advantage for all scenarios.
43. Pricing Recommendations
1. Use the pricing calculator offered by each provider to estimate total
application cost for specific applications. Keep in mind cloud-
optimized architectures may have a much lower cost. (For example,
compute functions instantiated on-demand, auto-scaling, etc.)
2. Do not make pricing the primary consideration in provider selection
unless the cost difference is critical to businesses requirements. In
general, major service and quality differences between providers
are more important than pricing considerations.
3. Developing deep expertise and service integration with a cloud
provider is usually more important than cost differences for
individual projects.
44. Saving Money with AWS
• Reserved Instances
• Spot Instances
• Burstable Instances
• Scheduled Instances (using AWS or third party tool)
• This can be used with any AWS VM service – EC2, RDS, ElastiCache,
etc!
45. AWS Instance type selection criteria
• Use the latest generation of instance types (x4/t2)
• Use burstable instances for applications with high daily variability
• Evaluate whether applications are CPU, memory, or IO intensive and
select the appropriate type – scale up your particular bottleneck
• For applications with consistent and predicable load, prefer larger
instances; for applications with unpredictable load, auto-scale
horizontally with more burstable instances
46. Buying a reserved instance
• Unsure about your needs?
Get a convertible
instance! Can move up or
across.
• You can sell them! (I
haven’t tried this)
• Best savings/risk is usually
with partial payment
option.
47. S3 Reduced Redundancy Store & Glacier
• “Only” duplicated across 2 facilities
• .01% storage failure rate (“400 times the durability of typical HDD”)
• About 25% cheaper
48. • Background service via event handle
to media upload completed method
• $412GB * $0.0314 per GB =
$155/year saved on storage alone
• Runs as AWS Marketplace service
($39/month) or desktop app
JPEGmini
49. Summary: FEE.org $ saving strategy:
• 2 reserved burstable RDS databases
• 1 reserved admin EC2 VM
• 2 Spot EC2 front-end server instances
• AutomatiCloud EC2 scheduling for off-hours (and backup automation)
• S3 Reduced redundancy store for non-critical backups and dev data
• CloudFlare CDN
• JPEGmini image optimization background service
51. FEE Development Process
1. Post job on UpWork.com
2. Hire freelancer
3. Developer commits work to git
4. Deploy to dev environment
5. Test work
6. Create pull request for release
7. Release build
8. Staged deployment to production servers
54. Orientation
• Google Doc with:
• Architectural overview
• FEE.org development process
• Instructions to setup localhost environment
• Review of tools used
• Relevant people involved & their contact info
• Address of FEE-Dev Skype group
• Code Quality Expectations
55. Development Environment Setup
1. Checkout git repository
2. “Just hit F5”
• NuGet for all dependencies
• XSLT for non-local environments
• Dev DB hosted in cloud
• Optional: Install Redis on localhost for better performance
58. Staged, Staggered Deployment
• xcopy to each production server
• ELB takes server out of production within 30
seconds
• Stagger release by ~5 minutes to let each
application pool warm up
62. Aside: LAMP deployment strategy (highly
available WordPress)
• Commit hooks on master branch in Bitbucket git repository
• Hooks call deploy.php script which runs a git pull in dev environment
• Release PHP code with git pull on production
• Image staging server (AMI), and deploy Spot fleet with AMI
• Use S3 Media storage provider, Redis cache – no persistent data on
Spot instances
• Easy Engine for easy nginx configuration, etckeeper to backup/sync
configuration file
69. Surveys and Prizes
• Please complete the session and event surveys!
1 ticket per session survey
1 ticket for the event survey
1 ticket for completing the booth game
• Drawing for prizes begins at 5pm in Q202