NASA imaging satellites deliver GB's of images to Earth every day. Mapbox uses AWS to process that data in real-time and build the most complete, seamless satellite map of the world. Learn how Mapbox uses Amazon S3 and Amazon SQS to stream data from NASA into clusters of EC2 instances running a clever algorithm that stiches images together in parallel. This session includes an in-depth discussion of high-volume storage with Amazon S3, cost-efficient data processing with Amazon EC2 Spot Instances, reliable job orchestration with Amazon SQS, and demand resilience with Auto Scaling.
3. Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2instances
Amazon SQS
Manages the units of work
46. Processing requirements
•Massive storage for raw and processed data
•Massive computing that we can spin up and down in minutes
•Everything must be fully automated
•Low cost
47. Amazon EC2
Low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2instances
Amazon SQS
Manages the queue of work
48. NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination S3 Bucket
Processed Outputs
49. Watcher EC2instance
•Copies raw data files from NASA server to our S3 bucket
•Splits file up into smaller parts and sends them into Amazon SQS as messages
50. Why stash raw data on Amazon S3?
•Extremely low latency between Amazon S3 and Amazon EC2 in the same AWS region
•Don’t want to hammer NASA servers with requests from our hundreds of workers
•Easy to reprocess data later
51. Messages for Amazon SQS
•Take a big job and split it up into smaller parts
•Shorter is better -a few minutes per message is ideal
•Messages need to be repeatable in case of failure
54. NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination S3 Bucket
Processed Outputs
55. Worker EC2instance
Grab message from the queue
Source S3 Bucket
SQS Queue
Destination S3 Bucket
Download raw data from S3
Run software to process the data
Deliver processed data to S3
Delete message from the queue to mark it complete
56. NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination S3 Bucket
Processed Outputs
57. Worker Auto Scaling Group
•Capacity is controlled by the number of messages in the queue
•Spikes are no problem: more instances come online automatically
67. Spot market
•Bid on unused Amazon EC2 capacity and get a discount
•Instance runs as long as your bid price is higher than the market price
•If market prices spikes, your instances are terminated immediately
•Perfect for big data processing jobs that aren’t on a critical schedule
88. Distribution requirements
•Massive storage for processed data
•HTTP sever capacity that we can spin up and down in minutes
•Global distribution for speed and redundancy
•Everything must be fully automated
•Low cost
89. Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2instances
Amazon SQS
Manages the units of work
90. Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2instances
Amazon SQS
Manages the units of work
91. Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2instances
Distributes web traffic between multiple EC2instances
Elastic Load
Balancing
92. NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination S3 Bucket
Processed Outputs
93. S3 Bucket
Virginia
S3 Bucket
São Paulo
S3 Bucket
Ireland
S3 Bucket
Tokyo
S3 Bucket
California
S3 Bucket
Singapore
S3 Bucket
Sydney
S3 Bucket
Oregon
Processed Outputs
S3 Bucket
Frankfurt
94.
95. region
S3 Bucket
Auto Scaling group
Server Instances
Elastic Load
Balancing
104. Instance reservations
•Buy computing up front for long-running instances
•Large upfront charge in exchange for low hourly usage cost
•Save up to 60% or more over the course of a year
•Perfect for critical instances that need to stay online
105. Reservations about reservations
•Took us over a year to commit
•Changing infrastructure: splitting applications, new instance types
106. What made us eventually buy
•Easily swap reservations for instances within the same family
•Sell unused instances on the secondary market
•Cloudability: Great reservation recommendation tool