This session drills deep into the Amazon S3 technical best practices that help you maximize storage performance for your use case. We provide real-world examples and discuss the impact of object naming conventions and parallelism on Amazon S3 performance, and describe the best practices for multipart uploads and byte-range downloads.
3. Architecture
Choosing a region
Building a naming scheme
Considering LISTs
Optimizing PUTs
Multipart upload
Demo
Optimizing GETs
Using CloudFront
Range-based GETs
Demo
Customer Case
BigDataCorp
4.
5.
6. Request Rate and Performance Considerations
http://amzn.to/18oF5LC
TIP
11. 1
2
N
1
2
N
Partition
Partition
Partition
Partition
12. •Store objects as a hash of their name
–add the original name as metadata
•“deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9
–prependkeyname withshort hash
•0aa3-deadmau5_mix.mp3
•Epoch time (reverse)
–5321354831-deadmau5_mix.mp3
36. Maestro
(Reserved Instance)
List of crawl
URLs
Main workers
Execute crawling and process data
Spot Instances
Secondary workers(queue listeners)
Reprocess data, query additional services, store data on MongoDB
Spot Instances
Secondary
work queues –
processed data
MongoDBcluster
Command and
Control Queue
37.
38. Architecture
Choosing a region
Building a naming scheme
Considering LISTs
Optimizing PUTs
Multipart upload
Demo
Optimizing GETs
Using CloudFront
Range-based GETs
Demo
Customer Case
BigDataCorp