SlideShare una empresa de Scribd logo
1 de 70
Descargar para leer sin conexión
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tom Johnston, S3 Product Management, AWS
Tom Fuller, Senior Solutions Architect, AWS
John Elliott, Infrastructure Engineering, Pinterest
April 19, 2017
Deep Dive on Object Storage
Amazon S3 and Amazon Glacier
Cloud Data Migration
Direct
Connect
Snow* data
transport
family
3rd Party
Connectors
Transfer
Acceleration
Storage
Gateway
Amazon Kinesis
Firehose
The AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS
What to Expect from the Session
• Pick the right storage class for your use cases
• Automate management tasks
• Best practices to optimize S3 performance
• Tools to help you manage storage
AWS Direct Connect AWS Snowball ISV Connectors
Amazon Kinesis
Firehose
S3 Transfer
Acceleration
AWS Storage
Gateway
Data transfer into Amazon S3
AWS Snowmobile
AWS Snowball Edge
Amazon Storage Partner Solutions
aws.amazon.com/backup-recovery/partner-solutions/
Note: Represents a sample of storage partners
Backup and RecoveryPrimary Storage Archive
Solutions that leverage file, block, object,
and streamed data formats as an
extension to on-premises storage
Solutions that leverage Amazon S3 for
durable data backup
Solutions that leverage Amazon
Glacier for durable and cost-effective
long-term data backup
Choice of storage classes on S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
Storage classes designed for your use case
S3 Standard
• Big data analysis
• Content distribution
• Static website
hosting
Standard - IA
• Backup & archive
• Disaster recovery
• File sync & share
• Long-retained data
Amazon Glacier
• Long term archives
• Digital preservation
• Magnetic tape
replacement
When should you move to Standard-IA?
S3 Analytics - storage class analysis
• Visualize the access pattern on your data over time
• Measure the object age where data is infrequently accessed
• Dive deep by bucket, prefixes, or specific object tag
• Easily create a lifecycle policy based on the analysis
Visualize access pattern on your data
Export S3 Analytics to the tools of your choice
 Pick the right storage class for your use cases
 Automate management tasks
• Best practices to optimize S3 performance
• Tools to help you manage storage
Automate data management
Lifecycle policies
• Automatic tiering and cost controls
• Includes two possible actions:
• Transition: archives to Standard - IA or Amazon
Glacier based on object age you specified
• Expiration: deletes objects after specified time
• Actions can be combined
• Set policies by bucket, prefix, or tags
• Set policies for current version or non-
current versions
Lifecycle policies
Set up a lifecycle policy on the AWS Management Console
Protect your data from accidental deletes
• Protects from unintended user deletes or
application logic failures
• New version with every upload
• Easy retrieval of deleted objects and roll
back to previous versions
Best Practice
Versioning
Easily recover from unintended delete
Tip: Create a recycle bin for your storage
Best Practice
Automate with trigger-based workflow
Amazon S3 event notifications
Events
SNS topic
SQS
queue
Lambda
function
• Notification when objects are
created via PUT, POST, Copy,
Multipart Upload, or DELETE
• Filter on prefixes and suffixes
• Trigger workflow with Amazon
SNS, Amazon SQS, and AWS
Lambda functions
Cross-region replication
Automated, fast, and reliable asynchronous replication of data across AWS Regions
Use cases:
• Compliance - store data hundreds of miles apart
• Lower latency - distribute data to regional customers
• Security - create remote replicas managed by separate AWS accounts
How it works:
• Only replicates new PUTs. Once configured, all new uploads into source
bucket will be replicated
• Entire bucket or prefix based
• 1:1 replication between any 2 regions
• Versioning required
• Deletes and lifecycle actions are not replicated
Summary – automate management tasks
Cross-region
replication
Automate transition
and expiration with
lifecycle policies
Trigger-based
workflow with
event notification
Easily recover from
accidental delete
with versioning
Topics
 Pick the right storage class for your use cases
 Automate management tasks
 Best practices to optimize S3 performance
• Tools to help you manage storage
Faster upload over long distances
S3 Transfer Acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Change your endpoint, not your code
No firewall changes or client software
Longer distance, larger files, more benefit
Faster or free
68 global edge locations
Try it at S3speedtest.com
Faster upload of large objects
Parallelize PUTs with multipart uploads
• Increase aggregate throughput by
parallelizing PUTs on high-bandwidth
networks
• Move the bottleneck to the network,
where it belongs
• Increase resiliency to network errors;
fewer large restarts on error-prone
networks
Best Practice
Faster download
You can parallelize GETs as well as PUTs
GET /example-object HTTP/1.1
Host: example-bucket.s3.amazonaws.com
x-amz-date: Fri, 28 Jan 2016 21:32:02 GMT
Range: bytes=0-9
Authorization: AWS AKIAIOSFODNN7EXAMPLE:Yxg83MZaEgh3OZ3l0rLo5RTX11o=
For large objects, use range-based GETs
align your get ranges with your parts
For content distribution, enable Amazon CloudFront
• Caches objects at the edge
• Low latency data transfer to end user
SQL Query on S3
Amazon Athena
• No loading of data
• Serverless
• Supports text, CSV, TSV, JSON, AVRO, and columnar
formats such as Apache ORC and Apache Parquet
• Access via console or JDBC driver
• $5 per TB scanned from S3
Getting Started – Athena with console
Query your S3 data using SQL
Run time
and data
scanned
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
Use a key-naming scheme with randomness at the beginning for high
TPS
• Most important if you regularly exceed 100 TPS on a bucket
• Avoid starting with a date or monotonically increasing numbers
Don’t do this…
Higher TPS by distributing key names
Distributing key names
Add randomness to the beginning of the key name
with a hash or reversed timestamp (ssmmhhddmmyy)
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
Best Practices - performance
 Faster upload over long distances
with S3 Transfer Acceleration
 Faster upload for large objects
with S3 multipart upload
 Optimize GET performance with
Range GET and CloudFront
 SQL Query on S3 with Athena
 Distribute key name for high TPS
workload
Topics
 Pick the right storage class for your use cases
 Automate management tasks
 Best practices to optimize S3 performance
 Tools to help you manage storage
Organize your data with object tags
Manage data based on what it is as opposed to where its located
• Classify your data, up to 10 tags per object
• Tag your objects with key-value pairs
• Write policies once based on the type of data
• Put object with tag or add tag to existing objects
Storage metrics
& analytics
Lifecycle policyAccess control
Manage access with object tags
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "X"}}
}
]
}
User permission by tags
Use cases:
• Perform security analysis
• Meet your IT auditing and compliance needs
• Take immediate action on activity
How it works:
• Capture S3 object-level requests
• Enable at the bucket level
• Logs delivered to your S3 bucket
• $0.10 per 100,000 data events
Audit and monitor access
AWS CloudTrail data events
Monitor performance and operation
Amazon CloudWatch metrics for S3
• Generate metrics for data of your choice
• Entire bucket, prefixes, and tags
• Up to 1,000 groups per bucket
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
• $0.30 per metric per month
CloudWatch Metrics for S3
Metric Name value
AllRequests Count
PutRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
PostRequests Count
Metric Name value
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms
Example
S3 Inventory
Save time Daily or weekly delivery Delivery to S3 bucketCSV File Output
Use case: trigger business workflows and applications such as secondary index garbage collection, data
auditing, and offline analytics
• More information about your objects than provided by LIST API, such as replication status, multipart
upload flag and delete marker
• Simple pricing: $0.0025 per million objects listed
S3 Inventory
Eventually consistent rolling snapshot
• New objects may not be listed
• Removed objects may still be included
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version ID of the object
Is Latest boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker boolean true if object is a delete marker of a versioned object, otherwise false
Size long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String eTag in HEX encoded format
StorageClass String Valid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA. UTF-8 encoded.
Multipart Uploaded boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
Validate before you act!
• Use HEAD OBJECT
John Elliott
Pinterest Infrastructure
45
100+ billion pins
categorized by people into more than
2.6 billion boards
4
6
80+ terabytes of new
data...every day
Almost entirely log data...
Over 150 petabytes
of data
47
S3 Growth
49
Storage Growth
YTD 60%
12 Months 86%
Since Jan ‘14 1,467%
S3 Data Structure
50
Level 1 Level 2 Level 3 Level 4
Bucket/ Application/ Table Name/ dt=2017-04-13/
Inventory Job
Operations Job Efficiency Job
● Count object sizes and read API log
● Join data sets to determine object access
activity in order to make tiering decisions
S3 API
logs
Rollup Job
Efficiency
Report
S3 bucket
listing
Old Data Flow 6hr runtime
● S3 Inventory report allows full bucket
inventory and operations data
● S3 Analytics provides much needed data on
object age and access patterns
Rollup Job S3
Analytics
S3
Inventory
Report
New Data Flow 20 min runtime
Setting up Inventory Analysis for S3
DEMO
Enable
Inventory
Process
Daily Files
Discover
Interesting
Prefixes
Storage
Analytics
Lifecycle
Policy
Summary – manage your storage
 Classify storage and manage access with S3 object tags
 Audit and monitor access with CloudTrail
 Monitor operational performance and set alarm with S3
CloudWatch metrics
 Use Inventory and discover interesting prefixes to dive
deeper on
Recap
 Pick the right storage class for your use cases
 Automate management tasks
 Best practices to optimize S3 performance
 Tools to help you manage storage
Thank you!
Enable
Inventory
Enable
Inventory
Enable
Inventory
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Process
Daily
Files
Discover
Interesting
Prefixes
Discover
Interesting
Prefixes
Discover
Interesting
Prefixes
Discover
Interesting
Prefixes

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
ENT302 Deep Dive on AWS Management Tools
ENT302 Deep Dive on AWS Management Tools ENT302 Deep Dive on AWS Management Tools
ENT302 Deep Dive on AWS Management Tools
 
Policy Ninja
Policy NinjaPolicy Ninja
Policy Ninja
 
Sec301 Security @ (Cloud) Scale
Sec301 Security @ (Cloud) ScaleSec301 Security @ (Cloud) Scale
Sec301 Security @ (Cloud) Scale
 
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
AWS re:Invent 2016: Born in the Cloud; Built Like a Startup (ARC205)
 
Stream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdaysStream Processing in SmartNews #jawsdays
Stream Processing in SmartNews #jawsdays
 
SEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) ScaleSEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) Scale
 
SRV408 Deep Dive on AWS IoT
SRV408 Deep Dive on AWS IoTSRV408 Deep Dive on AWS IoT
SRV408 Deep Dive on AWS IoT
 
ENT308 Best Practices for Microsoft Architectures on AWS
ENT308 Best Practices for Microsoft Architectures on AWSENT308 Best Practices for Microsoft Architectures on AWS
ENT308 Best Practices for Microsoft Architectures on AWS
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
Running Relational Databases on AWS
Running Relational Databases on AWS  Running Relational Databases on AWS
Running Relational Databases on AWS
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
ENT401 Deep Dive with Amazon EC2 Systems Manager
ENT401 Deep Dive with Amazon EC2 Systems ManagerENT401 Deep Dive with Amazon EC2 Systems Manager
ENT401 Deep Dive with Amazon EC2 Systems Manager
 
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum EfficiencyDeploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 

Similar a SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier

Similar a SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier (20)

Deep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksDeep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech Talks
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
 
Builders' Day - Best Practises for S3 - BL
Builders' Day - Best Practises for S3 - BLBuilders' Day - Best Practises for S3 - BL
Builders' Day - Best Practises for S3 - BL
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
AWS re:Invent 2016: Workshop: AWS S3 Deep-Dive Hands-On Workshop: Deploying a...
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
AWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field Experience
AWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field ExperienceAWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field Experience
AWS April 2016 Webinar Series - S3 Best Practices - A Decade of Field Experience
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Storage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierStorage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon Glacier
 
Deep Dive on Amazon S3
Deep Dive on Amazon S3Deep Dive on Amazon S3
Deep Dive on Amazon S3
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Object Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon GlacierObject Storage: Amazon S3 and Amazon Glacier
Object Storage: Amazon S3 and Amazon Glacier
 
Visualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech Talks
Visualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech TalksVisualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech Talks
Visualizing Amazon S3 Storage Management with QuickSight - AWS Online Tech Talks
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tom Johnston, S3 Product Management, AWS Tom Fuller, Senior Solutions Architect, AWS John Elliott, Infrastructure Engineering, Pinterest April 19, 2017 Deep Dive on Object Storage Amazon S3 and Amazon Glacier
  • 2. Cloud Data Migration Direct Connect Snow* data transport family 3rd Party Connectors Transfer Acceleration Storage Gateway Amazon Kinesis Firehose The AWS Storage Portfolio Object Amazon GlacierAmazon S3 Block Amazon EBS (persistent) Amazon EC2 Instance Store (ephemeral) File Amazon EFS
  • 3. What to Expect from the Session • Pick the right storage class for your use cases • Automate management tasks • Best practices to optimize S3 performance • Tools to help you manage storage
  • 4. AWS Direct Connect AWS Snowball ISV Connectors Amazon Kinesis Firehose S3 Transfer Acceleration AWS Storage Gateway Data transfer into Amazon S3 AWS Snowmobile AWS Snowball Edge
  • 5. Amazon Storage Partner Solutions aws.amazon.com/backup-recovery/partner-solutions/ Note: Represents a sample of storage partners Backup and RecoveryPrimary Storage Archive Solutions that leverage file, block, object, and streamed data formats as an extension to on-premises storage Solutions that leverage Amazon S3 for durable data backup Solutions that leverage Amazon Glacier for durable and cost-effective long-term data backup
  • 6. Choice of storage classes on S3 Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier
  • 7. Storage classes designed for your use case S3 Standard • Big data analysis • Content distribution • Static website hosting Standard - IA • Backup & archive • Disaster recovery • File sync & share • Long-retained data Amazon Glacier • Long term archives • Digital preservation • Magnetic tape replacement
  • 8. When should you move to Standard-IA? S3 Analytics - storage class analysis • Visualize the access pattern on your data over time • Measure the object age where data is infrequently accessed • Dive deep by bucket, prefixes, or specific object tag • Easily create a lifecycle policy based on the analysis
  • 10.
  • 11. Export S3 Analytics to the tools of your choice
  • 12.
  • 13.  Pick the right storage class for your use cases  Automate management tasks • Best practices to optimize S3 performance • Tools to help you manage storage
  • 14. Automate data management Lifecycle policies • Automatic tiering and cost controls • Includes two possible actions: • Transition: archives to Standard - IA or Amazon Glacier based on object age you specified • Expiration: deletes objects after specified time • Actions can be combined • Set policies by bucket, prefix, or tags • Set policies for current version or non- current versions Lifecycle policies
  • 15. Set up a lifecycle policy on the AWS Management Console
  • 16.
  • 17.
  • 18. Protect your data from accidental deletes • Protects from unintended user deletes or application logic failures • New version with every upload • Easy retrieval of deleted objects and roll back to previous versions Best Practice Versioning
  • 19. Easily recover from unintended delete Tip: Create a recycle bin for your storage Best Practice
  • 20. Automate with trigger-based workflow Amazon S3 event notifications Events SNS topic SQS queue Lambda function • Notification when objects are created via PUT, POST, Copy, Multipart Upload, or DELETE • Filter on prefixes and suffixes • Trigger workflow with Amazon SNS, Amazon SQS, and AWS Lambda functions
  • 21. Cross-region replication Automated, fast, and reliable asynchronous replication of data across AWS Regions Use cases: • Compliance - store data hundreds of miles apart • Lower latency - distribute data to regional customers • Security - create remote replicas managed by separate AWS accounts How it works: • Only replicates new PUTs. Once configured, all new uploads into source bucket will be replicated • Entire bucket or prefix based • 1:1 replication between any 2 regions • Versioning required • Deletes and lifecycle actions are not replicated
  • 22. Summary – automate management tasks Cross-region replication Automate transition and expiration with lifecycle policies Trigger-based workflow with event notification Easily recover from accidental delete with versioning
  • 23. Topics  Pick the right storage class for your use cases  Automate management tasks  Best practices to optimize S3 performance • Tools to help you manage storage
  • 24. Faster upload over long distances S3 Transfer Acceleration S3 Bucket AWS Edge Location Uploader Optimized Throughput! Change your endpoint, not your code No firewall changes or client software Longer distance, larger files, more benefit Faster or free 68 global edge locations Try it at S3speedtest.com
  • 25. Faster upload of large objects Parallelize PUTs with multipart uploads • Increase aggregate throughput by parallelizing PUTs on high-bandwidth networks • Move the bottleneck to the network, where it belongs • Increase resiliency to network errors; fewer large restarts on error-prone networks Best Practice
  • 26. Faster download You can parallelize GETs as well as PUTs GET /example-object HTTP/1.1 Host: example-bucket.s3.amazonaws.com x-amz-date: Fri, 28 Jan 2016 21:32:02 GMT Range: bytes=0-9 Authorization: AWS AKIAIOSFODNN7EXAMPLE:Yxg83MZaEgh3OZ3l0rLo5RTX11o= For large objects, use range-based GETs align your get ranges with your parts For content distribution, enable Amazon CloudFront • Caches objects at the edge • Low latency data transfer to end user
  • 27. SQL Query on S3 Amazon Athena • No loading of data • Serverless • Supports text, CSV, TSV, JSON, AVRO, and columnar formats such as Apache ORC and Apache Parquet • Access via console or JDBC driver • $5 per TB scanned from S3
  • 28. Getting Started – Athena with console
  • 29. Query your S3 data using SQL Run time and data scanned
  • 30. <my_bucket>/2013_11_13-164533125.jpg <my_bucket>/2013_11_13-164533126.jpg <my_bucket>/2013_11_13-164533127.jpg <my_bucket>/2013_11_13-164533128.jpg <my_bucket>/2013_11_12-164533129.jpg <my_bucket>/2013_11_12-164533130.jpg <my_bucket>/2013_11_12-164533131.jpg <my_bucket>/2013_11_12-164533132.jpg <my_bucket>/2013_11_11-164533133.jpg Use a key-naming scheme with randomness at the beginning for high TPS • Most important if you regularly exceed 100 TPS on a bucket • Avoid starting with a date or monotonically increasing numbers Don’t do this… Higher TPS by distributing key names
  • 31. Distributing key names Add randomness to the beginning of the key name with a hash or reversed timestamp (ssmmhhddmmyy) <my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg
  • 32. Best Practices - performance  Faster upload over long distances with S3 Transfer Acceleration  Faster upload for large objects with S3 multipart upload  Optimize GET performance with Range GET and CloudFront  SQL Query on S3 with Athena  Distribute key name for high TPS workload
  • 33. Topics  Pick the right storage class for your use cases  Automate management tasks  Best practices to optimize S3 performance  Tools to help you manage storage
  • 34. Organize your data with object tags Manage data based on what it is as opposed to where its located • Classify your data, up to 10 tags per object • Tag your objects with key-value pairs • Write policies once based on the type of data • Put object with tag or add tag to existing objects Storage metrics & analytics Lifecycle policyAccess control
  • 35. Manage access with object tags { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*" "Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "X"}} } ] } User permission by tags
  • 36. Use cases: • Perform security analysis • Meet your IT auditing and compliance needs • Take immediate action on activity How it works: • Capture S3 object-level requests • Enable at the bucket level • Logs delivered to your S3 bucket • $0.10 per 100,000 data events Audit and monitor access AWS CloudTrail data events
  • 37. Monitor performance and operation Amazon CloudWatch metrics for S3 • Generate metrics for data of your choice • Entire bucket, prefixes, and tags • Up to 1,000 groups per bucket • 1-minute CloudWatch metrics • Alert and alarm on metrics • $0.30 per metric per month
  • 38.
  • 39. CloudWatch Metrics for S3 Metric Name value AllRequests Count PutRequests Count GetRequests Count ListRequests Count DeleteRequests Count HeadRequests Count PostRequests Count Metric Name value BytesDownloaded MB BytesUploaded MB 4xxErrors Count 5xxErrors Count FirstByteLatency ms TotalRequestLatency ms
  • 41. S3 Inventory Save time Daily or weekly delivery Delivery to S3 bucketCSV File Output Use case: trigger business workflows and applications such as secondary index garbage collection, data auditing, and offline analytics • More information about your objects than provided by LIST API, such as replication status, multipart upload flag and delete marker • Simple pricing: $0.0025 per million objects listed
  • 42. S3 Inventory Eventually consistent rolling snapshot • New objects may not be listed • Removed objects may still be included Name Value Type Description Bucket String Bucket name. UTF-8 encoded. Key String Object key name. UTF-8 encoded. Version Id String Version ID of the object Is Latest boolean true if object is the latest version (current version) of a versioned object, otherwise false Delete Marker boolean true if object is a delete marker of a versioned object, otherwise false Size long Object size in bytes Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ ETag String eTag in HEX encoded format StorageClass String Valid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA. UTF-8 encoded. Multipart Uploaded boolean true if object is uploaded by using multipart, otherwise false Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded. Validate before you act! • Use HEAD OBJECT
  • 44. 100+ billion pins categorized by people into more than 2.6 billion boards 4 6
  • 45. 80+ terabytes of new data...every day Almost entirely log data... Over 150 petabytes of data 47
  • 46.
  • 47. S3 Growth 49 Storage Growth YTD 60% 12 Months 86% Since Jan ‘14 1,467%
  • 48. S3 Data Structure 50 Level 1 Level 2 Level 3 Level 4 Bucket/ Application/ Table Name/ dt=2017-04-13/
  • 49. Inventory Job Operations Job Efficiency Job ● Count object sizes and read API log ● Join data sets to determine object access activity in order to make tiering decisions S3 API logs Rollup Job Efficiency Report S3 bucket listing Old Data Flow 6hr runtime
  • 50. ● S3 Inventory report allows full bucket inventory and operations data ● S3 Analytics provides much needed data on object age and access patterns Rollup Job S3 Analytics S3 Inventory Report New Data Flow 20 min runtime
  • 51. Setting up Inventory Analysis for S3 DEMO Enable Inventory Process Daily Files Discover Interesting Prefixes Storage Analytics Lifecycle Policy
  • 52. Summary – manage your storage  Classify storage and manage access with S3 object tags  Audit and monitor access with CloudTrail  Monitor operational performance and set alarm with S3 CloudWatch metrics  Use Inventory and discover interesting prefixes to dive deeper on
  • 53. Recap  Pick the right storage class for your use cases  Automate management tasks  Best practices to optimize S3 performance  Tools to help you manage storage