SlideShare una empresa de Scribd logo
1 de 17
Amazon Simple
Storage Service (S3)
0
Outline
• Concepts
• How to Access S3
• Scalability
• Availability
• Versioning
• Consistency
1
Concepts of S3 – Bucket
• Container for objects stored in S3
• Unlimited size
• Organize the namespace at the highest level
• Internet accessible storage via HTTP/HTTPS
• Global unique name
From: http://techblogsearch.com/a/amazon-announced-new-s3-usability-enhancements.html
http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl
2
Concepts of S3 – Object
• Similar to files.
• No hierarchy.
• Objects are immutable.
• Size up to 5 TB
• Uniquely identified within a bucket by a
key(name) and a version ID
http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl
3
Concepts of S3 – Key
• Name of an object
• Unique identifier for an object within a bucket
• Use the object key to retrieve the object
http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl
4
Concepts of S3 – Namespace
Globally Unique
Bucket Name + Object Name (key)
5
How to Access S3 – CLI & REST
• AWS Command
 Put Object
aws s3api put-object bucket key
 Get Object
aws s3api get-object bucket key outfile
• REST API
 Put Object
 Get Object
From: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html
6
How to Access S3 – AWS SDK
SKD for Java
• Get Object
AmazonS3 s3Client = new AmazonS3Client(new
ProfileCredentialsProvider());
S3Object object = s3Client.getObject(new
GetObjectRequest(bucketName, key));
• Put Object
AmazonS3 s3client = new AmazonS3Client(new
ProfileCredentialsProvider());
s3client.putObject(new
PutObjectRequest(bucketName, keyName, file));
From: http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html
7
CAP Theorem
In the presence of a network partition,
strong consistency and high availability
can not be achieved at the same time
Amazon S3 (AP)
• High availability
• Eventual consistency
• Read-after-write consistency
From: http://berb.github.io/diploma-thesis/original/061_challenge.html
8
Scalability – Consistent Hashing
Avoid re-hashing everything
• Object key mapped to a point on the edge of
ring
• Map machine to many random points on the
edge of ring
• Machine responsible for arc before its nodes
• If a machine joins, it adds new nodes
• If a machine leaves, its range moves to after
machine
9
Availability – Replication
Pick n next nodes (different machines!)
n -- Replication factor
If one node fails, the other nodes still
have the key information
10
• keep multiple versions of an object in one bucket
• protect objects from unintended overwrites and deletions
• retrieve previous versions of them
• Write faster
– DELETE doesn’t wipe
– GET will return not found
- PUT doesn’t overwrite: pushes version
- GET returns most recent version
From: http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#overview
Versioning – Basics
11
Versioning
Version evolution of an object over time
--D: object
--Sx,Sy,Sz: nodes
--[Sx, 1]: vector clock [node, count]
• One vector clock is associate with every version of every
object
• Determine the versions of an object by examining their
vector clocks
• If the counters on D1’s clock is less than or equal to all of
the nodes in the second clock, D1 is ancestor of the
second, can be forgotten
• Otherwise, the two changes require reconciliation
Using vector clock to capture causality between different versions of the same object
D1([Sx, 1])
D2([Sx, 2])
D3([Sx, 2], [Sy, 1]) D4([Sx, 2], [Sz, 1])
D3 ? D4
12
Consistency
Read-after-write consistency
• PUTS of new objects in S3 bucket
• Guarantee visibility of new data to all applications
• Allow you to retrieve objects immediately after creation
Eventual consistency
• Overwrite PUTS and DELETES of objects in S3 bucket
• Weak consistency, to achieve high availability
• If no additional updates are made to a given data item, all
reads to that item will eventually return the same value.
Updates to a single key are atomic
13
Eventual Consistency
From: http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
Both W1 (write 1) and W2 (write 2) complete
before the start of R1 (read 1) and R2 (read 2)
o Consistent read
R1 and R2 both return color = ruby.
o Eventually consistent read
R1 and R2 might return color = red, color =
ruby, or no results, depending on the amount
of time that has elapsed.
14
Consistency Protocol
A consistency protocol similar to those used in quorum system is used
to maintain consistency among its replicas.
•This protocol has two key configurable values: R and W.
 R is the minimum number of nodes that must participate in a successful read
operation.
 W is the minimum number of nodes that must participate in a successful write
operation.
•N is the total number of nodes. Setting R and W.
•R+W ? N
15
Reference
[1] Giuseppe DeCandia, Deniz Hastorun, etc. Dynamo: Amazon’s Highly Available Key-
value Store
[2] SHLOMO SWIDLER, Read-After-Write Consistency in Amazon S3
[3] Amazon Simple Storage Service Documentation
[4] Peter Bailis, Ali Ghodsi UC Berkeley Eventual consistency today: limitations, extensions,
and beyond
[5] https://en.wikipedia.org/wiki/CAP_theorem
[6] Mayur Palankar, Ayodele Onibokun, etc. Amazon S3 for Science Grids: a Viable
Solution?
[7] David Bermbach and Stefan Tai Eventual Consistency: How soon is eventual?
[8] Jinesh Varia Cloud Architectures
[9] Garfinkel, Simson L. An Evaluation of Amazon’s Grid Computing Services: EC2, S3, and
SQS
[10] Amazon S3 Masterclass
[11] https://en.wikipedia.org/wiki/Consistent_hashing
16

Más contenido relacionado

Similar a Dynamo best practice - s3 security and t

Cassandra
CassandraCassandra
Cassandra
exsuns
 

Similar a Dynamo best practice - s3 security and t (20)

Freckle
FreckleFreckle
Freckle
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache Solr
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
ELK stack introduction
ELK stack introduction ELK stack introduction
ELK stack introduction
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Serverless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDBServerless Web Apps using API Gateway, Lambda and DynamoDB
Serverless Web Apps using API Gateway, Lambda and DynamoDB
 
Cassandra
CassandraCassandra
Cassandra
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Scality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup PresentationScality S3 Server: Node js Meetup Presentation
Scality S3 Server: Node js Meetup Presentation
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
SFScon14: Schrödinger’s elephant: why PostgreSQL can solve all your database ...
SFScon14: Schrödinger’s elephant: why PostgreSQL can solve all your database ...SFScon14: Schrödinger’s elephant: why PostgreSQL can solve all your database ...
SFScon14: Schrödinger’s elephant: why PostgreSQL can solve all your database ...
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
 

Último

一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 

Último (20)

一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 

Dynamo best practice - s3 security and t

  • 2. Outline • Concepts • How to Access S3 • Scalability • Availability • Versioning • Consistency 1
  • 3. Concepts of S3 – Bucket • Container for objects stored in S3 • Unlimited size • Organize the namespace at the highest level • Internet accessible storage via HTTP/HTTPS • Global unique name From: http://techblogsearch.com/a/amazon-announced-new-s3-usability-enhancements.html http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl 2
  • 4. Concepts of S3 – Object • Similar to files. • No hierarchy. • Objects are immutable. • Size up to 5 TB • Uniquely identified within a bucket by a key(name) and a version ID http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl 3
  • 5. Concepts of S3 – Key • Name of an object • Unique identifier for an object within a bucket • Use the object key to retrieve the object http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl 4
  • 6. Concepts of S3 – Namespace Globally Unique Bucket Name + Object Name (key) 5
  • 7. How to Access S3 – CLI & REST • AWS Command  Put Object aws s3api put-object bucket key  Get Object aws s3api get-object bucket key outfile • REST API  Put Object  Get Object From: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html 6
  • 8. How to Access S3 – AWS SDK SKD for Java • Get Object AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider()); S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key)); • Put Object AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider()); s3client.putObject(new PutObjectRequest(bucketName, keyName, file)); From: http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html 7
  • 9. CAP Theorem In the presence of a network partition, strong consistency and high availability can not be achieved at the same time Amazon S3 (AP) • High availability • Eventual consistency • Read-after-write consistency From: http://berb.github.io/diploma-thesis/original/061_challenge.html 8
  • 10. Scalability – Consistent Hashing Avoid re-hashing everything • Object key mapped to a point on the edge of ring • Map machine to many random points on the edge of ring • Machine responsible for arc before its nodes • If a machine joins, it adds new nodes • If a machine leaves, its range moves to after machine 9
  • 11. Availability – Replication Pick n next nodes (different machines!) n -- Replication factor If one node fails, the other nodes still have the key information 10
  • 12. • keep multiple versions of an object in one bucket • protect objects from unintended overwrites and deletions • retrieve previous versions of them • Write faster – DELETE doesn’t wipe – GET will return not found - PUT doesn’t overwrite: pushes version - GET returns most recent version From: http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#overview Versioning – Basics 11
  • 13. Versioning Version evolution of an object over time --D: object --Sx,Sy,Sz: nodes --[Sx, 1]: vector clock [node, count] • One vector clock is associate with every version of every object • Determine the versions of an object by examining their vector clocks • If the counters on D1’s clock is less than or equal to all of the nodes in the second clock, D1 is ancestor of the second, can be forgotten • Otherwise, the two changes require reconciliation Using vector clock to capture causality between different versions of the same object D1([Sx, 1]) D2([Sx, 2]) D3([Sx, 2], [Sy, 1]) D4([Sx, 2], [Sz, 1]) D3 ? D4 12
  • 14. Consistency Read-after-write consistency • PUTS of new objects in S3 bucket • Guarantee visibility of new data to all applications • Allow you to retrieve objects immediately after creation Eventual consistency • Overwrite PUTS and DELETES of objects in S3 bucket • Weak consistency, to achieve high availability • If no additional updates are made to a given data item, all reads to that item will eventually return the same value. Updates to a single key are atomic 13
  • 15. Eventual Consistency From: http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html Both W1 (write 1) and W2 (write 2) complete before the start of R1 (read 1) and R2 (read 2) o Consistent read R1 and R2 both return color = ruby. o Eventually consistent read R1 and R2 might return color = red, color = ruby, or no results, depending on the amount of time that has elapsed. 14
  • 16. Consistency Protocol A consistency protocol similar to those used in quorum system is used to maintain consistency among its replicas. •This protocol has two key configurable values: R and W.  R is the minimum number of nodes that must participate in a successful read operation.  W is the minimum number of nodes that must participate in a successful write operation. •N is the total number of nodes. Setting R and W. •R+W ? N 15
  • 17. Reference [1] Giuseppe DeCandia, Deniz Hastorun, etc. Dynamo: Amazon’s Highly Available Key- value Store [2] SHLOMO SWIDLER, Read-After-Write Consistency in Amazon S3 [3] Amazon Simple Storage Service Documentation [4] Peter Bailis, Ali Ghodsi UC Berkeley Eventual consistency today: limitations, extensions, and beyond [5] https://en.wikipedia.org/wiki/CAP_theorem [6] Mayur Palankar, Ayodele Onibokun, etc. Amazon S3 for Science Grids: a Viable Solution? [7] David Bermbach and Stefan Tai Eventual Consistency: How soon is eventual? [8] Jinesh Varia Cloud Architectures [9] Garfinkel, Simson L. An Evaluation of Amazon’s Grid Computing Services: EC2, S3, and SQS [10] Amazon S3 Masterclass [11] https://en.wikipedia.org/wiki/Consistent_hashing 16