Supercharging the Value of Your Data with Amazon S3

Supercharging the Value of Your Data with S3

Amazon S3 Transfer Acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Typically 50%–300% faster
Change your endpoint, not your code
54 global edge locations
No firewall exceptions
No client software required

Amazon
Route 53
Resolve
b1.s3-accelerate.amazonaws.com
HTTPS PUT/POST
upload_files.zip
HTTP/S PUT/POST
“upload_files.zip”
Service traffic flow
Client to S3 Bucket example
S3 Bucket
b1.s3-accelerate.amazonaws.com
EC2 Proxy
AWS Region
AWS Edge Location
Customer Client
1
2
3
4

Rio De
Janeiro
Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los
Angeles
Seattle Tokyo Singapore
Time[hrs]
500 GB upload from these edge locations to a bucket in Singapore
Public Internet
How fast is S3 Transfer Acceleration?
S3 Transfer Acceleration
On average, we have seen 171%
improvement over regular S3 when
uploading over long distances

Cross-Region
Replication Lifecycle
Policy
Data
Classification
& Management
Event
Notifications
CloudWatch Metrics S3 Inventory Audit with CloudTrail
Data Events
Storage
Analytics
Standard Standard - Infrequent Access Amazon Glacier
Storage Management for S3

Understand your storage usage
S3 Inventory
Analyze Logs with
EMR S3 Analytics

S3 Inventory
Save time Daily or Weekly delivery Delivery to S3 bucketCSV File Output
Use case: trigger business workflows and applications such as secondary index garbage collection, data
auditing, and offline analytics
• More information about your objects than provided by LIST API such as replication status, multipart
upload flag and delete marker
• Simple pricing: $0.0025 per million objects listed

• Eventually consistent rolling snapshot
• New objects may not be listed
• Removed objects may still be included
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version Id of the object
Is Latest boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker boolean true if object is a delete marker of a versioned object, otherwise false
Size long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String eTag in HEX encoded format
StorageClass String Valid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA. UTF-8 encoded.
Multipart Uploaded boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
Validate before you act!
• Use HEAD OBJECT
S3 Inventory

S3 Analytics – Storage Class Analysis
Analyze buckets,
Prefixes or tags
$0.10 per million
objects analyzed
Storage Class
Analysis
&
Lifecycle
recommendation
Data driven storage management for S3
Export Analysis to
your S3 bucket

S3 Analytics – Storage Class Analysis

Monitor your storage
Monitor and Alert with
CloudWatch
Audit your storage with
CloudTrail
Server Access Logs

CloudWatch metrics for S3
• Operational & Performance
monitoring
• Generate metrics for data of
your choice
• Entire bucket, Prefixes, and
Tags
• Up to 1000 object groups
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
Metric Name Metric value
AllRequests Count
PutRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
PostRequests Count
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms

CloudTrail Data Events for S3
Use case: perform security analysis, meet your IT auditing and
compliance needs and take immediate action on object level activity
API logs for bucket level requests
• Creation/Deletion of Buckets
• Changes to Bucket Configuration (Bucket policy, lifecycle policies,
replication policies, etc.)
• SNS Notification for log file delivery (optional)

Manage your data
Cross Region
Replication
Lifecycle Policies Event
Notifications
Object-Level
Tags

Manage your data
Data Classification and Management
Manage data based on what it is as opposed to where its located
• Easy data management
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
Classification Lifecycle PolicyAccess Control
Pricing: $0.01 per 10,000 tags/mo

Deep dive on Tags
• Tags are key value pairs
• Maximum 10 tags per object
• Maximum key length—127 Unicode characters
• Maximum value length—255 Unicode characters
• Tag keys and values are case sensitive.
• LIST operation on Tags is eventually consistent
• 2 ways to put tags via API
• Put objects with tag parameter, or
• add tag API after object is created.

Summary: Storage Management for S3
Cross-Region
Replication
Lifecycle Policy Data Classification
& Management
Event
Notifications
Monitor and Alert with
CloudWatch
Daily Inventory Lists Audit with object level
CloudTrail Logs
Storage Analytics

Amazon Athena is an interactive query service that makes it easy to
analyze data in Amazon S3 using standard SQL
NEW: Amazon Athena
Hadoop, Spark
& Presto
Amazon EMR Amazon Redshift
Data Warehouse
Amazon QuickSight
Visualization
Amazon Athena
Ad Hoc S3 Queries

Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades

Amazon Athena is Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use the console Add Table wizard
• Start querying

Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Athena supports multiple data formats
• Text, CSV, TSV, JSON, weblogs, AWS service logs
• Or convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required
• Stream data directly from Amazon S3

Use ANSI SQL
• Start writing ANSI SQL
• Support for complex joins, nested
queries & window functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data by
any key
• (date, time, custom keys)
• e.g., Year, Month, Day, Hour or
Customer Key, Date

Familiar Technologies Under the Covers
• Used for SQL Queries
• In-memory distributed query engine
• ANSI-SQL compatible with extensions
• Used for DDL functionality
• Complex data types
• Multitude of formats
• Supports data partitioning

Amazon Athena is Fast
• Tuned for performance
• Automatically parallelizes
queries
• Results are streamed to console
• Results also stored in S3
• Improve Query performance
• Compress your data
• Use columnar formats

Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
• Save by using compression, columnar formats, partitions

• Any one looking to process data stored in Amazon S3
• Data coming IOT Devices, Apache Logs, Omniture logs, CF logs,
Application Logs
• Anyone who knows SQL
• Both developers or Analysts
• Ad-hoc exploration of data and data discovery
• Customers looking to build a data lake on Amazon S3
Who is Athena for ?

A Sample Pipeline
Ad-hoc access to raw data using SQL

A Sample Pipeline
Ad-hoc access to data using Athena
Athena can query
aggregated datasets as well

Running Queries is Simple
Run time
and data
scanned

Supercharging the Value of Your Data with Amazon S3

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Supercharging the Value of Your Data with Amazon S3

Similar a Supercharging the Value of Your Data with Amazon S3 (20)

Más de Amazon Web Services

Más de Amazon Web Services (20)

Último

Último (20)

Supercharging the Value of Your Data with Amazon S3