In this session, storage experts will walk you through the object storage offering, Amazon S3, a bulk data repository that can deliver 99.999999999% durability and scale past trillions of objects worldwide. Learn about the different ways you can accelerate data transfer to S3 and get a close look at some of the new tools available for you to secure and manage your data more efficiently. Announced at re:Invent 2016, see how you can use Amazon Athena with S3 to run serverless analytics on your data and as a bonus, walk away with some code snippets to use with S3. Hear AWS customers talk about the solutions they have built with S3 to turn their data into a strategic asset, instead of just a cost center. And bring your toughest questions to our experts on hand and walk away that much smarter on how to use object storage from AWS.
3. Amazon S3 Transfer Acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Typically 50%–300% faster
Change your endpoint, not your code
54 global edge locations
No firewall exceptions
No client software required
5. Rio De
Janeiro
Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los
Angeles
Seattle Tokyo Singapore
Time[hrs]
500 GB upload from these edge locations to a bucket in Singapore
Public Internet
How fast is S3 Transfer Acceleration?
S3 Transfer Acceleration
On average, we have seen 171%
improvement over regular S3 when
uploading over long distances
9. S3 Inventory
Save time Daily or Weekly delivery Delivery to S3 bucketCSV File Output
Use case: trigger business workflows and applications such as secondary index garbage collection, data
auditing, and offline analytics
• More information about your objects than provided by LIST API such as replication status, multipart
upload flag and delete marker
• Simple pricing: $0.0025 per million objects listed
10. • Eventually consistent rolling snapshot
• New objects may not be listed
• Removed objects may still be included
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version Id of the object
Is Latest boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker boolean true if object is a delete marker of a versioned object, otherwise false
Size long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String eTag in HEX encoded format
StorageClass String Valid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA. UTF-8 encoded.
Multipart Uploaded boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
Validate before you act!
• Use HEAD OBJECT
S3 Inventory
11. S3 Analytics – Storage Class Analysis
Analyze buckets,
Prefixes or tags
$0.10 per million
objects analyzed
Storage Class
Analysis
&
Lifecycle
recommendation
Data driven storage management for S3
Export Analysis to
your S3 bucket
15. CloudWatch metrics for S3
• Operational & Performance
monitoring
• Generate metrics for data of
your choice
• Entire bucket, Prefixes, and
Tags
• Up to 1000 object groups
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
Metric Name Metric value
AllRequests Count
PutRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
PostRequests Count
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms
16. CloudTrail Data Events for S3
Use case: perform security analysis, meet your IT auditing and
compliance needs and take immediate action on object level activity
API logs for bucket level requests
• Creation/Deletion of Buckets
• Changes to Bucket Configuration (Bucket policy, lifecycle policies,
replication policies, etc.)
• SNS Notification for log file delivery (optional)
17. Manage your data
Cross Region
Replication
Lifecycle Policies Event
Notifications
Object-Level
Tags
18. Manage your data
Data Classification and Management
Manage data based on what it is as opposed to where its located
• Easy data management
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
Classification Lifecycle PolicyAccess Control
Pricing: $0.01 per 10,000 tags/mo
19. Deep dive on Tags
• Tags are key value pairs
• Maximum 10 tags per object
• Maximum key length—127 Unicode characters
• Maximum value length—255 Unicode characters
• Tag keys and values are case sensitive.
• LIST operation on Tags is eventually consistent
• 2 ways to put tags via API
• Put objects with tag parameter, or
• add tag API after object is created.
20. Summary: Storage Management for S3
Cross-Region
Replication
Lifecycle Policy Data Classification
& Management
Event
Notifications
Monitor and Alert with
CloudWatch
Daily Inventory Lists Audit with object level
CloudTrail Logs
Storage Analytics
22. Amazon Athena is an interactive query service that makes it easy to
analyze data in Amazon S3 using standard SQL
NEW: Amazon Athena
Hadoop, Spark
& Presto
Amazon EMR Amazon Redshift
Data Warehouse
Amazon QuickSight
Visualization
Amazon Athena
Ad Hoc S3 Queries
23. Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
24. Amazon Athena is Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use the console Add Table wizard
• Start querying
25. Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Athena supports multiple data formats
• Text, CSV, TSV, JSON, weblogs, AWS service logs
• Or convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required
• Stream data directly from Amazon S3
26. Use ANSI SQL
• Start writing ANSI SQL
• Support for complex joins, nested
queries & window functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data by
any key
• (date, time, custom keys)
• e.g., Year, Month, Day, Hour or
Customer Key, Date
27. Familiar Technologies Under the Covers
• Used for SQL Queries
• In-memory distributed query engine
• ANSI-SQL compatible with extensions
• Used for DDL functionality
• Complex data types
• Multitude of formats
• Supports data partitioning
28. Amazon Athena is Fast
• Tuned for performance
• Automatically parallelizes
queries
• Results are streamed to console
• Results also stored in S3
• Improve Query performance
• Compress your data
• Use columnar formats
29. Amazon Athena is Cost Effective
• Pay per query
• $5 per TB scanned from S3
• DDL Queries and failed queries are free
• Save by using compression, columnar formats, partitions
30. • Any one looking to process data stored in Amazon S3
• Data coming IOT Devices, Apache Logs, Omniture logs, CF logs,
Application Logs
• Anyone who knows SQL
• Both developers or Analysts
• Ad-hoc exploration of data and data discovery
• Customers looking to build a data lake on Amazon S3
Who is Athena for ?