SlideShare a Scribd company logo
1 of 67
Querying and analyzing
data in Amazon S3
• April, 2017
• Dario Rivera, Solutions Architect, AWS
You can find this presentation here: http://tinyurl.com/sfloft-bigdataday-2017-ws1
Your Big Data Application Architecture
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Ad-hoc analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
What is qwikLABS?
• Provides access to AWS services for this bootcamp
• No need to provide a credit card
• Automatically deleted when you’re finished
http://events-aws.qwiklab.com
• Create an account with the same email that you used to register for this
bootcamp
Sign in and start the lab
Once the lab is started you will see a “Create in Progress” message in the upper
right hand corner.
Navigating qwikLABS
Connect tab: Access and login information
Addl Info tab: Links to Interfaces
Lab Instruction tab:
Scripts for your labs
Everything you need for the lab
• Open AWS Console, login and verify the following AWS resources are
created:
• One Amazon EMR Cluster
• One Amazon Redshift Cluster
• Sign up (later) for
• Amazon QuickSight
Activity 1
Deliver Log Files to
Redshift
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; starts at $0.25/hour
Amazon
Redshift
a lot faster
a lot simpler
a lot cheaper
Amazon Redshift architecture
• Leader Node
Simple SQL end point
Stores metadata
Optimizes query plan
Coordinates query execution
• Compute Nodes
Local columnar storage
Parallel/distributed execution of all queries, loads, backups,
restores, resizes
• Start at just $0.25/hour, grow to 2 PB (compressed)
DC1: SSD; scale from 160 GB to 326 TB
DS2: HDD; scale from 2 TB to 2 PB
Ingestion/Backup
Backup
Restore
JDBC/ODBC
10 GigE
(HPC)
Benefit #1: Amazon Redshift is fast
• Parallel and Distributed
Query
Load
Export
Backup
Restore
Resize
Benefit #2: Amazon Redshift is fully managed
Continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
Benefit #3: Security is built-in
• Load encrypted from S3
• SSL to secure data in transit
• ECDHE perfect forward security
• Amazon VPC for network isolation
• Encryption to secure data at rest
• All blocks on disks & in Amazon S3 encrypted
• Block key, Cluster key, Master key (AES-256)
• On-premises HSM & AWS CloudHSM support
• Audit logging and AWS CloudTrail integration
• SOC 1/2/3, PCI-DSS, FedRAMP, BAA
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Benefit #4: Amazon Redshift is powerful
• Approximate functions
• User defined functions
• Machine Learning
• Data Science
Amazon ML
Benefit #5: Amazon Redshift has a large ecosystem
Data Integration Systems IntegratorsBusiness Intelligence
Activity 1: Deliver data to Redshift using s3 Copy Cmd
• Time: 5 minutes
• We are going to:
A. Connect to Redshift cluster and create a table to hold web logs data
B. COPY Data from S3 into Redshift
C. Run Queries against Recently S3 copied Data
Activity 1A: Connect to Amazon Redshift
• You can connect with pgweb
• Installed and configured for the Redshift Cluster
• Just navigate to pgweb and start interacting
Note: Click on the Addl. Info tab in qwikLABS and then open the pgWeb link in a
new window.
• Or, Use any JDBC/ODBC/libpq client
• Aginity Workbench for Amazon Redshift
• SQL Workbench/J
• DBeaver
• Datagrip
Activity 1B: Create table in Redshift
• Create table weblogs to capture the in-coming data from a Firehose delivery
stream
Note: You can download Redshift SQL code from qwikLabs. Click on the lab
instructions tab in qwikLABS and then download the Redshift SQL file.
Activity 1C: Deliver Data to Redshift from S3
• Run the Copy Command on Redshift to Load Data into wbelogs Table from S3
1. Remove last query from pgWeb
2. Run the below copy command (get access/secret key from qwiklabs connect
tab) in the query window
COPY weblogs
FROM 's3://bigdataworkshop-sfloft/processed/processed-logs-1.gz'
CREDENTIALS
'aws_access_key_id=<account_access_key>;aws_secret_access_key=<account_secret_key'
DELIMITER ','
REMOVEQUOTES
MAXERROR 0
GZIP;
Review: Amazon Redshift Test Queries
• Find distribution of response codes over days
• Count the number of 404 response codes
weblogs
weblogs
Review: Amazon Redshift Test Queries
• Show all requests paths with status “PAGE NOT FOUND
• Change ‘request_path’ to ‘request_uri’ in below query
weblogs
Interactive Querying with
Amazon Athena
Amazon
Athena
Interactive Query Service
• Query directly from
Amazon S3
• Use ANSI SQL
• Serverless
• Multiple Data Formats
• Cost Effective
Familiar Technologies Under the Covers
• Used for SQL Queries
• In-memory distributed query engine
• ANSI-SQL compatible with extensions
• Used for DDL functionality
• Complex data types
• Multitude of formats
• Supports data partitioning
Comparing performance and cost savings for
compression and columnar format
Dataset
Size on
Amazon S3
Query Run
time
Data Scanned Cost
Data stored as
text files
1 TB 236 seconds 1.15 TB $5.75
Data stored in
Apache
Parquet
format*
130 GB 6.78 seconds 2.51 GB $0.013
Savings /
Speedup
87% less with
Parquet
34x faster
99% less data
scanned
99.7% savings
(*compressed using Snappy compression)
https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-
athena/
Activity 2
Ad-hoc Querying with
Amazon Athena
Activity 2A: Interactive Querying with Athena
• From the AWS Management Console, click on All Services
Activity 2A: Interactive Querying with Athena
• Select Athena from the Analytics section and click on Get Started on the next
page
Activity 2A: Interactive Querying with Athena
• Dismiss the window for running the Athena tutorial.
• Dismiss any other tutorial window
Activity 2A: Interactive Querying with Athena
• Enter the SQL command to create a table as follows. The SQL ddl for this
exercise can be found on the Lab instructions tab in the file Athena.sql.
Please make sure to replace the <YOUR-KINESIS-FIREHOSE-DESTINATION-
BUCKET> with the bucket name ‘s3://bigdataworkshop-sfloft/raw/’
Activity 2A: Interactive Querying with Athena
• Notice that the table will be created in the sample database (sampledb). Click
on Run Query to create the table
Activity 2B: Interactive Querying with Athena
• The SQL ddl in the previous step creates a table in Athena based on the data
streamed from Kinesis Firehose to S3
• Select sampledb from the database section and click on the eye icon to sample a
few rows of the S3 data
Activity 2C: Interactive Querying with Athena
• Run interactive queries (copy SQL queries from Athena.sql under Lab
instructions) and see the results on the console
Activity 4D: Interactive Querying with Athena
• Optionally, you can save the results of a query to CSV by choosing the
file icon on the Results pane.
• You can also view the results of previous queries or queries that may take some
time to complete. Choose History then either search for your query or
choose View or Download to view or download the results of previous
completed queries. This also displays the status of queries that are currently
running.
Activity 2D: Interactive Querying with Athena
• Exercise: Query results are also stored in Amazon S3 in a bucket called aws-
athena-query-results-ACCOUNTID-REGION. Where can you can change the
default location in the console?
Data processing with
Amazon EMR
Storage
S3 (EMRFS), HDFS
YARN
Cluster Resource Management
Batch
MapReduce
Interactive
Tez
In Memory
Spark
Applications
Hive, Pig, Spark SQL/Streaming/ML, Mahout, Sqoop
HBase/Phoenix
Presto
Hue (SQL Interface/Metastore Management)
Zeppelin (Interactive Notebook)
Ganglia (Monitoring)
HiveServer2/Spark Thriftserver
(JDBC/ODBC)
Amazon EMR service
Amazon EMR release
Streaming
Flink
On-cluster UIs
Manage applicationsNotebooks
SQL editor, Workflow designer,
Metastore browser
Design and execute queries
and workloads
And more using
bootstrap actions!
The Hadoop ecosystem can run in Amazon EMR
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Easy to use Spot Instances
Spot Instances
for task nodes
Up to 90%
off Amazon EC2
on-demand
pricing
Meet SLA at predictable cost Exceed SLA at lower cost
Amazon S3 as your persistent data store
• Separate compute and storage
• Resize and shut down Amazon EMR
clusters with no data loss
• Point multiple Amazon EMR clusters at same
data in Amazon S3
EMR
EMR
Amazon
S3
EMRFS makes it easier to leverage S3
• Better performance and error handling options
• Transparent to applications – Use “s3://”
• Consistent view
• For consistent list and read-after-write for new puts
• Support for Amazon S3 server-side and client-side encryption
• Faster listing using EMRFS metadata
Apache Spark• Fast, general-purpose engine for large-
scale data processing
• Write applications quickly in Java, Scala,
or Python
• Combine SQL, streaming, and complex
analytics
Apache Zeppelin
• Web-based notebook for interactive
analytics
• Multiple language back end
• Apache Spark integration
• Data visualization
• Collaboration
https://zeppelin.incubator.apache.org/
Activity 3
Ad-hoc analysis using
Amazon EMR
Activity 3: Process and Query data using Amazon EMR
• Time: 20 minutes
• We are going to:
A. Use a Zeppelin Notebook to interact with Amazon EMR Cluster
B. Process the data delivered to Amazon S3 by Firehose using Apache Spark
C. Query the data processed in the earlier stage and create simple charts
Activity 3A: Open the Zeppelin interface
1. Click on the Lab Instructions tab in
qwikLABS and then download the
Zeppelin Notebook
2. Click on the Addl. Info tab in
qwikLABS and then open the
zeppelin link into a new window.
3. Import the Notebook using the
Import Note link on Zeppelin
interface
Using Zeppelin interface
Activity 3B: Run the notebook
• Enter the S3 bucket name where the logs are delivered by Kinesis Firehose. The
bucket name begins with bigdataworkshop-sfloft
• Execute Step 1
• Enter bucket name (bigdataworkshop-sfloft)
• Execute Step 2
• Change the ‘/*/*/*/*/*.gz’ post fix to be ‘/raw/*.gz’
• Create a Dataframe from the dataset delivered by Firehose
• Execute Step 3
• Sample a few rows
Activity 3B: Run the notebook
• Execute Step 4 to process the data
• Notice how the ‘REQUEST’ field consists of both the ’REQUEST
PROTOCOL’ and ‘REQUEST PATH’. Let’s fix that.
• Create a UDF that will split the column and add it to the Dataframe
• Print the new Dataframe
Activity 3B: Run the notebook
• Execute Step 6
• Register the data frame as a temporary tabl
• Now you can run SQL queries on the temporary tables.
• Execute the next 3 steps and observe the charts created
• What did you learn about the dataset?
Review : Ad-hoc analysis using Amazon EMR
• You just learned on how to process and query data using Amazon EMR with
Apache Spark
• Amazon EMR has many other frameworks available for you to use
• Hive, Presto, Flink, Pig, MapReduce
• Hue, Oozie, HBase
(Optional Exercise): Data
Visualization with Amazon
QuickSight
Fast, Easy Ad-Hoc Analytics for
Anyone, Everywhere
• Ease of use targeted at business users.
• Blazing fast performance powered by SPICE.
• Broad connectivity with AWS data services,
on-premises data, files and business
applications.
• Cloud-native solution that scales
automatically.
• 1/10th the cost of traditional BI solutions.
• Create, share and collaborate with anyone
in your organization, on the web or on
mobile.
Connect, SPICE, Analyze
QuickSight allows you to connect to data from a wide variety of AWS, third party,
and on-premise sources and import it to SPICE or query directly. Users can then
easily explore, analyze, and share their insights with anyone.
Amazon RDS
Amazon S3
Amazon Redshift
Activity 4
Visualize results in
QuickSight
Activity 4: Visualization with QuickSight
• We are going to:
A. Register for a QuickSight account
B. Connect to the Redshift Cluster
C. Create visualizations for analysis to answer questions like:
A. What are the most common http requests and how successful (response
code of 200) are they
B. Which are the most requested URIs
Activity 4A: QuickSight Registration
• Go to AWS Console, click on
QuickSight from the Analytics
section.
• Click on Signup in the next window
• Make sure the subscription type is
Standard and click Continue on the
next screen
Activity 4A: QuickSight Registration
• On the Subscription Type page, enter
the account name (see note below)
• Enter your email address
• Select US West region
• Check the S3 (all buckets) box
Note: QuickSight Account name is the
AWS account number from qwikLABS in
the Connect tab
Activity 4A: QuickSight Registration
• If a pop box to choose S3 buckets
appears, click Select buckets
• Click on Go To Amazon Quicksight
• Dismiss the next screen
Activity 4B: Connect to data source
• Click on Manage Data to
create a new data set in
QuickSight
• Choose Redshift (Auto-
discovered) as the data
source. QuickSight
autodiscovers databases
associated with your AWS
account (Redshift
database in this case)
Activity 4B: Connect to Amazon Redshift
Note: You can get the Redshift database
password from qwikLABS by navigating to
the “Custom Connection Details” section in
the Connect tab
Activity 4C: Choose your weblogs Redshift
table
Activity 4D: Ingest data into SPICE
• SPICE is Amazon QuickSight's in-
memory optimized calculation
engine, designed specifically for fast,
ad-hoc data visualization
• You can improve the performance of
database data sets by importing the
data into SPICE instead of using a
direct query to the database
Activity 4E: Creating your first analysis
• What are the most requested
http request types and their
corresponding response codes
for this site?
• Simply select request_type,
response_code and let
AUTOGRAPH create the
optimal visualization
Review – Creating your Analysis
• Exercise: Add a visual to demonstrate which URI are the most requested?
Your Big Data Application Architecture
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Ad-hoc analysis of
web logs
Amazon
Athena
Interactive querying of
web logs
Congratulations on building
your big data application on
AWS !!!

More Related Content

What's hot

Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data LakeCalum Miller
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Amazon Web Services
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkDatabricks
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing DataDatabricks
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 

What's hot (20)

Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Building a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache SparkBuilding a Feature Store around Dataframes and Apache Spark
Building a Feature Store around Dataframes and Apache Spark
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
Data engineering
Data engineeringData engineering
Data engineering
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 

Similar to Querying and Analyzing Data in Amazon S3

使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data AnalyticsAmazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveKevin Epstein
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsMarek Kuczynski
 
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 Amazon Web Services Korea
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksDeep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksAmazon Web Services
 
Deep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksDeep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksAmazon Web Services
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...Amazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18Neal Davis
 
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...Amazon Web Services
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAmazon Web Services
 

Similar to Querying and Analyzing Data in Amazon S3 (20)

使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
What's New with Big Data Analytics
What's New with Big Data AnalyticsWhat's New with Big Data Analytics
What's New with Big Data Analytics
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep Dive
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
 
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech TalksDeep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
Deep Dive on Amazon S3 - March 2017 AWS Online Tech Talks
 
Deep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech TalksDeep Dive on Amazon S3 - AWS Online Tech Talks
Deep Dive on Amazon S3 - AWS Online Tech Talks
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18
 
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
Optimizing the Data Tier for Serverless Web Applications - March 2017 Online ...
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Pooja Nehwal
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedDelhi Call girls
 

Recently uploaded (20)

If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 

Querying and Analyzing Data in Amazon S3

  • 1. Querying and analyzing data in Amazon S3 • April, 2017 • Dario Rivera, Solutions Architect, AWS You can find this presentation here: http://tinyurl.com/sfloft-bigdataday-2017-ws1
  • 2. Your Big Data Application Architecture Amazon EMR Amazon Redshift Amazon QuickSight Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Ad-hoc analysis of web logs Amazon Athena Interactive querying of web logs
  • 3. What is qwikLABS? • Provides access to AWS services for this bootcamp • No need to provide a credit card • Automatically deleted when you’re finished http://events-aws.qwiklab.com • Create an account with the same email that you used to register for this bootcamp
  • 4. Sign in and start the lab Once the lab is started you will see a “Create in Progress” message in the upper right hand corner.
  • 5. Navigating qwikLABS Connect tab: Access and login information Addl Info tab: Links to Interfaces Lab Instruction tab: Scripts for your labs
  • 6. Everything you need for the lab • Open AWS Console, login and verify the following AWS resources are created: • One Amazon EMR Cluster • One Amazon Redshift Cluster • Sign up (later) for • Amazon QuickSight
  • 7. Activity 1 Deliver Log Files to Redshift
  • 8. Relational data warehouse Massively parallel; Petabyte scale Fully managed HDD and SSD Platforms $1,000/TB/Year; starts at $0.25/hour Amazon Redshift a lot faster a lot simpler a lot cheaper
  • 9. Amazon Redshift architecture • Leader Node Simple SQL end point Stores metadata Optimizes query plan Coordinates query execution • Compute Nodes Local columnar storage Parallel/distributed execution of all queries, loads, backups, restores, resizes • Start at just $0.25/hour, grow to 2 PB (compressed) DC1: SSD; scale from 160 GB to 326 TB DS2: HDD; scale from 2 TB to 2 PB Ingestion/Backup Backup Restore JDBC/ODBC 10 GigE (HPC)
  • 10. Benefit #1: Amazon Redshift is fast • Parallel and Distributed Query Load Export Backup Restore Resize
  • 11. Benefit #2: Amazon Redshift is fully managed Continuous/incremental backups Multiple copies within cluster Continuous and incremental backups to S3 Continuous and incremental backups across regions Streaming restore Amazon S3 Amazon S3 Region 1 Region 2
  • 12. Benefit #3: Security is built-in • Load encrypted from S3 • SSL to secure data in transit • ECDHE perfect forward security • Amazon VPC for network isolation • Encryption to secure data at rest • All blocks on disks & in Amazon S3 encrypted • Block key, Cluster key, Master key (AES-256) • On-premises HSM & AWS CloudHSM support • Audit logging and AWS CloudTrail integration • SOC 1/2/3, PCI-DSS, FedRAMP, BAA 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 13. Benefit #4: Amazon Redshift is powerful • Approximate functions • User defined functions • Machine Learning • Data Science Amazon ML
  • 14. Benefit #5: Amazon Redshift has a large ecosystem Data Integration Systems IntegratorsBusiness Intelligence
  • 15. Activity 1: Deliver data to Redshift using s3 Copy Cmd • Time: 5 minutes • We are going to: A. Connect to Redshift cluster and create a table to hold web logs data B. COPY Data from S3 into Redshift C. Run Queries against Recently S3 copied Data
  • 16. Activity 1A: Connect to Amazon Redshift • You can connect with pgweb • Installed and configured for the Redshift Cluster • Just navigate to pgweb and start interacting Note: Click on the Addl. Info tab in qwikLABS and then open the pgWeb link in a new window. • Or, Use any JDBC/ODBC/libpq client • Aginity Workbench for Amazon Redshift • SQL Workbench/J • DBeaver • Datagrip
  • 17. Activity 1B: Create table in Redshift • Create table weblogs to capture the in-coming data from a Firehose delivery stream Note: You can download Redshift SQL code from qwikLabs. Click on the lab instructions tab in qwikLABS and then download the Redshift SQL file.
  • 18. Activity 1C: Deliver Data to Redshift from S3 • Run the Copy Command on Redshift to Load Data into wbelogs Table from S3 1. Remove last query from pgWeb 2. Run the below copy command (get access/secret key from qwiklabs connect tab) in the query window COPY weblogs FROM 's3://bigdataworkshop-sfloft/processed/processed-logs-1.gz' CREDENTIALS 'aws_access_key_id=<account_access_key>;aws_secret_access_key=<account_secret_key' DELIMITER ',' REMOVEQUOTES MAXERROR 0 GZIP;
  • 19. Review: Amazon Redshift Test Queries • Find distribution of response codes over days • Count the number of 404 response codes weblogs weblogs
  • 20. Review: Amazon Redshift Test Queries • Show all requests paths with status “PAGE NOT FOUND • Change ‘request_path’ to ‘request_uri’ in below query weblogs
  • 22. Amazon Athena Interactive Query Service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple Data Formats • Cost Effective
  • 23. Familiar Technologies Under the Covers • Used for SQL Queries • In-memory distributed query engine • ANSI-SQL compatible with extensions • Used for DDL functionality • Complex data types • Multitude of formats • Supports data partitioning
  • 24. Comparing performance and cost savings for compression and columnar format Dataset Size on Amazon S3 Query Run time Data Scanned Cost Data stored as text files 1 TB 236 seconds 1.15 TB $5.75 Data stored in Apache Parquet format* 130 GB 6.78 seconds 2.51 GB $0.013 Savings / Speedup 87% less with Parquet 34x faster 99% less data scanned 99.7% savings (*compressed using Snappy compression) https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon- athena/
  • 25. Activity 2 Ad-hoc Querying with Amazon Athena
  • 26. Activity 2A: Interactive Querying with Athena • From the AWS Management Console, click on All Services
  • 27. Activity 2A: Interactive Querying with Athena • Select Athena from the Analytics section and click on Get Started on the next page
  • 28. Activity 2A: Interactive Querying with Athena • Dismiss the window for running the Athena tutorial. • Dismiss any other tutorial window
  • 29. Activity 2A: Interactive Querying with Athena • Enter the SQL command to create a table as follows. The SQL ddl for this exercise can be found on the Lab instructions tab in the file Athena.sql. Please make sure to replace the <YOUR-KINESIS-FIREHOSE-DESTINATION- BUCKET> with the bucket name ‘s3://bigdataworkshop-sfloft/raw/’
  • 30. Activity 2A: Interactive Querying with Athena • Notice that the table will be created in the sample database (sampledb). Click on Run Query to create the table
  • 31. Activity 2B: Interactive Querying with Athena • The SQL ddl in the previous step creates a table in Athena based on the data streamed from Kinesis Firehose to S3 • Select sampledb from the database section and click on the eye icon to sample a few rows of the S3 data
  • 32. Activity 2C: Interactive Querying with Athena • Run interactive queries (copy SQL queries from Athena.sql under Lab instructions) and see the results on the console
  • 33. Activity 4D: Interactive Querying with Athena • Optionally, you can save the results of a query to CSV by choosing the file icon on the Results pane. • You can also view the results of previous queries or queries that may take some time to complete. Choose History then either search for your query or choose View or Download to view or download the results of previous completed queries. This also displays the status of queries that are currently running.
  • 34. Activity 2D: Interactive Querying with Athena • Exercise: Query results are also stored in Amazon S3 in a bucket called aws- athena-query-results-ACCOUNTID-REGION. Where can you can change the default location in the console?
  • 36. Storage S3 (EMRFS), HDFS YARN Cluster Resource Management Batch MapReduce Interactive Tez In Memory Spark Applications Hive, Pig, Spark SQL/Streaming/ML, Mahout, Sqoop HBase/Phoenix Presto Hue (SQL Interface/Metastore Management) Zeppelin (Interactive Notebook) Ganglia (Monitoring) HiveServer2/Spark Thriftserver (JDBC/ODBC) Amazon EMR service Amazon EMR release Streaming Flink
  • 37. On-cluster UIs Manage applicationsNotebooks SQL editor, Workflow designer, Metastore browser Design and execute queries and workloads And more using bootstrap actions!
  • 38. The Hadoop ecosystem can run in Amazon EMR
  • 39. On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Easy to use Spot Instances Spot Instances for task nodes Up to 90% off Amazon EC2 on-demand pricing Meet SLA at predictable cost Exceed SLA at lower cost
  • 40. Amazon S3 as your persistent data store • Separate compute and storage • Resize and shut down Amazon EMR clusters with no data loss • Point multiple Amazon EMR clusters at same data in Amazon S3 EMR EMR Amazon S3
  • 41. EMRFS makes it easier to leverage S3 • Better performance and error handling options • Transparent to applications – Use “s3://” • Consistent view • For consistent list and read-after-write for new puts • Support for Amazon S3 server-side and client-side encryption • Faster listing using EMRFS metadata
  • 42. Apache Spark• Fast, general-purpose engine for large- scale data processing • Write applications quickly in Java, Scala, or Python • Combine SQL, streaming, and complex analytics
  • 43. Apache Zeppelin • Web-based notebook for interactive analytics • Multiple language back end • Apache Spark integration • Data visualization • Collaboration https://zeppelin.incubator.apache.org/
  • 44. Activity 3 Ad-hoc analysis using Amazon EMR
  • 45. Activity 3: Process and Query data using Amazon EMR • Time: 20 minutes • We are going to: A. Use a Zeppelin Notebook to interact with Amazon EMR Cluster B. Process the data delivered to Amazon S3 by Firehose using Apache Spark C. Query the data processed in the earlier stage and create simple charts
  • 46. Activity 3A: Open the Zeppelin interface 1. Click on the Lab Instructions tab in qwikLABS and then download the Zeppelin Notebook 2. Click on the Addl. Info tab in qwikLABS and then open the zeppelin link into a new window. 3. Import the Notebook using the Import Note link on Zeppelin interface
  • 48. Activity 3B: Run the notebook • Enter the S3 bucket name where the logs are delivered by Kinesis Firehose. The bucket name begins with bigdataworkshop-sfloft • Execute Step 1 • Enter bucket name (bigdataworkshop-sfloft) • Execute Step 2 • Change the ‘/*/*/*/*/*.gz’ post fix to be ‘/raw/*.gz’ • Create a Dataframe from the dataset delivered by Firehose • Execute Step 3 • Sample a few rows
  • 49. Activity 3B: Run the notebook • Execute Step 4 to process the data • Notice how the ‘REQUEST’ field consists of both the ’REQUEST PROTOCOL’ and ‘REQUEST PATH’. Let’s fix that. • Create a UDF that will split the column and add it to the Dataframe • Print the new Dataframe
  • 50. Activity 3B: Run the notebook • Execute Step 6 • Register the data frame as a temporary tabl • Now you can run SQL queries on the temporary tables. • Execute the next 3 steps and observe the charts created • What did you learn about the dataset?
  • 51. Review : Ad-hoc analysis using Amazon EMR • You just learned on how to process and query data using Amazon EMR with Apache Spark • Amazon EMR has many other frameworks available for you to use • Hive, Presto, Flink, Pig, MapReduce • Hue, Oozie, HBase
  • 52. (Optional Exercise): Data Visualization with Amazon QuickSight
  • 53. Fast, Easy Ad-Hoc Analytics for Anyone, Everywhere • Ease of use targeted at business users. • Blazing fast performance powered by SPICE. • Broad connectivity with AWS data services, on-premises data, files and business applications. • Cloud-native solution that scales automatically. • 1/10th the cost of traditional BI solutions. • Create, share and collaborate with anyone in your organization, on the web or on mobile.
  • 54. Connect, SPICE, Analyze QuickSight allows you to connect to data from a wide variety of AWS, third party, and on-premise sources and import it to SPICE or query directly. Users can then easily explore, analyze, and share their insights with anyone. Amazon RDS Amazon S3 Amazon Redshift
  • 56. Activity 4: Visualization with QuickSight • We are going to: A. Register for a QuickSight account B. Connect to the Redshift Cluster C. Create visualizations for analysis to answer questions like: A. What are the most common http requests and how successful (response code of 200) are they B. Which are the most requested URIs
  • 57. Activity 4A: QuickSight Registration • Go to AWS Console, click on QuickSight from the Analytics section. • Click on Signup in the next window • Make sure the subscription type is Standard and click Continue on the next screen
  • 58. Activity 4A: QuickSight Registration • On the Subscription Type page, enter the account name (see note below) • Enter your email address • Select US West region • Check the S3 (all buckets) box Note: QuickSight Account name is the AWS account number from qwikLABS in the Connect tab
  • 59. Activity 4A: QuickSight Registration • If a pop box to choose S3 buckets appears, click Select buckets • Click on Go To Amazon Quicksight • Dismiss the next screen
  • 60. Activity 4B: Connect to data source • Click on Manage Data to create a new data set in QuickSight • Choose Redshift (Auto- discovered) as the data source. QuickSight autodiscovers databases associated with your AWS account (Redshift database in this case)
  • 61. Activity 4B: Connect to Amazon Redshift Note: You can get the Redshift database password from qwikLABS by navigating to the “Custom Connection Details” section in the Connect tab
  • 62. Activity 4C: Choose your weblogs Redshift table
  • 63. Activity 4D: Ingest data into SPICE • SPICE is Amazon QuickSight's in- memory optimized calculation engine, designed specifically for fast, ad-hoc data visualization • You can improve the performance of database data sets by importing the data into SPICE instead of using a direct query to the database
  • 64. Activity 4E: Creating your first analysis • What are the most requested http request types and their corresponding response codes for this site? • Simply select request_type, response_code and let AUTOGRAPH create the optimal visualization
  • 65. Review – Creating your Analysis • Exercise: Add a visual to demonstrate which URI are the most requested?
  • 66. Your Big Data Application Architecture Amazon EMR Amazon Redshift Amazon QuickSight Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Ad-hoc analysis of web logs Amazon Athena Interactive querying of web logs
  • 67. Congratulations on building your big data application on AWS !!!

Editor's Notes

  1. Talking points: Regions & Data governance Security
  2. HyperLogLog algorithm Significantly fast performance 2% error
  3. https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
  4. Amazon EMR is more than just MapReduce. Bootstrap actions available on GitHub
  5. In the next few slides, we’ll talk about data persistence models with Amazon EMR. The first pattern is Amazon S3 as HDFS. With this data persistence model, data gets stored on Amazon S3. HDFS does not play any role in storing data. As a matter of fact, HDFS is only there for temporary storage. Another common thing I hear is that storing data on Amazon S3 instead of HDFS slows my job down a lot because data has to get copied to the HDFS/disk first before processing starts. That’s incorrect. If you tell Hadoop that your data is on Amazon S3, Hadoop reads directly from Amazon S3 and streams data to Mappers without toughing the disk. Not to be completely correct, data does touch HDFS when data has to shuffle from mappers to reducers, but as I mentioned, HDFS acts as the temp space and nothing more. EMRFS is an implementation of HDFS used for reading and writing regular files from Amazon EMR directly to Amazon S3. EMRFS provides the convenience of storing persistent data in Amazon S3 for use with Hadoop while also providing features like Amazon S3 server-side encryption, read-after-write consistency, and list consistency.
  6. And every other feature that comes with Amazon S3. Features such as SSE, LifeCycle, etc. And again keep in mind that Amazon S3 as the storage is the main reason why we can’t build elastic clusters where nodes get added and removed dynamically without any data loss.
  7. Write programs in terms of transformations on distributed data sets.
  8. The SSH command below enables “port forwarding” on TCP 9026 so you can use http://localhost:9026 from a web browser on your local machine to view cluster details and job progress
  9. QuickSight is a fast, easy to use, cloud powered Business Analytics service lets business users quickly and easily visualize, explore, and share insights from their data to anyone in their organization, on the web or on mobile. QuickSight combines an elegant, easy to use interface with blazing fast performance powered by SPICE to provide a fast, easy to use business analytics service at 1/10 the cost of traditional BI solutions.