SlideShare a Scribd company logo
1 of 52
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rahul Bhartia, Ecosystem Solution Architect
15th
September 2015
Building your first Big Data
application on AWS
Amazon S3
Amazon Kinesis
Amazon DynamoDB
Amazon RDS (Aurora)
AWS Lambda
KCL Apps
Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
CollectCollect ProcessProcess AnalyzeAnalyze
StoreStore
Data Collection
and Storage
Data
Processing
Event
Processing
Data
Analysis
Data Answers
Big Data ecosystem on AWS
Your first Big Data application on AWS
?
Big Data ecosystem on AWS - Collect
CollectCollect ProcessProcess AnalyzeAnalyze
StoreStore
Data Answers
Big Data ecosystem on AWS - Process
CollectCollect ProcessProcess AnalyzeAnalyze
StoreStore
Data Answers
Big Data ecosystem on AWS - Analyze
CollectCollect ProcessProcess AnalyzeAnalyze
StoreStore
Data Answers
SQL
Setup
Resources
1. AWS Command Line Interface (aws-cli) configured
2. Amazon Kinesis stream with a single shard
3. Amazon S3 bucket to hold the files
4. Amazon EMR cluster (two nodes) with Spark and Hive
5. Amazon Redshift data warehouse cluster (single node)
Amazon Kinesis
Create an Amazon Kinesis stream to hold incoming data:
aws kinesis create-stream 
--stream-name AccessLogStream 
--shard-count 1
Amazon S3
Amazon EMR
Amazon Redshift
Your first Big Data application on AWS
1. COLLECT: Stream data into
Kinesis with Log4J
2. PROCESS: Process data
with EMR using Spark & Hive
3. ANALYZE: Analyze data in
Redshift using SQL
STORE
SQL
1. Collect
Amazon Kinesis Log4J Appender
Log file format
Spark
•Fast and general engine for large-
scale data processing
•Write applications quickly in Java,
Scala or Python
•Combine SQL, streaming, and
complex analytics.
Using Spark on EMR
Amazon Kinesis and Spark Streaming
Producer Amazon
Kinesis
Amazon
S3
DynamoD
B
KCL
Spark-Streaming uses
KCL for Kinesis
Amazon
EMR
Spark-Streaming application to read from Kinesis and write to S3
Spark-streaming - Reading from Kinesis
Spark-streaming – Writing to S3
View the output files in Amazon S3
2. Process
Amazon EMR’s Hive
Adapts a SQL-like (HiveQL) query to run on Hadoop
Schema on read: map table to the input data
Access data in Amazon S3, Amazon DymamoDB, and Amazon Kinesis
Query complex input formats using SerDe
Transform data with User Defined Functions (UDF)
Using Hive on Amazon EMR
Create a table that points to your Amazon S3 bucket
CREATE EXTERNAL TABLE access_log_raw(
host STRING, identity STRING,
user STRING, request_time STRING,
request STRING, status STRING,
size STRING, referrer STRING,
agent STRING
)
PARTITIONED BY (year INT, month INT, day INT, hour INT, min INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|[[^]]*])
([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*")
([^ "]*|"[^"]*"))?"
)
LOCATION 's3://YOUR-S3-BUCKET/access-log-raw';
msck repair table access_log_raw;
Process data using Hive
We will transform the data that is returned by the query before writing it
to our Amazon S3-stored external Hive table
Hive User Defined Functions (UDF) in use for the text transformations:
from_unixtime, unix_timestamp and hour
The “hour” value is important: this is what’s used to split and organize
the output files before writing to Amazon S3. These splits will allow us
to more efficiently load the data into Amazon Redshift later in the lab
using the parallel “COPY” command
Create an external Hive table in Amazon S3
Configure partition and compression
Query Hive and write output to Amazon S3
-- convert the Apache log timestamp to a UNIX timestamp
-- split files in Amazon S3 by the hour in the log lines
INSERT OVERWRITE TABLE access_log_processed PARTITION (hour)
SELECT
from_unixtime(unix_timestamp(request_time,
'[dd/MMM/yyyy:HH:mm:ss Z]')),
host,
request,
status,
referrer,
agent,
hour(from_unixtime(unix_timestamp(request_time,
'[dd/MMM/yyyy:HH:mm:ss Z]'))) as hour
FROM access_log_raw;
Viewing Job status
http://127.0.0.1/9026
View the output files in Amazon S3
Spark SQL
Spark's module for working with structured data using SQL
Run unmodified Hive queries on existing data.
Using Spark-SQL on Amazon EMR
Query the data with Spark
3. Analyze
Connect to Amazon Redshift
Create an Amazon Redshift table to hold your data
Loading data into Amazon Redshift
“COPY” command loads files in parallel
COPY accesslogs
FROM 's3://YOUR-S3-BUCKET/access-log-processed'
CREDENTIALS
'aws_access_key_id=YOUR-IAM-ACCESS_KEY;
aws_secret_access_key=YOUR-IAM-SECRET-KEY'
DELIMITER 't' IGNOREHEADER 0
MAXERROR 0
GZIP;
Amazon Redshift test queries
Your first Big Data application on AWS
A favicon would fix 398 of the total 977 PAGE NOT FOUND (404) errors
…around the same cost as a cup of coffee
Try it yourself on the AWS Cloud…
Service Est. Cost*
Amazon Kinesis $1.00
Amazon S3 (free tier) $0
Amazon EMR $0.44
Amazon Redshift $1.00
Est. Total $2.44
*Estimated costs assumes: use of free tier where available, lower cost instances, dataset no bigger than 10MB and instances running
for less than 4 hours. Costs may vary depending on options selected, size of dataset, and usage.
$3.50
Thank you
AWS Big Data blog
blogs.aws.amazon.com/bigdata

More Related Content

What's hot

Introduction to Amazon Lightsail
Introduction to Amazon LightsailIntroduction to Amazon Lightsail
Introduction to Amazon LightsailAmazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services
 
SQL Strikes Back! Options for Large Scale SQL Analytics
SQL Strikes Back! Options for Large Scale SQL AnalyticsSQL Strikes Back! Options for Large Scale SQL Analytics
SQL Strikes Back! Options for Large Scale SQL AnalyticsAmazon Web Services
 
Serverlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and QuicksightServerlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and QuicksightAmazon Web Services
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapIan Massingham
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料Amazon Web Services
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...Julien SIMON
 
Deep Dive on Amazon RDS (May 2016)
Deep Dive on Amazon RDS (May 2016)Deep Dive on Amazon RDS (May 2016)
Deep Dive on Amazon RDS (May 2016)Julien SIMON
 
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...Amazon Web Services
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012Amazon Web Services
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Julien SIMON
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraAmazon Web Services
 
使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務Amazon Web Services
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?Ian Massingham
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesAmazon Web Services
 

What's hot (20)

Introduction to Amazon Lightsail
Introduction to Amazon LightsailIntroduction to Amazon Lightsail
Introduction to Amazon Lightsail
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
SQL Strikes Back! Options for Large Scale SQL Analytics
SQL Strikes Back! Options for Large Scale SQL AnalyticsSQL Strikes Back! Options for Large Scale SQL Analytics
SQL Strikes Back! Options for Large Scale SQL Analytics
 
Serverlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and QuicksightServerlesss Big Data Analytics with Amazon Athena and Quicksight
Serverlesss Big Data Analytics with Amazon Athena and Quicksight
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
 
SEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) ScaleSEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) Scale
 
Deep Dive on Amazon RDS (May 2016)
Deep Dive on Amazon RDS (May 2016)Deep Dive on Amazon RDS (May 2016)
Deep Dive on Amazon RDS (May 2016)
 
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
 
Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)
 
Almacenamiento en la nube con AWS
Almacenamiento en la nube con AWSAlmacenamiento en la nube con AWS
Almacenamiento en la nube con AWS
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 
使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務使用 AWS Step Functions 開發 Serverless 服務
使用 AWS Step Functions 開發 Serverless 服務
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS ResourcesENT314 Automate Best Practices and Operational Health for Your AWS Resources
ENT314 Automate Best Practices and Operational Health for Your AWS Resources
 

Viewers also liked

AWS Finland March meetup 2017 - selecting enterprise IoT platform
AWS Finland March meetup 2017 - selecting enterprise IoT platformAWS Finland March meetup 2017 - selecting enterprise IoT platform
AWS Finland March meetup 2017 - selecting enterprise IoT platformRolf Koski
 
Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Julien SIMON
 
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...Cascading
 
Extending Application Data In The Cloud
Extending Application Data In The CloudExtending Application Data In The Cloud
Extending Application Data In The CloudRonald Bradford
 
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)Amazon Web Services
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSSKeyur Thakore
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
 
Application of data mining
Application of data miningApplication of data mining
Application of data miningSHIVANI SONI
 
AWS re:Invent Recap 2016 Taiwan part 1
AWS re:Invent Recap 2016 Taiwan part 1AWS re:Invent Recap 2016 Taiwan part 1
AWS re:Invent Recap 2016 Taiwan part 1Amazon Web Services
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...Amazon Web Services
 
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...Big Data Spain
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyBig Data Spain
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Viewers also liked (20)

AWS Finland March meetup 2017 - selecting enterprise IoT platform
AWS Finland March meetup 2017 - selecting enterprise IoT platformAWS Finland March meetup 2017 - selecting enterprise IoT platform
AWS Finland March meetup 2017 - selecting enterprise IoT platform
 
Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)Deep Learning with AWS (November 2016)
Deep Learning with AWS (November 2016)
 
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
The Cascading (big) data application framework - André Keple, Sr. Engineer, C...
 
Extending Application Data In The Cloud
Extending Application Data In The CloudExtending Application Data In The Cloud
Extending Application Data In The Cloud
 
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
AWS re:Invent 2016: Driving Innovation with Big Data and IoT (GPSST304)
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
AWS re:Invent Recap 2016 Taiwan part 1
AWS re:Invent Recap 2016 Taiwan part 1AWS re:Invent Recap 2016 Taiwan part 1
AWS re:Invent Recap 2016 Taiwan part 1
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
 
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven Poutsy
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to AWS September Webinar Series - Building Your First Big Data Application on AWS

AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAmazon Web Services
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSAmazon Web Services
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform Amazon Web Services
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaHelen Rogers
 
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsAmazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...Amazon Web Services
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014Amazon Web Services
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational DatabasesAmazon Web Services
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAmazon Web Services Korea
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational DatabasesAmazon Web Services
 

Similar to AWS September Webinar Series - Building Your First Big Data Application on AWS (20)

AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Amazed by AWS Series #4
Amazed by AWS Series #4Amazed by AWS Series #4
Amazed by AWS Series #4
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Building your first Data lake platform
Building your first Data lake platform Building your first Data lake platform
Building your first Data lake platform
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon Elisha
 
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
 
Log Analysis At Scale
Log Analysis At ScaleLog Analysis At Scale
Log Analysis At Scale
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational Databases
 
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon MeichtryAWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
AWS Innovate: Build a Data Lake on AWS- Johnathon Meichtry
 
Adding Search to Relational Databases
Adding Search to Relational DatabasesAdding Search to Relational Databases
Adding Search to Relational Databases
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

AWS September Webinar Series - Building Your First Big Data Application on AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rahul Bhartia, Ecosystem Solution Architect 15th September 2015 Building your first Big Data application on AWS
  • 2. Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS (Aurora) AWS Lambda KCL Apps Amazon EMR Amazon Redshift Amazon Machine Learning CollectCollect ProcessProcess AnalyzeAnalyze StoreStore Data Collection and Storage Data Processing Event Processing Data Analysis Data Answers Big Data ecosystem on AWS
  • 3. Your first Big Data application on AWS ?
  • 4. Big Data ecosystem on AWS - Collect CollectCollect ProcessProcess AnalyzeAnalyze StoreStore Data Answers
  • 5. Big Data ecosystem on AWS - Process CollectCollect ProcessProcess AnalyzeAnalyze StoreStore Data Answers
  • 6. Big Data ecosystem on AWS - Analyze CollectCollect ProcessProcess AnalyzeAnalyze StoreStore Data Answers SQL
  • 8. Resources 1. AWS Command Line Interface (aws-cli) configured 2. Amazon Kinesis stream with a single shard 3. Amazon S3 bucket to hold the files 4. Amazon EMR cluster (two nodes) with Spark and Hive 5. Amazon Redshift data warehouse cluster (single node)
  • 9. Amazon Kinesis Create an Amazon Kinesis stream to hold incoming data: aws kinesis create-stream --stream-name AccessLogStream --shard-count 1
  • 13. Your first Big Data application on AWS 1. COLLECT: Stream data into Kinesis with Log4J 2. PROCESS: Process data with EMR using Spark & Hive 3. ANALYZE: Analyze data in Redshift using SQL STORE SQL
  • 16.
  • 18. Spark •Fast and general engine for large- scale data processing •Write applications quickly in Java, Scala or Python •Combine SQL, streaming, and complex analytics.
  • 20. Amazon Kinesis and Spark Streaming Producer Amazon Kinesis Amazon S3 DynamoD B KCL Spark-Streaming uses KCL for Kinesis Amazon EMR Spark-Streaming application to read from Kinesis and write to S3
  • 23.
  • 24. View the output files in Amazon S3
  • 25.
  • 27. Amazon EMR’s Hive Adapts a SQL-like (HiveQL) query to run on Hadoop Schema on read: map table to the input data Access data in Amazon S3, Amazon DymamoDB, and Amazon Kinesis Query complex input formats using SerDe Transform data with User Defined Functions (UDF)
  • 28. Using Hive on Amazon EMR
  • 29. Create a table that points to your Amazon S3 bucket CREATE EXTERNAL TABLE access_log_raw( host STRING, identity STRING, user STRING, request_time STRING, request STRING, status STRING, size STRING, referrer STRING, agent STRING ) PARTITIONED BY (year INT, month INT, day INT, hour INT, min INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|[[^]]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*") ([^ "]*|"[^"]*"))?" ) LOCATION 's3://YOUR-S3-BUCKET/access-log-raw'; msck repair table access_log_raw;
  • 30.
  • 31. Process data using Hive We will transform the data that is returned by the query before writing it to our Amazon S3-stored external Hive table Hive User Defined Functions (UDF) in use for the text transformations: from_unixtime, unix_timestamp and hour The “hour” value is important: this is what’s used to split and organize the output files before writing to Amazon S3. These splits will allow us to more efficiently load the data into Amazon Redshift later in the lab using the parallel “COPY” command
  • 32. Create an external Hive table in Amazon S3
  • 33. Configure partition and compression
  • 34. Query Hive and write output to Amazon S3 -- convert the Apache log timestamp to a UNIX timestamp -- split files in Amazon S3 by the hour in the log lines INSERT OVERWRITE TABLE access_log_processed PARTITION (hour) SELECT from_unixtime(unix_timestamp(request_time, '[dd/MMM/yyyy:HH:mm:ss Z]')), host, request, status, referrer, agent, hour(from_unixtime(unix_timestamp(request_time, '[dd/MMM/yyyy:HH:mm:ss Z]'))) as hour FROM access_log_raw;
  • 35.
  • 37. View the output files in Amazon S3
  • 38.
  • 39. Spark SQL Spark's module for working with structured data using SQL Run unmodified Hive queries on existing data.
  • 40. Using Spark-SQL on Amazon EMR
  • 41. Query the data with Spark
  • 42.
  • 44. Connect to Amazon Redshift
  • 45. Create an Amazon Redshift table to hold your data
  • 46. Loading data into Amazon Redshift “COPY” command loads files in parallel COPY accesslogs FROM 's3://YOUR-S3-BUCKET/access-log-processed' CREDENTIALS 'aws_access_key_id=YOUR-IAM-ACCESS_KEY; aws_secret_access_key=YOUR-IAM-SECRET-KEY' DELIMITER 't' IGNOREHEADER 0 MAXERROR 0 GZIP;
  • 47.
  • 48.
  • 50. Your first Big Data application on AWS A favicon would fix 398 of the total 977 PAGE NOT FOUND (404) errors
  • 51. …around the same cost as a cup of coffee Try it yourself on the AWS Cloud… Service Est. Cost* Amazon Kinesis $1.00 Amazon S3 (free tier) $0 Amazon EMR $0.44 Amazon Redshift $1.00 Est. Total $2.44 *Estimated costs assumes: use of free tier where available, lower cost instances, dataset no bigger than 10MB and instances running for less than 4 hours. Costs may vary depending on options selected, size of dataset, and usage. $3.50
  • 52. Thank you AWS Big Data blog blogs.aws.amazon.com/bigdata