SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Matt Yanchyshyn, Sr. Manager, Solutions Architecture
April 2016
Building Your First Big Data Application on AWS
Your First Big Data Application on AWS
PROCESS
STORE
ANALYZE & VISUALIZE
COLLECT
Your First Big Data Application on AWS
PROCESS:
Amazon EMR with Spark & Hive
STORE
ANALYZE & VISUALIZE:
Amazon Redshift and Amazon QuickSight
COLLECT:
Amazon Kinesis Firehose
http://aws.amazon.com/big-data/use-cases/
A Modern Take on the Classic Data Warehouse
Setting up the environment
Data Storage with Amazon S3
Download all the CLI steps: http://bit.ly/aws-big-data-steps
Create an Amazon S3 bucket to store the data collected
with Amazon Kinesis Firehose
aws s3 mb s3://YOUR-S3-BUCKET-NAME
Access Control with IAM
Create an IAM role to allow Firehose to write to the S3 bucket
firehose-policy.json:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": {"Service": "firehose.amazonaws.com"},
"Action": "sts:AssumeRole"
}
}
Access Control with IAM
Create an IAM role to allow Firehose to write to the S3 bucket
s3-rw-policy.json:
{ "Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::YOUR-S3-BUCKET-NAME",
"arn:aws:s3:::YOUR-S3-BUCKET-NAME/*"
]
} }
Access Control with IAM
Create an IAM role to allow Firehose to write to the S3 bucket
aws iam create-role --role-name firehose-demo 
--assume-role-policy-document file://firehose-policy.json
Copy the value in “Arn” in the output,
e.g., arn:aws:iam::123456789:role/firehose-demo
aws iam put-role-policy --role-name firehose-demo 
--policy-name firehose-s3-rw 
--policy-document file://s3-rw-policy.json
Data Collection with Amazon Kinesis Firehose
Create a Firehose stream for incoming log data
aws firehose create-delivery-stream 
--delivery-stream-name demo-firehose-stream 
--s3-destination-configuration 
RoleARN=YOUR-FIREHOSE-ARN,
BucketARN="arn:aws:s3:::YOUR-S3-BUCKET-NAME",
Prefix=firehose/,
BufferingHints={IntervalInSeconds=60},
CompressionFormat=GZIP
Data Processing with Amazon EMR
Launch an Amazon EMR cluster with Spark and Hive
aws emr create-cluster 
--name "demo" 
--release-label emr-4.5.0 
--instance-type m3.xlarge 
--instance-count 2 
--ec2-attributes KeyName=YOUR-AWS-SSH-KEY 
--use-default-roles 
--applications Name=Hive Name=Spark Name=Zeppelin-Sandbox
Record your ClusterId from the output.
Data Analysis with Amazon Redshift
Create a single-node Amazon Redshift data warehouse:
aws redshift create-cluster 
--cluster-identifier demo 
--db-name demo 
--node-type dc1.large 
--cluster-type single-node 
--master-username master 
--master-user-password YOUR-REDSHIFT-PASSWORD 
--publicly-accessible 
--port 8192
Collect
Weblogs – Common Log Format (CLF)
75.35.230.210 - - [20/Jul/2009:22:22:42 -0700]
"GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236
"http://www.swivel.com/graphs/show/1163466"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11)
Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Writing into Amazon Kinesis Firehose
Download the demo weblog: http://bit.ly/aws-big-data
Open Python and run the following code to import the log into the stream:
import boto3
iam = boto3.client('iam')
firehose = boto3.client('firehose')
with open('weblog', 'r') as f:
for line in f:
firehose.put_record(
DeliveryStreamName='demo-firehose-stream',
Record={'Data': line}
)
print 'Record added'
Process
Apache Spark
• Fast, general purpose engine
for large-scale data processing
• Write applications quickly in
Java, Scala, or Python
• Combine SQL, streaming, and
complex analytics
Spark SQL
Spark's module for working with structured data using SQL
Run unmodified Hive queries on existing data
Apache Zeppelin
• Web-based notebook for interactive analytics
• Multiple language backend
• Apache Spark integration
• Data visualization
• Collaboration
https://zeppelin.incubator.apache.org/
View the Output Files in Amazon S3
After about 1 minute, you should see files in your S3
bucket:
aws s3 ls s3://YOUR-S3-BUCKET-NAME/firehose/ --recursive
Connect to Your EMR Cluster and Zeppelin
aws emr describe-cluster --cluster-id YOUR-EMR-CLUSTER-ID
Copy the MasterPublicDnsName. Use port forwarding so you can
access Zeppelin at http://localhost:18890 on your local machine.
ssh -i PATH-TO-YOUR-SSH-KEY -L 18890:localhost:8890 
hadoop@YOUR-EMR-DNS-NAME
Open Zeppelin with your local web browser and create a new “Note”:
http://localhost:18890
Exploring the Data in Amazon S3 using Spark
Download the Zeppelin notebook: http://bit.ly/aws-big-data-zeppelin
// Load all the files from S3 into a RDD
val accessLogLines = sc.textFile("s3://YOUR-S3-BUCKET-NAME/firehose/*/*/*/*")
// Count the lines
accessLogLines.count
// Print one line as a string
accessLogLines.first
// delimited by space so split them into fields
var accessLogFields = accessLogLines.map(_.split(" ").map(_.trim))
// Print the fields of a line
accessLogFields.first
Combine Fields: “A, B, C”  “A B C”
var accessLogColumns = accessLogFields
.map( arrayOfFields => { var temp1 =""; for (field <- arrayOfFields) yield {
var temp2 = ""
if (temp1.replaceAll("[",""").startsWith(""") && !temp1.endsWith("""))
temp1 = temp1 + " " + field.replaceAll("[|]",""")
else temp1 = field.replaceAll("[|]",""")
temp2 = temp1
if (temp1.endsWith(""")) temp1 = ""
temp2
}})
.map( fields => fields.filter(field => (field.startsWith(""") &&
field.endsWith(""")) || !field.startsWith(""") ))
.map(fields => fields.map(_.replaceAll(""","")))
Create a Data Frame and Transform the Data
import java.sql.Timestamp
import java.net.URL
case class accessLogs(
ipAddress: String,
requestTime: Timestamp,
requestMethod: String,
requestPath: String,
requestProtocol: String,
responseCode: String,
responseSize: String,
referrerHost: String,
userAgent: String
)
Create a Data Frame and Transform the Data
val accessLogsDF = accessLogColumns.map(line => {
var ipAddress = line(0)
var requestTime = new Timestamp(new
java.text.SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z").parse(line(3)).getTime())
var requestString = line(4).split(" ").map(_.trim())
var requestMethod = if (line(4).toString() != "-") requestString(0) else ""
var requestPath = if (line(4).toString() != "-") requestString(1) else ""
var requestProtocol = if (line(4).toString() != "-") requestString(2) else ""
var responseCode = line(5).replaceAll("-","")
var responseSize = line(6).replaceAll("-","")
var referrerHost = line(7)
var userAgent = line(8)
accessLogs(ipAddress, requestTime, requestMethod, requestPath,
requestProtocol,responseCode, responseSize, referrerHost, userAgent)
}).toDF()
Create an External Table Backed by Amazon S3
%sql
CREATE EXTERNAL TABLE access_logs
(
ip_address String,
request_time Timestamp,
request_method String,
request_path String,
request_protocol String,
response_code String,
response_size String,
referrer_host String,
user_agent String
)
PARTITIONED BY (year STRING,month STRING, day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
STORED AS TEXTFILE
LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed'
Configure Hive Partitioning and Compression
// set up Hive's "dynamic partitioning”
%sql
SET hive.exec.dynamic.partition=true
// compress output files on Amazon S3 using Gzip
%sql
SET hive.exec.compress.output=true
%sql
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
%sql
SET io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec
%sql
SET hive.exec.dynamic.partition.mode=nonstrict;
Write Output to Amazon S3
import org.apache.spark.sql.SaveMode
accessLogsDF
.withColumn("year", year(accessLogsDF("requestTime")))
.withColumn("month", month(accessLogsDF("requestTime")))
.withColumn("day", dayofmonth(accessLogsDF("requestTime")))
.write
.partitionBy("year","month","day")
.mode(SaveMode.Overwrite)
.insertInto("access_logs")
Query the Data Using Spark SQL
// Check the count of records
%sql
select count(*) from access_log_processed
// Fetch the first 10 records
%sql
select * from access_log_processed limit 10
View the Output Files in Amazon S3
Leave Zeppelin and go back to the console…
List the partition prefixes and output files:
aws s3 ls s3://YOUR-S3-BUCKET-NAME/access-log-processed/ 
--recursive
Analyze
Connect to Amazon Redshift
Using the PostgreSQL CLI
psql -h YOUR-REDSHIFT-ENDPOINT 
-p 8192 -U master demo
Or use any JDBC or ODBC SQL client with the PostgreSQL 8.x drivers
or native Amazon Redshift support
• Aginity Workbench for Amazon Redshift
• SQL Workbench/J
Create an Amazon Redshift Table to Hold Your Data
CREATE TABLE accesslogs
(
host_address varchar(512),
request_time timestamp,
request_method varchar(5),
request_path varchar(1024),
request_protocol varchar(10),
response_code Int,
response_size Int,
referrer_host varchar(1024),
user_agent varchar(512)
)
DISTKEY(host_address)
SORTKEY(request_time);
Loading Data into Amazon Redshift
“COPY” command loads files in parallel from
Amazon S3:
COPY accesslogs
FROM 's3://YOUR-S3-BUCKET-NAME/access-log-processed'
CREDENTIALS
'aws_access_key_id=YOUR-IAM-
ACCESS_KEY;aws_secret_access_key=YOUR-IAM-SECRET-KEY'
DELIMITER 't'
MAXERROR 0
GZIP;
Amazon Redshift Test Queries
-- find distribution of response codes over days
SELECT TRUNC(request_time), response_code, COUNT(1) FROM
accesslogs GROUP BY 1,2 ORDER BY 1,3 DESC;
-- find the 404 status codes
SELECT COUNT(1) FROM accessLogs WHERE response_code = 404;
-- show all requests for status as PAGE NOT FOUND
SELECT TOP 1 request_path,COUNT(1) FROM accesslogs WHERE
response_code = 404 GROUP BY 1 ORDER BY 2 DESC;
Visualize the Results
DEMO
Amazon QuickSight
Automating the Big Data Application
Amazon
Kinesis
Firehose
Amazon
EMR
Amazon
S3
Amazon
Redshift
Amazon
QuickSight
Amazon
S3
Event
Notification
Spark job
List of objects from Lambda
Write to Amazon Redshift using spark-redshift
Learn from big data experts
blogs.aws.amazon.com/bigdata
Building Your First Big Data Application on AWS

Más contenido relacionado

La actualidad más candente

AWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAmazon Web Services
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisAmazon Web Services
 
AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings Adam Book
 
Aws meetup managed_nat
Aws meetup managed_natAws meetup managed_nat
Aws meetup managed_natAdam Book
 
Build A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million UsersBuild A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million UsersAmazon Web Services
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!Julien SIMON
 
How to create aws s3 bucket using terraform
How to create aws s3 bucket using terraformHow to create aws s3 bucket using terraform
How to create aws s3 bucket using terraformKaty Slemon
 
Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Julien SIMON
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排Amazon Web Services
 
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...Amazon Web Services
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services
 
AWS March 2016 Webinar Series - Amazon EC2 Masterclass
AWS March 2016 Webinar Series - Amazon EC2 MasterclassAWS March 2016 Webinar Series - Amazon EC2 Masterclass
AWS March 2016 Webinar Series - Amazon EC2 MasterclassAmazon Web Services
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Web Services
 
Big Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon AthenaBig Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon AthenaJulien SIMON
 
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...Amazon Web Services
 

La actualidad más candente (20)

AWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAWS CloudFormation Best Practices
AWS CloudFormation Best Practices
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for Redis
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings AWS CloudFormation Intrinsic Functions and Mappings
AWS CloudFormation Intrinsic Functions and Mappings
 
Aws meetup managed_nat
Aws meetup managed_natAws meetup managed_nat
Aws meetup managed_nat
 
Build A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million UsersBuild A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million Users
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!
 
How to create aws s3 bucket using terraform
How to create aws s3 bucket using terraformHow to create aws s3 bucket using terraform
How to create aws s3 bucket using terraform
 
Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)Deep Dive on Amazon S3 (May 2016)
Deep Dive on Amazon S3 (May 2016)
 
Amazon EC2:Masterclass
Amazon EC2:MasterclassAmazon EC2:Masterclass
Amazon EC2:Masterclass
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排
 
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...
Introduction to Amazon Web Services - How to Scale your Next Idea on AWS : A ...
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
 
AWS March 2016 Webinar Series - Amazon EC2 Masterclass
AWS March 2016 Webinar Series - Amazon EC2 MasterclassAWS March 2016 Webinar Series - Amazon EC2 Masterclass
AWS March 2016 Webinar Series - Amazon EC2 Masterclass
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
 
Big Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon AthenaBig Data answers in seconds with Amazon Athena
Big Data answers in seconds with Amazon Athena
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...
 

Destacado

IronSource Atom - Redshift - Lessons Learned
IronSource Atom -  Redshift - Lessons LearnedIronSource Atom -  Redshift - Lessons Learned
IronSource Atom - Redshift - Lessons LearnedIdan Tohami
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerIdan Tohami
 
Hybrid Cloud – Enabling a Borderless Data Center for Your Business
Hybrid Cloud – Enabling a Borderless Data Center for Your BusinessHybrid Cloud – Enabling a Borderless Data Center for Your Business
Hybrid Cloud – Enabling a Borderless Data Center for Your BusinessAmazon Web Services
 
How to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel AvivHow to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel AvivAmazon Web Services
 
Adding to the bottom line - the Key Cloud plays for the Mid-Market - Adam Beavis
Adding to the bottom line - the Key Cloud plays for the Mid-Market - Adam BeavisAdding to the bottom line - the Key Cloud plays for the Mid-Market - Adam Beavis
Adding to the bottom line - the Key Cloud plays for the Mid-Market - Adam BeavisAmazon Web Services
 
AWS Webcast - Explore the AWS Cloud for Government
AWS Webcast - Explore the AWS Cloud for GovernmentAWS Webcast - Explore the AWS Cloud for Government
AWS Webcast - Explore the AWS Cloud for GovernmentAmazon Web Services
 
Creating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your OrganizationCreating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your OrganizationAmazon Web Services
 
Running Business Critical Workloads on AWS
Running Business Critical Workloads on AWS Running Business Critical Workloads on AWS
Running Business Critical Workloads on AWS Amazon Web Services
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
AWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersTaxJar
 
Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...
Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...
Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...Amazon Web Services
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive ApplicationsAmazon Web Services
 
Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...
Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...
Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...Amazon Web Services
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWSAmazon Web Services
 

Destacado (20)

IronSource Atom - Redshift - Lessons Learned
IronSource Atom -  Redshift - Lessons LearnedIronSource Atom -  Redshift - Lessons Learned
IronSource Atom - Redshift - Lessons Learned
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran Tessler
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Hybrid Cloud – Enabling a Borderless Data Center for Your Business
Hybrid Cloud – Enabling a Borderless Data Center for Your BusinessHybrid Cloud – Enabling a Borderless Data Center for Your Business
Hybrid Cloud – Enabling a Borderless Data Center for Your Business
 
How to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel AvivHow to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
How to Secure your Hybrid Enviroment - Pop-up Loft Tel Aviv
 
Adding to the bottom line - the Key Cloud plays for the Mid-Market - Adam Beavis
Adding to the bottom line - the Key Cloud plays for the Mid-Market - Adam BeavisAdding to the bottom line - the Key Cloud plays for the Mid-Market - Adam Beavis
Adding to the bottom line - the Key Cloud plays for the Mid-Market - Adam Beavis
 
AWS Webcast - Explore the AWS Cloud for Government
AWS Webcast - Explore the AWS Cloud for GovernmentAWS Webcast - Explore the AWS Cloud for Government
AWS Webcast - Explore the AWS Cloud for Government
 
Moving Viadeo to AWS
Moving Viadeo to AWSMoving Viadeo to AWS
Moving Viadeo to AWS
 
Creating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your OrganizationCreating a Culture of Cost Management in Your Organization
Creating a Culture of Cost Management in Your Organization
 
Running Business Critical Workloads on AWS
Running Business Critical Workloads on AWS Running Business Critical Workloads on AWS
Running Business Critical Workloads on AWS
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
AWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDS
 
Media Streaming on AWS
Media Streaming on AWSMedia Streaming on AWS
Media Streaming on AWS
 
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA Sellers
 
Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...
Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...
Getting Started with AWS IoT and the Dragon IoT Starter Kit - AWS May 2016 We...
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications
 
Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...
Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...
Orchestrating Network with Web Services Session Sponsored by Megaport – Camer...
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS
 
AWS Innovation at Scale
AWS Innovation at ScaleAWS Innovation at Scale
AWS Innovation at Scale
 

Similar a Building Your First Big Data Application on AWS

Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSAmazon Web Services
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAmazon Web Services
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web ServicesSimone Brunozzi
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS CloudAmazon Web Services
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS Amazon Web Services
 
Taking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) FamilyTaking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) FamilyBen Hall
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...DevOps_Fest
 
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...Amazon Web Services
 
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013Amazon Web Services
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016Amazon Web Services Korea
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiecturesIegor Fadieiev
 
Hands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with RelationalHands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with RelationalAmazon Web Services
 
Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheAmazon Web Services
 
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Amazon Web Services
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Amazon Web Services
 

Similar a Building Your First Big Data Application on AWS (20)

Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWSAWS APAC Webinar Week - Launching Your First Big Data Project on AWS
AWS APAC Webinar Week - Launching Your First Big Data Project on AWS
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud
 
AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS AWS September Webinar Series - Building Your First Big Data Application on AWS
AWS September Webinar Series - Building Your First Big Data Application on AWS
 
Amazed by AWS Series #4
Amazed by AWS Series #4Amazed by AWS Series #4
Amazed by AWS Series #4
 
Taking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) FamilyTaking advantage of the Amazon Web Services (AWS) Family
Taking advantage of the Amazon Web Services (AWS) Family
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
AWS Serverless Workshop
AWS Serverless WorkshopAWS Serverless Workshop
AWS Serverless Workshop
 
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
DevOps Fest 2019. Alex Casalboni. Configuration management and service discov...
 
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
The Best of Both Worlds: Implementing Hybrid IT with AWS (ENT218) | AWS re:In...
 
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
Becoming a Command Line Expert with the AWS CLI (TLS304) | AWS re:Invent 2013
 
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: Gaming on AWS 2016
 
Serverless archtiectures
Serverless archtiecturesServerless archtiectures
Serverless archtiectures
 
Hands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with RelationalHands-on Lab: Comparing Redis with Relational
Hands-on Lab: Comparing Redis with Relational
 
Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCache
 
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AW...
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Building Your First Big Data Application on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Matt Yanchyshyn, Sr. Manager, Solutions Architecture April 2016 Building Your First Big Data Application on AWS
  • 2. Your First Big Data Application on AWS PROCESS STORE ANALYZE & VISUALIZE COLLECT
  • 3. Your First Big Data Application on AWS PROCESS: Amazon EMR with Spark & Hive STORE ANALYZE & VISUALIZE: Amazon Redshift and Amazon QuickSight COLLECT: Amazon Kinesis Firehose
  • 5. Setting up the environment
  • 6. Data Storage with Amazon S3 Download all the CLI steps: http://bit.ly/aws-big-data-steps Create an Amazon S3 bucket to store the data collected with Amazon Kinesis Firehose aws s3 mb s3://YOUR-S3-BUCKET-NAME
  • 7. Access Control with IAM Create an IAM role to allow Firehose to write to the S3 bucket firehose-policy.json: { "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Principal": {"Service": "firehose.amazonaws.com"}, "Action": "sts:AssumeRole" } }
  • 8. Access Control with IAM Create an IAM role to allow Firehose to write to the S3 bucket s3-rw-policy.json: { "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::YOUR-S3-BUCKET-NAME", "arn:aws:s3:::YOUR-S3-BUCKET-NAME/*" ] } }
  • 9. Access Control with IAM Create an IAM role to allow Firehose to write to the S3 bucket aws iam create-role --role-name firehose-demo --assume-role-policy-document file://firehose-policy.json Copy the value in “Arn” in the output, e.g., arn:aws:iam::123456789:role/firehose-demo aws iam put-role-policy --role-name firehose-demo --policy-name firehose-s3-rw --policy-document file://s3-rw-policy.json
  • 10. Data Collection with Amazon Kinesis Firehose Create a Firehose stream for incoming log data aws firehose create-delivery-stream --delivery-stream-name demo-firehose-stream --s3-destination-configuration RoleARN=YOUR-FIREHOSE-ARN, BucketARN="arn:aws:s3:::YOUR-S3-BUCKET-NAME", Prefix=firehose/, BufferingHints={IntervalInSeconds=60}, CompressionFormat=GZIP
  • 11. Data Processing with Amazon EMR Launch an Amazon EMR cluster with Spark and Hive aws emr create-cluster --name "demo" --release-label emr-4.5.0 --instance-type m3.xlarge --instance-count 2 --ec2-attributes KeyName=YOUR-AWS-SSH-KEY --use-default-roles --applications Name=Hive Name=Spark Name=Zeppelin-Sandbox Record your ClusterId from the output.
  • 12. Data Analysis with Amazon Redshift Create a single-node Amazon Redshift data warehouse: aws redshift create-cluster --cluster-identifier demo --db-name demo --node-type dc1.large --cluster-type single-node --master-username master --master-user-password YOUR-REDSHIFT-PASSWORD --publicly-accessible --port 8192
  • 14. Weblogs – Common Log Format (CLF) 75.35.230.210 - - [20/Jul/2009:22:22:42 -0700] "GET /images/pigtrihawk.jpg HTTP/1.1" 200 29236 "http://www.swivel.com/graphs/show/1163466" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
  • 15. Writing into Amazon Kinesis Firehose Download the demo weblog: http://bit.ly/aws-big-data Open Python and run the following code to import the log into the stream: import boto3 iam = boto3.client('iam') firehose = boto3.client('firehose') with open('weblog', 'r') as f: for line in f: firehose.put_record( DeliveryStreamName='demo-firehose-stream', Record={'Data': line} ) print 'Record added'
  • 17. Apache Spark • Fast, general purpose engine for large-scale data processing • Write applications quickly in Java, Scala, or Python • Combine SQL, streaming, and complex analytics
  • 18. Spark SQL Spark's module for working with structured data using SQL Run unmodified Hive queries on existing data
  • 19. Apache Zeppelin • Web-based notebook for interactive analytics • Multiple language backend • Apache Spark integration • Data visualization • Collaboration https://zeppelin.incubator.apache.org/
  • 20. View the Output Files in Amazon S3 After about 1 minute, you should see files in your S3 bucket: aws s3 ls s3://YOUR-S3-BUCKET-NAME/firehose/ --recursive
  • 21. Connect to Your EMR Cluster and Zeppelin aws emr describe-cluster --cluster-id YOUR-EMR-CLUSTER-ID Copy the MasterPublicDnsName. Use port forwarding so you can access Zeppelin at http://localhost:18890 on your local machine. ssh -i PATH-TO-YOUR-SSH-KEY -L 18890:localhost:8890 hadoop@YOUR-EMR-DNS-NAME Open Zeppelin with your local web browser and create a new “Note”: http://localhost:18890
  • 22. Exploring the Data in Amazon S3 using Spark Download the Zeppelin notebook: http://bit.ly/aws-big-data-zeppelin // Load all the files from S3 into a RDD val accessLogLines = sc.textFile("s3://YOUR-S3-BUCKET-NAME/firehose/*/*/*/*") // Count the lines accessLogLines.count // Print one line as a string accessLogLines.first // delimited by space so split them into fields var accessLogFields = accessLogLines.map(_.split(" ").map(_.trim)) // Print the fields of a line accessLogFields.first
  • 23. Combine Fields: “A, B, C”  “A B C” var accessLogColumns = accessLogFields .map( arrayOfFields => { var temp1 =""; for (field <- arrayOfFields) yield { var temp2 = "" if (temp1.replaceAll("[",""").startsWith(""") && !temp1.endsWith(""")) temp1 = temp1 + " " + field.replaceAll("[|]",""") else temp1 = field.replaceAll("[|]",""") temp2 = temp1 if (temp1.endsWith(""")) temp1 = "" temp2 }}) .map( fields => fields.filter(field => (field.startsWith(""") && field.endsWith(""")) || !field.startsWith(""") )) .map(fields => fields.map(_.replaceAll(""","")))
  • 24. Create a Data Frame and Transform the Data import java.sql.Timestamp import java.net.URL case class accessLogs( ipAddress: String, requestTime: Timestamp, requestMethod: String, requestPath: String, requestProtocol: String, responseCode: String, responseSize: String, referrerHost: String, userAgent: String )
  • 25. Create a Data Frame and Transform the Data val accessLogsDF = accessLogColumns.map(line => { var ipAddress = line(0) var requestTime = new Timestamp(new java.text.SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z").parse(line(3)).getTime()) var requestString = line(4).split(" ").map(_.trim()) var requestMethod = if (line(4).toString() != "-") requestString(0) else "" var requestPath = if (line(4).toString() != "-") requestString(1) else "" var requestProtocol = if (line(4).toString() != "-") requestString(2) else "" var responseCode = line(5).replaceAll("-","") var responseSize = line(6).replaceAll("-","") var referrerHost = line(7) var userAgent = line(8) accessLogs(ipAddress, requestTime, requestMethod, requestPath, requestProtocol,responseCode, responseSize, referrerHost, userAgent) }).toDF()
  • 26. Create an External Table Backed by Amazon S3 %sql CREATE EXTERNAL TABLE access_logs ( ip_address String, request_time Timestamp, request_method String, request_path String, request_protocol String, response_code String, response_size String, referrer_host String, user_agent String ) PARTITIONED BY (year STRING,month STRING, day STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' STORED AS TEXTFILE LOCATION 's3://YOUR-S3-BUCKET-NAME/access-log-processed'
  • 27. Configure Hive Partitioning and Compression // set up Hive's "dynamic partitioning” %sql SET hive.exec.dynamic.partition=true // compress output files on Amazon S3 using Gzip %sql SET hive.exec.compress.output=true %sql SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec %sql SET io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec %sql SET hive.exec.dynamic.partition.mode=nonstrict;
  • 28. Write Output to Amazon S3 import org.apache.spark.sql.SaveMode accessLogsDF .withColumn("year", year(accessLogsDF("requestTime"))) .withColumn("month", month(accessLogsDF("requestTime"))) .withColumn("day", dayofmonth(accessLogsDF("requestTime"))) .write .partitionBy("year","month","day") .mode(SaveMode.Overwrite) .insertInto("access_logs")
  • 29. Query the Data Using Spark SQL // Check the count of records %sql select count(*) from access_log_processed // Fetch the first 10 records %sql select * from access_log_processed limit 10
  • 30. View the Output Files in Amazon S3 Leave Zeppelin and go back to the console… List the partition prefixes and output files: aws s3 ls s3://YOUR-S3-BUCKET-NAME/access-log-processed/ --recursive
  • 32. Connect to Amazon Redshift Using the PostgreSQL CLI psql -h YOUR-REDSHIFT-ENDPOINT -p 8192 -U master demo Or use any JDBC or ODBC SQL client with the PostgreSQL 8.x drivers or native Amazon Redshift support • Aginity Workbench for Amazon Redshift • SQL Workbench/J
  • 33. Create an Amazon Redshift Table to Hold Your Data CREATE TABLE accesslogs ( host_address varchar(512), request_time timestamp, request_method varchar(5), request_path varchar(1024), request_protocol varchar(10), response_code Int, response_size Int, referrer_host varchar(1024), user_agent varchar(512) ) DISTKEY(host_address) SORTKEY(request_time);
  • 34. Loading Data into Amazon Redshift “COPY” command loads files in parallel from Amazon S3: COPY accesslogs FROM 's3://YOUR-S3-BUCKET-NAME/access-log-processed' CREDENTIALS 'aws_access_key_id=YOUR-IAM- ACCESS_KEY;aws_secret_access_key=YOUR-IAM-SECRET-KEY' DELIMITER 't' MAXERROR 0 GZIP;
  • 35. Amazon Redshift Test Queries -- find distribution of response codes over days SELECT TRUNC(request_time), response_code, COUNT(1) FROM accesslogs GROUP BY 1,2 ORDER BY 1,3 DESC; -- find the 404 status codes SELECT COUNT(1) FROM accessLogs WHERE response_code = 404; -- show all requests for status as PAGE NOT FOUND SELECT TOP 1 request_path,COUNT(1) FROM accesslogs WHERE response_code = 404 GROUP BY 1 ORDER BY 2 DESC;
  • 37. Automating the Big Data Application Amazon Kinesis Firehose Amazon EMR Amazon S3 Amazon Redshift Amazon QuickSight Amazon S3 Event Notification Spark job List of objects from Lambda Write to Amazon Redshift using spark-redshift
  • 38. Learn from big data experts blogs.aws.amazon.com/bigdata