SlideShare una empresa de Scribd logo
1 de 17
Analytics in the Cloud
       Peter Sirota, GM Elastic MapReduce
Data-Driven Decision Making

Data is the new raw
material for any
business on par with
capital, people, and
labor.
What is Big Data?
  Terabytes of semi-structured log data
  in which businesses want to:
   find correlations/perform pattern matching
   generate recommendations
   calculate advanced statistics (i.e., TP99)

  Twitter “Firehose”
   50 million tweets per day
   1,400% growth per year
   How can advertisers drink from it?

  Social graphs
   Value increases with exponential
    growth in data connections

Big Data is full of valuable, unanswered questions!
Why is Big Data Hard (and Getting Harder)?
  Today’s Data Warehouses
     Need to consolidate from multiple data sources in multiple formats across
      multiple businesses
     Unconstrained growth of this business-critical information

  Today’s Users
     Expect faster response time of fresher data
     Sampling is not good enough and history is important
     Demand inexpensive experimentation with new data
     Become increasingly sophisticated Data Scientists

  Current systems don’t scale (and weren’t meant to)
     Long time to provision more infrastructure
     Specialized DB expertise required
     Expensive and inelastic solutions


We need tools built specifically for Big Data!
What is this thing called Hadoop?
Dealing with Big Data requires two things:
  Distributed, scalable storage
  Inexpensive, flexible analytics

Apache Hadoop is an open source software
platform that addresses both of these needs
  Includes a fault‐tolerant, distributed storage system
   (HDFS) developed for commodity servers
  Uses a technique called MapReduce to carry out
   exhaustive analysis over huge distributed data sets

Key benefits
 Affordable – Cost / TB is a fraction of traditional options
 Proven at scale – Numerous petabyte implementations in production;
  linear scalability
 Flexible – Data can be stored with or without schema
RDBMS vs. MapReduce/Hadoop

RDBMS                                   MapReduce/Hadoop
 Predefined schema                      No schema is required
 Strategic data placement for query     Random data placement
  tuning                                 Fast scan of the entire dataset
 Exploit indexes for fast retrieving    Uniform query performance
 SQL only                               Linearly scales for reads and
 Doesn’t scale linearly                  writes
                                         Support many languages
                                          including SQL



            Complementary technologies
Why Amazon Elastic MapReduce?
Managed Apache Hadoop Web Service
  Monitor thousands of clusters per day
  Use cases span from University students to Fortune 50

Reduces complexity of Hadoop management
  Handles node provisioning, customization, and shutdown
  Tunes Hadoop to your hardware and network
  Provides tools to debug and monitor your Hadoop clusters

Provides tight integration with AWS services
    Improved performance working with S3
    Automatic re-provisioning on node failure
    Dynamic expanding/shrinking of cluster size
    Spot integration
Elastic MapReduce Key Features
Simplified Cluster Configuration/Management
    Resize running job flows
    Support for EIP/IAM/Tagging
    Workload-specific configurations
    Bootstrap Actions

Enhanced Monitoring/Debugging
  Free CloudWatch Metrics / Alarms
  Hadoop Metrics in Console
  Ganglia Support

Improved Performance
  S3 Multipart Upload
  Cluster Compute Instances
Analytics Use Cases
Targeted advertising / Clickstream analysis
Data warehousing applications
Bio-informatics (Genome analysis)
Financial simulation (Monte Carlo simulation)
File processing (resize jpegs)
Web indexing
Data mining and BI
APACHE H IVE
DATA WAREHOUSE FOR H ADOOP
 Open source project started at Facebook
 Turns data on Hadoop into a virtually limitless
 data warehouse
 Provides data summarization, ad hoc querying
 and analysis
 Enables SQL-like queries on structured and
 unstructured data
   E.g. arbitrary field separators possible such as “,” in
    CSV file formats
 Inherits linear scalability of Hadoop
AWS Data Warehousing Architecture
Elastic Data Warehouse
 Customize cluster size to support varying resource needs
 (e.g. query support during the day versus batch processing
 overnight)
 Reduce costs by increasing server utilization
 Improve performance during high usage periods

                                   Data Warehouse
                                  (Batch Processing)
 Data Warehouse                                                      Data Warehouse
  (Steady State)                                                      (Steady State)

                                                        Shrink to
                    Expand to                          9 instances
                   25 instances
Reducing Costs with Spot Instances
Mix Spot and On-Demand instances to reduce cost and
accelerate computation while protecting against interruption

   Scenario #1           Scenario #2    #1: Cost without Spot
                           Job Flow     4 instances *14 hrs * $0.50 = $28
     Job Flow
                                        #2: Cost with Spot
                                        4 instances *7 hrs * $0.50 = $13 +
                                        5 instances * 7 hrs * $0.25 = $8.75
                                        Total = $21.75

      Duration:             Duration:
     14 Hours
                                        Time Savings: 50%
                            7 Hours     Cost Savings: ~22%


Other EMR + Spot Use Cases
Run entire cluster on Spot for biggest cost savings
Reduce the cost of application testing
Monitoring Clusters with CloudWatch
 Free CloudWatch Metrics and Alarms
   Track Hadoop job progress
   Alarm on degradations in cluster health
   Monitor aggregate Elastic MapReduce usage
Big Data Ecosystem And Tools
We have a rapidly growing ecosystem and will continue
to integrate with a wide range of partners. Some
examples:

  Business Intelligence
    MicroStrategy, Pentaho
  Analytics
    Datameer, Karmasphere, Quest
  Open source
    Ganglia, SQuirrel SQL
Resources
Amazon Elastic MapReduce
  aws.amazon.com/elasticmapreduce
  aws.amazon.com/articles/Elastic-MapReduce
  forums.aws.amazon.com/forum.jspa?forumID=52

Más contenido relacionado

La actualidad más candente

(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...Amazon Web Services
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSAmazon Web Services
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopDataWorks Summit
 
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...Amazon Web Services
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & ArchitectureAnjani Phuyal
 
Aaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloudAaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloudGanesh Raja
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Amazon Web Services
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Andy Wright
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...Rodney Joyce
 
Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudDataWorks Summit
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introductionchristian.perez
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAmazon Web Services
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 

La actualidad más candente (20)

(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWS
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
 
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
AWS Summit 2013 | India - Understanding the Total Cost of (Non) Ownership, Ki...
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Aaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloudAaum Analytics event - Big data in the cloud
Aaum Analytics event - Big data in the cloud
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
 
Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite Azure Con Cortana Analytics Suite
Azure Con Cortana Analytics Suite
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...CloudMonitor - Automated cost optimization and governance platform - Free BET...
CloudMonitor - Automated cost optimization and governance platform - Free BET...
 
Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the Cloud
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
AWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data AnalyticsAWS Enterprise Day | Big Data Analytics
AWS Enterprise Day | Big Data Analytics
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 

Destacado

Play’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methodsPlay’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methodsLuigi Buglione
 
AWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 MobileAWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 MobileAmazon Web Services
 
Cloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the CloudCloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the CloudAmazon Web Services
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big DataAmazon Web Services
 
AWS Partner Presentation - Riverbed
AWS Partner Presentation - RiverbedAWS Partner Presentation - Riverbed
AWS Partner Presentation - RiverbedAmazon Web Services
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon Web Services
 
Using raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleepUsing raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleepTao Tang-Little
 
AWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the CloudAWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the CloudAmazon Web Services
 
Data Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatData Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatNiek Temme
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析Simon Su
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMarco Brambilla
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehousedrluckyspin
 
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...Amazon Web Services
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?Amazon Web Services
 
Docker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry PiDocker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry PiPhilip Zheng
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)Amazon Web Services
 

Destacado (19)

Play’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methodsPlay’n’Learn: A Continuous KM Improvement Approach using FSM methods
Play’n’Learn: A Continuous KM Improvement Approach using FSM methods
 
AWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 MobileAWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
AWS Cloud Kata 2014 | Jakarta - 2-2 Mobile
 
Cloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the CloudCloud Connections: Integrating Enterprise IT with the Cloud
Cloud Connections: Integrating Enterprise IT with the Cloud
 
Cloud-Powered Social Gaming
Cloud-Powered Social GamingCloud-Powered Social Gaming
Cloud-Powered Social Gaming
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
AWS Partner Presentation - Riverbed
AWS Partner Presentation - RiverbedAWS Partner Presentation - Riverbed
AWS Partner Presentation - Riverbed
 
Amazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk IntroductionAmazon EC2 and AWS Elastic Beanstalk Introduction
Amazon EC2 and AWS Elastic Beanstalk Introduction
 
Using raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleepUsing raspberry pi to capture environmental factors that affect sleep
Using raspberry pi to capture environmental factors that affect sleep
 
AWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the CloudAWS Partner Webcast - Reporting and Analytics in the Cloud
AWS Partner Webcast - Reporting and Analytics in the Cloud
 
Introduction to AWS tools
Introduction to AWS toolsIntroduction to AWS tools
Introduction to AWS tools
 
Data Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostatData Donderdag - Making your own smart ‘machine learning’ thermostat
Data Donderdag - Making your own smart ‘machine learning’ thermostat
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
使用 Raspberry pi + fluentd + gcp cloud logging, big query 做iot 資料搜集與分析
 
Mobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di businessMobile + cloud + internet of things (iot) = nuove opportunità di business
Mobile + cloud + internet of things (iot) = nuove opportunità di business
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehouse
 
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?
 
Docker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry PiDocker Use Cases on Raspberry Pi
Docker Use Cases on Raspberry Pi
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)
 

Similar a AWS Summit 2011: Big Data Analytics in the AWS cloud

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...Yahoo Developer Network
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWSStylight
 

Similar a AWS Summit 2011: Big Data Analytics in the AWS cloud (20)

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Final deck
Final deckFinal deck
Final deck
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

AWS Summit 2011: Big Data Analytics in the AWS cloud

  • 1. Analytics in the Cloud Peter Sirota, GM Elastic MapReduce
  • 2. Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor.
  • 3. What is Big Data? Terabytes of semi-structured log data in which businesses want to:  find correlations/perform pattern matching  generate recommendations  calculate advanced statistics (i.e., TP99) Twitter “Firehose”  50 million tweets per day  1,400% growth per year  How can advertisers drink from it? Social graphs  Value increases with exponential growth in data connections Big Data is full of valuable, unanswered questions!
  • 4. Why is Big Data Hard (and Getting Harder)? Today’s Data Warehouses  Need to consolidate from multiple data sources in multiple formats across multiple businesses  Unconstrained growth of this business-critical information Today’s Users  Expect faster response time of fresher data  Sampling is not good enough and history is important  Demand inexpensive experimentation with new data  Become increasingly sophisticated Data Scientists Current systems don’t scale (and weren’t meant to)  Long time to provision more infrastructure  Specialized DB expertise required  Expensive and inelastic solutions We need tools built specifically for Big Data!
  • 5. What is this thing called Hadoop? Dealing with Big Data requires two things:  Distributed, scalable storage  Inexpensive, flexible analytics Apache Hadoop is an open source software platform that addresses both of these needs  Includes a fault‐tolerant, distributed storage system (HDFS) developed for commodity servers  Uses a technique called MapReduce to carry out exhaustive analysis over huge distributed data sets Key benefits  Affordable – Cost / TB is a fraction of traditional options  Proven at scale – Numerous petabyte implementations in production; linear scalability  Flexible – Data can be stored with or without schema
  • 6. RDBMS vs. MapReduce/Hadoop RDBMS MapReduce/Hadoop  Predefined schema  No schema is required  Strategic data placement for query  Random data placement tuning  Fast scan of the entire dataset  Exploit indexes for fast retrieving  Uniform query performance  SQL only  Linearly scales for reads and  Doesn’t scale linearly writes  Support many languages including SQL Complementary technologies
  • 7.
  • 8. Why Amazon Elastic MapReduce? Managed Apache Hadoop Web Service  Monitor thousands of clusters per day  Use cases span from University students to Fortune 50 Reduces complexity of Hadoop management  Handles node provisioning, customization, and shutdown  Tunes Hadoop to your hardware and network  Provides tools to debug and monitor your Hadoop clusters Provides tight integration with AWS services  Improved performance working with S3  Automatic re-provisioning on node failure  Dynamic expanding/shrinking of cluster size  Spot integration
  • 9. Elastic MapReduce Key Features Simplified Cluster Configuration/Management  Resize running job flows  Support for EIP/IAM/Tagging  Workload-specific configurations  Bootstrap Actions Enhanced Monitoring/Debugging  Free CloudWatch Metrics / Alarms  Hadoop Metrics in Console  Ganglia Support Improved Performance  S3 Multipart Upload  Cluster Compute Instances
  • 10. Analytics Use Cases Targeted advertising / Clickstream analysis Data warehousing applications Bio-informatics (Genome analysis) Financial simulation (Monte Carlo simulation) File processing (resize jpegs) Web indexing Data mining and BI
  • 11. APACHE H IVE DATA WAREHOUSE FOR H ADOOP Open source project started at Facebook Turns data on Hadoop into a virtually limitless data warehouse Provides data summarization, ad hoc querying and analysis Enables SQL-like queries on structured and unstructured data  E.g. arbitrary field separators possible such as “,” in CSV file formats Inherits linear scalability of Hadoop
  • 12. AWS Data Warehousing Architecture
  • 13. Elastic Data Warehouse Customize cluster size to support varying resource needs (e.g. query support during the day versus batch processing overnight) Reduce costs by increasing server utilization Improve performance during high usage periods Data Warehouse (Batch Processing) Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
  • 14. Reducing Costs with Spot Instances Mix Spot and On-Demand instances to reduce cost and accelerate computation while protecting against interruption Scenario #1 Scenario #2 #1: Cost without Spot Job Flow 4 instances *14 hrs * $0.50 = $28 Job Flow #2: Cost with Spot 4 instances *7 hrs * $0.50 = $13 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $21.75 Duration: Duration: 14 Hours Time Savings: 50% 7 Hours Cost Savings: ~22% Other EMR + Spot Use Cases Run entire cluster on Spot for biggest cost savings Reduce the cost of application testing
  • 15. Monitoring Clusters with CloudWatch Free CloudWatch Metrics and Alarms  Track Hadoop job progress  Alarm on degradations in cluster health  Monitor aggregate Elastic MapReduce usage
  • 16. Big Data Ecosystem And Tools We have a rapidly growing ecosystem and will continue to integrate with a wide range of partners. Some examples: Business Intelligence  MicroStrategy, Pentaho Analytics  Datameer, Karmasphere, Quest Open source  Ganglia, SQuirrel SQL
  • 17. Resources Amazon Elastic MapReduce aws.amazon.com/elasticmapreduce aws.amazon.com/articles/Elastic-MapReduce forums.aws.amazon.com/forum.jspa?forumID=52