SlideShare una empresa de Scribd logo
1 de 53
BIG Data on AWS
Paul Duffy
Characteristics of
Big Data

            How the Cloud Is
            Big Data’s Best Friend


                       Big Data on the Cloud
                       In the Real World
Characteristics of Big Data
The cost of data generation is falling rapidly



 Dramatic increase in volume, velocity and
              variety of data
BIG DATA
A collection of tools, techniques and technologies that
allow you to work productively with data at any scale.
Big Data is Getting Bigger

            2.7 Zetabytes in 2012
            Over 90% will be
            unstructured
            Data spread across a wide
            array of silos
Features driven by MapReduce
Variable data structures and sources
Computer Generated          Human Generated
• Application server logs     • Twitter “Fire Hose” 50m
  (web sites, games)            tweets/day 1,400%
• Sensor data (weather,         growth per year
  water, smart grids)         • Blogs/Reviews/Emails/P
• Images/videos (traffic,       ictures
  security cameras)           • Social Graphs:
                                Facebook, Linked-in,
                                Contacts
The Role of Data
  is Changing
Traditional analytics required a
              fixed data model,
based on pre-known questions




     Big Data promotes data exploration and
     experimentation which leads to innovation
Collection &   Computation    Collaboration
Generation
               storage       & analytics    & sharing
Lower costs,
faster throughput


                    Collection &        Computation         Collaboration
     Generation
                      storage            & analytics         & sharing


                                   Increased pressure on traditional IT and too
Require tools designed for data
 collection and computation at
any volume, velocity or format.
Software
 •   Designed for distribution
 •   Easy programming models
 •   Flexible language choice
 •   Platform for abstraction and ecosystem


 • Good example: Hadoop
Infrastructure
  •   Designed for distribution
  •   Easy programming models
  •   Flexible language choice
  •   Platform for abstraction and ecosystem


  • Good example: cloud computing
Software




           Infrastructure
How the Cloud Is
Big Data’s Best Friend
How do we define the cloud?
       By Benefits!
No Cap Ex
                                      Pay Per
     Elasticity
                                      Use


                      Cloud
Fast Time to Market           Focus on core
                              competency
Why is the Cloud
Big Data’s Best Friend?
We know we want collect, store, organize, analyze and
share it.

But we have limited resources.
The Cloud Optimizes
Precious IT Resources
i.e. Skilled People
“Over the next decade, the number of files or containers that
encapsulate the information in the digital universe will grow by
75x.
While the pool of IT staff available to manage them will grow
only slightly. At 1.5x”
                                  - 2011 IDC Digital Universe Study
Deploying a Hadoop cluster is hard
Cloud computing


                       30%                       70%

      The Old                            Managing All of the
      IT World    Using Big Data
                                   “Undifferentiated Heavy Lifting”
Cloud computing


                           30%                            70%

      The Old                                   Managing All of the
      IT World        Using Big Data
                                          “Undifferentiated Heavy Lifting”

      Cloud-Based                                               Configuring
     Infrastructure        Analyzing and Using Big Data
                                                                Cloud Assets

                                       70%                          30%
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
Managed
Reusability
              Services


Scale         Innovation
The Cloud Optimizes
Capacity Resources
Elastic Compute Capacity




    On and Off             Fast Growth




    Variable peaks         Predictable peaks
Elastic Compute Capacity
                                                WASTE




       On and Off                 Fast Growth




       Variable peaks             Predictable peaks

       CUSTOMER DISSATISFACTION
Elastic Compute Capacity

Capacity                           Traditional
                                   IT capacity

                                    Elastic cloud capacity
                            Time

            Your IT needs
Elastic Compute Capacity




       On and Off          Fast Growth




       Variable peaks      Predictable peaks
The Cloud Empowers Users
to Balance Cost and Time
1 instance for 500 hours
=
500 instances for 1 hour
                           I like this!
                             I scale
The Cloud
Reduces Cost
For Experimentation
The Cloud
Enables Collection and Storage
of Big Data
Simple Storage Service
                                         1 Trillion
1000.000

 750.000

 500.000

 250.000

   0.000




           750k+ peak transactions per second
Global Accessibility

                                                  Region
 US-WEST (N. California)                                   EU-WEST (Ireland)
                           GOV CLOUD                                                         ASIA PAC (Tokyo)




                                 US-EAST (Virginia)


US-WEST (Oregon)




                                                                               ASIA PAC
                                                                               (Singapore)
                                          SOUTH AMERICA (Sao Paulo)
Storage Costs are Declining
Big Data on the Cloud
In the Real World
Big Data Verticals

                                                                                               Social
Media/Advertisi                                              Financial
                  Oil & Gas     Retail       Life Sciences                   Security      Network/Gamin
      ng                                                     Services
                                                                                                 g



                                                                                               User
                                                                              Anti-virus
    Targeted                                                 Monte Carlo                    Demographics
                              Recommend
   Advertising                                               Simulations


                   Seismic                      Genome                         Fraud
                                                                                            Usage analysis
                   Analysis                     Analysis                      Detection


   Image and
                              Transactions
     Video                                                   Risk Analysis
                                Analysis                                       Image           In-game
   Processing
                                                                             Recognition        metrics
Visualizations
Bank – Monte Carlo Simulations
                 “The AWS platform was a good fit for its
                 unlimited and flexible computational power to

23 Hours to      our risk-simulation process requirements.

                 With AWS, we now have the power to decide
20 Minutes       how fast we want to obtain simulation
                 results, and, more importantly, we have the
                 ability to run simulations not possible before
                 due to the large amount of infrastructure
                 required.” – Castillo, Director, Bankinter
Recommendations




The Taste Test http://www.etsy.com/tastetest
Recommendations
Gift Ideas for Facebook Friends




etsy.com/gifts
Click Stream Analysis
   User recently
   purchased a
   sports movie and       Targeted Ad
   is searching for   (1.7 Million per day)
   video games
Characteristics of
Big Data

            How the Cloud Is
            Big Data’s Best Friend


                       Big Data on the Cloud
                       In the Real World
Thank you

Más contenido relacionado

La actualidad más candente

When Where Why Cloud
When Where Why CloudWhen Where Why Cloud
When Where Why Cloudreshmaroberts
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceNVIDIA
 
When where why cloud
When where why cloudWhen where why cloud
When where why cloudsallysogeti
 
Big data datacrunch
Big data datacrunchBig data datacrunch
Big data datacrunchReseau'Nable
 
Rethinking Disaster Prepardness THEITS12
Rethinking Disaster Prepardness THEITS12Rethinking Disaster Prepardness THEITS12
Rethinking Disaster Prepardness THEITS12Thomas Danford
 
IBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter ComputingIBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter ComputingTony Pearson
 
2012 RightScale Conference NYC - State of the Cloud
2012 RightScale Conference NYC - State of the Cloud2012 RightScale Conference NYC - State of the Cloud
2012 RightScale Conference NYC - State of the CloudRightScale
 
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataDr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataGlobal Business Events
 
Cutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellCutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellAMD
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
Software Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@PersistentSoftware Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@PersistentPersistent Systems Ltd.
 
End Note - AWS India Summit 2012
End Note - AWS India Summit 2012End Note - AWS India Summit 2012
End Note - AWS India Summit 2012Amazon Web Services
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
The What, Who & Why of NVIDIA
The What, Who & Why of NVIDIAThe What, Who & Why of NVIDIA
The What, Who & Why of NVIDIAAlison B. Lowndes
 
cloud of things paper
cloud of things papercloud of things paper
cloud of things paperAssem mousa
 
Cloud computing Paper
Cloud computing Paper Cloud computing Paper
Cloud computing Paper Assem mousa
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft Private Cloud
 

La actualidad más candente (19)

When Where Why Cloud
When Where Why CloudWhen Where Why Cloud
When Where Why Cloud
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and Inference
 
When where why cloud
When where why cloudWhen where why cloud
When where why cloud
 
Big data datacrunch
Big data datacrunchBig data datacrunch
Big data datacrunch
 
Rethinking Disaster Prepardness THEITS12
Rethinking Disaster Prepardness THEITS12Rethinking Disaster Prepardness THEITS12
Rethinking Disaster Prepardness THEITS12
 
IBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter ComputingIBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter Computing
 
2012 RightScale Conference NYC - State of the Cloud
2012 RightScale Conference NYC - State of the Cloud2012 RightScale Conference NYC - State of the Cloud
2012 RightScale Conference NYC - State of the Cloud
 
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataDr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
 
Cutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellCutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and Dell
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
Software Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@PersistentSoftware Faults, Failures and Their Mitigations | Turing100@Persistent
Software Faults, Failures and Their Mitigations | Turing100@Persistent
 
End Note - AWS India Summit 2012
End Note - AWS India Summit 2012End Note - AWS India Summit 2012
End Note - AWS India Summit 2012
 
Innovation Roundtable
Innovation RoundtableInnovation Roundtable
Innovation Roundtable
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
The What, Who & Why of NVIDIA
The What, Who & Why of NVIDIAThe What, Who & Why of NVIDIA
The What, Who & Why of NVIDIA
 
EPSRC CDT Conference
EPSRC CDT ConferenceEPSRC CDT Conference
EPSRC CDT Conference
 
cloud of things paper
cloud of things papercloud of things paper
cloud of things paper
 
Cloud computing Paper
Cloud computing Paper Cloud computing Paper
Cloud computing Paper
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 

Similar a 16h30 p duff-big-data-final

Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisAmazon Web Services
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the CloudBooz Allen Hamilton
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industriesdirkbeth
 
Esri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarEsri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarAmazon Web Services
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifectasinhabipul
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentMicrosoft Technet France
 
State of cloud computing v2
State of cloud computing v2State of cloud computing v2
State of cloud computing v2Md Aminul Hassan
 
2012: The Tipping Point of Broad Scale Cloud Deployment
2012: The Tipping Point of Broad Scale Cloud Deployment2012: The Tipping Point of Broad Scale Cloud Deployment
2012: The Tipping Point of Broad Scale Cloud DeploymentOpen Data Center Alliance
 
Mesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalMesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalTripp Payne
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) dataOscar Renalias
 

Similar a 16h30 p duff-big-data-final (20)

Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
The Cloud Changing the Game
The Cloud Changing the GameThe Cloud Changing the Game
The Cloud Changing the Game
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the Cloud
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industries
 
Esri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarEsri Application on AWS Cloud Webinar
Esri Application on AWS Cloud Webinar
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifecta
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligent
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
State of cloud computing v2
State of cloud computing v2State of cloud computing v2
State of cloud computing v2
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
2012: The Tipping Point of Broad Scale Cloud Deployment
2012: The Tipping Point of Broad Scale Cloud Deployment2012: The Tipping Point of Broad Scale Cloud Deployment
2012: The Tipping Point of Broad Scale Cloud Deployment
 
Mesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen FinalMesa Big Data 2nd Screen Final
Mesa Big Data 2nd Screen Final
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 
Aws jvaria e_collaborationforum
Aws jvaria e_collaborationforumAws jvaria e_collaborationforum
Aws jvaria e_collaborationforum
 
Unlocking value in your (big) data
Unlocking value in your (big) dataUnlocking value in your (big) data
Unlocking value in your (big) data
 

Más de Luiz Gustavo Santos

Apresentações CISPED 2013 todas consolidadas
Apresentações CISPED 2013   todas consolidadasApresentações CISPED 2013   todas consolidadas
Apresentações CISPED 2013 todas consolidadasLuiz Gustavo Santos
 
Apresentações CISPED 2013 - Consolidadas
Apresentações CISPED 2013 - ConsolidadasApresentações CISPED 2013 - Consolidadas
Apresentações CISPED 2013 - ConsolidadasLuiz Gustavo Santos
 
6.1 quadro de obrigações acessórias csn3 - fiesp set2013
6.1 quadro de obrigações acessórias   csn3 - fiesp set20136.1 quadro de obrigações acessórias   csn3 - fiesp set2013
6.1 quadro de obrigações acessórias csn3 - fiesp set2013Luiz Gustavo Santos
 
6 rfb peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb peso da burocracia tributária - a busca pela simplificação - resumidaLuiz Gustavo Santos
 
3 apresentação e social cisped 11 2013 [reparado]
3  apresentação e social cisped 11 2013 [reparado]3  apresentação e social cisped 11 2013 [reparado]
3 apresentação e social cisped 11 2013 [reparado]Luiz Gustavo Santos
 
02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE
02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE
02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTELuiz Gustavo Santos
 
01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL
01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL
01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASILLuiz Gustavo Santos
 
6.1 quadro de obrigações acessórias csn3 - fiesp set2013
6.1 quadro de obrigações acessórias   csn3 - fiesp set20136.1 quadro de obrigações acessórias   csn3 - fiesp set2013
6.1 quadro de obrigações acessórias csn3 - fiesp set2013Luiz Gustavo Santos
 
6 rfb peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb peso da burocracia tributária - a busca pela simplificação - resumidaLuiz Gustavo Santos
 
INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014
INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014
INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014Luiz Gustavo Santos
 

Más de Luiz Gustavo Santos (20)

Apresentações CISPED 2013 todas consolidadas
Apresentações CISPED 2013   todas consolidadasApresentações CISPED 2013   todas consolidadas
Apresentações CISPED 2013 todas consolidadas
 
7 mauro negruni
7 mauro negruni7 mauro negruni
7 mauro negruni
 
Apresentações CISPED 2013 - Consolidadas
Apresentações CISPED 2013 - ConsolidadasApresentações CISPED 2013 - Consolidadas
Apresentações CISPED 2013 - Consolidadas
 
8 debate jorge campos
8 debate   jorge campos8 debate   jorge campos
8 debate jorge campos
 
6.1 quadro de obrigações acessórias csn3 - fiesp set2013
6.1 quadro de obrigações acessórias   csn3 - fiesp set20136.1 quadro de obrigações acessórias   csn3 - fiesp set2013
6.1 quadro de obrigações acessórias csn3 - fiesp set2013
 
6 rfb peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb peso da burocracia tributária - a busca pela simplificação - resumida
 
5 alvaro bahia
5 alvaro bahia5 alvaro bahia
5 alvaro bahia
 
4 tania gurgel
4  tania gurgel4  tania gurgel
4 tania gurgel
 
3 apresentação e social cisped 11 2013 [reparado]
3  apresentação e social cisped 11 2013 [reparado]3  apresentação e social cisped 11 2013 [reparado]
3 apresentação e social cisped 11 2013 [reparado]
 
02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE
02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE
02 José Alberto Maia – Coordenador do Projeto eSOCIAL – MTE
 
01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL
01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL
01 Jorge Campos – Diretor Executivo e Coordenador do SPED BRASIL
 
Vinheta cisped 2013 v1
Vinheta cisped 2013   v1Vinheta cisped 2013   v1
Vinheta cisped 2013 v1
 
8 debate jorge campos
8 debate   jorge campos8 debate   jorge campos
8 debate jorge campos
 
6.1 quadro de obrigações acessórias csn3 - fiesp set2013
6.1 quadro de obrigações acessórias   csn3 - fiesp set20136.1 quadro de obrigações acessórias   csn3 - fiesp set2013
6.1 quadro de obrigações acessórias csn3 - fiesp set2013
 
6 rfb peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida6 rfb   peso da burocracia tributária - a busca pela simplificação - resumida
6 rfb peso da burocracia tributária - a busca pela simplificação - resumida
 
5 alvaro bahia
5 alvaro bahia5 alvaro bahia
5 alvaro bahia
 
4 tania gurgel
4  tania gurgel4  tania gurgel
4 tania gurgel
 
1 jorge campos
1 jorge campos1 jorge campos
1 jorge campos
 
Vinheta cisped 2013 v1
Vinheta cisped 2013   v1Vinheta cisped 2013   v1
Vinheta cisped 2013 v1
 
INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014
INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014
INFOLIVE BRASIL | Broadcast | WebTV | Filmes - Apresentação 2014
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

16h30 p duff-big-data-final

  • 1. BIG Data on AWS Paul Duffy
  • 2. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  • 4. The cost of data generation is falling rapidly Dramatic increase in volume, velocity and variety of data
  • 5. BIG DATA A collection of tools, techniques and technologies that allow you to work productively with data at any scale.
  • 6. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  • 7. Features driven by MapReduce
  • 8. Variable data structures and sources Computer Generated Human Generated • Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400% • Sensor data (weather, growth per year water, smart grids) • Blogs/Reviews/Emails/P • Images/videos (traffic, ictures security cameras) • Social Graphs: Facebook, Linked-in, Contacts
  • 9. The Role of Data is Changing
  • 10. Traditional analytics required a fixed data model, based on pre-known questions Big Data promotes data exploration and experimentation which leads to innovation
  • 11. Collection & Computation Collaboration Generation storage & analytics & sharing
  • 12. Lower costs, faster throughput Collection & Computation Collaboration Generation storage & analytics & sharing Increased pressure on traditional IT and too
  • 13. Require tools designed for data collection and computation at any volume, velocity or format.
  • 14. Software • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Hadoop
  • 15. Infrastructure • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: cloud computing
  • 16. Software Infrastructure
  • 17. How the Cloud Is Big Data’s Best Friend
  • 18. How do we define the cloud? By Benefits!
  • 19. No Cap Ex Pay Per Elasticity Use Cloud Fast Time to Market Focus on core competency
  • 20. Why is the Cloud Big Data’s Best Friend?
  • 21. We know we want collect, store, organize, analyze and share it. But we have limited resources.
  • 22. The Cloud Optimizes Precious IT Resources i.e. Skilled People
  • 23. “Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x. While the pool of IT staff available to manage them will grow only slightly. At 1.5x” - 2011 IDC Digital Universe Study
  • 24. Deploying a Hadoop cluster is hard
  • 25. Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting”
  • 26. Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting” Cloud-Based Configuring Infrastructure Analyzing and Using Big Data Cloud Assets 70% 30%
  • 27. Managed Reusability Services Scale Innovation
  • 28. Managed Reusability Services Scale Innovation
  • 29. Managed Reusability Services Scale Innovation
  • 30. Managed Reusability Services Scale Innovation
  • 31. Managed Reusability Services Scale Innovation
  • 33. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 34. Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
  • 35. Elastic Compute Capacity Capacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
  • 36. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 37. The Cloud Empowers Users to Balance Cost and Time
  • 38. 1 instance for 500 hours = 500 instances for 1 hour I like this! I scale
  • 39. The Cloud Reduces Cost For Experimentation
  • 40. The Cloud Enables Collection and Storage of Big Data
  • 41. Simple Storage Service 1 Trillion 1000.000 750.000 500.000 250.000 0.000 750k+ peak transactions per second
  • 42. Global Accessibility Region US-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia) US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
  • 43. Storage Costs are Declining
  • 44. Big Data on the Cloud In the Real World
  • 45. Big Data Verticals Social Media/Advertisi Financial Oil & Gas Retail Life Sciences Security Network/Gamin ng Services g User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Analysis Image In-game Processing Recognition metrics
  • 47. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to 23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide 20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  • 48. Recommendations The Taste Test http://www.etsy.com/tastetest
  • 49. Recommendations Gift Ideas for Facebook Friends etsy.com/gifts
  • 50.
  • 51. Click Stream Analysis User recently purchased a sports movie and Targeted Ad is searching for (1.7 Million per day) video games
  • 52. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World

Notas del editor

  1. According to IDC, 95% of the 1.2 zettabytes of data in the digital universe is unstructured; and 70% of of this is user-generated content. Unstructured data is also projected for explosive growth, with estimates of compound annual growth (CAGR) at 62% from 2008 - 2012.ChallengesUnconstrained growth
  2. The more misspelled words you collect, the better is your spellcheck application
  3. Computers typically generate data as byproduct of interacting with people or other with other device. The more interactions, typically there is more data. This data comes in a variety of formats from semi-structured logs to in unstructured binaries. This data can be extremely valuable. It can be used to understand and track application or service behavior so that we can find errors or suboptimal user experience. We can mind it for patterns and correlations to generate recommendations.For example ecommerce sites can analyze user access logs to provide product recommendations, social networking sites provide new friends recommendations, dating sites find qualified soul mates, and so fourth.
  4. Big data is important.
  5. Now the Philosophy around data has changed. The philosophy is collect as much data as possible before you know what questions you are going to ask and most importantly you don't know which algorithms you are going to ask because you don't know what type of questions I might need in future. The ultimate mantra of collect and measure everything. How you are going to refine those algorithms, how much data, how much processing power, you really don't know how much resources you really need. Big data is what clouds are for. Its Big data analysis and cloud computing is the perfect marriage.Free of constraintsCollect and Store without limitsCompute and Analyze without limitsVisualize without limites
  6. These resources are even more precious because of the rarity of skills.
  7. Our goal, and what our customers tell us they see, is that this ratio is inverted after moving to AWS. When you move your infrastructure to the cloud, this changes things drastically. Only 30% of your time should be spent architecting for the cloud and configuring your assets. This gives you 70% of your time to focus on your business. Project teams are free to add value to the business and it's customers, to innovate more quickly, and to deliver products to market quickly as well.
  8. Our goal, and what our customers tell us they see, is that this ratio is inverted after moving to AWS. When you move your infrastructure to the cloud, this changes things drastically. Only 30% of your time should be spent architecting for the cloud and configuring your assets. This gives you 70% of your time to focus on your business. Project teams are free to add value to the business and it's customers, to innovate more quickly, and to deliver products to market quickly as well.
  9. There are many patterns of usage that make capacity planning a complex science. From on and off usage patterns, where capacity is only needed at fixed times and not at others, fast growth where an online service becomes so successful that step changes in traditional capacity need to be added, variable peaks - where you just don't know what demand will be when and best guess applies, to predictable peaks such as during commute times as customers use mobile devices to access your service.
  10. Each of these examples is typified by wasted IT resources. Where you planned correctly, the IT resources will be over provisioned so that services are not impacted and customers lost during high demand. In the worst cases, that capacity will not be enough, and customer dissatisfaction will result. Most businesses have a mix differing patterns at play, and much time and resource is dedicated to planning and management to ensure services are always available. And when a new online service is really successful, you often can't ship in new capacity fast enough. Some say that's a nice problem to have, but those that have lived through it will tell you otherwise!
  11. Elasticity with AWS enables your provisioned capacity to follow demand. To scale up when needed and down when not. And as you only pay for what is used, the savings can be significant.
  12. You control how and when your service scales, so you can closely match increasing load in small increments, scale up fast when needed, and cool off and reduce the resources being used at any time of day. Even the most variable and complex demand patterns can be matched with the right amount of capacity - all automatically handled by AWS.
  13. Vertical scaling on commodity hardware. Perfect for Hadoop.
  14. New model is collect as much data as possible – “Data-First Philosophy”Allows us to collect data and ask questions laterAsk many different kinds of questions
  15. And scale is something AWS is used to dealing with. The Amazon Simple Storage Service, S3, recently passed 1 trillion objects in storage, with a peak transaction rate of 750 thousand per second. That's a lot of objects, all stored with 11 9's of durability.
  16. And just like an electricity grid, where you would not wire every factory to the same power station, the AWS infrastructure is global, with multiple regions around the globe from which services are available. This means you have control over things like where you applications run, where you data is stored, and where best to serve your customers from.
  17. Global reach (North Pole, Space)Native app every smartphoneSMSwebmobile-web10M+ users, 15M+ venues, ~1B check-insTerabytes of log data
  18. Bank at least 400,000 simulations to get realistic results.23 hours to 20 minutes and dramatically reduced processing, with the ability to reduce even further when required.Bankinter uses Amazon Web Services (AWS) as an integral part of their credit-risk simulation application, developing complex algorithms to simulate diverse scenarios in order to evaluate the financial health of their clients. “This requires high computational power,” says Bankinter Director of New Technologies Pedro Castillo. “We need to execute at least 400,000 simulations to get realistic results.”
  19. One result of such experimentation is Taste Test which is a recommendations product that helps Etsy figure out your tastes and to offer you relevant products. It works like this, you see 6 images at a time and you pick an image you like the most. You iterate through these sets of images a few times (you can also skip a set if you don’t like any images) and after a few iterations, Etsy displays the products that are most relevant to you. I encourage you to try – it’s a lot of fun.Today, Etsy uses Amazon Elastic MapReduce for web log analysis and recommendation algorithms. Because AWS easily and economically processes enormous amounts of data, it’s ideal for the type of processing that Etsy performs. Etsy copies its HTTP server logs every hour to Amazon S3, and syncs snapshots of the production database on a nightly basis. The combination of Amazon’s products and Etsy’s syncing/storage operation provides substantial benefits for Etsy. As Dr. Jason Davis, lead scientist at Etsy, explains, “the computing power available with [Amazon Elastic MapReduce] allows us to run these operations over dozens or even hundreds of machines without the need for owning the hardware.”Dr. Davis goes on to say, “Amazon Elastic MapReduce enables us to focus on developing our Hadoop-based analysis stack without worrying about the underlying infrastructure. As our cycles shift between development and research, our software and analysis requirements change and expand constantly, and [Amazon Elastic MapReduce] effectively eliminates half of our scaling issues, allowing us to focus on what is most important.”Etsy has realized improved results and performance by architecting their application for the cloud, with robustness and fault tolerance in mind, while providing a market for users to buy and sell handmade items online.
  20. Another example of such innovation is gift ideas. A lot of us struggle to pic the right present for our friends and so Etsy has a product that makes it easier. Etsy looks at your facebook social graph and learns about your interests and those of your friends. It uses this information to give you ideas for presents. For example, if your friend is an REM fan, Etsy may suggest a t-shirt with REM print on it.These innovative data products are just a few examples of innovation that is possible if we lower the cost barriers for data experimentation.
  21. Yelp is also doing product recommendations based on location, people reviews, or people searches. For example, “people who viewed this, viewed that” feature can help customers discover other relevant options in the area. People can discover interesting facts about places with “People viewed this after searching for that” feature. In this example, the westin hotel probably has glass elevators and is likely offers the best location to stay in san francisco at least by some definition of best.There is also “review highlights” feature. Yelp analyses written reviews and provides highlights about the places, so that their customers don’t have to read through all the reviews to get basic ideas about the place. All these differentiating features were possible because of Hadoop and flexible infrastructure for data processing.
  22. 500% increase in returns for advertising.Pedabytes of storage.There is a lot of data the retail business has about the users, it’s just never used it in advertising.For example, the retail knows that the customer has purchased a sports movie and is currently searching for video games, so it may make sense to advertise a sports video game for the customer.Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays. Amazon Elastic MapReduce and Cascading lets Razorfish focus on application development without having to worry about time-consuming set-up, management, or tuning of Hadoop clusters or the compute capacity upon which they sit.Ease of integration: Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms.Flexible: Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms.Adaptable: Cascading simplifies the integration of Hadoop with external ad systems.Scalable: AWS infrastructure helps Razorfish reliably store and process huge (Petabytes) data sets.The AWS elastic infrastructure platform allows Razorfish to manage wide variability in load by provisioning and removing capacity as needed. Mark Taylor, Program Director at Razorfish, said, “With our implementation of Amazon Elastic MapReduce and Cascading, there was no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired. We completed development and testing of our first client project in six weeks. Our process is completely automated. Total cost of the infrastructure averages around $13,000 per month. Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.”