SlideShare a Scribd company logo
1 of 54
Amazon Web Services
Big Data and the Cloud : A Best Friend Story
Joe Ziegler
Technical Evangelist
zieglerj@amazon.com    @jiyosub
Characteristics of
    Big Data


              How the Cloud Is
            Big Data’s Best Friend


                        Big Data on the Cloud
                          In the Real World
Characteristics of
    Big Data
BIG DATA
  When your data sets become
 so large that you have to start
innovating how to collect, store,
 organize, analyze and share it
Bigger Data
     is
Better Data
Features driven by MapReduce
Bigger Data
    is
Harder Data
Big Data is Getting Bigger

            2.7 Zetabytes in 2012
            Over 90% will be
           unstructured
            Data spread across a wide
           array of silos
Why is Big Data Hard (and Getting Harder)?

         Changing Data Requirements
      Faster response time of fresher data
Sampling is not good enough & history is important
       Increasing complexity of analytics
   Users demand inexpensive experimentation
Where is it Coming From?

Computer Generated           Human Generated
• Application server logs   • Twitter “Fire Hose” 50m
  (web sites, games)          tweets/day 1,400% growth
• Sensor data (weather,       per year
  water, smart grids)       • Blogs/Reviews/Emails/Pict
• Images/videos (traffic,     ures
  security cameras)         • Social Graphs: Facebook,
                              Linked-in, Contacts
The Role of Data
  is Changing
Until now, Questions you ask drove Data model




  New model is collect as much data as possible
  – “Data-First Philosophy”
Data is the new raw material for
Data is the new raw material for onbusiness on par
          any business           any
                                     par with
       with capital, people, labor
      capital, people, labor
We Need Tools Built Specifically
         for Big Data
Hadoop




• Scale out Easily     • Solves some Problems
• Parallel Computing   • Complex to Run
• Commodity Hardware   • Special Skills to Maintain
How the Cloud Is
Big Data’s Best Friend
How do we define the cloud?
       By Benefits!
No Cap Ex
                                     Pay Per
 Elasticity                           Use

                Cloud
Fast Time to Market          Focus on core
                              competency
Why is the Cloud
Big Data’s Best Friend
We know we want collect, store,
organize, analyze and share it.
But we have limited resources.
The Cloud Optimizes
Precious IT Resources
  i.e. Skilled People
“Over the next decade, the number of files or containers
that encapsulate the information in the digital universe
will grow by 75x.
While the pool of IT staff available to manage them will
grow only slightly. At 1.5x”
                                - 2011 IDC Digital Universe Study
Deploying a Hadoop cluster is hard
Cloud computing


              30%                    70%

The Old    Using Big         Managing All of the
IT World     Data      “Undifferentiated Heavy Lifting”
Cloud computing


                    30%                      70%

   The Old       Using Big          Managing All of the
   IT World        Data       “Undifferentiated Heavy Lifting”
                                                   Configuring
 Cloud-Based
                  Analyzing and Using Big Data       Cloud
Infrastructure
                                                     Assets
                             70%                      30%
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
The Cloud Optimizes
Capacity Resources
Elastic Compute Capacity




        On and Off           Fast Growth




       Variable peaks      Predictable peaks
Elastic Compute Capacity
                                             WASTE




        On and Off                 Fast Growth




       Variable peaks            Predictable peaks

                  CUSTOMER DISSATISFACTION
Elastic Compute Capacity

Capacity                           Traditional
                                   IT capacity
                                Elastic cloud capacity
                         Time
              Your IT needs
Elastic Compute Capacity




        On and Off           Fast Growth




       Variable peaks      Predictable peaks
The Cloud
Empowers Users to Balance
     Cost and Time
1 instance for 500 hours
            =
500 instances for 1 hour
The Cloud
   Reduces Cost
For Experimentation
The Cloud
Enables Collection and Storage
         of Big Data
Simple Storage Service
                                             1 Trillion
    1000.000



     750.000



     500.000



     250.000



       0.000




               750k+ peak transactions per second
Global Accessibility

                                                        Region
US-WEST (N. California)                                 EU-WEST (Ireland)
                          GOV CLOUD                                                       ASIA PAC (Tokyo)




                                 US-EAST (Virginia)


US-WEST (Oregon)




                                                                             ASIA PAC
                                                                            (Singapore)
                                      SOUTH AMERICA (Sao Paulo)
Storage Costs are Declining
Big Data on the Cloud
  In the Real World
Big Data Verticals

                                                                                             Social
Media/Adverti                                               Financial
                Oil & Gas      Retail       Life Sciences                   Security      Network/Gami
    sing                                                    Services
                                                                                               ng


                                                                                              User
                                                                             Anti-virus
   Targeted                                                 Monte Carlo                    Demographics
                             Recommend
  Advertising                                               Simulations


                 Seismic                       Genome                         Fraud
                                                                                           Usage analysis
                 Analysis                      Analysis                      Detection


  Image and
                             Transactions
    Video                                                   Risk Analysis
                               Analysis                                       Image           In-game
  Processing
                                                                            Recognition        metrics
Visualizations
Bank – Monte Carlo Simulations
                    “The AWS platform was a good fit for its
                 unlimited and flexible computational power to

23 Hours to         our risk-simulation process requirements.

                 With AWS, we now have the power to decide
20 Minutes         how fast we want to obtain simulation
                 results, and, more importantly, we have the
                 ability to run simulations not possible before
                  due to the large amount of infrastructure
                   required.” – Castillo, Director, Bankinter
Recommendations




        The Taste Test
http://www.etsy.com/tastetest
Recommendations

Gift Ideas for Facebook Friends




         etsy.com/gifts
Click Stream Analysis


  User recently
   purchased a        Targeted Ad
sports movie and
                      (1.7 Million per day)
 is searching for
   video games
Characteristics of
    Big Data


              How the Cloud Is
            Big Data’s Best Friend


                        Big Data on the Cloud
                          In the Real World
Questions?
Joe Ziegler
Technical Evangelist
zieglerj@amazon.com    @jiyosub

More Related Content

What's hot

Data security in cloud computing
Data security in cloud computingData security in cloud computing
Data security in cloud computingPrince Chandu
 
Cloud computing security issues and challenges
Cloud computing security issues and challengesCloud computing security issues and challenges
Cloud computing security issues and challengesDheeraj Negi
 
Cloud Application Development – The Future is now
Cloud Application Development – The Future is nowCloud Application Development – The Future is now
Cloud Application Development – The Future is nowSPEC INDIA
 
What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...
What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...
What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...Simplilearn
 
Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Brian K. Dickard
 
Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesThoughtWorks Studios
 
Cloud infrastructure and Cloud Services
Cloud infrastructure and Cloud ServicesCloud infrastructure and Cloud Services
Cloud infrastructure and Cloud ServicesIntel Corporation
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
cloud computing 5.pptx
cloud computing 5.pptxcloud computing 5.pptx
cloud computing 5.pptxJatin673232
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing SecurityNinh Nguyen
 
Cloud computing and data security
Cloud computing and data securityCloud computing and data security
Cloud computing and data securityMohammed Fazuluddin
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 

What's hot (20)

Data security in cloud computing
Data security in cloud computingData security in cloud computing
Data security in cloud computing
 
Cloud computing security issues and challenges
Cloud computing security issues and challengesCloud computing security issues and challenges
Cloud computing security issues and challenges
 
Cloud Application Development – The Future is now
Cloud Application Development – The Future is nowCloud Application Development – The Future is now
Cloud Application Development – The Future is now
 
Multi Cloud Architecture Approach
Multi Cloud Architecture ApproachMulti Cloud Architecture Approach
Multi Cloud Architecture Approach
 
What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...
What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...
What Is Cloud Computing? | Cloud Computing For Beginners | Cloud Computing Tr...
 
Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and Challenges
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud infrastructure and Cloud Services
Cloud infrastructure and Cloud ServicesCloud infrastructure and Cloud Services
Cloud infrastructure and Cloud Services
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
cloud computing 5.pptx
cloud computing 5.pptxcloud computing 5.pptx
cloud computing 5.pptx
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing Security
 
Cloud computing and data security
Cloud computing and data securityCloud computing and data security
Cloud computing and data security
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 

Viewers also liked

Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...
Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...
Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...LicensingLive! - SafeNet
 
Scaling the Cloud - Cloud Security
Scaling the Cloud - Cloud SecurityScaling the Cloud - Cloud Security
Scaling the Cloud - Cloud SecurityBill Burns
 
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Dion Hinchcliffe
 
Cloud Computing – Time for delivery. The question is not “if”, but “how, whe...
Cloud Computing – Time for delivery.  The question is not “if”, but “how, whe...Cloud Computing – Time for delivery.  The question is not “if”, but “how, whe...
Cloud Computing – Time for delivery. The question is not “if”, but “how, whe...Capgemini
 
Journey Through the AWS Cloud; Development and Test
Journey Through the AWS Cloud; Development and TestJourney Through the AWS Cloud; Development and Test
Journey Through the AWS Cloud; Development and TestAmazon Web Services
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...Amazon Web Services
 
Getting an open systems cloud strategy right the first time linthicm
Getting an open systems cloud strategy right the first time linthicmGetting an open systems cloud strategy right the first time linthicm
Getting an open systems cloud strategy right the first time linthicmDavid Linthicum
 
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS CorpAWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS CorpAmazon Web Services
 
Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...Ignacio M. Llorente
 
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaAmazon Web Services
 
Building the European Cloud Computing Strategy
Building the European Cloud Computing StrategyBuilding the European Cloud Computing Strategy
Building the European Cloud Computing StrategyCarl-Christian Buhr
 
Architectures for open and scalable clouds
Architectures for open and scalable cloudsArchitectures for open and scalable clouds
Architectures for open and scalable cloudsRandy Bias
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchAmazon Web Services
 
High Performance Web Applications
High Performance Web ApplicationsHigh Performance Web Applications
High Performance Web ApplicationsAmazon Web Services
 
Running Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
Running Microsoft SharePoint On AWS - Smartronix and AWS - WebinarRunning Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
Running Microsoft SharePoint On AWS - Smartronix and AWS - WebinarAmazon Web Services
 
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...Amazon Web Services
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureDavid Linthicum
 
Cloud Computing Without The Hype An Executive Guide (1.00 Slideshare)
Cloud Computing Without The Hype   An Executive Guide (1.00 Slideshare)Cloud Computing Without The Hype   An Executive Guide (1.00 Slideshare)
Cloud Computing Without The Hype An Executive Guide (1.00 Slideshare)Lustratus REPAMA
 
Cost Optimisation with Amazon Web Services
 Cost Optimisation with Amazon Web Services Cost Optimisation with Amazon Web Services
Cost Optimisation with Amazon Web ServicesAmazon Web Services
 

Viewers also liked (20)

Cloud Computing Technology Overview 2012
Cloud Computing Technology Overview 2012Cloud Computing Technology Overview 2012
Cloud Computing Technology Overview 2012
 
Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...
Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...
Hybrid Customer Insight - Data Collection and Analysis from On-premise and in...
 
Scaling the Cloud - Cloud Security
Scaling the Cloud - Cloud SecurityScaling the Cloud - Cloud Security
Scaling the Cloud - Cloud Security
 
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
Enterprise 2.0 Summit 2012 Closing Keynote - Next-Generation Ecosystems And i...
 
Cloud Computing – Time for delivery. The question is not “if”, but “how, whe...
Cloud Computing – Time for delivery.  The question is not “if”, but “how, whe...Cloud Computing – Time for delivery.  The question is not “if”, but “how, whe...
Cloud Computing – Time for delivery. The question is not “if”, but “how, whe...
 
Journey Through the AWS Cloud; Development and Test
Journey Through the AWS Cloud; Development and TestJourney Through the AWS Cloud; Development and Test
Journey Through the AWS Cloud; Development and Test
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
 
Getting an open systems cloud strategy right the first time linthicm
Getting an open systems cloud strategy right the first time linthicmGetting an open systems cloud strategy right the first time linthicm
Getting an open systems cloud strategy right the first time linthicm
 
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS CorpAWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
AWS Cloud Use Cases - Ezhil Arasan Babaraj, CSS Corp
 
Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...Open source and standards - unleashing the potential for innovation of cloud ...
Open source and standards - unleashing the potential for innovation of cloud ...
 
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
 
Building the European Cloud Computing Strategy
Building the European Cloud Computing StrategyBuilding the European Cloud Computing Strategy
Building the European Cloud Computing Strategy
 
Architectures for open and scalable clouds
Architectures for open and scalable cloudsArchitectures for open and scalable clouds
Architectures for open and scalable clouds
 
Getting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearch
 
High Performance Web Applications
High Performance Web ApplicationsHigh Performance Web Applications
High Performance Web Applications
 
Running Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
Running Microsoft SharePoint On AWS - Smartronix and AWS - WebinarRunning Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
Running Microsoft SharePoint On AWS - Smartronix and AWS - Webinar
 
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
 
Cloud Computing and Enterprise Architecture
Cloud Computing and Enterprise ArchitectureCloud Computing and Enterprise Architecture
Cloud Computing and Enterprise Architecture
 
Cloud Computing Without The Hype An Executive Guide (1.00 Slideshare)
Cloud Computing Without The Hype   An Executive Guide (1.00 Slideshare)Cloud Computing Without The Hype   An Executive Guide (1.00 Slideshare)
Cloud Computing Without The Hype An Executive Guide (1.00 Slideshare)
 
Cost Optimisation with Amazon Web Services
 Cost Optimisation with Amazon Web Services Cost Optimisation with Amazon Web Services
Cost Optimisation with Amazon Web Services
 

Similar to Big Data & The Cloud

Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisAmazon Web Services
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industriesdirkbeth
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifectasinhabipul
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...Intel IT Center
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Esri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarEsri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarAmazon Web Services
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the CloudBooz Allen Hamilton
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentMicrosoft Technet France
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
DISA: Cloud Computing And SaaS
DISA: Cloud Computing And SaaSDISA: Cloud Computing And SaaS
DISA: Cloud Computing And SaaSGovCloud Network
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesTony Pearson
 
Building Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCBuilding Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCAmazon Web Services
 

Similar to Big Data & The Cloud (20)

16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
The Cloud Changing the Game
The Cloud Changing the GameThe Cloud Changing the Game
The Cloud Changing the Game
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industries
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
The Enterprise Trifecta
The Enterprise TrifectaThe Enterprise Trifecta
The Enterprise Trifecta
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
Driving Towards Cloud 2015: A Technology Vision to Meet the Demands of Cloud ...
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Esri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarEsri Application on AWS Cloud Webinar
Esri Application on AWS Cloud Webinar
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the Cloud
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligent
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
DISA: Cloud Computing And SaaS
DISA: Cloud Computing And SaaSDISA: Cloud Computing And SaaS
DISA: Cloud Computing And SaaS
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Building Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCBuilding Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYC
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Big Data & The Cloud

  • 1. Amazon Web Services Big Data and the Cloud : A Best Friend Story
  • 3. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  • 5. BIG DATA When your data sets become so large that you have to start innovating how to collect, store, organize, analyze and share it
  • 6. Bigger Data is Better Data
  • 7. Features driven by MapReduce
  • 8. Bigger Data is Harder Data
  • 9. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  • 10. Why is Big Data Hard (and Getting Harder)? Changing Data Requirements Faster response time of fresher data Sampling is not good enough & history is important Increasing complexity of analytics Users demand inexpensive experimentation
  • 11. Where is it Coming From? Computer Generated Human Generated • Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400% growth • Sensor data (weather, per year water, smart grids) • Blogs/Reviews/Emails/Pict • Images/videos (traffic, ures security cameras) • Social Graphs: Facebook, Linked-in, Contacts
  • 12. The Role of Data is Changing
  • 13. Until now, Questions you ask drove Data model New model is collect as much data as possible – “Data-First Philosophy”
  • 14. Data is the new raw material for Data is the new raw material for onbusiness on par any business any par with with capital, people, labor capital, people, labor
  • 15. We Need Tools Built Specifically for Big Data
  • 16. Hadoop • Scale out Easily • Solves some Problems • Parallel Computing • Complex to Run • Commodity Hardware • Special Skills to Maintain
  • 17. How the Cloud Is Big Data’s Best Friend
  • 18. How do we define the cloud? By Benefits!
  • 19. No Cap Ex Pay Per Elasticity Use Cloud Fast Time to Market Focus on core competency
  • 20. Why is the Cloud Big Data’s Best Friend
  • 21. We know we want collect, store, organize, analyze and share it. But we have limited resources.
  • 22. The Cloud Optimizes Precious IT Resources i.e. Skilled People
  • 23. “Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x. While the pool of IT staff available to manage them will grow only slightly. At 1.5x” - 2011 IDC Digital Universe Study
  • 24. Deploying a Hadoop cluster is hard
  • 25. Cloud computing 30% 70% The Old Using Big Managing All of the IT World Data “Undifferentiated Heavy Lifting”
  • 26. Cloud computing 30% 70% The Old Using Big Managing All of the IT World Data “Undifferentiated Heavy Lifting” Configuring Cloud-Based Analyzing and Using Big Data Cloud Infrastructure Assets 70% 30%
  • 27. Managed Reusability Services Scale Innovation
  • 28. Managed Reusability Services Scale Innovation
  • 29. Managed Reusability Services Scale Innovation
  • 30. Managed Reusability Services Scale Innovation
  • 31. Managed Reusability Services Scale Innovation
  • 33. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 34. Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
  • 35. Elastic Compute Capacity Capacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
  • 36. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 37. The Cloud Empowers Users to Balance Cost and Time
  • 38. 1 instance for 500 hours = 500 instances for 1 hour
  • 39. The Cloud Reduces Cost For Experimentation
  • 40. The Cloud Enables Collection and Storage of Big Data
  • 41. Simple Storage Service 1 Trillion 1000.000 750.000 500.000 250.000 0.000 750k+ peak transactions per second
  • 42. Global Accessibility Region US-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia) US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
  • 43. Storage Costs are Declining
  • 44. Big Data on the Cloud In the Real World
  • 45. Big Data Verticals Social Media/Adverti Financial Oil & Gas Retail Life Sciences Security Network/Gami sing Services ng User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Analysis Image In-game Processing Recognition metrics
  • 47. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to 23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide 20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  • 48. Recommendations The Taste Test http://www.etsy.com/tastetest
  • 49. Recommendations Gift Ideas for Facebook Friends etsy.com/gifts
  • 50.
  • 51. Click Stream Analysis User recently purchased a Targeted Ad sports movie and (1.7 Million per day) is searching for video games
  • 52. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World

Editor's Notes

  1. The more misspelled words you collect, the better is your spellcheck application
  2. Data volume. As the data volume increases, it becomes increasingly difficult to process the data. Easy for 1 box: Harder for many boxes. When the data exceeds the capacity of one place.Data structure. Data comes in variety of formats from logs files to database schema to images. The diversity in data structures and format grows as well. To analyze this data holistically it is required to consolidate data across multiple data sources and multiple formats. Since valuable data comes from various companies like facebook, and linked-in it is also required to consolidate data across businesses.
  3. According to IDC, 95% of the 1.2 zettabytes of data in the digital universe is unstructured; and 70% of of this is user-generated content. Unstructured data is also projected for explosive growth, with estimates of compound annual growth (CAGR) at 62% from 2008 - 2012.ChallengesUnconstrained growth
  4. Finally complexity increases because demands on data are changing. Business requires faster response time on fresher data. Sampling is not good enough, history is important. Did the customer purchase something in February because his friend has a birthday or because it was a valentine's day – this information can help figure out how to help this customer next February. SQL is simply not enough to drive some of the answers. Data scientist require access to other statistical tools or other programing languages. Finally and most importantly users demand inexpensive experimentation. Often times we don’t know what products or facts will come out of our analytics so we cannot justify large upfront investment.
  5. Computers typically generate data as byproduct of interacting with people or other with other device. The more interactions, typically there is more data. This data comes in a variety of formats from semi-structured logs to in unstructured binaries. This data can be extremely valuable. It can be used to understand and track application or service behavior so that we can find errors or suboptimal user experience. We can mind it for patterns and correlations to generate recommendations.For example ecommerce sites can analyze user access logs to provide product recommendations, social networking sites provide new friends recommendations, dating sites find qualified soul mates, and so fourth.
  6. Big data is important.
  7. Now the Philosophy around data has changed. The philosophy is collect as much data as possible before you know what questions you are going to ask and most importantly you don't know which algorithms you are going to ask because you don't know what type of questions I might need in future. The ultimate mantra of collect and measure everything. How you are going to refine those algorithms, how much data, how much processing power, you really don't know how much resources you really need. Big data is what clouds are for. Its Big data analysis and cloud computing is the perfect marriage.Free of constraintsCollect and Store without limitsCompute and Analyze without limitsVisualize without limites
  8. Data is the next industrial revolutionToday, the core of any successful company is the data it manages and its ability to effectively model, analyze and process that data quickly – almost in real time - so that it can make the right decision faster and rise to the top.
  9. These resources are even more precious because of the rarity of skills.
  10. Our goal, and what our customers tell us they see, is that this ratio is inverted after moving to AWS. When you move your infrastructure to the cloud, this changes things drastically. Only 30% of your time should be spent architecting for the cloud and configuring your assets. This gives you 70% of your time to focus on your business. Project teams are free to add value to the business and it's customers, to innovate more quickly, and to deliver products to market quickly as well.
  11. Our goal, and what our customers tell us they see, is that this ratio is inverted after moving to AWS. When you move your infrastructure to the cloud, this changes things drastically. Only 30% of your time should be spent architecting for the cloud and configuring your assets. This gives you 70% of your time to focus on your business. Project teams are free to add value to the business and it's customers, to innovate more quickly, and to deliver products to market quickly as well.
  12. There are many patterns of usage that make capacity planning a complex science. From on and off usage patterns, where capacity is only needed at fixed times and not at others, fast growth where an online service becomes so successful that step changes in traditional capacity need to be added, variable peaks - where you just don't know what demand will be when and best guess applies, to predictable peaks such as during commute times as customers use mobile devices to access your service.
  13. Each of these examples is typified by wasted IT resources. Where you planned correctly, the IT resources will be over provisioned so that services are not impacted and customers lost during high demand. In the worst cases, that capacity will not be enough, and customer dissatisfaction will result. Most businesses have a mix differing patterns at play, and much time and resource is dedicated to planning and management to ensure services are always available. And when a new online service is really successful, you often can't ship in new capacity fast enough. Some say that's a nice problem to have, but those that have lived through it will tell you otherwise!
  14. Elasticity with AWS enables your provisioned capacity to follow demand. To scale up when needed and down when not. And as you only pay for what is used, the savings can be significant.
  15. You control how and when your service scales, so you can closely match increasing load in small increments, scale up fast when needed, and cool off and reduce the resources being used at any time of day. Even the most variable and complex demand patterns can be matched with the right amount of capacity - all automatically handled by AWS.
  16. Vertical scaling on commodity hardware. Perfect for Hadoop.
  17. New model is collect as much data as possible – “Data-First Philosophy”Allows us to collect data and ask questions laterAsk many different kinds of questions
  18. And scale is something AWS is used to dealing with. The Amazon Simple Storage Service, S3, recently passed 1 trillion objects in storage, with a peak transaction rate of 750 thousand per second. That's a lot of objects, all stored with 11 9's of durability.
  19. And just like an electricity grid, where you would not wire every factory to the same power station, the AWS infrastructure is global, with multiple regions around the globe from which services are available. This means you have control over things like where you applications run, where you data is stored, and where best to serve your customers from.
  20. Global reach (North Pole, Space)Native app every smartphoneSMSwebmobile-web10M+ users, 15M+ venues, ~1B check-insTerabytes of log data
  21. Bank at least 400,000 simulations to get realistic results.23 hours to 20 minutes and dramatically reduced processing, with the ability to reduce even further when required.Bankinter uses Amazon Web Services (AWS) as an integral part of their credit-risk simulation application, developing complex algorithms to simulate diverse scenarios in order to evaluate the financial health of their clients. “This requires high computational power,” says Bankinter Director of New Technologies Pedro Castillo. “We need to execute at least 400,000 simulations to get realistic results.”
  22. One result of such experimentation is Taste Test which is a recommendations product that helps Etsy figure out your tastes and to offer you relevant products. It works like this, you see 6 images at a time and you pick an image you like the most. You iterate through these sets of images a few times (you can also skip a set if you don’t like any images) and after a few iterations, Etsy displays the products that are most relevant to you. I encourage you to try – it’s a lot of fun.Today, Etsy uses Amazon Elastic MapReduce for web log analysis and recommendation algorithms. Because AWS easily and economically processes enormous amounts of data, it’s ideal for the type of processing that Etsy performs. Etsy copies its HTTP server logs every hour to Amazon S3, and syncs snapshots of the production database on a nightly basis. The combination of Amazon’s products and Etsy’s syncing/storage operation provides substantial benefits for Etsy. As Dr. Jason Davis, lead scientist at Etsy, explains, “the computing power available with [Amazon Elastic MapReduce] allows us to run these operations over dozens or even hundreds of machines without the need for owning the hardware.”Dr. Davis goes on to say, “Amazon Elastic MapReduce enables us to focus on developing our Hadoop-based analysis stack without worrying about the underlying infrastructure. As our cycles shift between development and research, our software and analysis requirements change and expand constantly, and [Amazon Elastic MapReduce] effectively eliminates half of our scaling issues, allowing us to focus on what is most important.”Etsy has realized improved results and performance by architecting their application for the cloud, with robustness and fault tolerance in mind, while providing a market for users to buy and sell handmade items online.
  23. Another example of such innovation is gift ideas. A lot of us struggle to pic the right present for our friends and so Etsy has a product that makes it easier. Etsy looks at your facebook social graph and learns about your interests and those of your friends. It uses this information to give you ideas for presents. For example, if your friend is an REM fan, Etsy may suggest a t-shirt with REM print on it.These innovative data products are just a few examples of innovation that is possible if we lower the cost barriers for data experimentation.
  24. Yelp is also doing product recommendations based on location, people reviews, or people searches. For example, “people who viewed this, viewed that” feature can help customers discover other relevant options in the area. People can discover interesting facts about places with “People viewed this after searching for that” feature. In this example, the westin hotel probably has glass elevators and is likely offers the best location to stay in san francisco at least by some definition of best.There is also “review highlights” feature. Yelp analyses written reviews and provides highlights about the places, so that their customers don’t have to read through all the reviews to get basic ideas about the place. All these differentiating features were possible because of Hadoop and flexible infrastructure for data processing.
  25. 500% increase in returns for advertising.Pedabytes of storage.Thereis a lot of data the retail business has about the users, it’s just never used it in advertising.For example, the retail knows that the customer has purchased a sports movie and is currently searching for video games, so it may make sense to advertise a sports video game for the customer.Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays. Amazon Elastic MapReduce and Cascading lets Razorfish focus on application development without having to worry about time-consuming set-up, management, or tuning of Hadoop clusters or the compute capacity upon which they sit.Ease of integration: Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms.Flexible: Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms.Adaptable: Cascading simplifies the integration of Hadoop with external ad systems.Scalable: AWS infrastructure helps Razorfish reliably store and process huge (Petabytes) data sets.The AWS elastic infrastructure platform allows Razorfish to manage wide variability in load by provisioning and removing capacity as needed. Mark Taylor, Program Director at Razorfish, said, “With our implementation of Amazon Elastic MapReduce and Cascading, there was no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired. We completed development and testing of our first client project in six weeks. Our process is completely automated. Total cost of the infrastructure averages around $13,000 per month. Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.”