SlideShare una empresa de Scribd logo
1 de 98
Descargar para leer sin conexión
Big Data Analytics
                                Peter Sirota
General Manager, Amazon Elastic MapReduce
Overview
1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

4. The Big Data ecosystem
1



Introducing Big Data
Generation



 Collection & storage



Analytics & computation



Collaboration & sharing
The cost of data generation
         is falling
Lower cost,
higher throughput         Generation



                     Collection & storage



                    Analytics & computation



                    Collaboration & sharing
Lower cost,
higher throughput         Generation



                                                   Highly
                     Collection & storage     constrained



                    Analytics & computation



                    Collaboration & sharing
Data volume




                                                                                                                Generated data



                                                                                                                Available for analysis




      Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
      IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Elastic and highly scalable
             +
No upfront capital expense
                                   Remove
             +                =
Only pay for what you use         constraints
             +
   Available on-demand
Lower cost,
higher throughput         Generation



                                                   Highly
                     Collection & storage     constrained



                    Analytics & computation



                    Collaboration & sharing
Generation



               Collection & storage

Accelerated

              Analytics & computation



              Collaboration & sharing
Close the gap.
Big Data
Technologies and techniques for
 working productively with data,
          at any scale.
2




     From data to
actionable information
“Who buys video games?”
Per day:
    3.5 billion records
13 TB of click stream logs
71 million unique cookies
Results:
      500% return on ad spend
17,000% reduction in procurement time
“Who is using our
   service?”
Finding signal in the noise of logs

      Identified early mobile usage
 Invested heavily in mobile development
In January 2013
 9,432,061 unique mobile devices
    used the Yelp mobile app.

4 million+ calls. 5 million+ directions.
Open web index.
3.4 billion records.
  Available to all.
Full parse for impact of
    social networks
  300 lines of Ruby code.
         14 hours.
           $100.
Tweeting about Flu




      You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011
Tweeting about Food


 Tweets about
the price of rice




  Official food
 price inflation
3




  Analytics and
Cloud Computing
Generation



 Collection & storage



Analytics & computation



Collaboration & sharing
Generation


                                S3, Glacier,
 Collection & storage     Storage Gateway,
                               DynamoDB,
                             Redshift, RDS,
                                     HBase
Analytics & computation



Collaboration & sharing
Generation



 Collection & storage


                                      EC2 &
Analytics & computation   Elastic MapReduce




Collaboration & sharing
Generation



 Collection & storage



Analytics & computation


                                        EC2 & S3,
Collaboration & sharing            CloudFormation,
                               Elastic MapReduce,
                          RDS, DynamoDB, Redshift
Generation
                                                            S3, Glacier,
                                                      Storage Gateway,
                                                           DynamoDB,
                     Collection & storage                Redshift, RDS,
                                                                 HBase
AWS Data Pipeline
                                                                EC2 &
                    Analytics & computation         Elastic MapReduce


                                                            EC2 & S3,
                    Collaboration & sharing            CloudFormation,
                                                   Elastic MapReduce,
                                              RDS, DynamoDB, Redshift
Generation
                                                            S3, Glacier,
                                                      Storage Gateway,
                                                           DynamoDB,
                     Collection & storage                Redshift, RDS,
                                                                 HBase
AWS Data Pipeline
                                                                EC2 &
                    Analytics & computation         Elastic MapReduce


                                                            EC2 & S3,
                    Collaboration & sharing            CloudFormation,
                                                   Elastic MapReduce,
                                              RDS, DynamoDB, Redshift
Elastic MapReduce
Managed Hadoop analytics
S3, DynamoDB, Redshift
Input data
S3, DynamoDB, Redshift
       Input data




Code       Elastic
          MapReduce
S3, DynamoDB, Redshift
       Input data




Code       Elastic    Name
          MapReduce   node
S3, DynamoDB, Redshift
       Input data




Code       Elastic    Name
          MapReduce   node



                                                S3/HDFS


                                    Elastic
                                    cluster
S3, DynamoDB, Redshift
       Input data




Code       Elastic                        Name
          MapReduce                       node


                        Queries
                                                                    S3/HDFS
                         + BI
                    Via JDBC, Pig, Hive
                                                        Elastic
                                                        cluster
S3, DynamoDB, Redshift
       Input data




Code       Elastic                        Name                                Output
          MapReduce                       node


                        Queries
                                                                    S3/HDFS
                         + BI
                    Via JDBC, Pig, Hive
                                                        Elastic
                                                        cluster
S3, DynamoDB, Redshift
Input data




                                      Output
1. Elastic clusters
10 hours
6 hours
Peak capacity
2. Rapid, tuned provisioning
Tedious.
Remove undifferentiated
    heavy lifting.
3. Hadoop all the way down
Robust ecosystem.
Databases, machine learning, segmentation,
   clustering, analytics, metadata stores,
      exchange formats, and so on...
4. Agility for experimentation
Instance choice.
Stay flexible on instance type & number.
5. Cost optimizations
Built for Spot.
Name-your-price supercomputing.
1. Elastic clusters

2. Rapid, tuned provisioning
3. Hadoop all the way down

4. Agility for experimentation.

5. Cost optimizations
Vin Sharma vin.sharma@intel.com
Director, Product Strategy & Marketing
Big Data Software, Intel Corporation
Analysis of Data Can Transform Society




   Enhance scientific       Create new business   Increase public safety
  understanding, drive      models and improve         and improve
     innovation, and           organizational     energy efficiency with
accelerate medical cures.       processes.             smart grids.
Intel’s Vision to Democratize Big Data




Unlock Value in   Support Open   Deliver Software Value
    Silicon         Platforms
Intel at the Intersection of Big Data




      HPC                   Cloud             Open Source
  Enabling exascale     Helping enterprises   Contributing code
computing on massive        build open          and fostering
     data sets         interoperable clouds      ecosystem
Intel® Technology at the Heart of the Cloud




                  Server


        Storage

                  Network
Scale-Out Big Data
Compute Platform Optimization


          Cost-effective performance
          •Intel® Advanced Vector Extension Technology
          •Intel® Turbo Boost Technology 2.0
          •Intel® Advanced Encryption Standard New
          Instructions Technology
Intel® Advanced Vector Extensions Technology

                                                                                                              • Newest in a long line of
                                                                                                                processor instruction
                                                                                                                innovations

                                                                                                              • Increases floating point
                                                                                                                operations per clock up to
                                                                                                                2X1 performance




     Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
     measured using specific computer See backup for configuration details. software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
        1 : Performance comparison using Linpack benchmark. systems, components,
     information information on performance forecasts go to http://www.intel.com/performance
        For more legal
                       and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
73
Intel® Turbo Boost Technology 2.0



              More Performance
              Higher turbo speeds maximize
              performance for single and
              multi-threaded applications
Intel® Advanced Encryption
 Standard New Instructions

           • Processor assistance for
             performing AES encryption
             7 new instructions

           • Makes enabled encryption
             software faster and stronger
The Power of Intel® Platform Solutions:
        TeraSort for       50%                              Richer
         1 TB sort         Reduction                         user
                                                          experiences
4 HRS                                  80%
                                       Reduction      50%
                                                     Reduction      40%
                                                                    Reduction




   Previous
     Intel®
    Xeon®
                 Intel®
                Xeon®        Solid-State
                                                                                10 MIN
               Processor       Drive             10G
   Processor
                E5 2600                        Ethernet   Intel® Apache
                                                             Hadoop
The Virtuous Cycle of User Experience


                                    Clients
Cloud




                           Intelligent Systems
4




The Big Data
 Ecosystem
Data, data, everywhere...
     Data is stored in silos.
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On-premises
“How do I get my data to the cloud?”
Data mobility
    Generated and stored in AWS
    Inbound data transfer is free
    Multipart upload to S3
    Physical media
    AWS Direct Connect
    Regional replication of AMIs and snapshots
“How do I integrate my data for
     maximum impact?”
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On-premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On-premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On premises
S3      HBase on EMR    RDS




DynamoDB       EMR        Redshift




            On premises
AWS Data Pipeline
Orchestration for data-intensive workloads.
 Announced in November, available now.
AWS Data Pipeline
   Data-intensive orchestration and automation
   Reliable and scheduled
   Easy to use, drag and drop
   Execution and retry logic
   Map data dependencies
   Create and manage temporary compute
   resources
Anatomy of a pipeline
Additional checks and notifications
Arbitrarily complex pipelines
aws.amazon.com/datapipeline
aws.amazon.com/big-data
Summary
1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

4. The Big Data ecosystem
Get 600 Hours of free supercomputing
                time!


        www.powerof60.com
Thank you!
sirota@amazon.com

Más contenido relacionado

La actualidad más candente

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUG IT
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and BeyondPaco Nathan
 
Cloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsCloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsAnkit Rathi
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Paco Nathan
 
Multi-thematic spatial databases
Multi-thematic spatial databasesMulti-thematic spatial databases
Multi-thematic spatial databasesConor Mc Elhinney
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightPaco Nathan
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)NuoDB
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Martin Bém
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemTreasure Data, Inc.
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 

La actualidad más candente (20)

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and Beyond
 
Cloud Computing for Data Professionals
Cloud Computing for Data ProfessionalsCloud Computing for Data Professionals
Cloud Computing for Data Professionals
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)
 
Multi-thematic spatial databases
Multi-thematic spatial databasesMulti-thematic spatial databases
Multi-thematic spatial databases
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Sandish3Certs
Sandish3CertsSandish3Certs
Sandish3Certs
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 

Destacado

Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
High performance computing принципы проектирования сети
High performance computing принципы проектирования сетиHigh performance computing принципы проектирования сети
High performance computing принципы проектирования сетиMUK Extreme
 
How HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology EcosystemHow HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology Ecosysteminside-BigData.com
 
High Performance Computing: State of the Industry
High Performance Computing: State of the IndustryHigh Performance Computing: State of the Industry
High Performance Computing: State of the IndustryIMEX Research
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBigDataExpo
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destacado (16)

Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
High performance computing принципы проектирования сети
High performance computing принципы проектирования сетиHigh performance computing принципы проектирования сети
High performance computing принципы проектирования сети
 
How HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology EcosystemHow HPC Transforms the Corporate Information Technology Ecosystem
How HPC Transforms the Corporate Information Technology Ecosystem
 
High Performance Computing: State of the Industry
High Performance Computing: State of the IndustryHigh Performance Computing: State of the Industry
High Performance Computing: State of the Industry
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
IDC HPC Market Update
IDC HPC Market UpdateIDC HPC Market Update
IDC HPC Market Update
 
2016 IDC HPC Market Update
2016 IDC HPC Market Update2016 IDC HPC Market Update
2016 IDC HPC Market Update
 
EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15
EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15
EPA Horizon 2020 SC5 Roadshow presentation - UCD 02.06.15
 
HPC Market Update from IDC
HPC Market Update from IDCHPC Market Update from IDC
HPC Market Update from IDC
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar a Big Data Analytics

Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Amazon Web Services
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarAmazon Web Services
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesAmazon Web Services
 
Introduction to Elastic MapReduce
Introduction to Elastic MapReduceIntroduction to Elastic MapReduce
Introduction to Elastic MapReduceAmazon Web Services
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesAmazon Web Services
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Amazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012Amazon Web Services
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlKhanderao Kand
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server ProLynn Langit
 
AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013Amazon Web Services
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architecturesRaji Gogulapati
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016Amazon Web Services Korea
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 

Similar a Big Data Analytics (20)

Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
 
Introduction to Elastic MapReduce
Introduction to Elastic MapReduceIntroduction to Elastic MapReduce
Introduction to Elastic MapReduce
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Understanding Player Behaviour
Understanding Player BehaviourUnderstanding Player Behaviour
Understanding Player Behaviour
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
Data-driven Innovation - Wood
Data-driven Innovation - WoodData-driven Innovation - Wood
Data-driven Innovation - Wood
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
DAT103 Introducing Amazon RedShift - AWS re: Invent 2012
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
 
AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013AWS Big Data Analytics IP Expo 2013
AWS Big Data Analytics IP Expo 2013
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Treasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on HerokuTreasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on Heroku
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Big Data Analytics

  • 1. Big Data Analytics Peter Sirota General Manager, Amazon Elastic MapReduce
  • 2. Overview 1. Introducing Big Data 2. From data to actionable information 3. Analytics and Cloud Computing 4. The Big Data ecosystem
  • 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 5. The cost of data generation is falling
  • 6. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  • 7. Lower cost, higher throughput Generation Highly Collection & storage constrained Analytics & computation Collaboration & sharing
  • 8. Data volume Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 9. Elastic and highly scalable + No upfront capital expense Remove + = Only pay for what you use constraints + Available on-demand
  • 10. Lower cost, higher throughput Generation Highly Collection & storage constrained Analytics & computation Collaboration & sharing
  • 11. Generation Collection & storage Accelerated Analytics & computation Collaboration & sharing
  • 13. Big Data Technologies and techniques for working productively with data, at any scale.
  • 14. 2 From data to actionable information
  • 15. “Who buys video games?”
  • 16. Per day: 3.5 billion records 13 TB of click stream logs 71 million unique cookies
  • 17.
  • 18.
  • 19. Results: 500% return on ad spend 17,000% reduction in procurement time
  • 20. “Who is using our service?”
  • 21. Finding signal in the noise of logs Identified early mobile usage Invested heavily in mobile development
  • 22. In January 2013 9,432,061 unique mobile devices used the Yelp mobile app. 4 million+ calls. 5 million+ directions.
  • 23. Open web index. 3.4 billion records. Available to all.
  • 24. Full parse for impact of social networks 300 lines of Ruby code. 14 hours. $100.
  • 25. Tweeting about Flu You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011
  • 26. Tweeting about Food Tweets about the price of rice Official food price inflation
  • 27. 3 Analytics and Cloud Computing
  • 28. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 29. Generation S3, Glacier, Collection & storage Storage Gateway, DynamoDB, Redshift, RDS, HBase Analytics & computation Collaboration & sharing
  • 30. Generation Collection & storage EC2 & Analytics & computation Elastic MapReduce Collaboration & sharing
  • 31. Generation Collection & storage Analytics & computation EC2 & S3, Collaboration & sharing CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  • 32. Generation S3, Glacier, Storage Gateway, DynamoDB, Collection & storage Redshift, RDS, HBase AWS Data Pipeline EC2 & Analytics & computation Elastic MapReduce EC2 & S3, Collaboration & sharing CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  • 33. Generation S3, Glacier, Storage Gateway, DynamoDB, Collection & storage Redshift, RDS, HBase AWS Data Pipeline EC2 & Analytics & computation Elastic MapReduce EC2 & S3, Collaboration & sharing CloudFormation, Elastic MapReduce, RDS, DynamoDB, Redshift
  • 37. S3, DynamoDB, Redshift Input data Code Elastic MapReduce
  • 38. S3, DynamoDB, Redshift Input data Code Elastic Name MapReduce node
  • 39. S3, DynamoDB, Redshift Input data Code Elastic Name MapReduce node S3/HDFS Elastic cluster
  • 40. S3, DynamoDB, Redshift Input data Code Elastic Name MapReduce node Queries S3/HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • 41. S3, DynamoDB, Redshift Input data Code Elastic Name Output MapReduce node Queries S3/HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 57. 2. Rapid, tuned provisioning
  • 59. Remove undifferentiated heavy lifting.
  • 60. 3. Hadoop all the way down
  • 61. Robust ecosystem. Databases, machine learning, segmentation, clustering, analytics, metadata stores, exchange formats, and so on...
  • 62. 4. Agility for experimentation
  • 63. Instance choice. Stay flexible on instance type & number.
  • 66. 1. Elastic clusters 2. Rapid, tuned provisioning 3. Hadoop all the way down 4. Agility for experimentation. 5. Cost optimizations
  • 67. Vin Sharma vin.sharma@intel.com Director, Product Strategy & Marketing Big Data Software, Intel Corporation
  • 68. Analysis of Data Can Transform Society Enhance scientific Create new business Increase public safety understanding, drive models and improve and improve innovation, and organizational energy efficiency with accelerate medical cures. processes. smart grids.
  • 69. Intel’s Vision to Democratize Big Data Unlock Value in Support Open Deliver Software Value Silicon Platforms
  • 70. Intel at the Intersection of Big Data HPC Cloud Open Source Enabling exascale Helping enterprises Contributing code computing on massive build open and fostering data sets interoperable clouds ecosystem
  • 71. Intel® Technology at the Heart of the Cloud Server Storage Network
  • 72. Scale-Out Big Data Compute Platform Optimization Cost-effective performance •Intel® Advanced Vector Extension Technology •Intel® Turbo Boost Technology 2.0 •Intel® Advanced Encryption Standard New Instructions Technology
  • 73. Intel® Advanced Vector Extensions Technology • Newest in a long line of processor instruction innovations • Increases floating point operations per clock up to 2X1 performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer See backup for configuration details. software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other 1 : Performance comparison using Linpack benchmark. systems, components, information information on performance forecasts go to http://www.intel.com/performance For more legal and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 73
  • 74. Intel® Turbo Boost Technology 2.0 More Performance Higher turbo speeds maximize performance for single and multi-threaded applications
  • 75. Intel® Advanced Encryption Standard New Instructions • Processor assistance for performing AES encryption 7 new instructions • Makes enabled encryption software faster and stronger
  • 76. The Power of Intel® Platform Solutions: TeraSort for 50% Richer 1 TB sort Reduction user experiences 4 HRS 80% Reduction 50% Reduction 40% Reduction Previous Intel® Xeon® Intel® Xeon® Solid-State 10 MIN Processor Drive 10G Processor E5 2600 Ethernet Intel® Apache Hadoop
  • 77. The Virtuous Cycle of User Experience Clients Cloud Intelligent Systems
  • 78. 4 The Big Data Ecosystem
  • 79. Data, data, everywhere... Data is stored in silos.
  • 80. S3 HBase on EMR RDS DynamoDB EMR Redshift On-premises
  • 81. “How do I get my data to the cloud?”
  • 82. Data mobility Generated and stored in AWS Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect Regional replication of AMIs and snapshots
  • 83. “How do I integrate my data for maximum impact?”
  • 84. S3 HBase on EMR RDS DynamoDB EMR Redshift On-premises
  • 85. S3 HBase on EMR RDS DynamoDB EMR Redshift On-premises
  • 86. S3 HBase on EMR RDS DynamoDB EMR Redshift On premises
  • 87. S3 HBase on EMR RDS DynamoDB EMR Redshift On premises
  • 88. S3 HBase on EMR RDS DynamoDB EMR Redshift On premises
  • 89. AWS Data Pipeline Orchestration for data-intensive workloads. Announced in November, available now.
  • 90. AWS Data Pipeline Data-intensive orchestration and automation Reliable and scheduled Easy to use, drag and drop Execution and retry logic Map data dependencies Create and manage temporary compute resources
  • 91. Anatomy of a pipeline
  • 92. Additional checks and notifications
  • 96. Summary 1. Introducing Big Data 2. From data to actionable information 3. Analytics and Cloud Computing 4. The Big Data ecosystem
  • 97. Get 600 Hours of free supercomputing time! www.powerof60.com