SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
DO NOT USE PUBLICLY
    Beyond Batch                         PRIOR TO 10/23/12
    Headline Goes Here
    Doug Cutting
    Speaker Name or Subhead Goes Here
    October 2012




1
Hadoop Started As Batch
    • Simple, powerful                   MapReduce
           •   Kills a lot of birds
    • Efficient, scalable
           •   Compute at storage
    • Shared platform
           •   Used by Pig, Hive, etc.
    • Incredibly useful!
           •   But not sufficient


2
Big Data Is Not (Just) Batch
    Its true themes are:
    • Scalability
        •   Affordability
              •   Commodity hardware
              •   Open-source software
        •   Distributed & reliable
    • Schema on read
    • Data beats algorithms


3
HBase: First Non-Batch Component
    Online key/value store
    • Complement to batch
          • Online put/get
          • Batch load & analyze
          • Best of both
          • Popular combination
    •   A step towards the future…


4
Holy Grail Of Big Data
    • Open source, commodity HW, etc.
    • Linear scaling
          •   To scale, just buy more hardware
    •   On many axes
          • Storage capacity
          • Throughput & latency
                •   of batch & query
    •   Transactions, Joins, Indexes
          •   and batch!

5
Google Gives Us A Map
                Google publication           Open source project
                Google publication             Apache project

         2004
         2004       GFS & MapReduce
                    GFS & MapReduce          2006
                                             2006       Hadoop
                                                        Hadoop       batch programs
                                                                     batch programs

         2005
         2005       Sawzall
                    Sawzall                  2008
                                             2008       Pig & Hive
                                                        Pig & Hive   batch queries
                                                                     batch queries

         2006
         2006       BigTable
                    BigTable                 2008
                                             2008       HBase
                                                        HBase        online key/value
                                                                     online key/value

          ...
          ...                  ...
                               ...             ...
                                               ...          ...
                                                            ...                ...
                                                                               ...
         2012
         2012       Spanner
                    Spanner                    ?
                                               ?             ?
                                                             ?       transactions, etc.
                                                                     holy grail?

                                     5 years – 26 authors!
6
Impala Is Latest Step
             Google publication    Open source project
             Google publication      Apache project
          2004   GFS & MapReduce   2006       Hadoop       batch programs
          2004   GFS & MapReduce   2006       Hadoop       batch programs

          2005   Sawzall           2008       Pig & Hive   batch queries
          2005   Sawzall           2008       Pig & Hive   batch queries

          2006   BigTable          2008       HBase        online key/value
          2006   BigTable          2008       HBase        online key/value
          2010   Dremel/F1         2012       Impala       online queries
           ...           ...        ...           ...               ...
          2012
          2012   Spanner
                 Spanner             ?
                                     ?             ?
                                                   ?       transactions, etc.
                                                           holy grail?




7
@cutting   #bigquestions

8

Más contenido relacionado

Más de Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceCloudera, Inc.
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enoughCloudera, Inc.
 

Más de Cloudera, Inc. (20)

Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 

Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

  • 1. DO NOT USE PUBLICLY Beyond Batch PRIOR TO 10/23/12 Headline Goes Here Doug Cutting Speaker Name or Subhead Goes Here October 2012 1
  • 2. Hadoop Started As Batch • Simple, powerful MapReduce • Kills a lot of birds • Efficient, scalable • Compute at storage • Shared platform • Used by Pig, Hive, etc. • Incredibly useful! • But not sufficient 2
  • 3. Big Data Is Not (Just) Batch Its true themes are: • Scalability • Affordability • Commodity hardware • Open-source software • Distributed & reliable • Schema on read • Data beats algorithms 3
  • 4. HBase: First Non-Batch Component Online key/value store • Complement to batch • Online put/get • Batch load & analyze • Best of both • Popular combination • A step towards the future… 4
  • 5. Holy Grail Of Big Data • Open source, commodity HW, etc. • Linear scaling • To scale, just buy more hardware • On many axes • Storage capacity • Throughput & latency • of batch & query • Transactions, Joins, Indexes • and batch! 5
  • 6. Google Gives Us A Map Google publication Open source project Google publication Apache project 2004 2004 GFS & MapReduce GFS & MapReduce 2006 2006 Hadoop Hadoop batch programs batch programs 2005 2005 Sawzall Sawzall 2008 2008 Pig & Hive Pig & Hive batch queries batch queries 2006 2006 BigTable BigTable 2008 2008 HBase HBase online key/value online key/value ... ... ... ... ... ... ... ... ... ... 2012 2012 Spanner Spanner ? ? ? ? transactions, etc. holy grail? 5 years – 26 authors! 6
  • 7. Impala Is Latest Step Google publication Open source project Google publication Apache project 2004 GFS & MapReduce 2006 Hadoop batch programs 2004 GFS & MapReduce 2006 Hadoop batch programs 2005 Sawzall 2008 Pig & Hive batch queries 2005 Sawzall 2008 Pig & Hive batch queries 2006 BigTable 2008 HBase online key/value 2006 BigTable 2008 HBase online key/value 2010 Dremel/F1 2012 Impala online queries ... ... ... ... ... 2012 2012 Spanner Spanner ? ? ? ? transactions, etc. holy grail? 7
  • 8. @cutting #bigquestions 8

Notas del editor

  1. you've heard a lot worried it might be hype bubbleyou might be hesitatingbelief: hadoop has a great futureover next few minutes tell you where hadoop is today and where hadoop's going so you can be comfortable adopting it for long-term profit from all of your data
  2. Proven incredibly usefulEnables folks to benefit from vastly more dataNot something we’re ashamed of, rather proud of
  3. … Need to look forward
  4. …Back to today…
  5. Major new capability in Impala not a niche another step towards a grander future we know where we're headed we shouldn't resist adoption use Impala today and expect more tomorrow
  6. For more, attend the 1:40 presentation on Impala