SlideShare una empresa de Scribd logo
1 de 69
Descargar para leer sin conexión
Saturday, June 12, 2010
Evolving a New Analytical Platform
          What Works and What’s Missing


          Jeff Hammerbacher
          Chief Scientist, Cloudera
          June 8, 2010



Saturday, June 12, 2010
My Background
         Thanks for Asking
         ▪   hammer@cloudera.com
         ▪   Studied Mathematics at Harvard
         ▪   Worked as a Quant on Wall Street
         ▪   Conceived, built, and led Data team at Facebook
             ▪   Nearly 30 amazing engineers and data scientists
             ▪   Several open source projects and research papers
         ▪   Founder of Cloudera
             ▪   Chief Scientist
             ▪   Also, check out the book “Beautiful Data”

Saturday, June 12, 2010
Presentation Outline
         ▪   BI: Science for Profit
             ▪   Need tools for whole research cycle
             ▪   SQL Server 2008 R2: defining the platform
         ▪   State of the Platform Ecosystem
         ▪   New Foundations: Hadoop
             ▪   Boiling the Frog
             ▪   Future developments
         ▪   Questions and Discussion




Saturday, June 12, 2010
BI is looking more like science (for profit)




Saturday, June 12, 2010
Jim Gray: Science entering Fourth Paradigm
            “We have to do better at producing tools to
                 support the whole research cycle”




Saturday, June 12, 2010
RDBMS only a small part of this tool set




Saturday, June 12, 2010
Example: SQL Server 2008 R2




Saturday, June 12, 2010
RDBMS: SQL Server




Saturday, June 12, 2010
ETL: SQL Server Integration Services
                                 RDBMS: SQL Server




Saturday, June 12, 2010
ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services




Saturday, June 12, 2010
ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services




Saturday, June 12, 2010
ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Saturday, June 12, 2010
CEP: StreamInsight
                          ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Saturday, June 12, 2010
CEP: StreamInsight
                          ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Saturday, June 12, 2010
MDM: Master Data Services
                                   CEP: StreamInsight
                          ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Saturday, June 12, 2010
Collaboration: SharePoint
                               MDM: Master Data Services
                                   CEP: StreamInsight
                          ETL: SQL Server Integration Services
                                 RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Saturday, June 12, 2010
What do we call this unified suite?




Saturday, June 12, 2010
For today: Analytical Data Platform




Saturday, June 12, 2010
Who makes up the platform ecosystem?




Saturday, June 12, 2010
Platform Providers




Saturday, June 12, 2010
Infrastructure Providers
                            Platform Providers




Saturday, June 12, 2010
Infrastructure Providers
                            Platform Providers
                          Application Developers




Saturday, June 12, 2010
Content Providers
                          Infrastructure Providers
                            Platform Providers
                          Application Developers




Saturday, June 12, 2010
Content Providers
                          Infrastructure Providers
                            Platform Providers
                          Application Developers
                                End Users




Saturday, June 12, 2010
What is new about the ecosystem today?




Saturday, June 12, 2010
Content Providers
            1. > 95% of enterprise data is unstructured
                  2. Data volumes growing rapidly




Saturday, June 12, 2010
Infrastructure Providers
                                    1. Cloud
                          2. Warehouse-Scale Computers




Saturday, June 12, 2010
Platform Providers
                                     1. Open source
                          2. Driven by consumer web properties




Saturday, June 12, 2010
Application Developers
                             1. Data Scientists
                          2. Diversity of languages




Saturday, June 12, 2010
End Users
                          1. Move beyond reporting to analytics
                           2. Make use of all enterprise data




Saturday, June 12, 2010
New foundations: HDFS and MapReduce




Saturday, June 12, 2010
(This is what boiling a frog feels like)




Saturday, June 12, 2010
2005: Doug/Mike start project inside Nutch




Saturday, June 12, 2010
2006: Doug joins Yahoo!




Saturday, June 12, 2010
2007: Make Hadoop scale




Saturday, June 12, 2010
2007: Make Hadoop scale
                          Yahoo! makes Pig open source




Saturday, June 12, 2010
Jim Gray’s “Fourth Paradigm” lecture
                              2007: Make Hadoop scale
                             Yahoo! makes Pig open source




Saturday, June 12, 2010
Randy Bryant’s “DISC” lecture
                          Jim Gray’s “Fourth Paradigm” lecture
                              2007: Make Hadoop scale
                             Yahoo! makes Pig open source




Saturday, June 12, 2010
Randy Bryant’s “DISC” lecture
                          Jim Gray’s “Fourth Paradigm” lecture
                              2007: Make Hadoop scale
                             Yahoo! makes Pig open source
                           Powerset makes HBase open source




Saturday, June 12, 2010
2008: Make Hadoop fast




Saturday, June 12, 2010
2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Saturday, June 12, 2010
First Hadoop Summit
                          2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Saturday, June 12, 2010
First Hadoop Summit
                          2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Saturday, June 12, 2010
Facebook makes Hive open source
                                First Hadoop Summit
                             2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Saturday, June 12, 2010
“MapReduce: A Major Step Backwards”
                            Facebook makes Hive open source
                                  First Hadoop Summit
                               2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Saturday, June 12, 2010
2009: Insert Hadoop into the enterprise




Saturday, June 12, 2010
2009: Insert Hadoop into the enterprise
                            Cloudera releases CDH




Saturday, June 12, 2010
First Hadoop World NYC
                   2009: Insert Hadoop into the enterprise
                            Cloudera releases CDH




Saturday, June 12, 2010
Yahoo! sorts a petabyte with Hadoop
                                First Hadoop World NYC
                   2009: Insert Hadoop into the enterprise
                                Cloudera releases CDH




Saturday, June 12, 2010
Yahoo! sorts a petabyte with Hadoop
                                First Hadoop World NYC
                   2009: Insert Hadoop into the enterprise
                          Cloudera releases CDH
                Cloudera adds training, support, services




Saturday, June 12, 2010
“The Unreasonable Effectiveness of Data”
                   Yahoo! sorts a petabyte with Hadoop
                          First Hadoop World NYC
                   2009: Insert Hadoop into the enterprise
                          Cloudera releases CDH
                Cloudera adds training, support, services




Saturday, June 12, 2010
2010: Integrate Hadoop into the enterprise




Saturday, June 12, 2010
2010: Integrate Hadoop into the enterprise
                          IBM announces InfoSphere BigInsights




Saturday, June 12, 2010
Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                          IBM announces InfoSphere BigInsights




Saturday, June 12, 2010
Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                          IBM announces InfoSphere BigInsights
                            Datameer and Karmasphere funded




Saturday, June 12, 2010
Teradata, Pentaho, and others integrate
              Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                          IBM announces InfoSphere BigInsights
                            Datameer and Karmasphere funded




Saturday, June 12, 2010
Hive adds JDBC and ODBC
               Teradata, Pentaho, and others integrate
              Yahoo! completes enterprise-class security
             2010: Integrate Hadoop into the enterprise
                          IBM announces InfoSphere BigInsights
                            Datameer and Karmasphere funded




Saturday, June 12, 2010
Hadoop will be an Analytical Data Platform




Saturday, June 12, 2010
What’s Next?




Saturday, June 12, 2010
Capture: Log collection and CEP




Saturday, June 12, 2010
Curate: Workflow and Scheduling




Saturday, June 12, 2010
Curate: Secondary and Full-Text Indexing




Saturday, June 12, 2010
Curate: Learn Structure from Data




Saturday, June 12, 2010
Analyze: Mesos-enabled frameworks




Saturday, June 12, 2010
Analyze: Link local and global data




Saturday, June 12, 2010
All behind a single pane of glass




Saturday, June 12, 2010
Cloudera Desktop
                          Making Many Computers Feel Like One




Saturday, June 12, 2010
(c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0




Saturday, June 12, 2010

Más contenido relacionado

Similar a Experiences Evolving a New Analytical Platform: What Works and What's Missing

Open End To End Js Stack
Open End To End Js StackOpen End To End Js Stack
Open End To End Js StackSkills Matter
 
GlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIsGlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIsRestlet
 
Ruby conf2010 OpenPaaS
Ruby conf2010 OpenPaaSRuby conf2010 OpenPaaS
Ruby conf2010 OpenPaaSDerek Collison
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableDenodo
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Migrating to CouchDB
Migrating to CouchDBMigrating to CouchDB
Migrating to CouchDBJohn Wood
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETLkabrilake
 
Shifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveShifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveWAN-IFRA
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Tugdual Grall
 
Fcc open-developer-day
Fcc open-developer-dayFcc open-developer-day
Fcc open-developer-dayTed Drake
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Philip Bourne
 
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Goikailan
 
Introduction to Node.js: perspectives from a Drupal dev
Introduction to Node.js: perspectives from a Drupal devIntroduction to Node.js: perspectives from a Drupal dev
Introduction to Node.js: perspectives from a Drupal devmcantelon
 
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersGeospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersThierry Badard
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperabilityparker01
 

Similar a Experiences Evolving a New Analytical Platform: What Works and What's Missing (20)

Open End To End Js Stack
Open End To End Js StackOpen End To End Js Stack
Open End To End Js Stack
 
20100513brown
20100513brown20100513brown
20100513brown
 
Tech WG report 2011
Tech WG report 2011Tech WG report 2011
Tech WG report 2011
 
GlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIsGlueCon 2015 - Publish your SQL data as web APIs
GlueCon 2015 - Publish your SQL data as web APIs
 
Ruby conf2010 OpenPaaS
Ruby conf2010 OpenPaaSRuby conf2010 OpenPaaS
Ruby conf2010 OpenPaaS
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Migrating to CouchDB
Migrating to CouchDBMigrating to CouchDB
Migrating to CouchDB
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
Shifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveShifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspective
 
Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?Why and How to integrate Hadoop and NoSQL?
Why and How to integrate Hadoop and NoSQL?
 
Fcc open-developer-day
Fcc open-developer-dayFcc open-developer-day
Fcc open-developer-day
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020
 
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
 
Introduction to Node.js: perspectives from a Drupal dev
Introduction to Node.js: perspectives from a Drupal devIntroduction to Node.js: perspectives from a Drupal dev
Introduction to Node.js: perspectives from a Drupal dev
 
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersGeospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
Railsconf 2010
Railsconf 2010Railsconf 2010
Railsconf 2010
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Último (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Experiences Evolving a New Analytical Platform: What Works and What's Missing

  • 2. Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist, Cloudera June 8, 2010 Saturday, June 12, 2010
  • 3. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Saturday, June 12, 2010
  • 4. Presentation Outline ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ State of the Platform Ecosystem ▪ New Foundations: Hadoop ▪ Boiling the Frog ▪ Future developments ▪ Questions and Discussion Saturday, June 12, 2010
  • 5. BI is looking more like science (for profit) Saturday, June 12, 2010
  • 6. Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Saturday, June 12, 2010
  • 7. RDBMS only a small part of this tool set Saturday, June 12, 2010
  • 8. Example: SQL Server 2008 R2 Saturday, June 12, 2010
  • 10. ETL: SQL Server Integration Services RDBMS: SQL Server Saturday, June 12, 2010
  • 11. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Saturday, June 12, 2010
  • 12. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Saturday, June 12, 2010
  • 13. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Saturday, June 12, 2010
  • 14. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Saturday, June 12, 2010
  • 15. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Saturday, June 12, 2010
  • 16. MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Saturday, June 12, 2010
  • 17. Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Saturday, June 12, 2010
  • 18. What do we call this unified suite? Saturday, June 12, 2010
  • 19. For today: Analytical Data Platform Saturday, June 12, 2010
  • 20. Who makes up the platform ecosystem? Saturday, June 12, 2010
  • 22. Infrastructure Providers Platform Providers Saturday, June 12, 2010
  • 23. Infrastructure Providers Platform Providers Application Developers Saturday, June 12, 2010
  • 24. Content Providers Infrastructure Providers Platform Providers Application Developers Saturday, June 12, 2010
  • 25. Content Providers Infrastructure Providers Platform Providers Application Developers End Users Saturday, June 12, 2010
  • 26. What is new about the ecosystem today? Saturday, June 12, 2010
  • 27. Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Saturday, June 12, 2010
  • 28. Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Saturday, June 12, 2010
  • 29. Platform Providers 1. Open source 2. Driven by consumer web properties Saturday, June 12, 2010
  • 30. Application Developers 1. Data Scientists 2. Diversity of languages Saturday, June 12, 2010
  • 31. End Users 1. Move beyond reporting to analytics 2. Make use of all enterprise data Saturday, June 12, 2010
  • 32. New foundations: HDFS and MapReduce Saturday, June 12, 2010
  • 33. (This is what boiling a frog feels like) Saturday, June 12, 2010
  • 34. 2005: Doug/Mike start project inside Nutch Saturday, June 12, 2010
  • 35. 2006: Doug joins Yahoo! Saturday, June 12, 2010
  • 36. 2007: Make Hadoop scale Saturday, June 12, 2010
  • 37. 2007: Make Hadoop scale Yahoo! makes Pig open source Saturday, June 12, 2010
  • 38. Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Saturday, June 12, 2010
  • 39. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Saturday, June 12, 2010
  • 40. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Saturday, June 12, 2010
  • 41. 2008: Make Hadoop fast Saturday, June 12, 2010
  • 42. 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Saturday, June 12, 2010
  • 43. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Saturday, June 12, 2010
  • 44. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Saturday, June 12, 2010
  • 45. Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Saturday, June 12, 2010
  • 46. “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Saturday, June 12, 2010
  • 47. 2009: Insert Hadoop into the enterprise Saturday, June 12, 2010
  • 48. 2009: Insert Hadoop into the enterprise Cloudera releases CDH Saturday, June 12, 2010
  • 49. First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Saturday, June 12, 2010
  • 50. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Saturday, June 12, 2010
  • 51. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Saturday, June 12, 2010
  • 52. “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Saturday, June 12, 2010
  • 53. 2010: Integrate Hadoop into the enterprise Saturday, June 12, 2010
  • 54. 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Saturday, June 12, 2010
  • 55. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Saturday, June 12, 2010
  • 56. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Saturday, June 12, 2010
  • 57. Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Saturday, June 12, 2010
  • 58. Hive adds JDBC and ODBC Teradata, Pentaho, and others integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Saturday, June 12, 2010
  • 59. Hadoop will be an Analytical Data Platform Saturday, June 12, 2010
  • 61. Capture: Log collection and CEP Saturday, June 12, 2010
  • 62. Curate: Workflow and Scheduling Saturday, June 12, 2010
  • 63. Curate: Secondary and Full-Text Indexing Saturday, June 12, 2010
  • 64. Curate: Learn Structure from Data Saturday, June 12, 2010
  • 66. Analyze: Link local and global data Saturday, June 12, 2010
  • 67. All behind a single pane of glass Saturday, June 12, 2010
  • 68. Cloudera Desktop Making Many Computers Feel Like One Saturday, June 12, 2010
  • 69. (c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Saturday, June 12, 2010