SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
How Apache Hadoop is Revolutionizing
Business Intelligence and Data Analytics

Strata Conference, Sept 22nd 2011, New York, NY

Dr. Amr Awadallah, Founder, CTO, VP of Engineering
aaa@cloudera.com, twitter: @awadallah
Business Intelligence Before Adopting Apache Hadoop

  BI Reports + Interactive Apps                        Can’t Explore Original
                                                       High Fidelity Raw Data
    RDBMS (processed data)
       ETL Compute Grid
                   Moving Data To
                   Compute Doesn’t Scale
           Storage Only Grid (original raw data)
                                                                            Archiving =
            Mostly Append
                                                                            Premature
                           Collection                                       Data Death
                     Instrumentation

                    Copyright © 2011, Cloudera, Inc. All Rights Reserved.             2
Business Intelligence After Adopting Apache Hadoop
                                                               Data Exploration &
  BI Reports + Interactive Apps                                Advanced Analytics

            RDBMS




    ETL and Aggregations                               Complex Data Processing
                 Hadoop: Storage + Compute Grid
                 Mostly Append                       Keep Data Alive For Ever
                                  Collection
                            Instrumentation

                    Copyright © 2011, Cloudera, Inc. All Rights Reserved.           3
So What is Apache Hadoop?
• A scalable fault-tolerant distributed system for data storage and
  processing (open source under the Apache license)

• Core Hadoop has two main components:
    • Hadoop Distributed File System: self-healing high-bandwidth clustered storage
    • MapReduce: fault-tolerant distributed processing


• Key business values:
    •   Flexible – Store any data, Run any analysis (Mine First, Govern Later)
    •   Scalable – Start at 1TB/3-nodes then grow to petabytes/thousands of nodes
    •   Affordable – Cost per TB at a fraction of traditional options
    •   Open Source – No Lock-In, Rich Ecosystem, Large developer community
    •   Broadly adopted – A large and active ecosystem, Proven to run at scale

                          Copyright © 2011, Cloudera, Inc. All Rights Reserved.       4
The Main Benefit: Agility/Flexibility

Schema-on-Write (RDBMS):                                  Schema-on-Read (Hadoop):
•   Schema must be created before                        •   Data is simply copied to the file
    data is loaded                                           store, no special transformation is
                                                             needed
•   Explicit load operation has to
    take place which transforms data                     •   A SerDe (Serializer/Deserlizer) is
    to database internal structure                           applied during read time to extract
                                                             the required columns
•   New columns must be added
    explicitly before data for such                      •   New data can start flowing
    columns can be loaded into the                           anytime and will appear
    database                                                 retroactively once the SerDe is
                                                             updated to parse them
•   Read is Fast                                         •   Load is Fast
                                        Benefits
•   Standards/Governance                                 •   Flexibility/Agility

                         Copyright © 2011, Cloudera, Inc. All Rights Reserved.                 5
What is Complex Data Processing?
1. Java MapReduce: Gives the most flexibility and performance,
   but potentially long development cycle (the “assembly
   language” of Hadoop).
2. Streaming MapReduce (also Pipes): Allows you to develop in
   any programming language of your choice, but slightly lower
   performance and less flexibility.
3. Pig: A high-level language out of Yahoo, suitable for batch data
   flow workloads.
4. Hive: A SQL interpreter out of Facebook, also includes a meta-
   store mapping files to their schemas and associated SerDe.
5. Oozie: A PDL XML workflow server engine that enables creating
   a workflow of jobs composed of any of the above.

                    Copyright © 2011, Cloudera, Inc. All Rights Reserved.   6
What This Means For You: Agility

Up Front Design                                                Just in Time




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.         7
What This Means For You: Innovation

   Data Committee                                              Data Scientist




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.           8
What This Means For You: Consolidation

        Silos                                                           Sharing




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.             9
What This Means For You: Extract Value from Latent Data

  Archive to Tape                                         Keep Data Alive




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.       10
What This Means For You: Ability to Grow Fluidly
Benefit #2: Scalability




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.   11
What This Means For You: Data Beats Algorithm

  Smarter Algos                                            More Data




                Copyright © 2011, Cloudera, Inc. All Rights Reserved.   12
Where Does Hadoop Fit in the Enterprise Data Stack?
                                          Data Scientists          Analysts         Business Users



                                                                                       Enterprise
                                                 IDEs            BI, Analytics
                           System                                                      Reporting
                          Operators
                                          Development Tools                 Business Intelligence Tools


                          Cloudera
                         Mgmt Suite                                                               Enterprise
                                                                                                    Data
  Data
             ETL Tools




Architects                                                                                        Warehouse     Customers



                                                                                                  Low-Latency     Web
                                                                                                    Serving     Application

                                                                           Relational               Systems
                     Logs             Files           Web Data
                                                                           Databases

                                          Copyright © 2011, Cloudera, Inc. All Rights Reserved.                         13
Use The Right Tool For The Right Job

    Relational Databases:                             Hadoop:




Use when:                                              Use when:
•   Interactive OLAP Analytics (<1sec)                 •   Structured or Not (Agility)
•   Multistep ACID Transactions                        •   Scalability of Storage/Compute
•   100% SQL Compliance                                •   Complex Data Processing
                         Copyright © 2011, Cloudera, Inc. All Rights Reserved.              14
Two Core Use Cases Common Across Many Industries

Use Case                   Application                       Industry                            Application      Use Case
                      Social Network Analysis                  Web                   Clickstream Sessionization
 ADVANCED ANALYTICS




                                                             Media




                                                                                                                   DATA PROCESSING
                       Content Optimization                                          Clickstream Sessionization

                        Network Analytics                      Telco                              Mediation

                       Loyalty & Promotions                   Retail                             Data Factory

                          Fraud Analysis                    Financial                    Trade Reconciliation

                          Entity Analysis                    Federal                               SIGINT

                       Sequencing Analysis             Bioinformatics                      Genome Mapping

                         Product Quality              Manufacturing                     Mfg Process Tracking



                                         Copyright © 2011, Cloudera, Inc. All Rights Reserved.                               15
CDH: Cloudera’s Distribution Including Apache Hadoop
                     UI Framework                HUE                               SDK              HUE SDK


               Workflow       OOZIE             Scheduling         OOZIE                 Metadata      HIVE


                                        Languages / Compilers
                                                                       PIG, HIVE     Fast Read/Write
         Data Integration
                                                                                          Access
         FLUME, SQOOP, ODBC                                                                  HBASE


                                               Coordination                                ZOOKEEPER




•   Open Source – 100% Apache licensed, 100% Open Source, 100% Free.
•   Enterprise Ready – Predictable releases, Documentation, Hotfix Patches, Intensive QA
•   Integrated – All required component versions & dependencies are managed for you
•   Industry Standard – Existing RDBMS, ETL and BI systems work best with it
•   Many Form Factors – Public Cloud, Private Cloud, Ubuntu, RHEL, 32/64bit, etc

                                 Copyright © 2011, Cloudera, Inc. All Rights Reserved.                        16
SCM Express: Simplifies Installation and Configuration

    Service & Configuration Manager
    (SCM) Express takes the complexity out of
    deploying and configuring CDH.

     Provision a complete Hadoop stack in minutes
     Centrally manage system services through a user-
      friendly interface
     Manages services for up to 50 nodes
     FREE to download


KEY FEATURES
Automated, wizard-based    Central, real-time        Ability to configure the         Incorporates          Automates the expansion
   installation of the      dashboard for           cluster while it’s running   comprehensive validation   of services to new nodes
 complete Hadoop stack       configuration                                          and error checking       when they come online
                             management


         1                       2                            3                           4                          5
                                            ©2011 Cloudera, Inc. All Rights Reserved.                                         17
What is Cloudera Enterprise?

Cloudera Enterprise makes open source                            CLOUDERA ENTERPRISE COMPONENTS
Apache Hadoop enterprise-easy
                                                               Cloudera                       Production-Level
 Simplify and Accelerate Hadoop Deployment
                                                            Management Suite                      Support
 Reduce Adoption Costs and Risks
 Lower the Cost of Administration                             Comprehensive                Our Team of Experts
                                                             Toolset for Hadoop             On-Call to Help You
 Increase the Transparency & Control of Hadoop
                                                               Administration                 Meet Your SLAs
 Leverage the Experience of Our Experts



   3 of the top 5 telecommunications, mobile services, defense & intelligence,
     banking, media and retail organizations depend on Cloudera Enterprise

            EFFECTIVENESS                                                         EFFICIENCY
            Ensuring Repeatable Value from                                        Enabling Apache Hadoop to be
            Apache Hadoop Deployments                                             Affordably Run in Production



                                     ©2011 Cloudera, Inc. All Rights Reserved.                                    18
Hadoop World 2011

    The largest gathering of Hadoop practitioners, developers,
    business executives, industry luminaries and innovative
    companies in the Hadoop ecosystem.

•    1400 attendees, 25+ sponsors
                                                                        November 8-9
•    60 sessions across 5 tracks for:
                                                                   Sheraton New York Hotel
      – Business Decision Makers                                        & Towers, NYC
      – Enterprise Architects
      – IT Operators                                                   Learn more and register at
      – Data Scientists                                            www.hadoopworld.com
      – Developers
•    Cloudera Training and Certification                                  $50 discount for
     (November 7, 10, 11)                                                 Strata attendees



                           ©2011 Cloudera, Inc. All Rights Reserved.                                19
What I Would Like You To Remember:
• The Key Benefits of the Apache Hadoop Data Platform:
   • Agility/Flexibility (Enables Innovation/Exploration).
   • Complex Data Processing (Any Language, Any Problem).
   • Scalability of Storage/Compute (Freedom to Grow).
   • Economical Active Archive (Keep All Your Data Alive).

• Cloudera Enterprise enables:
   •   Lower the Cost of Management and Administration.
   •   Simplify and Accelerate Hadoop Deployment.
   •   Increase the Transparency & Control of Hadoop.
   •   Firm SLAs on Issue Resolution.
                   Copyright © 2011, Cloudera, Inc. All Rights Reserved.   20
Contact Information:



          Amr Awadallah
        aaa@cloudera.com
           650-644-3921
   http://twitter.com/awadallah




                  Copyright © 2011, Cloudera, Inc. All Rights Reserved.   21
Copyright © 2011, Cloudera, Inc. All Rights Reserved.   22
Appendix



      Copyright © 2011, Cloudera, Inc. All Rights Reserved.   23
Hadoop Timeline

                                                                              Fastest sort of a TB, 3.5mins
                                                                              over 910 nodes
                         Doug Cutting adds DFS &
                        MapReduce support to Nutch                                              • Fastest sort of a TB, 62secs
                                                                                                over 1,460 nodes
                                                            NY Times converts 4TB of            • Sorted a PB in 16.25hours
Doug Cutting & Mike Cafarella                                                                   over 3,658 nodes
                                                          image archives over 100 EC2s
  started working on Nutch


     2002        2003           2004         2005            2006            2007         2008           2009

             Google publishes GFS &
                                                   Yahoo! hires Cutting,                      Cloudera         Doug Cutting
               MapReduce papers
                                                 Hadoop spins out of Nutch                    Founded         joins Cloudera

                                                                     Facebooks launches Hive:
                                                                      SQL Support for Hadoop
                                                                                                  Hadoop Summit 2009,
                                                                                                     750 attendees


                                  Copyright © 2011, Cloudera, Inc. All Rights Reserved.                                  24
Cloudera’s Track Record
• Customers: Multiple customers with >1,000 Hadoop nodes under management
• Supporting dozens of diverse production use cases including ones that are revenue critical
  with tight SLA’s

• Community: years of demonstrated leadership in the Apache Hadoop ecosystem.
  Cloudera employees are:
    • The largest contributor to the Hadoop ecosystem in patches
    • Founders of 70% of the projects in the Apache Hadoop ecosystem including Apache
      Hadoop itself
    • The first to build & integrate what is now the reference Hadoop stack

• Industry: Multiple years of experience providing Hadoop solutions across industries:
    • 2 of the top 5 payments companies run Cloudera
    • 3 of the top 5 commerical banks run Cloudera
    • 2 of the top 4 online travel companies run Cloudera


                            Copyright © 2011, Cloudera, Inc. All Rights Reserved.        25
Cloudera Enterprise Management Suite

Utility                   It Helps You…                       So You Can…                        It’s Like…
Activity Monitor          • Consolidate all user activities
                            into a real-time view
                                                              • Improve performance              • MySQL Enterprise Monitor
                                                              • Improve conformance to           • Quest Foglight for Oracle /
                          • Diagnose user performance           SLAs                               SQL Server
                          • Track activity metrics            • Improve QOS



Service &                 • Manage system services            • Lower cost of administration     • Red Hat Satellite Server
                          • Automate changes                  • Improve uptime                   • Microsoft System Center
Configuration             • Validate settings                                                    • Oracle Enterprise Manager
Manager                   • 1-click security


Resource                  • Report on the usage of
                            scarce resources
                                                              • Improve quality of service       • VMware vCenter
                                                              • Extend the life of the cluster
Manager                   • Plan for capacity expansion




Authorization             • Centralize management of all
                            users, groups and privileges
                                                              • Lower the costs of
                                                                administration
                                                                                                 • Teradata security
                                                                                                   administration
Manager                   • Manage permissions via            • Improve compliance
                            delegated administration




                   ©2011 Cloudera, Inc. All Rights Reserved.                                                             26
CDH Integrates with Existing IT Infrastructure

   BI/Analytics   ETL                   Databases                 Cloud/OS      Hardware




                        Copyright © 2011, Cloudera, Inc. All Rights Reserved.              27
Copyright © 2011, Cloudera, Inc. All Rights Reserved.   28

Más contenido relacionado

La actualidad más candente

Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesOverview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesJames Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...Daniel Martin
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsDataWorks Summit
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data avanttic Consultoría Tecnológica
 

La actualidad más candente (20)

Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesOverview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Db2 analytics accelerator on ibm integrated analytics system technical over...
Db2 analytics accelerator on ibm integrated analytics system   technical over...Db2 analytics accelerator on ibm integrated analytics system   technical over...
Db2 analytics accelerator on ibm integrated analytics system technical over...
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Dipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAs
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
 

Similar a Business Intelligence and Data Analytics Revolutionized with Apache Hadoop

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesCloudera, Inc.
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 

Similar a Business Intelligence and Data Analytics Revolutionized with Apache Hadoop (20)

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 

Business Intelligence and Data Analytics Revolutionized with Apache Hadoop

  • 1. How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics Strata Conference, Sept 22nd 2011, New York, NY Dr. Amr Awadallah, Founder, CTO, VP of Engineering aaa@cloudera.com, twitter: @awadallah
  • 2. Business Intelligence Before Adopting Apache Hadoop BI Reports + Interactive Apps Can’t Explore Original High Fidelity Raw Data RDBMS (processed data) ETL Compute Grid Moving Data To Compute Doesn’t Scale Storage Only Grid (original raw data) Archiving = Mostly Append Premature Collection Data Death Instrumentation Copyright © 2011, Cloudera, Inc. All Rights Reserved. 2
  • 3. Business Intelligence After Adopting Apache Hadoop Data Exploration & BI Reports + Interactive Apps Advanced Analytics RDBMS ETL and Aggregations Complex Data Processing Hadoop: Storage + Compute Grid Mostly Append Keep Data Alive For Ever Collection Instrumentation Copyright © 2011, Cloudera, Inc. All Rights Reserved. 3
  • 4. So What is Apache Hadoop? • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license) • Core Hadoop has two main components: • Hadoop Distributed File System: self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing • Key business values: • Flexible – Store any data, Run any analysis (Mine First, Govern Later) • Scalable – Start at 1TB/3-nodes then grow to petabytes/thousands of nodes • Affordable – Cost per TB at a fraction of traditional options • Open Source – No Lock-In, Rich Ecosystem, Large developer community • Broadly adopted – A large and active ecosystem, Proven to run at scale Copyright © 2011, Cloudera, Inc. All Rights Reserved. 4
  • 5. The Main Benefit: Agility/Flexibility Schema-on-Write (RDBMS): Schema-on-Read (Hadoop): • Schema must be created before • Data is simply copied to the file data is loaded store, no special transformation is needed • Explicit load operation has to take place which transforms data • A SerDe (Serializer/Deserlizer) is to database internal structure applied during read time to extract the required columns • New columns must be added explicitly before data for such • New data can start flowing columns can be loaded into the anytime and will appear database retroactively once the SerDe is updated to parse them • Read is Fast • Load is Fast Benefits • Standards/Governance • Flexibility/Agility Copyright © 2011, Cloudera, Inc. All Rights Reserved. 5
  • 6. What is Complex Data Processing? 1. Java MapReduce: Gives the most flexibility and performance, but potentially long development cycle (the “assembly language” of Hadoop). 2. Streaming MapReduce (also Pipes): Allows you to develop in any programming language of your choice, but slightly lower performance and less flexibility. 3. Pig: A high-level language out of Yahoo, suitable for batch data flow workloads. 4. Hive: A SQL interpreter out of Facebook, also includes a meta- store mapping files to their schemas and associated SerDe. 5. Oozie: A PDL XML workflow server engine that enables creating a workflow of jobs composed of any of the above. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 6
  • 7. What This Means For You: Agility Up Front Design Just in Time Copyright © 2011, Cloudera, Inc. All Rights Reserved. 7
  • 8. What This Means For You: Innovation Data Committee Data Scientist Copyright © 2011, Cloudera, Inc. All Rights Reserved. 8
  • 9. What This Means For You: Consolidation Silos Sharing Copyright © 2011, Cloudera, Inc. All Rights Reserved. 9
  • 10. What This Means For You: Extract Value from Latent Data Archive to Tape Keep Data Alive Copyright © 2011, Cloudera, Inc. All Rights Reserved. 10
  • 11. What This Means For You: Ability to Grow Fluidly Benefit #2: Scalability Copyright © 2011, Cloudera, Inc. All Rights Reserved. 11
  • 12. What This Means For You: Data Beats Algorithm Smarter Algos More Data Copyright © 2011, Cloudera, Inc. All Rights Reserved. 12
  • 13. Where Does Hadoop Fit in the Enterprise Data Stack? Data Scientists Analysts Business Users Enterprise IDEs BI, Analytics System Reporting Operators Development Tools Business Intelligence Tools Cloudera Mgmt Suite Enterprise Data Data ETL Tools Architects Warehouse Customers Low-Latency Web Serving Application Relational Systems Logs Files Web Data Databases Copyright © 2011, Cloudera, Inc. All Rights Reserved. 13
  • 14. Use The Right Tool For The Right Job Relational Databases: Hadoop: Use when: Use when: • Interactive OLAP Analytics (<1sec) • Structured or Not (Agility) • Multistep ACID Transactions • Scalability of Storage/Compute • 100% SQL Compliance • Complex Data Processing Copyright © 2011, Cloudera, Inc. All Rights Reserved. 14
  • 15. Two Core Use Cases Common Across Many Industries Use Case Application Industry Application Use Case Social Network Analysis Web Clickstream Sessionization ADVANCED ANALYTICS Media DATA PROCESSING Content Optimization Clickstream Sessionization Network Analytics Telco Mediation Loyalty & Promotions Retail Data Factory Fraud Analysis Financial Trade Reconciliation Entity Analysis Federal SIGINT Sequencing Analysis Bioinformatics Genome Mapping Product Quality Manufacturing Mfg Process Tracking Copyright © 2011, Cloudera, Inc. All Rights Reserved. 15
  • 16. CDH: Cloudera’s Distribution Including Apache Hadoop UI Framework HUE SDK HUE SDK Workflow OOZIE Scheduling OOZIE Metadata HIVE Languages / Compilers PIG, HIVE Fast Read/Write Data Integration Access FLUME, SQOOP, ODBC HBASE Coordination ZOOKEEPER • Open Source – 100% Apache licensed, 100% Open Source, 100% Free. • Enterprise Ready – Predictable releases, Documentation, Hotfix Patches, Intensive QA • Integrated – All required component versions & dependencies are managed for you • Industry Standard – Existing RDBMS, ETL and BI systems work best with it • Many Form Factors – Public Cloud, Private Cloud, Ubuntu, RHEL, 32/64bit, etc Copyright © 2011, Cloudera, Inc. All Rights Reserved. 16
  • 17. SCM Express: Simplifies Installation and Configuration Service & Configuration Manager (SCM) Express takes the complexity out of deploying and configuring CDH.  Provision a complete Hadoop stack in minutes  Centrally manage system services through a user- friendly interface  Manages services for up to 50 nodes  FREE to download KEY FEATURES Automated, wizard-based Central, real-time Ability to configure the Incorporates Automates the expansion installation of the dashboard for cluster while it’s running comprehensive validation of services to new nodes complete Hadoop stack configuration and error checking when they come online management 1 2 3 4 5 ©2011 Cloudera, Inc. All Rights Reserved. 17
  • 18. What is Cloudera Enterprise? Cloudera Enterprise makes open source CLOUDERA ENTERPRISE COMPONENTS Apache Hadoop enterprise-easy Cloudera Production-Level  Simplify and Accelerate Hadoop Deployment Management Suite Support  Reduce Adoption Costs and Risks  Lower the Cost of Administration Comprehensive Our Team of Experts Toolset for Hadoop On-Call to Help You  Increase the Transparency & Control of Hadoop Administration Meet Your SLAs  Leverage the Experience of Our Experts 3 of the top 5 telecommunications, mobile services, defense & intelligence, banking, media and retail organizations depend on Cloudera Enterprise EFFECTIVENESS EFFICIENCY Ensuring Repeatable Value from Enabling Apache Hadoop to be Apache Hadoop Deployments Affordably Run in Production ©2011 Cloudera, Inc. All Rights Reserved. 18
  • 19. Hadoop World 2011 The largest gathering of Hadoop practitioners, developers, business executives, industry luminaries and innovative companies in the Hadoop ecosystem. • 1400 attendees, 25+ sponsors November 8-9 • 60 sessions across 5 tracks for: Sheraton New York Hotel – Business Decision Makers & Towers, NYC – Enterprise Architects – IT Operators Learn more and register at – Data Scientists www.hadoopworld.com – Developers • Cloudera Training and Certification $50 discount for (November 7, 10, 11) Strata attendees ©2011 Cloudera, Inc. All Rights Reserved. 19
  • 20. What I Would Like You To Remember: • The Key Benefits of the Apache Hadoop Data Platform: • Agility/Flexibility (Enables Innovation/Exploration). • Complex Data Processing (Any Language, Any Problem). • Scalability of Storage/Compute (Freedom to Grow). • Economical Active Archive (Keep All Your Data Alive). • Cloudera Enterprise enables: • Lower the Cost of Management and Administration. • Simplify and Accelerate Hadoop Deployment. • Increase the Transparency & Control of Hadoop. • Firm SLAs on Issue Resolution. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 20
  • 21. Contact Information: Amr Awadallah aaa@cloudera.com 650-644-3921 http://twitter.com/awadallah Copyright © 2011, Cloudera, Inc. All Rights Reserved. 21
  • 22. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 22
  • 23. Appendix Copyright © 2011, Cloudera, Inc. All Rights Reserved. 23
  • 24. Hadoop Timeline Fastest sort of a TB, 3.5mins over 910 nodes Doug Cutting adds DFS & MapReduce support to Nutch • Fastest sort of a TB, 62secs over 1,460 nodes NY Times converts 4TB of • Sorted a PB in 16.25hours Doug Cutting & Mike Cafarella over 3,658 nodes image archives over 100 EC2s started working on Nutch 2002 2003 2004 2005 2006 2007 2008 2009 Google publishes GFS & Yahoo! hires Cutting, Cloudera Doug Cutting MapReduce papers Hadoop spins out of Nutch Founded joins Cloudera Facebooks launches Hive: SQL Support for Hadoop Hadoop Summit 2009, 750 attendees Copyright © 2011, Cloudera, Inc. All Rights Reserved. 24
  • 25. Cloudera’s Track Record • Customers: Multiple customers with >1,000 Hadoop nodes under management • Supporting dozens of diverse production use cases including ones that are revenue critical with tight SLA’s • Community: years of demonstrated leadership in the Apache Hadoop ecosystem. Cloudera employees are: • The largest contributor to the Hadoop ecosystem in patches • Founders of 70% of the projects in the Apache Hadoop ecosystem including Apache Hadoop itself • The first to build & integrate what is now the reference Hadoop stack • Industry: Multiple years of experience providing Hadoop solutions across industries: • 2 of the top 5 payments companies run Cloudera • 3 of the top 5 commerical banks run Cloudera • 2 of the top 4 online travel companies run Cloudera Copyright © 2011, Cloudera, Inc. All Rights Reserved. 25
  • 26. Cloudera Enterprise Management Suite Utility It Helps You… So You Can… It’s Like… Activity Monitor • Consolidate all user activities into a real-time view • Improve performance • MySQL Enterprise Monitor • Improve conformance to • Quest Foglight for Oracle / • Diagnose user performance SLAs SQL Server • Track activity metrics • Improve QOS Service & • Manage system services • Lower cost of administration • Red Hat Satellite Server • Automate changes • Improve uptime • Microsoft System Center Configuration • Validate settings • Oracle Enterprise Manager Manager • 1-click security Resource • Report on the usage of scarce resources • Improve quality of service • VMware vCenter • Extend the life of the cluster Manager • Plan for capacity expansion Authorization • Centralize management of all users, groups and privileges • Lower the costs of administration • Teradata security administration Manager • Manage permissions via • Improve compliance delegated administration ©2011 Cloudera, Inc. All Rights Reserved. 26
  • 27. CDH Integrates with Existing IT Infrastructure BI/Analytics ETL Databases Cloud/OS Hardware Copyright © 2011, Cloudera, Inc. All Rights Reserved. 27
  • 28. Copyright © 2011, Cloudera, Inc. All Rights Reserved. 28