SlideShare una empresa de Scribd logo
1 de 50
Integrating Hadoop into the Enterprise
Jonathan Seidman
Hadoop Summit 2012
June 14th, 2012
Who I Am

    • Solutions Architect, Partner Engineering
      Team.
    • Co-founder of Chicago Hadoop User
      Group and co-founder/organizer of
      Chicago Big Data.
    • jseidman@cloudera.com
    • @jseidman
    • cloudera.com/careers

2
                     ©2012 Cloudera, Inc. All Rights Reserved.
What I’ll Be Talking About
    • Some Background.
    • Common uses of Hadoop in an enterprise data
      infrastructure.
    • Hadoop Integration – the big picture.
    • Deeper dive:
      – Data import/export: Moving data between Hadoop
        and existing data stores.
      – ETL tools.
      – Business intelligence (BI) and analytic tools.
    • Example architectures and data flows.
    • Conclusions


3
                       ©2012 Cloudera, Inc. All Rights Reserved.
My Life Before Cloudera…




4
                ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop at Orbitz
                                    100.00%
                                                                                                     Queries
                                     90.00%
                                     80.00%                                                          Searches
                                            71.67%
                                     70.00%
                                     60.00%
                                     50.00%
                                     40.00%
                                                                                   34.30%
                                            31.87%
                                     30.00%
                                     20.00%
                                     10.00%
                                                                                   2.78%
                                       0.00%
                                                1    2   3   4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20




5
                 ©2012 Cloudera, Inc. All Rights Reserved.
But Hadoop Was An Isolated System



           Developers                                               Business Analysts Normal
                                                                    Users             Humans




6
                        ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop + the Data Warehouse…




7
                ©2012 Cloudera, Inc. All Rights Reserved.
…Enabled New Analyses




8
               ©2012 Cloudera, Inc. All Rights Reserved.
In our opinion, integration with existing IT systems
and software is critical, as we know enterprises will
not be replacing these technologies anytime soon.

    For Hadoop platforms this means integration with
    existing databases, data warehouses, and
    business-analytics and business-visualization
    tools. *




    * A near-term outlook for big data, Jo Maitland, GigaOM Pro, March 2012


9
                             ©2012 Cloudera, Inc. All Rights Reserved.
What Can We Do?
 • ETL
     – Scalable ETL – allows companies to meet SLA’s
       (inexpensively).
     – Agile – facilitates rapid modifications.
 • Moving analysis off of existing systems.
 • Sandbox for exploratory analytics.
 • Using Hadoop as an active archive.
 • Joining transactional data from a DB with
   interaction data.
 • Common theme: freeing up existing systems for
   tasks they’re better suited for.


10
                       ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise
  Data
Warehouse



Relational
Databases
                 Flume
       Data Import/Export                                                         ETL Tools



                            Appliances                                    NoSQL


 11
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Data Import/Export



     Enterprise
       Data
     Warehouse



     Relational
     Databases




12
                  ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Overview

 • Apache project designed to ease import
   and export of data between Hadoop and
   relational databases.
 • Provides functionality to do bulk imports
   and exports of data with HDFS, Hive and
   HBase.
 • Java based. Leverages MapReduce to
   transfer data in parallel.


13
                  ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Overview

 • Uses a “connector” abstraction.
 • Two types of connectors
     – Standard connectors are JDBC based.
     – Direct connectors use native database
       interfaces to improve performance.
 • Direct connectors are available for many
   open-source and commercial databases –
   MySQL, PostgreSQL, Oracle, SQL
   Server, Teradata, etc.

14
                    ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Import Flow

                Run import             Collect metadata

       Client                Sqoop

     Generate code,                               Pull data
     Execute MR job
                       MapReduce                         Map                  Map     Map

                              Write to Hadoop

                                                                             Hadoop




15
                                 ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Limitations

 Sqoop has some limitations, including:
 • Poor support for security.
       $ sqoop import –username scott –password tiger…
     – Sqoop can read command line options from
       an option file, but this still has holes.
 • Error prone syntax.
 • Tight coupling to JDBC model – not a
   good fit for non-RDBMS systems.


16
                      ©2012 Cloudera, Inc. All Rights Reserved.
Fortunately…

 Sqoop 2 (incubating) will address many of
 these limitations:
 •   Adds a web-based GUI.
 •   Centralized configuration.
 •   More flexible model.
 •   Improved security model.



17
                    ©2012 Cloudera, Inc. All Rights Reserved.
Informatica PowerExchange

 • Not just RDBMS integration – provides
   consistent, native integration between
   Hadoop and a range of data
   sources, databases, legacy
   systems, standard file formats, CRM…
 • Integrated with PowerCenter for pre/post-
   processing of data, administration, and
   metadata management.


18
                 ©2012 Cloudera, Inc. All Rights Reserved.
Power Exchange – Data Import

                      Access Data                            Pre-Process         Ingest Data
   Web server




Databases,            PowerExchange                           PowerCenter
Data Warehouse
                       Batch                                                       HDFS



Message Queues,
Email, Social Media    CDC                                                         HIVE
                                                             e.g.
                                                             Filter, Join, Cle
   ERP, CRM                                                  anse
                       Real-time


   Mainframe




 19
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Power Exchange – Data Export

Extract Data   Post-Process                             Deliver Data

                                                                          Web server




               PowerCenter                               PowerExchange
                                                                         Databases,
                                                                         Data Warehouse
 HDFS                                                      Batch




                                                           Real-time
                                                                           ERP, CRM
               e.g. Transform
               to target
               schema
                                                                           Mainframe




20
                             ©2012 Cloudera, Inc. All Rights Reserved.
Informatica PowerExchange
 1. Create Ingest or
 Extract Mapping



 2. Create Hadoop
 Connection




                               3. Configure Workflow




           4. Configure Hive
           Properties




21
                                             ©2012 Cloudera, Inc. All Rights Reserved.
There’s Always the Low-Tech Way…

                                                                         GreenPlum




                                                                GPLoad
 Hadoop                                                                  GreenPlum
Processing   Hive                                  Local Disk




                                                                         GreenPlum



22
                    ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise
  Data
Warehouse



Relational
Databases
                 Flume
       Data Import/Export                                                         ETL Tools



                            Appliances                                    NoSQL


 23
                                  ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools




24
             ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools




25
             ©2012 Cloudera, Inc. All Rights Reserved.
ETL – The Wikipedia Definition

 • Extract, transform and load (ETL) is a
   process in database usage and especially
   in data warehousing that involves:
     – Extracting data from outside sources
     – Transforming it to fit operational needs
     – Loading it into the end target (DB or data
       warehouse)

           http://en.wikipedia.org/wiki/Extract,_transform,_load



26
                           ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools

 • Very common use case for Hadoop.
 • Most ETL in Hadoop is still done through
   plain old MapReduce.
 • Companies want to leverage their existing
   developer skills – many enterprises have
   armies of SQL and ETL developers.




27
                 ©2012 Cloudera, Inc. All Rights Reserved.
Informatica HParser

 • Not exactly ETL – provides data
   transformation and parsing optimized for
   parallel processing on Hadoop.
 • Supports deeply hierarchical data and
   complex data formats.
 • Transformations are defined in a Windows
   UI and then deployed to a Hadoop Cluster
   for execution.


28
                 ©2012 Cloudera, Inc. All Rights Reserved.
HParser – How does it work?
                                         hadoop … dt-hadoop.jar
                                         … My_Parser /input/*/input*.txt

                                                                              HDFS




1. Develop a DT transformation
2. Deploy the transformation to Hadoop
3. Run DT on Hadoop to produce
   tabular data
4. Analyze the data with HIVE / PIG /
   MapReduce / Other…



 29
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Pentaho

 • Existing BI tools extended to support
   Hadoop.
 • Not just ETL – also provides data
   import/export, job
   orchestration, reporting, and analysis
   functionality.
 • Supports integration with HDFS, Hive and
   Hbase.
 • Community and Enterprise Editions
   offered.
30
                 ©2012 Cloudera, Inc. All Rights Reserved.
Pentaho
 • Primary component is
   Pentaho Data
   Integration (PDI), also
   known as Kettle.
 • PDI Provides a
   graphical drag-and-
   drop environment for
   defining ETL
   jobs, which interface
   with Java MapReduce
   to execute in-cluster
   transformations.

31
                   ©2012 Cloudera, Inc. All Rights Reserved.
Other ETL Solutions

 • Talend
     – Also following an open-source model.
     – Extending their existing data integration tools
       to data integration.
 • Pervasive RushAnalyzer
     – Software to build and run big data ETL, data
       transformation, mining and visualization on
       Hadoop.


32
                      ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise
  Data
Warehouse



Relational
Databases
                 Flume
       Data Import/Export                                                         ETL Tools



                            Appliances                                    NoSQL


 33
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Business Intelligence/Analytics Tools




34
              ©2012 Cloudera, Inc. All Rights Reserved.
BI – The Forrester Research Definition

 "Business Intelligence is a set of
 methodologies, processes, architectures, an
 d technologies that transform raw data into
 meaningful and useful information used to
 enable more effective strategic, tactical, and
 operational insights and decision-making.” *


 * http://en.wikipedia.org/wiki/Business_intelligence


35
                                ©2012 Cloudera, Inc. All Rights Reserved.
Business Intelligence/Analytics Tools




     Relational      Data
                                           …
     Databases    Warehouses




36
                               ©2012 Cloudera, Inc. All Rights Reserved.
Cloudera ODBC Driver
 • Most of these tools use the
   ODBC standard.
 • Since Hive is an SQL-like                                         ODBC


   system it’s a good fit for                                    DRIVER

   ODBC.                                                             HIVEQL

 • ODBC driver for Hive is
   available, but has licensing                                HIVE SERVER



   issues.                                                        HIVE

 • Because of this, Cloudera
   developed it’s own
   drivers, available for free
   download.
37
                   ©2012 Cloudera, Inc. All Rights Reserved.
Hive ODBC Limitations

 • Hive does not have full SQL support.
 • Multi-user is currently not supported by
   Hive Server.
 • Poor support for security.
 • Dependent on Hive – data must be loaded
   in Hive to be available.
 • The Thrift API in the Hive Server doesn’t
   support common ODBC calls.

38
                 ©2012 Cloudera, Inc. All Rights Reserved.
Hive ODBC Limitations

The Hive community is working on Hive Server 2 to
address some of these limitations:
 • Improved support for multiple users.
 • Improved support for ODBC and JDBC
   drivers.
 • And better support for security is coming.




39
                   ©2012 Cloudera, Inc. All Rights Reserved.
MicroStrategy




40
                 ©2012 Cloudera, Inc. All Rights Reserved.
Tableau




41
           ©2012 Cloudera, Inc. All Rights Reserved.
Other BI Connectors

 • Microsoft ODBC Driver
     – Part of the Hadoop on Windows solution.
     – Provides connectivity for MS BI tools such as
       Excel, PowerPivot, etc.
 • MapR ODBC driver
     – Support for standard ODBC based tools.




42
                     ©2012 Cloudera, Inc. All Rights Reserved.
Analytic Tools


     – RHadoop project.

     – Integration of SAS analytics with Hadoop.

     – Integration of SAP HANA with Hadoop

     – Toad for Cloud


43
                        ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop Specific Tools – Karmasphere




44
             ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop Specific Tools – Datameer




45
              ©2012 Cloudera, Inc. All Rights Reserved.
Example Integration




     Event           HParser                                       PowerCenter/Power     Data
                                     Hive                              Exchange
     Logs                                                                              Warehouse




 https://community.informatica.com/mpresources/Communities/IW2012/Docs/bos_65.pdf



46
                                    ©2012 Cloudera, Inc. All Rights Reserved.
Example – Migration of ETL


     Logs            Raw                                    ETL (SQL)             Target
                    Tables                                                        Tables


                                                            Data
                                                          Warehouse




                     HDFS                                       ETL
     Logs   Flume                                           (MapReduce)
                                                                          Sqoop       Target
                                                                                      Tables

                                                                                     Data
                                                                                   Warehouse



47
                             ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing?

 • Better tools for ETL without coding.
 • Better tools for data governance, data
   quality, etc.
     – Ensuring that data in Hadoop complies with
       policies, rules, etc.
 • Integration with commercial enterprise
   schedulers/workflow engines.
     – Although open-source workflow schedulers
       exist (e.g. Oozie).


48
                     ©2012 Cloudera, Inc. All Rights Reserved.
Conclusions
 • Hadoop integration is still in the early stages.
     – Expect to see new/better tools coming from both vendors
       and the open-source community.
 • Despite the relative immaturity of this space, there’s
   already a dizzying array of solutions available.
     – Choose solutions based on existing skills and tools already
       in use by your organization.
 • If using current BI tools integrated with Hive keep in
   mind that enhancements for multi-user, security, etc.
   are on the way.
 • And it bears repeating: always use the right tool for the
   job.
     – Hadoop won’t replace your data warehouses and
       databases, but will complement them.


49
                          ©2012 Cloudera, Inc. All Rights Reserved.
Thank
                   Questions?
      You!
             http://www.cloudera.com/partners/spotlight/

               +1 (888) 789-1488                         cloudera.com   twitter.com/
                                                                         cloudera
                 sales@cloudera.com

                                                                        facebook.com/
                                                                          cloudera




50
             ©2011 Cloudera, Inc. All Rights Reserved.

Más contenido relacionado

La actualidad más candente

Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesCloudera, Inc.
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopHortonworks
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Boston HUG - Cloudera presentation
Boston HUG - Cloudera presentationBoston HUG - Cloudera presentation
Boston HUG - Cloudera presentationreedshea
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleHarald Erb
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopHortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 

La actualidad más candente (20)

Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo SlidesWebinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Oracle in Database Hadoop
Oracle in Database HadoopOracle in Database Hadoop
Oracle in Database Hadoop
 
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache HadoopBest Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Boston HUG - Cloudera presentation
Boston HUG - Cloudera presentationBoston HUG - Cloudera presentation
Boston HUG - Cloudera presentation
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 

Similar a Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateClouderaUserGroups
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applicationsrussell_jurney
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetupclive boulton
 

Similar a Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise (20)

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Seattle Scalability - Sept Meetup
Seattle Scalability - Sept MeetupSeattle Scalability - Sept Meetup
Seattle Scalability - Sept Meetup
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise

  • 1. Integrating Hadoop into the Enterprise Jonathan Seidman Hadoop Summit 2012 June 14th, 2012
  • 2. Who I Am • Solutions Architect, Partner Engineering Team. • Co-founder of Chicago Hadoop User Group and co-founder/organizer of Chicago Big Data. • jseidman@cloudera.com • @jseidman • cloudera.com/careers 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. What I’ll Be Talking About • Some Background. • Common uses of Hadoop in an enterprise data infrastructure. • Hadoop Integration – the big picture. • Deeper dive: – Data import/export: Moving data between Hadoop and existing data stores. – ETL tools. – Business intelligence (BI) and analytic tools. • Example architectures and data flows. • Conclusions 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. My Life Before Cloudera… 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Hadoop at Orbitz 100.00% Queries 90.00% 80.00% Searches 71.67% 70.00% 60.00% 50.00% 40.00% 34.30% 31.87% 30.00% 20.00% 10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. But Hadoop Was An Isolated System Developers Business Analysts Normal Users Humans 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Hadoop + the Data Warehouse… 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. …Enabled New Analyses 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. In our opinion, integration with existing IT systems and software is critical, as we know enterprises will not be replacing these technologies anytime soon. For Hadoop platforms this means integration with existing databases, data warehouses, and business-analytics and business-visualization tools. * * A near-term outlook for big data, Jo Maitland, GigaOM Pro, March 2012 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. What Can We Do? • ETL – Scalable ETL – allows companies to meet SLA’s (inexpensively). – Agile – facilitates rapid modifications. • Moving analysis off of existing systems. • Sandbox for exploratory analytics. • Using Hadoop as an active archive. • Joining transactional data from a DB with interaction data. • Common theme: freeing up existing systems for tasks they’re better suited for. 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. BI/Analytics Tools Enterprise Data Warehouse Relational Databases Flume Data Import/Export ETL Tools Appliances NoSQL 11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. Data Import/Export Enterprise Data Warehouse Relational Databases 12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. Sqoop Overview • Apache project designed to ease import and export of data between Hadoop and relational databases. • Provides functionality to do bulk imports and exports of data with HDFS, Hive and HBase. • Java based. Leverages MapReduce to transfer data in parallel. 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. Sqoop Overview • Uses a “connector” abstraction. • Two types of connectors – Standard connectors are JDBC based. – Direct connectors use native database interfaces to improve performance. • Direct connectors are available for many open-source and commercial databases – MySQL, PostgreSQL, Oracle, SQL Server, Teradata, etc. 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. Sqoop Import Flow Run import Collect metadata Client Sqoop Generate code, Pull data Execute MR job MapReduce Map Map Map Write to Hadoop Hadoop 15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. Sqoop Limitations Sqoop has some limitations, including: • Poor support for security. $ sqoop import –username scott –password tiger… – Sqoop can read command line options from an option file, but this still has holes. • Error prone syntax. • Tight coupling to JDBC model – not a good fit for non-RDBMS systems. 16 ©2012 Cloudera, Inc. All Rights Reserved.
  • 17. Fortunately… Sqoop 2 (incubating) will address many of these limitations: • Adds a web-based GUI. • Centralized configuration. • More flexible model. • Improved security model. 17 ©2012 Cloudera, Inc. All Rights Reserved.
  • 18. Informatica PowerExchange • Not just RDBMS integration – provides consistent, native integration between Hadoop and a range of data sources, databases, legacy systems, standard file formats, CRM… • Integrated with PowerCenter for pre/post- processing of data, administration, and metadata management. 18 ©2012 Cloudera, Inc. All Rights Reserved.
  • 19. Power Exchange – Data Import Access Data Pre-Process Ingest Data Web server Databases, PowerExchange PowerCenter Data Warehouse Batch HDFS Message Queues, Email, Social Media CDC HIVE e.g. Filter, Join, Cle ERP, CRM anse Real-time Mainframe 19 ©2012 Cloudera, Inc. All Rights Reserved.
  • 20. Power Exchange – Data Export Extract Data Post-Process Deliver Data Web server PowerCenter PowerExchange Databases, Data Warehouse HDFS Batch Real-time ERP, CRM e.g. Transform to target schema Mainframe 20 ©2012 Cloudera, Inc. All Rights Reserved.
  • 21. Informatica PowerExchange 1. Create Ingest or Extract Mapping 2. Create Hadoop Connection 3. Configure Workflow 4. Configure Hive Properties 21 ©2012 Cloudera, Inc. All Rights Reserved.
  • 22. There’s Always the Low-Tech Way… GreenPlum GPLoad Hadoop GreenPlum Processing Hive Local Disk GreenPlum 22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. BI/Analytics Tools Enterprise Data Warehouse Relational Databases Flume Data Import/Export ETL Tools Appliances NoSQL 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. ETL Tools 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. ETL Tools 25 ©2012 Cloudera, Inc. All Rights Reserved.
  • 26. ETL – The Wikipedia Definition • Extract, transform and load (ETL) is a process in database usage and especially in data warehousing that involves: – Extracting data from outside sources – Transforming it to fit operational needs – Loading it into the end target (DB or data warehouse) http://en.wikipedia.org/wiki/Extract,_transform,_load 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. ETL Tools • Very common use case for Hadoop. • Most ETL in Hadoop is still done through plain old MapReduce. • Companies want to leverage their existing developer skills – many enterprises have armies of SQL and ETL developers. 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Informatica HParser • Not exactly ETL – provides data transformation and parsing optimized for parallel processing on Hadoop. • Supports deeply hierarchical data and complex data formats. • Transformations are defined in a Windows UI and then deployed to a Hadoop Cluster for execution. 28 ©2012 Cloudera, Inc. All Rights Reserved.
  • 29. HParser – How does it work? hadoop … dt-hadoop.jar … My_Parser /input/*/input*.txt HDFS 1. Develop a DT transformation 2. Deploy the transformation to Hadoop 3. Run DT on Hadoop to produce tabular data 4. Analyze the data with HIVE / PIG / MapReduce / Other… 29 ©2012 Cloudera, Inc. All Rights Reserved.
  • 30. Pentaho • Existing BI tools extended to support Hadoop. • Not just ETL – also provides data import/export, job orchestration, reporting, and analysis functionality. • Supports integration with HDFS, Hive and Hbase. • Community and Enterprise Editions offered. 30 ©2012 Cloudera, Inc. All Rights Reserved.
  • 31. Pentaho • Primary component is Pentaho Data Integration (PDI), also known as Kettle. • PDI Provides a graphical drag-and- drop environment for defining ETL jobs, which interface with Java MapReduce to execute in-cluster transformations. 31 ©2012 Cloudera, Inc. All Rights Reserved.
  • 32. Other ETL Solutions • Talend – Also following an open-source model. – Extending their existing data integration tools to data integration. • Pervasive RushAnalyzer – Software to build and run big data ETL, data transformation, mining and visualization on Hadoop. 32 ©2012 Cloudera, Inc. All Rights Reserved.
  • 33. BI/Analytics Tools Enterprise Data Warehouse Relational Databases Flume Data Import/Export ETL Tools Appliances NoSQL 33 ©2012 Cloudera, Inc. All Rights Reserved.
  • 34. Business Intelligence/Analytics Tools 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. BI – The Forrester Research Definition "Business Intelligence is a set of methodologies, processes, architectures, an d technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making.” * * http://en.wikipedia.org/wiki/Business_intelligence 35 ©2012 Cloudera, Inc. All Rights Reserved.
  • 36. Business Intelligence/Analytics Tools Relational Data … Databases Warehouses 36 ©2012 Cloudera, Inc. All Rights Reserved.
  • 37. Cloudera ODBC Driver • Most of these tools use the ODBC standard. • Since Hive is an SQL-like ODBC system it’s a good fit for DRIVER ODBC. HIVEQL • ODBC driver for Hive is available, but has licensing HIVE SERVER issues. HIVE • Because of this, Cloudera developed it’s own drivers, available for free download. 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. Hive ODBC Limitations • Hive does not have full SQL support. • Multi-user is currently not supported by Hive Server. • Poor support for security. • Dependent on Hive – data must be loaded in Hive to be available. • The Thrift API in the Hive Server doesn’t support common ODBC calls. 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. Hive ODBC Limitations The Hive community is working on Hive Server 2 to address some of these limitations: • Improved support for multiple users. • Improved support for ODBC and JDBC drivers. • And better support for security is coming. 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. MicroStrategy 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. Tableau 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Other BI Connectors • Microsoft ODBC Driver – Part of the Hadoop on Windows solution. – Provides connectivity for MS BI tools such as Excel, PowerPivot, etc. • MapR ODBC driver – Support for standard ODBC based tools. 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. Analytic Tools – RHadoop project. – Integration of SAS analytics with Hadoop. – Integration of SAP HANA with Hadoop – Toad for Cloud 43 ©2012 Cloudera, Inc. All Rights Reserved.
  • 44. Hadoop Specific Tools – Karmasphere 44 ©2012 Cloudera, Inc. All Rights Reserved.
  • 45. Hadoop Specific Tools – Datameer 45 ©2012 Cloudera, Inc. All Rights Reserved.
  • 46. Example Integration Event HParser PowerCenter/Power Data Hive Exchange Logs Warehouse https://community.informatica.com/mpresources/Communities/IW2012/Docs/bos_65.pdf 46 ©2012 Cloudera, Inc. All Rights Reserved.
  • 47. Example – Migration of ETL Logs Raw ETL (SQL) Target Tables Tables Data Warehouse HDFS ETL Logs Flume (MapReduce) Sqoop Target Tables Data Warehouse 47 ©2012 Cloudera, Inc. All Rights Reserved.
  • 48. What’s Missing? • Better tools for ETL without coding. • Better tools for data governance, data quality, etc. – Ensuring that data in Hadoop complies with policies, rules, etc. • Integration with commercial enterprise schedulers/workflow engines. – Although open-source workflow schedulers exist (e.g. Oozie). 48 ©2012 Cloudera, Inc. All Rights Reserved.
  • 49. Conclusions • Hadoop integration is still in the early stages. – Expect to see new/better tools coming from both vendors and the open-source community. • Despite the relative immaturity of this space, there’s already a dizzying array of solutions available. – Choose solutions based on existing skills and tools already in use by your organization. • If using current BI tools integrated with Hive keep in mind that enhancements for multi-user, security, etc. are on the way. • And it bears repeating: always use the right tool for the job. – Hadoop won’t replace your data warehouses and databases, but will complement them. 49 ©2012 Cloudera, Inc. All Rights Reserved.
  • 50. Thank Questions? You! http://www.cloudera.com/partners/spotlight/ +1 (888) 789-1488 cloudera.com twitter.com/ cloudera sales@cloudera.com facebook.com/ cloudera 50 ©2011 Cloudera, Inc. All Rights Reserved.

Notas del editor

  1. Common theme: moving time, space, or processor intensive processing to Hadoop.
  2. Flume provides ingestion of streaming data (e.g. logs) into Hadoop.
  3. Client executesSqoop job.Sqoop interrogates DB for column names, types, etc.Based on extracted metadata, Sqoop creates source code for table class, and then kicks off MR job. This table class can be used for processing on extracted records.Sqoop by default will guess at a column for splitting data for distribution across the cluster. This can also be specified by client.
  4. Pentaho also has integration with NoSQL DBs (Mongo, Cassandra, etc.)
  5. Most of these tools integrate to existing data stores using the ODBC standard.
  6. MSTR and Tableau are tested and certified now with the Cloudera driver, but other standard ODBC based tools should also work, and more integrations will be supported soon.
  7. Also, Cloudera has implemented a solution for multi-user, which will also soon support authentication.
  8. In memory model supports low-latency queries.