SlideShare una empresa de Scribd logo
1 de 50
Descargar para leer sin conexión
Integrating Hadoop into the Enterprise
Jonathan Seidman
Hadoop Summit 2012
June 14th, 2012
Who I Am

    •  Solutions Architect, Partner Engineering
       Team.
    •  Co-founder of Chicago Hadoop User
       Group and co-founder/organizer of
       Chicago Big Data.
    •  jseidman@cloudera.com
    •  @jseidman
    •  cloudera.com/careers

2
                     ©2012 Cloudera, Inc. All Rights Reserved.
What I’ll Be Talking About
    •  Some Background.
    •  Common uses of Hadoop in an enterprise data
       infrastructure.
    •  Hadoop Integration – the big picture.
    •  Deeper dive:
      –  Data import/export: Moving data between Hadoop
         and existing data stores.
      –  ETL tools.
      –  Business intelligence (BI) and analytic tools.
    •  Example architectures and data flows.
    •  Conclusions


3
                        ©2012 Cloudera, Inc. All Rights Reserved.
My Life Before Cloudera…




4
                ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop at Orbitz
                                   100.00%
                                                                                                     Queries
                                    90.00%

                                    80.00%                                                           Searches
                                           71.67%
                                    70.00%

                                    60.00%

                                    50.00%

                                    40.00%
                                                                                   34.30%
                                           31.87%
                                    30.00%

                                    20.00%

                                    10.00%
                                                                                   2.78%
                                      0.00%
                                               1    2   3    4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20




5
                 ©2012 Cloudera, Inc. All Rights Reserved.
But Hadoop Was An Isolated System



           Developers                                               Business Analysts Normal
                                                                     Users            Humans




6
                        ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop + the Data Warehouse…




7
                ©2012 Cloudera, Inc. All Rights Reserved.
…Enabled New Analyses




8
               ©2012 Cloudera, Inc. All Rights Reserved.
In our opinion, integration with existing IT systems
and software is critical, as we know enterprises will
not be replacing these technologies anytime soon.

    For Hadoop platforms this means integration with
    existing databases, data warehouses, and
    business-analytics and business-visualization
    tools. *




    * A near-term outlook for big data, Jo Maitland, GigaOM Pro, March 2012


9
                             ©2012 Cloudera, Inc. All Rights Reserved.
What Can We Do?
 •  ETL
      –  Scalable ETL – allows companies to meet SLA’s
         (inexpensively).
      –  Agile – facilitates rapid modifications.
 •  Moving analysis off of existing systems.
 •  Sandbox for exploratory analytics.
 •  Using Hadoop as an active archive.
 •  Joining transactional data from a DB with
    interaction data.
 •  Common theme: freeing up existing systems for
    tasks they’re better suited for.


10
                        ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise	
  
  Data	
  
Warehouse	
  



Rela2onal	
  	
  
Databases	
  
                     Flume	
  
           Data Import/Export                                                          ETL Tools



                                 Appliances                                    NoSQL


 11
                                       ©2012 Cloudera, Inc. All Rights Reserved.
Data Import/Export



     Enterprise	
  
       Data	
  
     Warehouse	
  



     Rela2onal	
  	
  
     Databases	
  




12
                         ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Overview

 •  Apache project designed to ease import
    and export of data between Hadoop and
    relational databases.
 •  Provides functionality to do bulk imports
    and exports of data with HDFS, Hive and
    HBase.
 •  Java based. Leverages MapReduce to
    transfer data in parallel.


13
                  ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Overview

 •  Uses a “connector” abstraction.
 •  Two types of connectors
     –  Standard connectors are JDBC based.
     –  Direct connectors use native database
        interfaces to improve performance.
 •  Direct connectors are available for many
    open-source and commercial databases –
    MySQL, PostgreSQL, Oracle, SQL Server,
    Teradata, etc.

14
                     ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Import Flow

                Run import             Collect metadata

       Client                Sqoop

     Generate code,                               Pull data
     Execute MR job
                       MapReduce                         Map                  Map     Map

                              Write to Hadoop

                                                                             Hadoop




15
                                 ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Limitations

 Sqoop has some limitations, including:
 •  Poor support for security.
       $ sqoop import –username scott –password
                       tiger…
     –  Sqoop can read command line options from
        an option file, but this still has holes.
 •  Error prone syntax.
 •  Tight coupling to JDBC model – not a
    good fit for non-RDBMS systems.

16
                     ©2012 Cloudera, Inc. All Rights Reserved.
Fortunately…

 Sqoop 2 (incubating) will address many of
 these limitations:

 •    Adds a web-based GUI.
 •    Centralized configuration.
 •    More flexible model.
 •    Improved security model.



17
                     ©2012 Cloudera, Inc. All Rights Reserved.
Informatica PowerExchange

 •  Not just RDBMS integration – provides
    consistent, native integration between
    Hadoop and a range of data sources,
    databases, legacy systems, standard file
    formats, CRM…
 •  Integrated with PowerCenter for pre/post-
    processing of data, administration, and
    metadata management.


18
                  ©2012 Cloudera, Inc. All Rights Reserved.
Power Exchange – Data Import

                      Access Data                            Pre-Process          Ingest Data
   Web server




Databases,            PowerExchange                           PowerCenter
Data Warehouse

                       Batch                                                        HDFS



Message Queues,
Email, Social Media    CDC                                                          HIVE
                                                             e.g. Filter, Join,
                                                             Cleanse
  ERP, CRM
                       Real-time


  Mainframe




 19
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Power Exchange – Data Export

Extract Data   Post-Process                             Deliver Data

                                                                          Web server




               PowerCenter                               PowerExchange
                                                                         Databases,
                                                                         Data Warehouse
 HDFS                                                     Batch




                                                           Real-time
                                                                           ERP, CRM
               e.g. Transform
               to target
               schema
                                                                           Mainframe




20
                             ©2012 Cloudera, Inc. All Rights Reserved.
Informatica PowerExchange
 1. Create Ingest or
 Extract Mapping



 2. Create Hadoop
 Connection




                               3. Configure Workflow




           4. Configure Hive
           Properties




21
                                              ©2012 Cloudera, Inc. All Rights Reserved.
There’s Always the Low-Tech Way…

                                                                                        GreenPlum	
  




                                                                               GPLoad
 Hadoop	
                                                                               GreenPlum	
  
Processing	
     Hive	
                                    Local	
  Disk	
  




                                                                                        GreenPlum	
  



22
                            ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise	
  
  Data	
  
Warehouse	
  



Rela2onal	
  	
  
Databases	
  
                     Flume	
  
           Data Import/Export                                                          ETL Tools



                                 Appliances                                    NoSQL


 23
                                       ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools




24
             ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools




25
             ©2012 Cloudera, Inc. All Rights Reserved.
ETL – The Wikipedia Definition

 •  Extract, transform and load (ETL) is a
    process in database usage and especially
    in data warehousing that involves:
     –  Extracting data from outside sources
     –  Transforming it to fit operational needs
     –  Loading it into the end target (DB or data
        warehouse)

           http://en.wikipedia.org/wiki/Extract,_transform,_load



26
                           ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools

 •  Very common use case for Hadoop.
 •  Most ETL in Hadoop is still done through
    plain old MapReduce.
 •  Companies want to leverage their existing
    developer skills – many enterprises have
    armies of SQL and ETL developers.




27
                  ©2012 Cloudera, Inc. All Rights Reserved.
Informatica HParser

 •  Not exactly ETL – provides data
    transformation and parsing optimized for
    parallel processing on Hadoop.
 •  Supports deeply hierarchical data and
    complex data formats.
 •  Transformations are defined in a Windows
    UI and then deployed to a Hadoop Cluster
    for execution.


28
                 ©2012 Cloudera, Inc. All Rights Reserved.
HParser – How does it work?
                                          hadoop … dt-hadoop.jar
                                          … My_Parser /input/*/input*.txt

                                                                               HDFS




1.  Develop a DT transformation
2.  Deploy the transformation to Hadoop
3.  Run DT on Hadoop to produce
    tabular data
4.  Analyze the data with HIVE / PIG /
    MapReduce / Other…



 29
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Pentaho

 •  Existing BI tools extended to support
    Hadoop.
 •  Not just ETL – also provides data import/
    export, job orchestration, reporting, and
    analysis functionality.
 •  Supports integration with HDFS, Hive and
    Hbase.
 •  Community and Enterprise Editions
    offered.

30
                  ©2012 Cloudera, Inc. All Rights Reserved.
Pentaho

 •  Primary component is
    Pentaho Data
    Integration (PDI), also
    known as Kettle.
 •  PDI Provides a
    graphical drag-and-
    drop environment for
    defining ETL jobs,
    which interface with
    Java MapReduce to
    execute in-cluster
    transformations.

31
                    ©2012 Cloudera, Inc. All Rights Reserved.
Other ETL Solutions

 •  Talend
     –  Also following an open-source model.
     –  Extending their existing data integration tools
        to data integration.
 •  Pervasive RushAnalyzer
     –  Software to build and run big data ETL, data
        transformation, mining and visualization on
        Hadoop.


32
                      ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise	
  
  Data	
  
Warehouse	
  



Rela2onal	
  	
  
Databases	
  
                     Flume	
  
           Data Import/Export                                                          ETL Tools



                                 Appliances                                    NoSQL


 33
                                       ©2012 Cloudera, Inc. All Rights Reserved.
Business Intelligence/Analytics Tools




34
              ©2012 Cloudera, Inc. All Rights Reserved.
BI – The Forrester Research Definition

 "Business Intelligence is a set of
 methodologies, processes, architectures,
 and technologies that transform raw data
 into meaningful and useful information used
 to enable more effective strategic, tactical,
 and operational insights and decision-
 making.” *

 * http://en.wikipedia.org/wiki/Business_intelligence


35
                                ©2012 Cloudera, Inc. All Rights Reserved.
Business Intelligence/Analytics Tools




     Rela2onal	
  	
        Data	
  
                                                      …	
  
     Databases	
         Warehouses	
  




36
                                          ©2012 Cloudera, Inc. All Rights Reserved.
Cloudera ODBC Driver

 •  Most of these tools use the
    ODBC standard.
 •  Since Hive is an SQL-like                                         ODBC	
  


    system it’s a good fit for                                    DRIVER


    ODBC.                                                             HIVEQL	
  


 •  ODBC driver for Hive is
    available, but has licensing                                HIVE SERVER



    issues.                                                        HIVE

 •  Because of this, Cloudera
    developed it’s own drivers,
    available for free download.

37
                    ©2012 Cloudera, Inc. All Rights Reserved.
Hive ODBC Limitations

 •  Hive does not have full SQL support.
 •  Multi-user is currently not supported by
    Hive Server.
 •  Poor support for security.
 •  Dependent on Hive – data must be loaded
    in Hive to be available.
 •  The Thrift API in the Hive Server doesn’t
    support common ODBC calls.

38
                 ©2012 Cloudera, Inc. All Rights Reserved.
Hive ODBC Limitations

The Hive community is working on Hive Server 2 to
address some of these limitations:
 •  Improved support for multiple users.
 •  Improved support for ODBC and JDBC
    drivers.
 •  And better support for security is coming.




39
                   ©2012 Cloudera, Inc. All Rights Reserved.
MicroStrategy




40
                 ©2012 Cloudera, Inc. All Rights Reserved.
Tableau




41
           ©2012 Cloudera, Inc. All Rights Reserved.
Other BI Connectors

 •  Microsoft ODBC Driver
     –  Part of the Hadoop on Windows solution.
     –  Provides connectivity for MS BI tools such as
        Excel, PowerPivot, etc.
 •  MapR ODBC driver
     –  Support for standard ODBC based tools.




42
                      ©2012 Cloudera, Inc. All Rights Reserved.
Analytic Tools


     –  RHadoop project.

     –  Integration of SAS analytics with Hadoop.

     –  Integration of SAP HANA with Hadoop

     –  Toad for Cloud


43
                         ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop Specific Tools – Karmasphere




44
             ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop Specific Tools – Datameer




45
              ©2012 Cloudera, Inc. All Rights Reserved.
Example Integration




     Event           HParser                                           PowerCenter/     Data
                                     Hive                             PowerExchange
     Logs                                                                             Warehouse




 https://community.informatica.com/mpresources/Communities/IW2012/Docs/bos_65.pdf



46
                                    ©2012 Cloudera, Inc. All Rights Reserved.
Example – Migration of ETL


     Logs            Raw                                    ETL (SQL)             Target
                    Tables                                                        Tables


                                                           Data
                                                         Warehouse




                     HDFS                                       ETL
     Logs   Flume                                           (MapReduce)   Sqoop       Target
                                                                                      Tables

                                                                                    Data
                                                                                  Warehouse



47
                             ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing?

 •  Better tools for ETL without coding.
 •  Better tools for data governance, data
    quality, etc.
     –  Ensuring that data in Hadoop complies with
        policies, rules, etc.
 •  Integration with commercial enterprise
    schedulers/workflow engines.
     –  Although open-source workflow schedulers
        exist (e.g. Oozie).


48
                     ©2012 Cloudera, Inc. All Rights Reserved.
Conclusions
 •  Hadoop integration is still in the early stages.
     –  Expect to see new/better tools coming from both vendors
        and the open-source community.
 •  Despite the relative immaturity of this space, there’s
    already a dizzying array of solutions available.
     –  Choose solutions based on existing skills and tools already
        in use by your organization.
 •  If using current BI tools integrated with Hive keep in
    mind that enhancements for multi-user, security, etc.
    are on the way.
 •  And it bears repeating: always use the right tool for the
    job.
     –  Hadoop won’t replace your data warehouses and
        databases, but will complement them.


49
                          ©2012 Cloudera, Inc. All Rights Reserved.
Thank
                   Questions?
      You!
             http://www.cloudera.com/partners/spotlight/

               +1 (888) 789-1488                         cloudera.com   twitter.com/
                 sales@cloudera.com
                                                                         cloudera

                                                                        facebook.com/
                                                                          cloudera




50
             ©2011 Cloudera, Inc. All Rights Reserved.

Más contenido relacionado

La actualidad más candente

Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hortonworks
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Cloudera, Inc.
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Daniel Abadi
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 

La actualidad más candente (20)

Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)Hdp r-google charttools-webinar-3-5-2013 (2)
Hdp r-google charttools-webinar-3-5-2013 (2)
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 

Destacado

Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter
7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter
7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur TwitterSocial Media For You
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaCloudera, Inc.
 
Mc5.marketing multicanal
Mc5.marketing multicanalMc5.marketing multicanal
Mc5.marketing multicanallenaignf
 
Hadoop Hbase - Introduction
Hadoop Hbase - IntroductionHadoop Hbase - Introduction
Hadoop Hbase - IntroductionBlandine Larbret
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2IMC Institute
 
Les grands enjeux de la banque de demain
Les grands enjeux de la banque de demainLes grands enjeux de la banque de demain
Les grands enjeux de la banque de demainEmmanuel Fraysse
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMathieu Dumoulin
 
Junior Connect : la conquête de l'engagement
Junior Connect : la conquête de l'engagementJunior Connect : la conquête de l'engagement
Junior Connect : la conquête de l'engagementIpsos France
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduceMathieu Dumoulin
 
Les community managers en France 2012
Les community managers en France 2012 Les community managers en France 2012
Les community managers en France 2012 HelloWork
 
Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...
Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...
Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...Cloudera, Inc.
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Référentiel Client Unique
Référentiel Client Unique Référentiel Client Unique
Référentiel Client Unique Soft Computing
 
Carnet de témoignages #2 : les community managers dans les entreprises franca...
Carnet de témoignages #2 : les community managers dans les entreprises franca...Carnet de témoignages #2 : les community managers dans les entreprises franca...
Carnet de témoignages #2 : les community managers dans les entreprises franca...HelloWork
 
infographie : les Français et Facebook
infographie : les Français et Facebookinfographie : les Français et Facebook
infographie : les Français et FacebookRaphaël Sougakoff
 

Destacado (20)

Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
 
Etude sur le Big Data
Etude sur le Big DataEtude sur le Big Data
Etude sur le Big Data
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter
7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter
7 astuces pour attirer l'attention d'un influenceur sur Linkedin et sur Twitter
 
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, ClouderaHadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
Hadoop World 2011: Hadoop and Performance - Todd Lipcon & Yanpei Chen, Cloudera
 
Mc5.marketing multicanal
Mc5.marketing multicanalMc5.marketing multicanal
Mc5.marketing multicanal
 
Hadoop Hbase - Introduction
Hadoop Hbase - IntroductionHadoop Hbase - Introduction
Hadoop Hbase - Introduction
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
Les grands enjeux de la banque de demain
Les grands enjeux de la banque de demainLes grands enjeux de la banque de demain
Les grands enjeux de la banque de demain
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifié
 
Junior Connect : la conquête de l'engagement
Junior Connect : la conquête de l'engagementJunior Connect : la conquête de l'engagement
Junior Connect : la conquête de l'engagement
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduce
 
Les community managers en France 2012
Les community managers en France 2012 Les community managers en France 2012
Les community managers en France 2012
 
Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...
Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...
Hadoop World 2011: Preview of the New Cloudera Management Suite - Phil Zeylig...
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Référentiel Client Unique
Référentiel Client Unique Référentiel Client Unique
Référentiel Client Unique
 
Carnet de témoignages #2 : les community managers dans les entreprises franca...
Carnet de témoignages #2 : les community managers dans les entreprises franca...Carnet de témoignages #2 : les community managers dans les entreprises franca...
Carnet de témoignages #2 : les community managers dans les entreprises franca...
 
infographie : les Français et Facebook
infographie : les Français et Facebookinfographie : les Français et Facebook
infographie : les Français et Facebook
 

Similar a Integrating Hadoop Into the Enterprise – Hadoop Summit 2012

Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applicationsrussell_jurney
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateClouderaUserGroups
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 

Similar a Integrating Hadoop Into the Enterprise – Hadoop Summit 2012 (20)

Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 

Más de Jonathan Seidman

Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Jonathan Seidman
 
Foundations strata sf-2019_final
Foundations strata sf-2019_finalFoundations strata sf-2019_final
Foundations strata sf-2019_finalJonathan Seidman
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Jonathan Seidman
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Jonathan Seidman
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Jonathan Seidman
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman
 
Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Jonathan Seidman
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 

Más de Jonathan Seidman (9)

Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019
 
Foundations strata sf-2019_final
Foundations strata sf-2019_finalFoundations strata sf-2019_final
Foundations strata sf-2019_final
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012

  • 1. Integrating Hadoop into the Enterprise Jonathan Seidman Hadoop Summit 2012 June 14th, 2012
  • 2. Who I Am •  Solutions Architect, Partner Engineering Team. •  Co-founder of Chicago Hadoop User Group and co-founder/organizer of Chicago Big Data. •  jseidman@cloudera.com •  @jseidman •  cloudera.com/careers 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. What I’ll Be Talking About •  Some Background. •  Common uses of Hadoop in an enterprise data infrastructure. •  Hadoop Integration – the big picture. •  Deeper dive: –  Data import/export: Moving data between Hadoop and existing data stores. –  ETL tools. –  Business intelligence (BI) and analytic tools. •  Example architectures and data flows. •  Conclusions 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. My Life Before Cloudera… 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Hadoop at Orbitz 100.00% Queries 90.00% 80.00% Searches 71.67% 70.00% 60.00% 50.00% 40.00% 34.30% 31.87% 30.00% 20.00% 10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. But Hadoop Was An Isolated System Developers Business Analysts Normal Users Humans 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Hadoop + the Data Warehouse… 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. …Enabled New Analyses 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. In our opinion, integration with existing IT systems and software is critical, as we know enterprises will not be replacing these technologies anytime soon. For Hadoop platforms this means integration with existing databases, data warehouses, and business-analytics and business-visualization tools. * * A near-term outlook for big data, Jo Maitland, GigaOM Pro, March 2012 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. What Can We Do? •  ETL –  Scalable ETL – allows companies to meet SLA’s (inexpensively). –  Agile – facilitates rapid modifications. •  Moving analysis off of existing systems. •  Sandbox for exploratory analytics. •  Using Hadoop as an active archive. •  Joining transactional data from a DB with interaction data. •  Common theme: freeing up existing systems for tasks they’re better suited for. 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. BI/Analytics Tools Enterprise   Data   Warehouse   Rela2onal     Databases   Flume   Data Import/Export ETL Tools Appliances NoSQL 11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. Data Import/Export Enterprise   Data   Warehouse   Rela2onal     Databases   12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. Sqoop Overview •  Apache project designed to ease import and export of data between Hadoop and relational databases. •  Provides functionality to do bulk imports and exports of data with HDFS, Hive and HBase. •  Java based. Leverages MapReduce to transfer data in parallel. 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. Sqoop Overview •  Uses a “connector” abstraction. •  Two types of connectors –  Standard connectors are JDBC based. –  Direct connectors use native database interfaces to improve performance. •  Direct connectors are available for many open-source and commercial databases – MySQL, PostgreSQL, Oracle, SQL Server, Teradata, etc. 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. Sqoop Import Flow Run import Collect metadata Client Sqoop Generate code, Pull data Execute MR job MapReduce Map Map Map Write to Hadoop Hadoop 15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. Sqoop Limitations Sqoop has some limitations, including: •  Poor support for security. $ sqoop import –username scott –password tiger… –  Sqoop can read command line options from an option file, but this still has holes. •  Error prone syntax. •  Tight coupling to JDBC model – not a good fit for non-RDBMS systems. 16 ©2012 Cloudera, Inc. All Rights Reserved.
  • 17. Fortunately… Sqoop 2 (incubating) will address many of these limitations: •  Adds a web-based GUI. •  Centralized configuration. •  More flexible model. •  Improved security model. 17 ©2012 Cloudera, Inc. All Rights Reserved.
  • 18. Informatica PowerExchange •  Not just RDBMS integration – provides consistent, native integration between Hadoop and a range of data sources, databases, legacy systems, standard file formats, CRM… •  Integrated with PowerCenter for pre/post- processing of data, administration, and metadata management. 18 ©2012 Cloudera, Inc. All Rights Reserved.
  • 19. Power Exchange – Data Import Access Data Pre-Process Ingest Data Web server Databases, PowerExchange PowerCenter Data Warehouse Batch HDFS Message Queues, Email, Social Media CDC HIVE e.g. Filter, Join, Cleanse ERP, CRM Real-time Mainframe 19 ©2012 Cloudera, Inc. All Rights Reserved.
  • 20. Power Exchange – Data Export Extract Data Post-Process Deliver Data Web server PowerCenter PowerExchange Databases, Data Warehouse HDFS Batch Real-time ERP, CRM e.g. Transform to target schema Mainframe 20 ©2012 Cloudera, Inc. All Rights Reserved.
  • 21. Informatica PowerExchange 1. Create Ingest or Extract Mapping 2. Create Hadoop Connection 3. Configure Workflow 4. Configure Hive Properties 21 ©2012 Cloudera, Inc. All Rights Reserved.
  • 22. There’s Always the Low-Tech Way… GreenPlum   GPLoad Hadoop   GreenPlum   Processing   Hive   Local  Disk   GreenPlum   22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. BI/Analytics Tools Enterprise   Data   Warehouse   Rela2onal     Databases   Flume   Data Import/Export ETL Tools Appliances NoSQL 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. ETL Tools 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. ETL Tools 25 ©2012 Cloudera, Inc. All Rights Reserved.
  • 26. ETL – The Wikipedia Definition •  Extract, transform and load (ETL) is a process in database usage and especially in data warehousing that involves: –  Extracting data from outside sources –  Transforming it to fit operational needs –  Loading it into the end target (DB or data warehouse) http://en.wikipedia.org/wiki/Extract,_transform,_load 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. ETL Tools •  Very common use case for Hadoop. •  Most ETL in Hadoop is still done through plain old MapReduce. •  Companies want to leverage their existing developer skills – many enterprises have armies of SQL and ETL developers. 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Informatica HParser •  Not exactly ETL – provides data transformation and parsing optimized for parallel processing on Hadoop. •  Supports deeply hierarchical data and complex data formats. •  Transformations are defined in a Windows UI and then deployed to a Hadoop Cluster for execution. 28 ©2012 Cloudera, Inc. All Rights Reserved.
  • 29. HParser – How does it work? hadoop … dt-hadoop.jar … My_Parser /input/*/input*.txt HDFS 1.  Develop a DT transformation 2.  Deploy the transformation to Hadoop 3.  Run DT on Hadoop to produce tabular data 4.  Analyze the data with HIVE / PIG / MapReduce / Other… 29 ©2012 Cloudera, Inc. All Rights Reserved.
  • 30. Pentaho •  Existing BI tools extended to support Hadoop. •  Not just ETL – also provides data import/ export, job orchestration, reporting, and analysis functionality. •  Supports integration with HDFS, Hive and Hbase. •  Community and Enterprise Editions offered. 30 ©2012 Cloudera, Inc. All Rights Reserved.
  • 31. Pentaho •  Primary component is Pentaho Data Integration (PDI), also known as Kettle. •  PDI Provides a graphical drag-and- drop environment for defining ETL jobs, which interface with Java MapReduce to execute in-cluster transformations. 31 ©2012 Cloudera, Inc. All Rights Reserved.
  • 32. Other ETL Solutions •  Talend –  Also following an open-source model. –  Extending their existing data integration tools to data integration. •  Pervasive RushAnalyzer –  Software to build and run big data ETL, data transformation, mining and visualization on Hadoop. 32 ©2012 Cloudera, Inc. All Rights Reserved.
  • 33. BI/Analytics Tools Enterprise   Data   Warehouse   Rela2onal     Databases   Flume   Data Import/Export ETL Tools Appliances NoSQL 33 ©2012 Cloudera, Inc. All Rights Reserved.
  • 34. Business Intelligence/Analytics Tools 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. BI – The Forrester Research Definition "Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision- making.” * * http://en.wikipedia.org/wiki/Business_intelligence 35 ©2012 Cloudera, Inc. All Rights Reserved.
  • 36. Business Intelligence/Analytics Tools Rela2onal     Data   …   Databases   Warehouses   36 ©2012 Cloudera, Inc. All Rights Reserved.
  • 37. Cloudera ODBC Driver •  Most of these tools use the ODBC standard. •  Since Hive is an SQL-like ODBC   system it’s a good fit for DRIVER ODBC. HIVEQL   •  ODBC driver for Hive is available, but has licensing HIVE SERVER issues. HIVE •  Because of this, Cloudera developed it’s own drivers, available for free download. 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. Hive ODBC Limitations •  Hive does not have full SQL support. •  Multi-user is currently not supported by Hive Server. •  Poor support for security. •  Dependent on Hive – data must be loaded in Hive to be available. •  The Thrift API in the Hive Server doesn’t support common ODBC calls. 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. Hive ODBC Limitations The Hive community is working on Hive Server 2 to address some of these limitations: •  Improved support for multiple users. •  Improved support for ODBC and JDBC drivers. •  And better support for security is coming. 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. MicroStrategy 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. Tableau 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Other BI Connectors •  Microsoft ODBC Driver –  Part of the Hadoop on Windows solution. –  Provides connectivity for MS BI tools such as Excel, PowerPivot, etc. •  MapR ODBC driver –  Support for standard ODBC based tools. 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. Analytic Tools –  RHadoop project. –  Integration of SAS analytics with Hadoop. –  Integration of SAP HANA with Hadoop –  Toad for Cloud 43 ©2012 Cloudera, Inc. All Rights Reserved.
  • 44. Hadoop Specific Tools – Karmasphere 44 ©2012 Cloudera, Inc. All Rights Reserved.
  • 45. Hadoop Specific Tools – Datameer 45 ©2012 Cloudera, Inc. All Rights Reserved.
  • 46. Example Integration Event HParser PowerCenter/ Data Hive PowerExchange Logs Warehouse https://community.informatica.com/mpresources/Communities/IW2012/Docs/bos_65.pdf 46 ©2012 Cloudera, Inc. All Rights Reserved.
  • 47. Example – Migration of ETL Logs Raw ETL (SQL) Target Tables Tables Data Warehouse HDFS ETL Logs Flume (MapReduce) Sqoop Target Tables Data Warehouse 47 ©2012 Cloudera, Inc. All Rights Reserved.
  • 48. What’s Missing? •  Better tools for ETL without coding. •  Better tools for data governance, data quality, etc. –  Ensuring that data in Hadoop complies with policies, rules, etc. •  Integration with commercial enterprise schedulers/workflow engines. –  Although open-source workflow schedulers exist (e.g. Oozie). 48 ©2012 Cloudera, Inc. All Rights Reserved.
  • 49. Conclusions •  Hadoop integration is still in the early stages. –  Expect to see new/better tools coming from both vendors and the open-source community. •  Despite the relative immaturity of this space, there’s already a dizzying array of solutions available. –  Choose solutions based on existing skills and tools already in use by your organization. •  If using current BI tools integrated with Hive keep in mind that enhancements for multi-user, security, etc. are on the way. •  And it bears repeating: always use the right tool for the job. –  Hadoop won’t replace your data warehouses and databases, but will complement them. 49 ©2012 Cloudera, Inc. All Rights Reserved.
  • 50. Thank Questions? You! http://www.cloudera.com/partners/spotlight/ +1 (888) 789-1488 cloudera.com twitter.com/ sales@cloudera.com cloudera facebook.com/ cloudera 50 ©2011 Cloudera, Inc. All Rights Reserved.