SlideShare una empresa de Scribd logo
1 de 43
A New Generation of Data Transfer
    Tools for Hadoop: Sqoop 2
  Bilung Lee (blee at cloudera dot com)
  Kathleen Ting (kathleen at cloudera dot com)



                Hadoop Summit 2012. 6/13/12 Apache Sqoop
               Copyright 2012 The Apache Software Foundation
Who Are We?
• Bilung Lee
  – Apache Sqoop Committer
  – Software Engineer, Cloudera


• Kathleen Ting
  – Apache Sqoop Committer
  – Support Manager, Cloudera


                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  2
                  Copyright 2012 The Apache Software Foundation
What is Sqoop?
• Bulk data transfer tool
    – Import/Export from/to relational databases,
      enterprise data warehouses, and NoSQL systems
    – Populate tables in HDFS, Hive, and HBase
    – Integrate with Oozie as an action
    – Support plugins via connector based architecture
    May ‘09              March ‘10                         August ‘11      April ‘12


  First version          Moved to                          Moved to         Apache
(HADOOP-5815)             GitHub                           Apache       Top Level Project

                    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                       3
                   Copyright 2012 The Apache Software Foundation
Sqoop 1 Architecture
                                                  Document
                            Enterprise             Based
                              Data                Systems
                            Warehouse




                                                          Relational
                                                          Database



command
                    Hadoop



                                     Map Task



 Sqoop




                                               HDFS/HBase/
                                                  Hive




           Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                       4
          Copyright 2012 The Apache Software Foundation
Sqoop 1 Challenges
• Cryptic, contextual command line arguments
• Tight coupling between data transfer and
  output format
• Security concerns with openly shared
  credentials
• Not easy to manage installation/configuration
• Connectors are forced to follow JDBC model

                 Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                5
                Copyright 2012 The Apache Software Foundation
Sqoop 2 Architecture




     Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                    6
    Copyright 2012 The Apache Software Foundation
Sqoop 2 Themes
• Ease of Use

• Ease of Extension

• Security




                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  7
                  Copyright 2012 The Apache Software Foundation
Sqoop 2 Themes
• Ease of Use

• Ease of Extension

• Security




                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  8
                  Copyright 2012 The Apache Software Foundation
Ease of Use
Sqoop 1                                                Sqoop 2
Client-only Architecture                               Client/Server Architecture
CLI based                                              CLI + Web based
Client access to Hive, HBase                           Server access to Hive, HBase
Oozie and Sqoop tightly coupled                        Oozie finds REST API




                                Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                      9
                               Copyright 2012 The Apache Software Foundation
Sqoop 1: Client-side Tool
• Client-side installation + configuration
  – Connectors are installed/configured locally
  – Local requires root privileges
  – JDBC drivers are needed locally
  – Database connectivity is needed locally




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 10
                 Copyright 2012 The Apache Software Foundation
Sqoop 2: Sqoop as a Service
• Server-side installation + configuration
  – Connectors are installed/configured in one place
  – Managed by administrator and run by operator
  – JDBC drivers are needed in one place
  – Database connectivity is needed on the server




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 11
                 Copyright 2012 The Apache Software Foundation
Client Interface
• Sqoop 1 client interface:
  – Command line interface (CLI) based
  – Can be automated via scripting


• Sqoop 2 client interface:
  – CLI based (in either interactive or script mode)
  – Web based (remotely accessible)
  – REST API is exposed for external tool integration

                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 12
                 Copyright 2012 The Apache Software Foundation
Sqoop 1: Service Level Integration
• Hive, HBase
  – Require local installation
• Oozie
  – von Neumann(esque) integration:
     • Package Sqoop as an action
     • Then run Sqoop from node machines, causing one MR
       job to be dependent on another MR job
     • Error-prone, difficult to debug


                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  13
                  Copyright 2012 The Apache Software Foundation
Sqoop 2: Service Level Integration
• Hive, HBase
  – Server-side integration
• Oozie
  – REST API integration




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 14
                 Copyright 2012 The Apache Software Foundation
Ease of Use
Sqoop 1                                                Sqoop 2
Client-only Architecture                               Client/Server Architecture
CLI based                                              CLI + Web based
Client access to Hive, HBase                           Server access to Hive, HBase
Oozie and Sqoop tightly coupled                        Oozie finds REST API




                                Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                      15
                               Copyright 2012 The Apache Software Foundation
Sqoop 2 Themes
• Ease of Use

• Ease of Extension

• Security




                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  16
                  Copyright 2012 The Apache Software Foundation
Ease of Extension
Sqoop 1                                               Sqoop 2
Connector forced to follow JDBC model                 Connector given free rein
Connectors must implement functionality               Connectors benefit from common
                                                      framework of functionality
Connector selection is implicit                       Connector selection is explicit




                               Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                        17
                              Copyright 2012 The Apache Software Foundation
Sqoop 1: Implementing Connectors
• Connectors are forced to follow JDBC model
  – Connectors are limited/required to use common
    JDBC vocabulary (URL, database, table, etc)
• Connectors must implement all Sqoop
  functionality they want to support
  – New functionality may not be available for
    previously implemented connectors



                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 18
                 Copyright 2012 The Apache Software Foundation
Sqoop 2: Implementing Connectors
• Connectors are not restricted to JDBC model
  – Connectors can define own domain
• Common functionality are abstracted out of
  connectors
  – Connectors are only responsible for data transfer
  – Common Reduce phase implements data
    transformation and system integration
  – Connectors can benefit from future development
    of common functionality
                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 19
                 Copyright 2012 The Apache Software Foundation
Different Options, Different Results
Which is running MySQL?
$ sqoop import --connect jdbc:mysql://localhost/db 
--username foo --table TEST

$ sqoop import --connect jdbc:mysql://localhost/db 
--driver com.mysql.jdbc.Driver --username foo --table TEST


• Different options may lead to unpredictable
  results
   – Sqoop 2 requires explicit selection of a connector,
     thus disambiguating the process
                    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                   20
                   Copyright 2012 The Apache Software Foundation
Sqoop 1: Using Connectors
• Choice of connector is implicit
  – In a simple case, based on the URL in --connect
    string to access the database
  – Specification of different options can lead to
    different connector selection
  – Error-prone but good for power users




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 21
                 Copyright 2012 The Apache Software Foundation
Sqoop 1: Using Connectors
• Require knowledge of database idiosyncrasies
  – e.g. Couchbase does not need to specify a table
    name, which is required, causing --table to get
    overloaded as backfill or dump operation
  – e.g. --null-string representation is not supported
    by all connectors

• Functionality is limited to what the implicitly
  chosen connector supports


                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  22
                  Copyright 2012 The Apache Software Foundation
Sqoop 2: Using Connectors
• Users make explicit connector choice
  – Less error-prone, more predictable
• Users need not be aware of the functionality
  of all connectors
  – Couchbase users need not care that other
    connectors use tables




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 23
                 Copyright 2012 The Apache Software Foundation
Sqoop 2: Using Connectors
• Common functionality is available to all
  connectors
  – Connectors need not worry about common
    downstream functionality, such as transformation
    into various formats and integration with other
    systems




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 24
                 Copyright 2012 The Apache Software Foundation
Ease of Extension
Sqoop 1                                               Sqoop 2
Connector forced to follow JDBC model                 Connector given free rein
Connectors must implement functionality               Connectors benefit from common
                                                      framework of functionality
Connector selection is implicit                       Connector selection is explicit




                               Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                        25
                              Copyright 2012 The Apache Software Foundation
Sqoop 2 Themes
• Ease of Use

• Ease of Extension

• Security




                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  26
                  Copyright 2012 The Apache Software Foundation
Security
Sqoop 1                                               Sqoop 2
Support only for Hadoop security                      Support for Hadoop security and role-
                                                      based access control to external systems
High risk of abusing access to external               Reduced risk of abusing access to external
systems                                               systems
No resource management policy                         Resource management policy




                               Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                                 27
                              Copyright 2012 The Apache Software Foundation
Sqoop 1: Security
• Inherit/Propagate Kerberos principal for the
  jobs it launches
• Access to files on HDFS can be controlled via
  HDFS security
• Limited support (user/password) for secure
  access to external systems




                 Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                28
                Copyright 2012 The Apache Software Foundation
Sqoop 2: Security
• Inherit/Propagate Kerberos principal for the
  jobs it launches
• Access to files on HDFS can be controlled via
  HDFS security
• Support for secure access to external systems
  via role-based access to connection objects
  – Administrators create/edit/delete connections
  – Operators use connections



                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 29
                 Copyright 2012 The Apache Software Foundation
Sqoop 1: External System Access
• Every invocation requires necessary
  credentials to access external systems (e.g.
  relational database)
  – Workaround: create a user with limited access in
    lieu of giving out password
     • Does not scale
     • Permission granularity is hard to obtain
• Hard to prevent misuse once credentials are
  given
                    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                   30
                   Copyright 2012 The Apache Software Foundation
Sqoop 2: External System Access
• Connections are enabled as first-class objects
  – Connections encompass credentials
  – Connections are created once and then used
    many times for various import/export jobs
  – Connections are created by administrator and
    used by operator
     • Safeguard credential access from end users
• Connections can be restricted in scope based
  on operation (import/export)
  – Operators cannot abuse credentials
                    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                   31
                   Copyright 2012 The Apache Software Foundation
Sqoop 1: Resource Management
• No explicit resource management policy
  – Users specify the number of map jobs to run
  – Cannot throttle load on external systems




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 32
                 Copyright 2012 The Apache Software Foundation
Sqoop 2: Resource Management
• Connections allow specification of resource
  management policy
  – Administrators can limit the total number of
    physical connections open at one time
  – Connections can also be disabled




                  Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                 33
                 Copyright 2012 The Apache Software Foundation
Security
Sqoop 1                                               Sqoop 2
Support only for Hadoop security                      Support for Hadoop security and role-
                                                      based access control to external systems
High risk of abusing access to external               Reduced risk of abusing access to external
systems                                               systems
No resource management policy                         Resource management policy




                               Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                                                 34
                              Copyright 2012 The Apache Software Foundation
Demo Screenshots




    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                   35
   Copyright 2012 The Apache Software Foundation
Demo Screenshots




    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                   36
   Copyright 2012 The Apache Software Foundation
Demo Screenshots




    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                   37
   Copyright 2012 The Apache Software Foundation
Demo Screenshots




    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                   38
   Copyright 2012 The Apache Software Foundation
Demo Screenshots




    Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                   39
   Copyright 2012 The Apache Software Foundation
Takeaway
Sqoop 2 Highights:
  – Ease of Use: Sqoop as a Service
  – Ease of Extension: Connectors benefit from
    shared functionality
  – Security: Connections as first-class objects and
    role-based security




                   Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                  40
                  Copyright 2012 The Apache Software Foundation
Current Status: work-in-progress
• Sqoop2 Development:
 http://issues.apache.org/jira/browse/SQOOP-365

• Sqoop2 Blog Post:
 http://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop

• Sqoop2 Design:
 http://cwiki.apache.org/confluence/display/SQOOP/Sqoop+2




                         Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                        41
                        Copyright 2012 The Apache Software Foundation
Current Status: work-in-progress
• Sqoop2 Quickstart:
 http://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Quickstart

• Sqoop2 Resource Layout:
 http://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+-+Resource+Layout

• Sqoop2 Feature Requests:
 http://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Feature+Requests




                         Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                                             42
                        Copyright 2012 The Apache Software Foundation
Hadoop Summit 2012. 6/13/12 Apache Sqoop
                                                43
Copyright 2012 The Apache Software Foundation

Más contenido relacionado

La actualidad más candente

Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Apache Ambari BOF - APIs - Hadoop Summit 2013
Apache Ambari BOF - APIs - Hadoop Summit 2013Apache Ambari BOF - APIs - Hadoop Summit 2013
Apache Ambari BOF - APIs - Hadoop Summit 2013Hortonworks
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 

La actualidad más candente (20)

Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Apache Ambari BOF - APIs - Hadoop Summit 2013
Apache Ambari BOF - APIs - Hadoop Summit 2013Apache Ambari BOF - APIs - Hadoop Summit 2013
Apache Ambari BOF - APIs - Hadoop Summit 2013
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 

Destacado

Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop UsersKathleen Ting
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopSomeshwar Kale
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionDataWorks Summit
 

Destacado (6)

Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 

Similar a Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera, Inc.
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 

Similar a Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2 (20)

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Hadoopppt.pptx
Hadoopppt.pptxHadoopppt.pptx
Hadoopppt.pptx
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2

  • 1. A New Generation of Data Transfer Tools for Hadoop: Sqoop 2 Bilung Lee (blee at cloudera dot com) Kathleen Ting (kathleen at cloudera dot com) Hadoop Summit 2012. 6/13/12 Apache Sqoop Copyright 2012 The Apache Software Foundation
  • 2. Who Are We? • Bilung Lee – Apache Sqoop Committer – Software Engineer, Cloudera • Kathleen Ting – Apache Sqoop Committer – Support Manager, Cloudera Hadoop Summit 2012. 6/13/12 Apache Sqoop 2 Copyright 2012 The Apache Software Foundation
  • 3. What is Sqoop? • Bulk data transfer tool – Import/Export from/to relational databases, enterprise data warehouses, and NoSQL systems – Populate tables in HDFS, Hive, and HBase – Integrate with Oozie as an action – Support plugins via connector based architecture May ‘09 March ‘10 August ‘11 April ‘12 First version Moved to Moved to Apache (HADOOP-5815) GitHub Apache Top Level Project Hadoop Summit 2012. 6/13/12 Apache Sqoop 3 Copyright 2012 The Apache Software Foundation
  • 4. Sqoop 1 Architecture Document Enterprise Based Data Systems Warehouse Relational Database command Hadoop Map Task Sqoop HDFS/HBase/ Hive Hadoop Summit 2012. 6/13/12 Apache Sqoop 4 Copyright 2012 The Apache Software Foundation
  • 5. Sqoop 1 Challenges • Cryptic, contextual command line arguments • Tight coupling between data transfer and output format • Security concerns with openly shared credentials • Not easy to manage installation/configuration • Connectors are forced to follow JDBC model Hadoop Summit 2012. 6/13/12 Apache Sqoop 5 Copyright 2012 The Apache Software Foundation
  • 6. Sqoop 2 Architecture Hadoop Summit 2012. 6/13/12 Apache Sqoop 6 Copyright 2012 The Apache Software Foundation
  • 7. Sqoop 2 Themes • Ease of Use • Ease of Extension • Security Hadoop Summit 2012. 6/13/12 Apache Sqoop 7 Copyright 2012 The Apache Software Foundation
  • 8. Sqoop 2 Themes • Ease of Use • Ease of Extension • Security Hadoop Summit 2012. 6/13/12 Apache Sqoop 8 Copyright 2012 The Apache Software Foundation
  • 9. Ease of Use Sqoop 1 Sqoop 2 Client-only Architecture Client/Server Architecture CLI based CLI + Web based Client access to Hive, HBase Server access to Hive, HBase Oozie and Sqoop tightly coupled Oozie finds REST API Hadoop Summit 2012. 6/13/12 Apache Sqoop 9 Copyright 2012 The Apache Software Foundation
  • 10. Sqoop 1: Client-side Tool • Client-side installation + configuration – Connectors are installed/configured locally – Local requires root privileges – JDBC drivers are needed locally – Database connectivity is needed locally Hadoop Summit 2012. 6/13/12 Apache Sqoop 10 Copyright 2012 The Apache Software Foundation
  • 11. Sqoop 2: Sqoop as a Service • Server-side installation + configuration – Connectors are installed/configured in one place – Managed by administrator and run by operator – JDBC drivers are needed in one place – Database connectivity is needed on the server Hadoop Summit 2012. 6/13/12 Apache Sqoop 11 Copyright 2012 The Apache Software Foundation
  • 12. Client Interface • Sqoop 1 client interface: – Command line interface (CLI) based – Can be automated via scripting • Sqoop 2 client interface: – CLI based (in either interactive or script mode) – Web based (remotely accessible) – REST API is exposed for external tool integration Hadoop Summit 2012. 6/13/12 Apache Sqoop 12 Copyright 2012 The Apache Software Foundation
  • 13. Sqoop 1: Service Level Integration • Hive, HBase – Require local installation • Oozie – von Neumann(esque) integration: • Package Sqoop as an action • Then run Sqoop from node machines, causing one MR job to be dependent on another MR job • Error-prone, difficult to debug Hadoop Summit 2012. 6/13/12 Apache Sqoop 13 Copyright 2012 The Apache Software Foundation
  • 14. Sqoop 2: Service Level Integration • Hive, HBase – Server-side integration • Oozie – REST API integration Hadoop Summit 2012. 6/13/12 Apache Sqoop 14 Copyright 2012 The Apache Software Foundation
  • 15. Ease of Use Sqoop 1 Sqoop 2 Client-only Architecture Client/Server Architecture CLI based CLI + Web based Client access to Hive, HBase Server access to Hive, HBase Oozie and Sqoop tightly coupled Oozie finds REST API Hadoop Summit 2012. 6/13/12 Apache Sqoop 15 Copyright 2012 The Apache Software Foundation
  • 16. Sqoop 2 Themes • Ease of Use • Ease of Extension • Security Hadoop Summit 2012. 6/13/12 Apache Sqoop 16 Copyright 2012 The Apache Software Foundation
  • 17. Ease of Extension Sqoop 1 Sqoop 2 Connector forced to follow JDBC model Connector given free rein Connectors must implement functionality Connectors benefit from common framework of functionality Connector selection is implicit Connector selection is explicit Hadoop Summit 2012. 6/13/12 Apache Sqoop 17 Copyright 2012 The Apache Software Foundation
  • 18. Sqoop 1: Implementing Connectors • Connectors are forced to follow JDBC model – Connectors are limited/required to use common JDBC vocabulary (URL, database, table, etc) • Connectors must implement all Sqoop functionality they want to support – New functionality may not be available for previously implemented connectors Hadoop Summit 2012. 6/13/12 Apache Sqoop 18 Copyright 2012 The Apache Software Foundation
  • 19. Sqoop 2: Implementing Connectors • Connectors are not restricted to JDBC model – Connectors can define own domain • Common functionality are abstracted out of connectors – Connectors are only responsible for data transfer – Common Reduce phase implements data transformation and system integration – Connectors can benefit from future development of common functionality Hadoop Summit 2012. 6/13/12 Apache Sqoop 19 Copyright 2012 The Apache Software Foundation
  • 20. Different Options, Different Results Which is running MySQL? $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST $ sqoop import --connect jdbc:mysql://localhost/db --driver com.mysql.jdbc.Driver --username foo --table TEST • Different options may lead to unpredictable results – Sqoop 2 requires explicit selection of a connector, thus disambiguating the process Hadoop Summit 2012. 6/13/12 Apache Sqoop 20 Copyright 2012 The Apache Software Foundation
  • 21. Sqoop 1: Using Connectors • Choice of connector is implicit – In a simple case, based on the URL in --connect string to access the database – Specification of different options can lead to different connector selection – Error-prone but good for power users Hadoop Summit 2012. 6/13/12 Apache Sqoop 21 Copyright 2012 The Apache Software Foundation
  • 22. Sqoop 1: Using Connectors • Require knowledge of database idiosyncrasies – e.g. Couchbase does not need to specify a table name, which is required, causing --table to get overloaded as backfill or dump operation – e.g. --null-string representation is not supported by all connectors • Functionality is limited to what the implicitly chosen connector supports Hadoop Summit 2012. 6/13/12 Apache Sqoop 22 Copyright 2012 The Apache Software Foundation
  • 23. Sqoop 2: Using Connectors • Users make explicit connector choice – Less error-prone, more predictable • Users need not be aware of the functionality of all connectors – Couchbase users need not care that other connectors use tables Hadoop Summit 2012. 6/13/12 Apache Sqoop 23 Copyright 2012 The Apache Software Foundation
  • 24. Sqoop 2: Using Connectors • Common functionality is available to all connectors – Connectors need not worry about common downstream functionality, such as transformation into various formats and integration with other systems Hadoop Summit 2012. 6/13/12 Apache Sqoop 24 Copyright 2012 The Apache Software Foundation
  • 25. Ease of Extension Sqoop 1 Sqoop 2 Connector forced to follow JDBC model Connector given free rein Connectors must implement functionality Connectors benefit from common framework of functionality Connector selection is implicit Connector selection is explicit Hadoop Summit 2012. 6/13/12 Apache Sqoop 25 Copyright 2012 The Apache Software Foundation
  • 26. Sqoop 2 Themes • Ease of Use • Ease of Extension • Security Hadoop Summit 2012. 6/13/12 Apache Sqoop 26 Copyright 2012 The Apache Software Foundation
  • 27. Security Sqoop 1 Sqoop 2 Support only for Hadoop security Support for Hadoop security and role- based access control to external systems High risk of abusing access to external Reduced risk of abusing access to external systems systems No resource management policy Resource management policy Hadoop Summit 2012. 6/13/12 Apache Sqoop 27 Copyright 2012 The Apache Software Foundation
  • 28. Sqoop 1: Security • Inherit/Propagate Kerberos principal for the jobs it launches • Access to files on HDFS can be controlled via HDFS security • Limited support (user/password) for secure access to external systems Hadoop Summit 2012. 6/13/12 Apache Sqoop 28 Copyright 2012 The Apache Software Foundation
  • 29. Sqoop 2: Security • Inherit/Propagate Kerberos principal for the jobs it launches • Access to files on HDFS can be controlled via HDFS security • Support for secure access to external systems via role-based access to connection objects – Administrators create/edit/delete connections – Operators use connections Hadoop Summit 2012. 6/13/12 Apache Sqoop 29 Copyright 2012 The Apache Software Foundation
  • 30. Sqoop 1: External System Access • Every invocation requires necessary credentials to access external systems (e.g. relational database) – Workaround: create a user with limited access in lieu of giving out password • Does not scale • Permission granularity is hard to obtain • Hard to prevent misuse once credentials are given Hadoop Summit 2012. 6/13/12 Apache Sqoop 30 Copyright 2012 The Apache Software Foundation
  • 31. Sqoop 2: External System Access • Connections are enabled as first-class objects – Connections encompass credentials – Connections are created once and then used many times for various import/export jobs – Connections are created by administrator and used by operator • Safeguard credential access from end users • Connections can be restricted in scope based on operation (import/export) – Operators cannot abuse credentials Hadoop Summit 2012. 6/13/12 Apache Sqoop 31 Copyright 2012 The Apache Software Foundation
  • 32. Sqoop 1: Resource Management • No explicit resource management policy – Users specify the number of map jobs to run – Cannot throttle load on external systems Hadoop Summit 2012. 6/13/12 Apache Sqoop 32 Copyright 2012 The Apache Software Foundation
  • 33. Sqoop 2: Resource Management • Connections allow specification of resource management policy – Administrators can limit the total number of physical connections open at one time – Connections can also be disabled Hadoop Summit 2012. 6/13/12 Apache Sqoop 33 Copyright 2012 The Apache Software Foundation
  • 34. Security Sqoop 1 Sqoop 2 Support only for Hadoop security Support for Hadoop security and role- based access control to external systems High risk of abusing access to external Reduced risk of abusing access to external systems systems No resource management policy Resource management policy Hadoop Summit 2012. 6/13/12 Apache Sqoop 34 Copyright 2012 The Apache Software Foundation
  • 35. Demo Screenshots Hadoop Summit 2012. 6/13/12 Apache Sqoop 35 Copyright 2012 The Apache Software Foundation
  • 36. Demo Screenshots Hadoop Summit 2012. 6/13/12 Apache Sqoop 36 Copyright 2012 The Apache Software Foundation
  • 37. Demo Screenshots Hadoop Summit 2012. 6/13/12 Apache Sqoop 37 Copyright 2012 The Apache Software Foundation
  • 38. Demo Screenshots Hadoop Summit 2012. 6/13/12 Apache Sqoop 38 Copyright 2012 The Apache Software Foundation
  • 39. Demo Screenshots Hadoop Summit 2012. 6/13/12 Apache Sqoop 39 Copyright 2012 The Apache Software Foundation
  • 40. Takeaway Sqoop 2 Highights: – Ease of Use: Sqoop as a Service – Ease of Extension: Connectors benefit from shared functionality – Security: Connections as first-class objects and role-based security Hadoop Summit 2012. 6/13/12 Apache Sqoop 40 Copyright 2012 The Apache Software Foundation
  • 41. Current Status: work-in-progress • Sqoop2 Development: http://issues.apache.org/jira/browse/SQOOP-365 • Sqoop2 Blog Post: http://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop • Sqoop2 Design: http://cwiki.apache.org/confluence/display/SQOOP/Sqoop+2 Hadoop Summit 2012. 6/13/12 Apache Sqoop 41 Copyright 2012 The Apache Software Foundation
  • 42. Current Status: work-in-progress • Sqoop2 Quickstart: http://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Quickstart • Sqoop2 Resource Layout: http://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+-+Resource+Layout • Sqoop2 Feature Requests: http://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Feature+Requests Hadoop Summit 2012. 6/13/12 Apache Sqoop 42 Copyright 2012 The Apache Software Foundation
  • 43. Hadoop Summit 2012. 6/13/12 Apache Sqoop 43 Copyright 2012 The Apache Software Foundation

Notas del editor

  1. Apache Sqoop was created to efficiently transfer bulk data between Hadoop and external structured datastores because databases are not easily accessible by Hadoop. The popularity of Sqoop in enterprise systems confirms that Sqoop does bulk transfer admirably. That said, to enhance its functionality, Sqoop needs to fulfill data integration use-cases as well as become easier to manage and operate.
  2. It’s a Apache TLP now
  3. Different connectors interpret these options differently. Some options are not understood for the same operation by different connectors, while some connectors have custom options that do not apply to others. Confusing for users and detrimental to effective use.Some connectors may support a certain data format while others don’tRequired to use common JDBC vocabulary (URL, database, table, etc.)Cryptic and contextual command line arguments can lead to error-prone connector matching, resulting in user errorsDue to tight coupling between data transfer and the serialization format, some connectors may support a certain data format that others don't (e.g. direct MySQL connector can't support sequence files)There are security concerns with openly shared credentialsBy requiring root privileges, local configuration and installation are not easy to manageDebugging the map job is limited to turning on the verbose flagConnectors are forced to follow the JDBC model and are required to use common JDBC vocabulary (URL, database, table, etc), regardless if it is applicable
  4. Sqoop 1’s challenges will be addressed by Sqoop 2 Architecture Connector only focuses on connectivitySerialization, format conversion, Hive/HBase integration should be uniformly available via frameworkRepository Manager: creates/edits connectors, connections, jobsConnector Manager: register new connectors, enables/disables connectorsJob Manager: submit new jobs to MR, gets job progress, kills specific jobs
  5. Kathleen to take over.Sqoop 2 is a work in progress and you are welcome to join our weekly conference calls discussing its’ design and implementation. Please see me afterwards for more details if interested. During our discussions we’ve identified three pain points for Sqoop 2 to address. Those of you who have used Sqoop will find those points apparent: ease of use, ease of extension, and security.
  6. Pause for Q. Kathleen to take over.Sqoop 2 is a work in progress and you are welcome to join our weekly conference calls discussing its’ design and implementation. Please see me afterwards for more details if interested. During our discussions we’ve identified three pain points for Sqoop 2 to address. Those of you who have used Sqoop will find those points apparent: ease of use, ease of extension, and security.
  7. There are 4 points we want to address with Sqoop 2’s ease of use.
  8. Like other client-side tools, Sqoop 1 requires everything – connectors, root privileges, drivers, db connectivity – to be installed and configured locally.
  9. Sqoop 2 will be a service and as such you can install once and then run everywhere. This means that connectors will be configured in one place, managed by the Admin role and run by the Operator role, which will be discussed in detail later. Likewise, JDBC drivers will be in one place and database connectivity will only be needed on the server.Sqoop as a web-based service, exposes the REST APIFront-ended by CLI and browserBack-ended by a metadata repositoryExample of document based system is couchbase.Sqoop 1 has something called a sqoopmetastore, which is similar to a repository for metadata but not quite. That said, the model of operation for Sqoop 1 and Sqoop 2 is very different: Sqoop 1 was a limited vocabulary tool while Sqoop 2 is more metadata driven. The design of Sqoop 2’s metadata repository is such that it can be replaced by other providers.
  10. Sqoop 1 was intended for power users as evidenced by its CLI. Sqoop 2 has two modes: one for the power user and one for the newbie. Those new to Sqoop will appreciate the interactive UI, which walks you through import/export setup, eliminating redundant/incorrect options. Various connectors are added in one place, with connectors exposing necessary options to the Sqoop framework and with the user only required to provide info relevant to their use-case.Not bound by terminal, well documented return codes
  11. In Sqoop 1, Hive and HBase require local installation. Currently Oozie launches Sqoop by bundling it and running it on the cluster, which is error-prone and difficult to debug.
  12. With Sqoop 2, Hive and HBase integration happens not from client but from the backend. Hive does not need to be installed on Sqoop at all. What Sqoop will do is submit requests to the HiveServer over the wire. Exposing a REST API for operation and management will help Sqoop integrate better with external systems such as Oozie.Oozie and Sqoop will be decoupled: if you install a new Sqoop connector then don’t need to install it in Oozie also.Hive will not invoke anything in Sqoop, while Oozie does invoke Sqoop so the REST API does not benefit Hive in any way but it does benefit OozieWhich Hive/HBase server the data will be put into is the responsibility of the reduce phase which will have its own configuration and since both these systems have are on Hadoop - we don't need any added security besides passing down the Kerberos principal
  13. 4 pts
  14. Pause for questions.
  15. Because Sqoop is heavily JDBC centric, it’s not easy to work with non relational dbCouchbase implementation required different interpretationInconsistencies between connectors
  16. Two-phases: first, transfer; second, transform/integration with other componentsOption to opt-out of downstream processing (i.e. revert to Sqoop 1)Trade-off between ease of connector/tooling development vs faster performanceSeparating data transfer (Map) from data transform (Reduce) allows connectors to specializeConnectors benefit from a common framework of functionality – don’t have to worry about forward compabilityFunctionally, Sqoop 2 is a superset of Sqoop 1 but does it in a different way Too early in the design process to tell if the same CLI commands could be used but most likely not primarily because it is a fundamentally incompatible changeReduce phase limited to stream transformations (no aggregation to start with)
  17. Former is running MySQL b/c specifying driver option prevents the MySQL connector from working i.e. would end up using generic JDBC connector
  18. Based on the URL which is JDBC centric and something Sqoop 2 is moving away from in the connect string used to access the database, Sqoop attempts to predict which driver it should load.What are connectors?Plugin components based on Sqoop’s extension frameworkEfficiently transfer data between Hadoop and external storeMeant for optimized import/export or don’t support native JDBCBundled connectors: MySQL, PostgreSQL, Oracle, SQLServer, JDBCHigh-performance data transfer: Direct MySQL, Direct PostgreSQL
  19. Cryptic and contextual command line arguments can lead to error-prone connector matching, resulting in user errors. Due to tight coupling between data transfer and the serialization format, some connectors may support a certain data format that others don't (e.g. direct MySQL connector can't support sequence files).
  20. With the user making an explicit connector choice in Sqoop 2, it will be less error-prone and more predictable. Connectors are no longer forced to follow the JDBC model and are no longer required to use common JDBC vocabulary (URL, database, table, etc), regardless if it is applicable.
  21. Common functionality will be abstracted out of connectors, holding them responsible only for data transport. The reduce phase will implement common functionality, ensuring that connectors benefit from future development of functionality.
  22. Pause for questions. Bilung to take over.
  23. No code generation, no compilation allows Sqoop to run where there are no compilers, which makes it more secure by preventing bad code from runningPreviously required direct access to Hive/HBaseMore secure because routed through Sqoop server rather than opening up access to all clients to perform jobs
  24. Connection is only for external systems
  25. No need to disable user in database