2. Agenda
Integrating with existing Enterprise System - Scoop
Load the data from RDBMS to Hadoop
Load data from RDBMS to Hive
Load data from Hadoop to RDBMS
3. Apache Scoop
Efficiently transferring bulk data between Apache Hadoop and structured
datastores such as relational databases.
Import data from DBMS to Hadoop / Hive / Hbase
Export data from Hadoop to DBMS
5. Scenarios for Loading Data from an RDBMS to HDFS
Sample Scenario - Consider you have get unstructured data from various
sites like (Twitter / Forums / Facebook / Sites etc). We need to get some
insight by joining the social data with customer data stored in DBMS.
Typical Scenario: The need to use data stored in a Relational
Database Management System (Oracle, MySQL etc) in a
MapReduce job
– Lookup tables
– Legacy data
6. Loading Data to HDFS
Import database table (Having Primary Key) to HDFS
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--target-dir <hdfsPath>
Import database table (Having Primary Key) to HDFS based on some condition
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--target-dir <hdfsPath> --where "sal > 1000"
Import database table (No Primary Key) to HDFS
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--target-dir <hdfsPath> -m 1
7. Loading Data to Hive
Import database table (Having Primary Key) to Hive
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--hive-table tableName --create-hive-table --hive-import
8. Loading Data to HBase
Import database table (Having Primary Key) to HBase
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--hbase-table <hbase_tableName> --column-family <hbase_table_col1> --hbase-create-table
Import database table (Having Primary Key) to Hbase. Here some columns are imported
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--hbase-table <hbase_tableName> --columns column1,column2 --column-family
<hbase_table_col1> --hbase-create-table
Import database table (Having No Primary Key) to Hbase. Here we mention one column as
row key
sqoop import --connect <DBURL> --username <name> --password <pass> --table <tableName>
--hbase-table <hbase_tableName> --column-family <hbase_table_col1> --hbase-row-key
column1 --hbase-create-table
9. Exporting Data to DBMS
Exporting data from Hive to DBMS
sqoop export --connect <DBURL> --username <name> --password <pass> --table <tableName>
--export-dir /user/hive/warehouse/<tableName> --input-fields-terminated-by ','