Sqoop Import Part 3

Sqoop Import Part 3

Sqoop can be used to import data seamlessly into HDFS from RDBMS systems. Sqoop provides many options to handle different scenarios. Many of them will be explained in multiple tutorials with examples. This Sqoop tutorial will cover import with compression, import with boundary query, import incremental data and Import data from Teradata.

Generic Arguments to import command

Attribute

Description

--target-dir

This is used to specify HDFS directory where data need to be imported.

--table

This is used to specify RDBMS table name from where data need to be imported.

--append

This is used to append imported data to the existing HDFS directory.

--delete-target-dir

This is used to delete target HDFS directory(if already exist) before importing data.



→ Import with compression: There are many compression techniques are available such Gzip, Snappy & Bzip.


→ Import Using Boundary Query: There are 2 ways to import data using boundary query. Either primary key should be defined for this work or --spilt-by / single mapper('-m 1') should be used.


→ Import Incremental Data: Incremental import will require "-last-value", "--incremental" & "--check-column" attributes.
sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table orders \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \
 --check-column order_id \
 --incremental append \
 --last-value 1022 \
 --bindir $SQOOP_HOME/lib/


→ Import with specified Number of Executor: "-m " attribute can be used to specify number of executor to import data with parallellism using many executors. In the example below, 2 executors are specified
sqoop import \
 --connect jdbc:teradata://127.0.0.1/database=tutorial_db \
 --driver com.teradata.jdbc.TeraDriver \
 --username tutorial_user \
 --password tutorial_user \
 --table order_item \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/TDImport \
 --bindir /Users/username/other/Hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/lib \
 --delete-target-dir \
 -m 2


→ Import Data from Teradata: Importing data is same as MySql but teradata specific driver & URL need to be used
sqoop import \
 --connect jdbc:teradata://127.0.0.1/database=tutorial_db \
 --driver com.teradata.jdbc.TeraDriver \
 --username tutorial_user \
 --password tutorial_user \
 --table order_item \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/TDImport \
 --bindir /Users/username/other/Hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/lib \
 --delete-target-dir \
 -m 1