Sqoop Import Part 1

Sqoop Import Part 1

Sqoop can be used to import data seamlessly into HDFS from RDBMS systems. Sqoop provides many options to handle different scenarios. Many of them will be explained in multiple tutorials with examples. This Sqoop tutorial will cover simple import, import with delimiter, delete existing directory before importing data, importing data to existing directory and import without primary key.

Generic Arguments to import command

Attribute

Description

--target-dir

This is used to specify HDFS directory where data need to be imported.

--table

This is used to specify RDBMS table name from where data need to be imported.

--append

This is used to append imported data to the existing HDFS directory.

--delete-target-dir

This is used to delete target HDFS directory(if already exist) before importing data.


Sqoop Import Data File Formats
→ Simple Import: By default data is delimited by comma (,) if not specified
sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table order_items \
 --target-dir hdfs://localhost:9000/user/username/scoop_import


→ Import with Delimter: "--fields-terminated-by" can be used to specify specific delimiter for the imported data. In the below example, pipe(|) is used as delimiter.
 sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table products \
 --fields-terminated-by "|" \
 --target-dir product \
 --delete-target-dir \
 --bindir $SQOOP_HOME/lib/ \
 -m 1


→ Import With Delete: "--delete-target-dir" argument can be used to delete target HDFS directory(if exists) before importing data.
 sqoop import \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table products \
 --target-dir product \
 --delete-target-dir \
 --bindir $SQOOP_HOME/lib/ \
 -m 1


→ Import With Append: "--append" argument can be used to append imported data to the existing HDFS directory
sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table orders \
 --where "order_id in (11,1001,1021,1011)" \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \
 --bindir $SQOOP_HOME/lib/ \
--append


→ Import Without Primary Key: There are 2 ways to import data from tables which don't have primary keys defined.