Sqoop Import Part 2

Sqoop Import Part 2

Sqoop can be used to import data seamlessly into HDFS from RDBMS systems. Sqoop provides many options to handle different scenarios. Many of them will be explained in multiple tutorials with examples. This Sqoop tutorial will cover import specific column, import with query, Import data with Nulls, Import with where clause, Import into avro file, Import with Append.

Generic Arguments to import command

Attribute

Description

--target-dir

This is used to specify HDFS directory where data need to be imported.

--table

This is used to specify RDBMS table name from where data need to be imported.

--append

This is used to append imported data to the existing HDFS directory.

--delete-target-dir

This is used to delete target HDFS directory(if already exist) before importing data.



→ Import Specific columns: "--columns" argument can be used to import specific columns.
 sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table orders \
 --columns order_id,order_status,order_date \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_orders \
 --bindir $SQOOP_HOME/lib/

Note: If numerical column is not specified as the first column in the list then it will fail with "java.lang.NumberFormatException". Solution for it is to specify split-by clause with the numerical column.
sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table orders_new \
 --columns order_status,order_id,order_date \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_orders6 \
 --bindir $SQOOP_HOME/lib/ \
 --split-by order_id


→ Import with Query: " --query" attribute can be used to import with the given query. Sqoop import with query will fail if $CONDITIONS' is not specified in Where condition. "--query" and "--table" attributes cannot be used together.
 sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --query 'select order_status,order_id from orders WHERE $CONDITIONS' \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/query_orders \
 --bindir $SQOOP_HOME/lib/ \
 --split-by order_id


→ Import Data with Nulls: If nulls are not handled properly then column with null data will be fetched and it will be import as 'null' String. There are different arguments to handle nulls in String and number. Both "--null-string" & "--null-non-string" clauses can be used in a single import.



→ Import With WHERE Clause: "--where" argument can be used to filter data from table during import. In the below example, where clause is used to fetch only order_id with values 1,100,102,101"
sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table orders \
 --where "order_id in (1,100,102,101)" \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \
 --bindir $SQOOP_HOME/lib/ 


→ Import as Avro File: "--as-avrodatafile" argument can be used to import data in Avro data format.
 sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table order_items \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/avro \
--as-avrodatafile


→ Import with append:
sqoop import  \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table orders \
 --where "order_id in (11,1001,1021,1011)" \
 --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \
 --bindir $SQOOP_HOME/lib/ \
--append