Sqoop can be used to import data seamlessly into HDFS from RDBMS systems. Sqoop provides many options to handle different scenarios. Many of them will be explained in multiple tutorials with examples. This Sqoop tutorial will cover import specific column, import with query, Import data with Nulls, Import with where clause, Import into avro file, Import with Append.
Generic Arguments to import command
Attribute |
Description |
--target-dir |
This is used to specify HDFS directory where data need to be imported. |
--table |
This is used to specify RDBMS table name from where data need to be imported. |
--append |
This is used to append imported data to the existing HDFS directory. |
--delete-target-dir |
This is used to delete target HDFS directory(if already exist) before importing data. |
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --columns order_id,order_status,order_date \ --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_orders \ --bindir $SQOOP_HOME/lib/
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders_new \ --columns order_status,order_id,order_date \ --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_orders6 \ --bindir $SQOOP_HOME/lib/ \ --split-by order_id
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --query 'select order_status,order_id from orders WHERE $CONDITIONS' \ --target-dir hdfs://localhost:9000/user/username/scoop_import/query_orders \ --bindir $SQOOP_HOME/lib/ \ --split-by order_id
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --query 'select order_status,order_id from orders where order_id in (1,100,102,101) AND $CONDITIONS' \ --target-dir hdfs://localhost:9000/user/username/scoop_import/query_orders_null \ --bindir $SQOOP_HOME/lib/ \ --split-by order_id Output: null,1 null,100 CLOSED,101 null,102
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --query 'select order_status,order_id from orders where order_id in (1,100,102,101) AND $CONDITIONS' \ --target-dir hdfs://localhost:9000/user/username/scoop_import/query_orders_null \ --bindir $SQOOP_HOME/lib/ \ --split-by order_id \ --null-string ''
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --query 'select order_status,order_id from orders where order_id in (1,100,102,101) AND $CONDITIONS' \ --target-dir hdfs://localhost:9000/user/username/scoop_import/query_orders_null \ --bindir $SQOOP_HOME/lib/ \ --split-by order_id \ --null-non-string -1
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --where "order_id in (1,100,102,101)" \ --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \ --bindir $SQOOP_HOME/lib/
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table order_items \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro \ --as-avrodatafile
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --where "order_id in (11,1001,1021,1011)" \ --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \ --bindir $SQOOP_HOME/lib/ \ --append