Sqoop can be used to import data seamlessly into HDFS from RDBMS systems. Sqoop provides many options to handle different scenarios. Many of them will be explained in multiple tutorials with examples. This Sqoop tutorial will cover import with compression, import with boundary query, import incremental data and Import data from Teradata.
Generic Arguments to import command
Attribute |
Description |
--target-dir |
This is used to specify HDFS directory where data need to be imported. |
--table |
This is used to specify RDBMS table name from where data need to be imported. |
--append |
This is used to append imported data to the existing HDFS directory. |
--delete-target-dir |
This is used to delete target HDFS directory(if already exist) before importing data. |
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlroot \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro_zip \ --append \ --split-by order_id \ --bindir /Users/username/others/Hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/lib/ \ --compress sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro_zip \ --append \ --compression-codec Gzip sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro_zip \ --append \ --compression-codec org.apache.hadoop.io.compress.GzipCodec
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro_snaapy \ --as-avrodatafile \ --append \ --compression-codec snappy sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro_snaapy \ --as-avrodatafile \ --append \ --compression-codec org.apache.hadoop.io.compress.SnappyCodec
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/avro_zip \ --append \ --compression-codec org.apache.hadoop.io.compress.BZip2Codec
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table order_items \ --split-by order_item_id \ --target-dir hdfs://localhost:9000/user/username/scoop_import/boundary_query3 \ --boundary-query 'SELECT MIN(order_item_id),MAX(order_item_id) FROM order_items where order_item_id>9999'
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table order_items \ --split-by order_item_id \ --target-dir hdfs://localhost:9000/user/username/scoop_import/boundary_query_other \ --boundary-query 'SELECT 100000,172198'
sqoop import \ --connect jdbc:mysql://localhost:3306/retail_db \ --username root \ --password mysqlrootpassword \ --driver com.mysql.cj.jdbc.Driver \ --table orders \ --target-dir hdfs://localhost:9000/user/username/scoop_import/partial_column_where \ --check-column order_id \ --incremental append \ --last-value 1022 \ --bindir $SQOOP_HOME/lib/
sqoop import \ --connect jdbc:teradata://127.0.0.1/database=tutorial_db \ --driver com.teradata.jdbc.TeraDriver \ --username tutorial_user \ --password tutorial_user \ --table order_item \ --target-dir hdfs://localhost:9000/user/username/scoop_import/TDImport \ --bindir /Users/username/other/Hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/lib \ --delete-target-dir \ -m 2
sqoop import \ --connect jdbc:teradata://127.0.0.1/database=tutorial_db \ --driver com.teradata.jdbc.TeraDriver \ --username tutorial_user \ --password tutorial_user \ --table order_item \ --target-dir hdfs://localhost:9000/user/username/scoop_import/TDImport \ --bindir /Users/username/other/Hadoop/sqoop-1.4.7.bin__hadoop-2.6.0/lib \ --delete-target-dir \ -m 1