Sqoop Export Part 1

Sqoop Export Part 1

Sqoop can be used to export data seamlessly from HDFS into RDBMS systems. Sqoop provides many options to handle different scenarios. Many of them will be explained in multiple tutorials with examples. This Sqoop tutorial will cover simple export, export with delimiter, export specific columns, export with update only and export with upsert.

Generic Arguments to export command

Attribute

Description

--export-dir

This is used to specify HDFS directory from where data need to be exported.

--table

This is used to specify RDBMS table name where data need to be exported.


→ Simple Export: By default data is assumed to be delimited by comma (,) if not specifically specified.
sqoop export \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table order_items \
 --export-dir hdfs://localhost:9000/user/username/retail_db/order_items


→ Export from Delimited File: "--input-fields-terminated-by" can be used to specify specific delimiter present in input data file. In the below example, ctrl A is used present as file delimiter.
sqoop export \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table order_items \
 --export-dir hdfs://localhost:9000/user/username/retail_db/order_items
 --input-fields-terminated-by "\001"


→ Export Specific columns: "--columns" argument can be used to export specific columns into RDBMS system.
sqoop export \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table daily_revenue_demo \
 --export-dir hdfs://localhost:9000/user/hive/warehouse/daily_revenue.db/daily_revenue_txt \
 --bindir $SQOOP_HOME/lib/ \
 --columns order_date,revenue


→ Export Delimited Specific columns: "--columns" and "--input-fields-terminated-by" arguments can be used to export specific columns from delimited file into RDBMS system.
In the below example, only 2 columns are loaded into MySql table from 'Ctrl A' delimited file.
sqoop export \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
 --table daily_revenue_demo \
 --export-dir hdfs://localhost:9000/user/hive/warehouse/daily_revenue.db/daily_revenue_ctlr_a \
 --bindir $SQOOP_HOME/lib/ \
 --columns order_date,revenue \
 --input-fields-terminated-by "\001"


→ Export With Only Update: If "--update-key" attribute is used without "--update-mode" then it will only update existing data on specified column but it will not insert any new rows.
sqoop export \
  --connect jdbc:mysql://localhost:3306/retail_db \
  --username root \
  --password mysqlrootpassword \
  --driver com.mysql.cj.jdbc.Driver \
  --table orders \
  --export-dir hdfs://localhost:9000/user/username/retail_db/orders \
  --bindir $SQOOP_HOME/lib/ \
  --update-key order_id


→ Export With Upsert: If "--update-key" attribute is used along with "--update-mode allowinsert" then it will not only update existing data on specified column but also insert new rows.
sqoop export \
 --connect jdbc:mysql://localhost:3306/retail_db \
 --username root \
 --password mysqlrootpassword \
 --table orders \
 --export-dir hdfs://localhost:9000/user/username/retail_db/orders \
 --bindir $SQOOP_HOME/lib/ \
 --update-key order_id \
 --update-mode allowinsert
Note: For upsert to work in mysql, update key should be primary key else it will insert only.