Sqoop Introduction

Sqoop Introduction

Sqoop term is combination of SQL and Hadoop i.e SQL + HADOOP = SQOOP. Sqoop was initially created by Cloudera and now it is maintained by Apache. Sqoop was created to import & export data seamlessly in & out of HDFS from & to RDBMS systems.

Sqoop provide many utilities apart of just import/export data

Sqoop Command

Description

help

List all Sqoop commands

version

Display Sqoop version

list-databases

List available databases on connected database

list-tables

List available tables in a connected database

eval

Execute a SQL statement and return the results

export

Export an HDFS directory/file to a database table

import

Import a table from a database to HDFS

import-all-tables

Import tables from a database to HDFS

codegen

Generate code which will used by Sqoop Import/Export

job

Create Sqoop jobs to perform repeated task



Hadoop Setup: Hadoop setup should be present for Sqoop. If not present then please complete hadoop setup first. Steps available on Hadoop Setup

Initial Sqoop Setup: Click here to download Sqoop binary or download directly from apache website http://sqoop.apache.org/ . Place & extract the Sqoop package in $HOME/hadoop directory.
Also set SQOOP_HOME & PATH in .profile file in Home directory(~/.profile) as shown below.
##.profile
export SQOOP_HOME=$HOME/hadoop/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin

Also set HADOOP_HOME in sqoop-env.sh file in $SQOOP_HOME/conf directory for HADOOP_MAPRED_HOME & HADOOP_COMMON_HOME variables.
##sqoop-env.sh
export HADOOP_MAPRED_HOME=[$HADOOP_HOME Path]
export HADOOP_COMMON_HOME=[$HADOOP_HOME Path]


Generic Arguments to the commands

Argument

Description

connect

RDBMS connection URL used by Sqoop to connect to database server (with or without database name).

username

This is the username of database to be used for connecting database.

password

This is the password of username used for connecting database.

driver

This is jdbc driver class for the underlying database.



→ Version: This will display the current Sqoop version
sqoop version

Output Sqoop 1.4.7

→ Help: This can be used to get help related to Sqoop commands.
Syntax:  sqoop help [command]

Example 1: sqoop help

Example 2: sqoop help import

Example 3: sqoop help export

Example 4: sqoop help eval

→ List Database: This will display all the databases present.
sqoop list-databases \
  --connect jdbc:mysql://localhost:3306 \
  --username root \
  --password mysqlrootpassword

→ List tables: This will display all the tables present in "demo_db" database. demo_db can be changed with required DB name.
sqoop list-tables \
  --connect jdbc:mysql://localhost:3306/demo_db \
  --username root \
  --password mysqlrootpassword

→ EVAL: Eval can be used to run adhoc queries such as Select, Insert, delete & update in Database. demo_db can be changed with required DB name and any database query can be specified after "--query".
sqoop eval \
  --connect jdbc:mysql://localhost:3306/demo_db \
  --username root \
  --password mysqlrootpassword \
 --driver com.mysql.cj.jdbc.Driver \
  --query "SELECT * FROM order_items LIMIT 10"

→ Import-all-tables: This command can be used to import all the tables from a particular database. demo_db can be changed with required DB name. "--warehouse-dir" attribute can be used when importing multiple tables.
sqoop import-all-tables \
--connect jdbc:mysql://localhost:3306/demo_db?zeroDateTimeBehavior=CONVERT_TO_NULL \
--username root \
--password mysqlrootpassword \
--autoreset-to-one-mapper \
--warehouse-dir all_table_import 
Note: "--autoreset-to-one-mapper" attribute is used while importing all tables so that import will not fail when it encounter tables with no primary key.

Sqoop provide a way to exclude some of the tables while importing all the tables. --exclude-tables "comma separated tablenames" can be used to exclude tables from imports.
 sqoop import-all-tables \
--connect jdbc:mysql://localhost:3306/retail_db?zeroDateTimeBehavior=CONVERT_TO_NULL \
--username root \
--password mysqlrootpassword \
--autoreset-to-one-mapper \
--warehouse-dir all_table_import1 \
--exclude-tables "order_items_wp,orders_sqoop,orders_new,orders,orders_wp"


→ CodeGen: This command can be used to generate java code which will be used by Sqoop to import/export data. demo_db can be changed with required DB name.
sqoop codegen \
  --connect jdbc:mysql://localhost:3306/demo_db \
  --username root \
  --password mysqlrootpassword \
  --driver com.mysql.cj.jdbc.Driver \
 --table order_items