Sqoop Complete Tutorial Part-2

This is the continuation part of “Sqoop Complete Tutorial”. If you want to read
Part-1, please click here.
Part-2, please click here.
Part-3, please click here.
Part-4, please click here.
Part-5, please click here.
Part-6, please click here.
Part-7, please click here.

We are about to discuss the following in the current and in the upcoming parts.

Importing data from MySQL to HDFS

  • Importing a table from MySQL to the default location of the HDFS
  • Importing a table from MySQL to user-specified location in HDFS
  • Importing all the tables from MySQL to the default location of the HDFS
  • Importing all the tables from MySQL to user-specified location in HDFS.
  • Importing all the tables but excluding few tables from MySQL to HDFS.
  • Importing part of the data from a table from MySQL to HDFS using WHERE condition.
  • Importing a few columns from a table from MySQL to HDFS
  • Importing and compressing the table’s data to HDFS
  • Importing data by specifying the column delimiter.
  • Importing data by splitting the data based on a specific column.
  • Import the data from MySQL to HDFS based on a SQL Query.
  • From MySQL incremental import data into HDFS.
  • Import and append the data.
  • Import data from MySQL and store in a sequential file format in HDFS.
  • Read the contents of the source database without importing it.
  • Get the list of the databases.
  • Import specific columns from a table to HDFS.
  • Controlling the parallelism while import.

The format of the import process is the same however some additional keywords will be used if the target is not HDFS.

Importing data from MySQL to Hive

  • Importing a table from MySQL to Hive’s default database.
  • Importing a table from MySQL to Hive’s user’s database.
  • Importing all the tables from MySQL to Hive
  • Importing all the tables but excluding few from MySQL to Hive
  • Import the table’s structure from MySQL to Hive without data.

Exporting data from from HDFS to MySQL

  • Exporting CSV/Text data from HDFS to MySQL
  • Exporting into a staging table
  • Sqoop Export – Update the records in the destination table.
  • Sqoop Export – Append the new and modify the existing the values in the destination table.

Exporting data from from Hive to MySQL

  • Export data from Hive Warehouse to MySQL
  • Export data from Hive Catalog to MySQL

Sqoop Automation

  • Sqoop – Creating a job
  • Sqoop – Displaying the list of the jobs
  • Sqoop – Displaying the properties of a job
  • Sqoop – Executing the job
  • Sqoop – Deleting the job

Importing data from MySQL to HBase

  • Importing a table from MySQL to HBase
  • Importing a table with some selected columns from MySQL to HBase


Ok, let us begin the exercises with the above-mentioned points one by one.

Importing data from MySQL to HDFS


1) Importing a table from MySQL to HDFS default location.

sqoop import
–connect jdbc:mysql://localhost/retail_db
–table categories
–username root
–password cloudera
-m 1

Explanation: “Sqoop Import” is the keyword to be used while import. Sqoop is a JDBC utility to move data between Hadoop Ecosystems and relational databases. Username and password must be provided to connect to the RDBMS.

Assigning mappers is the user’s choice. Users can assign a number of mappers based on the size of the data that is being imported. In case the password not to be disclosed in the connection string, the user can simply pass -P argument without providing any password. During the run-time, it will prompt you to enter the password. Refer to below –

Importing a table from MySQL to HDFS default location – Hiding password.

sqoop import
–connect jdbc:mysql://localhost/retail_db
–table categories
–username root
–P
-m 1

The above command will create the directory named “categories” and the table data will be stored as a file (part-m-00000) in the specified directory “/user/cloudera”

2) Importing a table from MySQL to HDFS user-specified target location.

sqoop import
–connect jdbc:mysql://localhost/retail_db
–table categories
–username root
–password cloudera
–target-dir ‘retail_db’
-m 1

3) Importing all the tables from MySQL to HDFS default location

sqoop import-all-tables
–connect jdbc:mysql://localhost/retail_db
–username root
–password cloudera

Note: All the tables will be placed in the root directory as files.

4) Importing all the tables from MySQL to HDFS user-specified target location.

sqoop import-all-tables
–connect jdbc:mysql://localhost/retail_db
–username root
–password cloudera
–warehouse-dir ‘retaildb’

Note: All the tables will be placed in the specified directory. The keyword “target-dir” will not work with import-all-tables. No mappers assigned since a single mapper is insufficient to import data from the “retail_db” database as the volume is higher than the HDFS block size.

Please click here for the next part.

6 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s