SQOOP Complete Tutorial Part-10

In this session, we’ll work with staging tables while exporting data from HDFS to MySQL.

Staging tables are the tables that hold the data temporarily.

If any exceptions are raised by the map reduce jobs, the data may only be loaded partly in the target table. Instead of transferring data to the target table, staging tables are utilized in certain instances. If there are no exceptions, data will be moved from the staging table to the target table.

Let’s consider the below data-set to demonstrate why staging table is required:

The below Employees data is stored in a file called emp in /user/cloudera/empdir.

101, ‘Smith’,20000,’IT’
102, ‘Sean’,20000,’IT’
103, ‘Debby’,15000,’Accounts’
104, ‘Juan’,14000,’Accounts’
105, ‘Mona’,15000,’HR’

In MySQL, use the below commands to create the target table in NewTestDB database.

–Creating the Target table
CREATE TABLE Emp(
EmpID INT NOT NULL PRIMARY KEY,
Ename VARCHAR(20),
Salary INT,
Dept VARCHAR(10));

Let’s now use Sqoop to export the data from the HDFS console.

sqoop export \
–connect jdbc:mysql://localhost/NewTestDB \
–username root \
–table Emp\
–export-dir /user/cloudera/empdir

Let’s return to MySQL and query the table “Emp” to verify if the data was successfully exported. This is the same as what was shown in the last blog, so please see the screenshots here.

Now, let’s delete few records from “Emp” from MySQL table as shown below:

DELETE FROM Emp WHERE EmpID <> 104;

Please note that EmpIDs are listed from 101 to 105, with the exception of 104, all other rows being deleted on purpose. Also note that EmpID is a primary key column.

Now re-execute the “Sqoop export” command. The task will be started, mappers will be allocated, and data export will begin, but it will throw exceptions for the reason listed below.

As stated above, the EmpID column in the Emp table has a primary key constraint, and EmpID 104 already exists. MySQL will reject the redundancy during the data insertion into the table and throw a primary key constraint violation. At this point, the Sqoop task will be canceled, resulting in a partial export.

See the below screenshots.

This slideshow requires JavaScript.

Now query the MySQL table and see the result.

MySQL_Staging_PartialExport

EmpID 105 is not imported into the database, as you may have seen. Let’s have a look at how a staging table works in this case.

Except for EmpID 104, I’m deleting the data from the Emp table once more.

DELETE FROM Emp WHERE EmpID <> 104;

and creating a staging table in MySQL as shown below.
CREATE TABLE Emp_Stage(
EmpID INT NOT NULL PRIMARY KEY,
Ename VARCHAR(20),
Salary INT,
Dept VARCHAR(10));

and executing the below Sqoop command.

sqoop export
–connect jdbc:mysql://localhost/NewTestDB
–username root
–password cloudera
–table Emp
–export-dir /user/cloudera/empdir
–staging-table Emp_Stage
–clear-staging-table

The goal of the above command is to put the data into a staging table, and if the export goes off without a hitch, the data will be inserted into the target table. However, because one row was purposely left un-deleted to induce a constraint violation, it would import successfully into the staging table but fail to insert into the target table. Now, let’s delete the rows from both the tables and re-run the Sqoop command.

In MySQL
TRUNCATE TABLE Emp;
TRUNCATE TABLE Emp_Stage;

In the HDFS terminal –
sqoop export
–connect jdbc:mysql://localhost/NewTestDB
–username root
–password cloudera
–table Emp
–export-dir /user/cloudera/empdir
–staging-table Emp_Stage
–clear-staging-table

The following steps will be performed by the Sqoop-Export command.
1) The data will be exported from HDFS to MySQL staging table.
2) Since there are no exceptions, it will start migrating data from staging table to target-table.
3) Staging table’s purpose is solved hence it will clear the data from staging-table.

Hope you find this article helpful.

Please subscribe to get updates on latest posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s