Sqoop Complete Tutorial Part-14

This is the final installment of the “Sqoop Complete Tutorial” series. As previously stated, Apache Sqoop is an ETL tool for moving large amounts of data between Apache Hadoop, Apache Hive, and structured datastores like relational databases. There are various arguments that aid in restricting, filtering, and accomplishing the data transmission need. In order to perform the needed work without difficulty, one must be aware of the appropriate arguments to provide.

In this post, we will see how to transfer data from MySQL to HBase using Sqoop.
The following will be addressed –
> Importing a table from MySQL to HBase
> Importing a table with some selected columns from MySQL to HBase

Importing a table from MySQL to HBase
In earlier article it was explained that the table should be created in MySQL when exporting data from Hive. Similarly, when exporting data from MySQL to HBase, a table must be created in HBase.

Let’s consider our “EMP” dataset for the practice. The table is already existed in MySQL database. Before import, let’s create the table in Hbase which is a column-oriented database and data is stored in tables.

If you need “EMP” and “DEPT” datasets for MySQL, please click here.

[cloudera@quickstart ~]$ hbase shell;
2021-08-15 03:12:38,741 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help<RETURN>’ for list of supported commands.
Type “exit<RETURN>” to leave the HBase Shell
Version 1.2.0-cdh5.10.0, rUnknown, Fri Jan 20 12:13:18 PST 2017

hbase(main):001:0> create ’emp’,’emp_contents’;
0 row(s) in 5.2380 seconds

=> Hbase::Table – emp
hbase(main):002:0>

The above command will create a table named “emp” with the column family “emp contents.” Please keep in mind that the Hbase schema’s column families are key-value pairs. Please also note that Hbase is case-sensitive, so use capital letters with caution.

sqoop import
–connect jdbc:mysql://localhost/NewTestDB
–username root
–password cloudera
–table emp
–hbase-table emp
–column-family emp_contents
–hbase-row-key empno
-m 1;

Execution Outcome:
21/08/15 04:29:11 INFO mapreduce.Job: Running job: job_1629026505943_0001
21/08/15 04:29:25 INFO mapreduce.Job: Job job_1629026505943_0001 running in uber mode : false
21/08/15 04:29:25 INFO mapreduce.Job: map 0% reduce 0%
21/08/15 04:29:33 INFO mapreduce.Job: map 100% reduce 0%
21/08/15 04:29:33 INFO mapreduce.Job: Job job_1629026505943_0001 completed successfully
21/08/15 04:29:33 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=180212
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=1
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=5846
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5846
Total vcore-seconds taken by all map tasks=5846
Total megabyte-seconds taken by all map tasks=5986304
Map-Reduce Framework
Map input records=14
Map output records=14
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=115
CPU time spent (ms)=3500
Physical memory (bytes) snapshot=277659648
Virtual memory (bytes) snapshot=1578172416
Total committed heap usage (bytes)=221249536
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
21/08/15 04:29:33 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 26.5879 seconds (0 bytes/sec)
21/08/15 04:29:33 INFO mapreduce.ImportJobBase: Retrieved 14 records.
[cloudera@quickstart ~]$

Query HBase table to verify if the table has data.
hbase(main):002:0> scan ’emp’

HBaseScanEmpOutput

Importing a table with some selected columns from MySQL to HBase
The implementation is the same as before, with the addition of a new parameter, “columns.” The code is provided below.

sqoop import
–connect jdbc:mysql://localhost/NewTestDB
–username root
–password cloudera
–table emp
–hbase-table emp
–columns “empno,ename,sal”
–column-family emp_contents
–hbase-row-key empno
-m 1;