SQOOP Complete Tutorial Part-11

This is the continuation part of “Sqoop Complete Tutorial”.

The following will be covered in this article.

  • Sqoop Export – Update the records in the destination table.
  • Sqoop Export – Append the new and modify the existing the values in the destination table.

When exporting data from Hive/HDFS to RDBMS, we frequently need to either update existing records OR update current records and append new entries. The following export control arguments are crucial to understand in order to achieve the above specified implementations.

–update-key <col-name>
Anchor column to use for updates. Use a comma separated list of columns if there are more than one column.

–update-mode <mode>
Specify how updates are performed when new rows are found with non-matching keys in database. Legal values for mode include updateonly (default) and allowinsert.

Simply put, if we specify —update-key, the export will run in update-only mode, but if we set update-mode as ‘allowinsert’, the row will be added if the key does not exist.

Below is the dataset which will be used for the experiment:

Emp dataset in MySQL and HDFS directory
101,’Smith’,20000,’IT’
102,’Sean’,20000,’IT’
103,’Debby’,15000,’Accounts’
104,’Juan’,14000,’Accounts’
105,’Mona’,15000,’HR’

SampleDataInHDFSandMySQL

In the HDFS dataset, I’m now adding one new row and altering one of the old ones. So my new data will be as shown below.

101,’Smith’,40000,’IT’
102,’Sean’,20000,’IT’
103,’Debby’,15000,’Accounts’
104,’Juan’,14000,’Accounts’
105,’Mona’,15000,’HR’
106,’Richards’,12000,’HR’

Now, let’s execute the below Sqoop Command in HDFS terminal.

sqoop export
–connect jdbc:mysql://localhost/NewTestDB
–username root
–password cloudera
–table Emp
–export-dir /user/cloudera/empdir
–update-key EmpID

This slideshow requires JavaScript.

When you look at the final result in MySQL, you’ll notice that the record for EmpID # 101 has been changed. However, no new row was added to the table. Because we just gave “update-key,” this is to be anticipated. Let’s check what happens if we include the “update-mode” argument as well.

sqoop export
–connect jdbc:mysql://localhost/NewTestDB
–username root
–password cloudera
–table Emp
–export-dir /user/cloudera/empdir
–update-key EmpID
–update-mode allowinsert

MySQLBeforeAfter_SqoopExportUpdateInsert

It will check whether any records need to be changed or if new records need to be added with the aforementioned implementation. EmpID # 106 has been added to the table, as you can see in the output.

I hope you found this post to be informative. Please let me know if you have any questions or if further information is required.

Don’t forget to subscribe!!!

Leave a Reply