SQOOP Complete Tutorial Part-1

Sqoop (SQL-to-Hadoop) is a Hadoop ecosystem component and an ETL tool that offers the capability to extract data from various structured data stores such as relational databases with the help of map-reduce. This command-line interpreter works efficiently to transfer a huge volume of the data to the Hadoop file system (HDFS, HBase, and Hive). Similarly, it exports the data from Hadoop to SQL.

Sqoop offers parallel processing as well as fault tolerance along with the automation functionality that helps in scheduling the data transfer jobs.


Sqoop Import

Imports individual or all the tables from RDBMS to HDFS or to Hive or HBase. Regardless of the target location, the table data will be stored as text files.

In HDFS, each row of the table is treated as a record, each table is treated as a sub-directory, and table data is stored as text files. Users can opt to store the table data either in text format or as binary data or in Avro format or in row-column file format or in Sequence file formats. Users can also have the privilege to opt if the data to be compressed.

In Hive, the target database must be created before importing all the tables from RDBMS. If not, the tables will be imported into the Hive’s ‘default’ database.

Sqoop Import will take care of the table’s creation if the table does not exist in Hive metastore.

Sqoop Export

Exports the files from Hadoop distributed file system to RDBMS. Sqoop will parse the contents of the files and convert them into rows and tables. Each line will be split into multiple columns based on the delimiter specified.

Before we start working with Sqoop, please note the following:

Sqoop is a command-line interface and the default HDFS terminal is used to issue Sqoop commands. There is hardly any need for GUI since the scope of Sqoop is limited to import and export the data. Please also note that Sqoop is case sensitive. One has to be very careful with the names of the table, column, directory, sub-directory, source or target locations along with the Sqoop reserved keywords.