Sqoop Import as Parquet File

We often wanted to import data in parquet format to take use of the benefit of parquet, which reduces storage space by using highly efficient column-wise compression and configurable encoding algorithms for columns with diverse data types. In earlier versions of CDH didn’t support importing the table data as parquet. Since Sqoop v1.4.6 (CDH 5.5) onwards it is possible.

Here are the examples:

sqoop import \
–-connect jdbc:mysql://localhost/dbTest \
–-username root
–-password root
–query ‘SELECT EmpNo, EName, DeptNo FROM Emp WHERE $CONDITIONS’
–split-by EmpNo \
–-hive-import \
–hive-database default \
–hive-table Employee \
–-target-dir empdir \
–-as-parquetfile;

If you want the table’s data to be imported into HDFS then use the below query.

sqoop import \
–-connect jdbc:mysql://localhost/dbTest \
–-username root
–-password root
–query ‘SELECT EmpNo, EName, DeptNo FROM Emp WHERE $CONDITIONS’
–split-by EmpNo \
–-target-dir /home/cloudera/user/hive/warehouse/empdir \
–-as-parquetfile;

Hope you find this article helpful.

Please subscribe for more interesting updates.