Joins in HiveQL

As we discussed earlier, HiveQL handles structured data only, much like SQL. This doesn’t mean that Hive just manages structured data, it also processes and transforms the unstructured data into a readable structured way. 

Unstructured data is usually not dependent on static data or other data files. But in structured data, especially if the data is imported from relational systems, due to normalization, the tables may be connected with other tables to obtain meaningful information for a few columns. Hence functionalities of SQL will also be needed in big-data platforms such as Joins, Sub-queries, casting, and conversion functions.

This article focuses on the joins that are available in Hive.

Below is the type of joins available in Hive.

  • INNER JOIN
  • LEFT OUTER JOIN
  • RIGHT OUTER JOIN
  • FULL OUTER JOIN

I am not specifying “self-join” explicitly because it is also an inner join. 

INNER JOIN is the join type that combines two tables to return records that have matching values from both the tables.

LEFT OUTER JOIN returns all records from the left table and the matched records from the right table.

RIGHT OUTER JOIN is a join that returns all records from the right table, and the matched records from the left table.

FULL OUTER JOIN displays all the records i.e. matched records and unmatched records from both the tables. 

Not just the description but the implementation is also as same as in SQL.

Please do let me know if you want to see how it is to be implemented in Hive.

 

3 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s