Apache Hive – New Columns in Output

In Apache Hive, in addition to choosing existing columns from a table, it is possible to generate new columns in the query output depending on the current columns, like in SQL. If the source table contains one or more columns that allow us to aggregate or manipulate the data in any manner, we can easily retrieve the aggregated or changed value into a new column. Those extra columns, on the other hand, are only visible in the output.

The following example will help you understand:

SELECT StudentID,
              SUM(Amount) PaidAmount,
              MIN(PaymentDate) FirstPaymentDate,
              MAX(PaymentDate) LastPaymentDate,
              COUNT(*) TotalPayments
FROM StudentPayments
GROUP BY StudentID;

The preceding query will return 5 columns, with “StudentID” being the only one that exists in the table. The remaining columns have been aggregated/manipulated and are logical.

Hope you find this article helpful.

Please subscribe for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s