In Apache Hive, in addition to choosing existing columns from a table, it is possible to generate new columns in the query output depending on the current columns, like in SQL. If the source table contains one or more columns that allow us to aggregate or manipulate the data in any manner, we can easily retrieve the aggregated or changed value into a new column. Those extra columns, on the other hand, are only visible in the output.
The following example will help you understand:
SELECT StudentID,
SUM(Amount) PaidAmount,
MIN(PaymentDate) FirstPaymentDate,
MAX(PaymentDate) LastPaymentDate,
COUNT(*) TotalPayments
FROM StudentPayments
GROUP BY StudentID;
The preceding query will return 5 columns, with “StudentID” being the only one that exists in the table. The remaining columns have been aggregated/manipulated and are logical.
Hope you find this article helpful.
Please subscribe for more interesting updates.
One comment