Apache Hive is built on top of Apache Hadoop, which is a distributed, fault-tolerant, and open source data warehouse platform for reading, writing, and handling massive datasets stored directly in HDFS or other data management structures such as Apache HBase. Hive is characterized by the ability to query massive datasets using Apache Tez or MapReduce.
Hive was designed to allow non-programmers who are familiar with SQL to work with petabytes of data using a SQL-like interface known as HiveQL.
How it works:
Hive turns HiveQL queries into MapReduce or Tez jobs that run on Yet Another Resource Negotiator (YARN), Apache Hadoop’s distributed job scheduling framework. It queries data stored in a distributed database system such as Hadoop Distributed File System (HDFS) or Amazon Easy Storage Service (S3).
Hive stores its database and table metadata in a metastore, which is a database or file backed store that enables easy data abstraction and exploration.
Traditional RDBMS vs Apache Hive:
Traditional relational databases are designed for dynamic queries on small to medium datasets, and they fail to accommodate or process massive datasets. Hive, on the other hand, operates efficiently through a broad distributed database by using batch processing.
Apache Hive allows massive-scale analytics. It generates a central source of data that can be quickly accessed and enable data-driven decisions.
Almost all of the features are defined in depth on this site, with various examples and datasets. Hope it helps you.
Please do follow this site for more interesting updates.