Apache Hive is built on top of Apache Hadoop, which is a distributed, fault-tolerant, and open source data warehouse platform for reading, writing, and handling massive datasets stored directly in HDFS or other data management structures such as Apache HBase. Hive is characterized by the ability to query massive datasets using Apache Tez or MapReduce.
This page combines all Hive-related subjects into a single article to make it more helpful and beneficial to learners. It is presented in a hierarchical manner, beginning with the basics and progressing to more advanced topics.
APACHE HIVE
HIVE DDL
- Complete DDL Commands in Hive
- Usage of ALTER in Hadoop Hive
- Creating Databases
- Showing the database name in Hive Prompt
- Altering Databases
- Dropping Databases
- Error while dropping the database in Hive
- Database Properties
- Database Comments
IMPORTING DATA
- Loading Data From RDBMS Into Hive Using Sqoop
- Loading Data From HDFS into Hive
- Loading Data From Local File System to Hive
TABLE
- DDL Statements in Hive
- DML Statements in Hive
- Internal Tables
- Internal Tables Using CTAS
- External Tables
- Change Location of Managed Table in Hive
- Internal Tables vs External Tables
- Internal Tables with External Data
- Table Properties
- Extended vs Formatted Table Properties
- Table Comments
- Table Comments Using Alter
- Column Comments
DATA TYPES
HIVE QUERIES – OPERATORS – CLAUSES
SYSTEM FUNCTIONS
- Collection Functions
- Type Conversion Functions
- Date Functions
- FROM_UNIXTIME(bigint unixtime[, string format])
- UNIX_TIMESTAMP()
- TO_DATE(string timestamp)
- YEAR(string date)
- QUARTER(date/timestamp/string)
- MONTH(string date)
- DAY(string date)
- DAYOFMONTH(date)
- HOUR(string date)
- MINUTE(string date)
- SECOND(string date)
- WEEKOFYEAR(string date)
- EXTRACT()
- DATEDIFF(string enddate, string startdate)
- DATE_ADD(date/timestamp/string startdate, tinyint/smallint/int days)
- DATE_SUB(date/timestamp/string startdate, tinyint/smallint/int days)
- CURRENT_DATE
- CURRENT_TIMESTAMP
- ADD_MONTHS(string start_date, int num_months, output_date_format)
- LAST_DAY(string date)
- NEXT_DAY(string start_date, string day_of_week)
- TRUNC(string date, string format)
- MONTHS_BETWEEN(date1, date2)
- DATE_FORMAT(date/timestamp/string ts, string fmt)
- Conditional Functions
- String Functions
- CONCAT(string|binary A, string|binary B…)
- CONCAT_WS(string SEP, string A, string B…)
- ELT(N int,str1 string,str2 string,str3 string,…)
- FORMAT_NUMBER(number x, int d)
- GET_JSON_OBJECT(string json_string, string path)
- INSTR(string str, string substr)
- LENGTH(string A)
- LOCATE(string substr, string str[, int pos])
- LOWER(string A) lcase(string A)
- LPAD(string str, int len, string pad)
- LTRIM(string A)
- PARSE_URL(string urlString, string partToExtract [, string keyToExtract])
- PRINTF(String format, Obj… args)
- QUOTE(String text)
- REGEXP_EXTRACT(string subject, string pattern, int index)
- REPEAT(string str, int n)
- REPLACE(string A, string OLD, string NEW)
- REVERSE(string A)
- RPAD(string str, int len, string pad)
- RTRIM(string A)
- SPACE(int n)
- SPLIT(string str, string pat)
- STR_TO_MAP(text[, delimiter1, delimiter2])
- SUBSTR
- SUBSTRING_INDEX(string A, string delim, int count)
- TRANSLATE(string|char|varchar input, string|char|varchar from, string|char|varchar to)
- TRIM(string A)
- UPPER(string A) ucase(string A)
- INITCAP(string A)
- Misc. Functions
- Aggregate Functions
- COUNT(*), COUNT(expr), COUNT(DISTINCT expr[, expr…])
- SUM(col), SUM(DISTINCT col)
- AVG(col), AVG(DISTINCT col)
- MIN(col)
- MAX(col)
- VARIANCE(col)
- VAR_POP(col)
- VAR_SAMP(col)
- STDDEV_POP(col)
- STDDEV_SAMP(col)
- COVAR_POP(col1, col2)
- COVAR_SAMP(col1, col2)
- CORR(col1, col2)
- PERCENTILE
- PERCENTILE_APPROX
- COLLECT_SET(col)
- COLLECT_LIST(col)
- NTILE(INTEGER x)
- Windowing Functions
- Built-in Table Generated Functions
- EXPLODE
- EXPLODE LATERAL VIEW
- INLINE
- INLINE LATERAL VIEW
- Aggregate Functions in Analytic Context
VIRTUAL COLUMNS
FILE FORMATS & COMPRESSION
- File Formats & Compression
- Text File Format
- Parquet
- Sequence File Format
- RC File Format
- ORC File Format
- Avro
SUB-QUERIES
JOINS
VIEWS
- Views
- Creating Table & View With SELECT statement
- Can we create a table based on a view?
- Altering Views
PARTITIONS
- Partitions – Introduction – Types
- SQL Server Partitions vs Hive Partitions
- Static Partition
- Dynamic Partition
- Adding Partition
- Renaming Partition
- Dropping Partition
- Overwriting Partitioned Data
- Truncate data from a partition
SerDe
- Serialization & Deserialization – Part-1
- Serialization & Deserialization – Part-2
- Serialization & Deserialization – Part-3
- Serialization & Deserialization – Part-4
- Serialization & Deserialization – Part-5
- Serialization & Deserialization – Part-6
- Serialization & Deserialization – Part-7
- Serialization & Deserialization – Part-8
- SerDe – RegExp_Extract – Part-9
- Serialization & Deserialization – Part-10
- SerDe – Common Error
COMPLEX DATA TYPES
- Introduction
- Array
- Map
- Struct
- Array, Map and Struct
- Struct in Array
- Struct in Array-2
- Array in Struct
- Array in Map
- Array in Struct-2
- Struct in Map
BUCKETING
SAMPLING
EXPORT DATA
NON-SQL STATEMENTS
METADATA & PROPERTIES
- Versions
- Reserved Keywords As Columns
- Get Hive Username & Password
- Get Connected User
- Connecting Hive with BEELINE
- Hive Metadata Queries
- Extended vs Formatted Table Properties – Hive
- Get The Tables List In an external File – Hive
- Generate Create Table Statement In Hive
- Search Databases and Tables in Apache Hive
- Terminate Query Execution in Apache Hive & Impala
- Apache Hive Table Create Date
- Starting Hosts and Roles in Cloudera VM
- Listing All The Functions In Apache Hive & Impala
- CDH Compatible Versions – Hive, Impala and HUE
- Hive – Extended Properties
- Hive Table Properties
- Clearing Screen – HDFS Terminal, Hive, Impala
- Moving a table from one Hive database to another
- Displaying the column names in Hive
- Showing the database name in Hive Prompt
- Locking in Hive
- Analyzing Table
BATCH PROCESSING
- Executing HiveQL Script from CLI in different modes
- Executing Hive Query From Terminal
- Working with variables in Apache Hive
- Executing SQL Script file in Hive
PERFORMANCE TUNING
IMPALA
- Batch Processing in Impala
- Display Built-in Functions in Impala
- Refreshing Metadata in Impala
- Impala – Reserved Keywords as Columns
CCA 159
ASSIGNMENTS
- Apache Hive – Quiz-1
- Apache Hive – Quiz-2
- Assignment-1
- Assignment-2
- Assignment-3
- Assignment-4
- Assignment-5
Hope you find this page helpful.
Please subscribe for more interesting updates.
2 comments