Big Data: Apache Hive & Impala Data Types Quick Reference

This article offers an overview of the various data types that are available both in Apache Hive & Impala. 

impala_hive_datatypes


TINYINT
– 1 byte 
Range: -128 to 127
 
SMALLINT – 2 bytes 
Range: -32,768 to 32,767
 
INT – 4-bytes
Range: -2,147,483,648 to 2,147,483,647
 
BigInt – 8 bytes value
Range: -9223372036854775808 .. 9223372036854775807.
 
FLOAT  – 4 bytes
single precision floating point number

DOUBLE – 8-byte
double precision floating point number

DECIMAL
Hive 0.13.0 introduced user definable precision and scale

STRING 
The hard limit on the size of a STRING and the total size of a row is 2 GB.
The limit is 1 GB on STRING when writing to Parquet files.
 
TIMESTAMP

Timestamps were introduced in Hive 0.8.0. It supports traditional UNIX timestamp with the optional nanosecond precision.

The supported Timestamps format is yyyy-mm-dd hh:mm:ss[.f…].

Complex types:
Complex types (also referred to as nested types) in Hive let you represent multiple data values within a single row/column position. Impala supports the complex types ARRAY, MAP, and STRUCT in Impala 2.3 and higher. 
 
Arrays: Array
     Collection of Similar Data
Maps: Map
     Key Value Combination
Structs: Struct
    Collection of Different Data
 

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s