Compression Algorithms For ORC File Format

For detailed information about ORC file format, click here.

ORC provides the best Hive performance overall. In addition, to specifying the storage format, you can also specify a compression algorithms such as Zlib, Snappy, etc. 

Below are the examples:

CREATE TABLE Customers (
CustomerID INT,
CustomerName STRING,
Street STRING,
City STRING,
State STRING,
Zip INT
) STORED AS ORC TBLPROPERTIES (“orc.compress”=”Zlib”);

Zlib is quicker than SNAPPY to read, smaller than SNAPPY on disk, but a bit slower than SNAPPY to write.

CREATE TABLE Customers (
CustomerID INT,
CustomerName STRING,
Street STRING,
City STRING,
State STRING,
Zip INT
) STORED AS ORC TBLPROPERTIES (“orc.compress”=”SNAPPY”);

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.

Please do click on follow button for more interesting updates.

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s