For detailed information about ORC file format, click here.
ORC provides the best Hive performance overall. In addition, to specifying the storage format, you can also specify a compression algorithms such as Zlib, Snappy, etc.
Below are the examples:
CREATE TABLE Customers (
CustomerID INT,
CustomerName STRING,
Street STRING,
City STRING,
State STRING,
Zip INT
) STORED AS ORC TBLPROPERTIES (“orc.compress”=”Zlib”);
Zlib is quicker than SNAPPY to read, smaller than SNAPPY on disk, but a bit slower than SNAPPY to write.
CREATE TABLE Customers (
CustomerID INT,
CustomerName STRING,
Street STRING,
City STRING,
State STRING,
Zip INT
) STORED AS ORC TBLPROPERTIES (“orc.compress”=”SNAPPY”);
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.
Please do click on follow button for more interesting updates.
2 comments