Optimized Row Columnar (ORC) File Format

The Optimized Row Columnar (ORC) file format is the most powerful way for improved performance and storage saving, of all file formats. It provides the most efficient compression that cause smaller disk reads. Also, the columnar format is also ideal for vectorization optimizations in Tez.

As specified in the documentation, the ORC file format for data storage is recommended for the following reasons:

Efficient compression:
Stored as columns and compressed, which leads to smaller disk reads. The columnar format is also ideal for vectorization optimizations in Tez.

• Fast reads:

ORC has a built-in index, min/max values, and other aggregates that cause entire stripes to be skipped during reads. In addition, predicate pushdown pushes filters into reads so that minimal rows are read. And Bloom filters further reduce the number of rows that are returned.

• Proven in large-scale deployments:

Facebook uses the ORC file format for a 300+ PB deployment.

Below is an example to define the ORC file format while creating the table.

CREATE TABLE Customers (
CustomerID INT,
CustomerName STRING,
Street STRING,
City STRING,
State STRING,
Zip INT
) STORED AS ORC;

Hope you like this post.

Stay in touch for more interesting updates.

9 comments

  1. I’m impressed, I have to admit. Rarely do I come across a blog that’s both educative and engaging, and without a doubt, you have hit the nail on the head. The problem is something which not enough folks are speaking intelligently about. I am very happy I came across this in my hunt for something regarding this.

    Like

    1. I spend a lot of time writing blog posts and frequently forget to express gratitude to my readers and followers. Your feedback is really valuable to me. Thanks a lot.

      Like

    1. I spend a lot of time writing blog posts and frequently forget to express gratitude to my readers and followers. Your feedback is really valuable to me. Thanks a lot.

      Like

    1. I spend a lot of time writing blog posts and frequently forget to express gratitude to my readers and followers. Your feedback is really valuable to me. Thanks a lot.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s