Apache Impala – Limitations

Apache Impala is an open source massively parallel SQL query engine for data stored in a Hadoop-based cluster. Impala delivers Hadoop scalable parallel database technology, allowing users to run low-latency SQL queries on data stored in HDFS and Apache HBase without having to relocate or transform the data.

There are a lot of articles written in this blog on Apache Impala and in this post, I am trying to gather information about the limitations in it. I believe these limitations will most likely be addressed in future editions.

  • No support of Indexes. Impala does not use HIVE’s indexing capabilities because they are limited. Impala is often oblivious of the data that appears in HDFS files because it is not a monolithic DBMS. As a result, the index will not be able to keep up with the base data.
  • No concurrency control method is supported by Impala. In the case of concurrent inserts into the same database, the transactional nature of the HiveMetaStore (HMS), which gets updates on inserts, generates an exception.
  • Need of invalidate/refresh the metadata.
  • No support for triggers.
  • Logging is not supported.
  • No support for SerDe.

Please feel free to leave a comment and subscribe for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s