Apache Impala is an open source massively parallel SQL query engine for data stored in a Hadoop-based cluster. Impala delivers Hadoop scalable parallel database technology, allowing users to run low-latency SQL queries on data stored in HDFS and Apache HBase without having to relocate or transform the data.
There are a lot of articles written in this blog on Apache Impala and in this post, I am trying to gather information about the limitations in it. I believe these limitations will most likely be addressed in future editions.
- No support of Indexes. Impala does not use HIVE’s indexing capabilities because they are limited. Impala is often oblivious of the data that appears in HDFS files because it is not a monolithic DBMS. As a result, the index will not be able to keep up with the base data.
- No concurrency control method is supported by Impala. In the case of concurrent inserts into the same database, the transactional nature of the HiveMetaStore (HMS), which gets updates on inserts, generates an exception.
- Need of invalidate/refresh the metadata.
- No support for triggers.
- Logging is not supported.
- No support for SerDe.
Please feel free to leave a comment and subscribe for more interesting updates.