Apache Impala – Advantages & Limitations

Impala has a number of features that make it simple and trustworthy to use. Let’s have a look at a few of them.

Advantages/Benefits:

Data scientists and analysts are already familiar with its SQL interface.
In Apache Hadoop, you may query large amounts of data (“big data”).
Distributed queries in a cluster environment, allowing for easy scaling and the utilization of low-cost commodity hardware.
Ability to transfer data files between components without having to copy or export/import them; for example, writing with Pig, transforming with Hive, and querying with Impala.
Impala can read and write to Hive tables, allowing for easy data transfer while performing analytics on Hive data.
Customers can avoid costly modelling and ETL solely for analytics by using a single solution for large data processing and analytics.
For SQL Queries and data processing, it is significantly faster.
It can use HDFS and HBASE as its storage systems.
Since the data is stored in memory, query optimization is quick and simple.
It is secure since it uses Kerberos authentication.
It has a number of APIs that assist it in connecting to data sources. Many data visualization engines, such as TABLEAU, make it simple to connect.
Impala comes with a number of built-in functions that we may use to get the results we want.

There are a lot of articles written in this blog on Apache Impala and in this post, I am trying to gather information about its limitations in it. I believe these limitations will most likely be addressed in future editions.

Limitations:

No support of Indexes. Impala does not use HIVE’s indexing capabilities because they are limited. Impala is often oblivious of the data that appears in HDFS files because it is not a monolithic DBMS. As a result, the index will not be able to keep up with the base data.
No concurrency control method is supported by Impala. In the case of concurrent inserts into the same database, the transactional nature of the HiveMetaStore (HMS), which gets updates on inserts, generates an exception.
Need of invalidate/refresh the metadata.
No support for triggers.
Logging is not supported.
No support for SerDe.

Hope you find this article helpful.

Please feel free to leave a message and subscribe for more interesting updates.