Apache Impala – Sub Queries

A subquery is a query that is included within another query statement. Subqueries, often known as sub-SELECTs or nested SELECTs, are a type of query. In this article, we’ll look at what we can do with Sub-queries in Apache Impala, as well as their constraints.


  • A scalar subquery generates a result set with a single row and a single column, which is often generated by an aggregation function like MAX() or SUM (). In scalar contexts, such as arguments to comparison operators, this single result value can be substituted.
  • Uncorrelated subqueries don’t refer to any of the query’s tables in the outer block. Each entry from the outer query block is evaluated using the same value or collection of values produced by the subquery.
  • Correlated subqueries are also allowed, in which one or more values from the outer query block are compared to values in the subquery’s WHERE clause. Each row evaluated by the outer WHERE clause can have a distinct set of values applied to it.
  • Subqueries can be in FROM clause or it can be in the WHERE clause.


Impala’s initial support for nested subqueries covers the most typical scenarios. There are still some limitations:

  • In Impala 2.1.0 and higher, you can utilize subqueries in a query including UNION or UNION ALL, but you can’t currently create a union of two subqueries (for example, in the argument of an IN or EXISTS operator).
    The operators ANY and ALL cannot be used with subqueries that return scalar values. (Impala doesn’t have a SOME operator at the moment, but if it had, the same restriction would apply.)
  • Any subquery comparing data from the outer query block to another table must use at least one equality comparison, not primarily other sorts of comparisons such as less than, larger than, BETWEEN, or!=, for the EXISTS and NOT EXISTS clauses.
    The BETWEEN operator currently does not accept a scalar subquery as the first or second argument.
  • Within an OR conjunction, a subquery cannot be utilized. OR conjunctions can be used in expressions within a subquery, such as the WHERE clause; the restriction only applies to parts of the query “above” the subquery.
  • In numeric contexts, scalar subqueries are only permitted. You can’t compare a scalar subquery to a value of a non-numeric type like TIMESTAMP or BOOLEAN with the LIKE, REGEXP, or RLIKE operators, or use it as an input to the LIKE, REGEXP, or RLIKE operators
  • To generate the comparison value, the values to be compared against, or the return value, you cannot utilize subqueries with the CASE function.
  • In the filter condition of the HAVING clause, a subquery is not permitted. (A subquery can’t be used outside of the WITH, FROM, and WHERE clauses, strictly speaking.)
  • You must use a fully qualified name (table_name.column_name or database_name.table_name.column_name) when referring to any column from the outer query block within a subquery.
  • The TABLESAMPLE clause of the SELECT statement does not apply to a table reference derived from a view, a subquery, or anything other than a real base table. This clause only works for tables backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or HBase tables.

Hope you find this article helpful.

Please subscribe for more interesting updates.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s