Analyzer
Known incompatibilities
In ClickHouse version 24.3
, the new query analyzer was enabled by default.
Despite fixing a large number of bugs and introducing new optimizations, it also introduces some breaking changes in ClickHouse behaviour. Please read the following changes to determine how to rewrite your queries for the new analyzer.
Invalid queries are no longer optimized
The previous query planning infrastructure applied AST-level optimizations before the query validation step. Optimizations could rewrite the initial query so it becomes valid and can be executed.
In the new analyzer, query validation takes place before the optimization step. This means that invalid queries that were possible to execute before are now unsupported. In such cases, the query must be fixed manually.
Example 1:
The following query uses column number
in the projection list when only toString(number)
is available after the aggregation.
In the old analyzer, GROUP BY toString(number)
was optimized into GROUP BY number,
making the query valid.
Example 2:
The same problem occurs in this query: column number
is used after aggregation with another key.
The previous query analyzer fixed this query by moving the number > 5
filter from the HAVING
clause to the WHERE
clause.
To fix the query, you should move all conditions that apply to non-aggregated columns to the WHERE
section to conform to standard SQL syntax:
CREATE VIEW with invalid query
The new analyzer always performs type-checking.
Previously, it was possible to create a VIEW
with an invalid SELECT
query. It would then fail during the first SELECT
or INSERT
(in the case of MATERIALIZED VIEW
).
Now, it's not possible to create such VIEW
s anymore.
Example:
Known incompatibilities of the JOIN
clause
Join using column from projection
Alias from the SELECT
list can not be used as a JOIN USING
key by default.
A new setting, analyzer_compatibility_join_using_top_level_identifier
, when enabled, alters the behavior of JOIN USING
to prefer to resolve identifiers based on expressions from the projection list of the SELECT
query, rather than using the columns from left table directly.
Example:
With analyzer_compatibility_join_using_top_level_identifier
set to true
, the join condition is interpreted as t1.a + 1 = t2.b
, matching the behavior of earlier versions. So, the result will be 2, 'two'
.
When the setting is false
, the join condition defaults to t1.b = t2.b
, and the query will return 2, 'one'
.
If b
is not present in t1
, the query will fail with an error.
Changes in behavior with JOIN USING
and ALIAS
/MATERIALIZED
columns
In the new analyzer, using *
in a JOIN USING
query that involves ALIAS
or MATERIALIZED
columns will include those columns in the result set by default.
Example:
In the new analyzer, the result of this query will include the payload
column along with id
from both tables. In contrast, the previous analyzer would only include these ALIAS
columns if specific settings (asterisk_include_alias_columns
or asterisk_include_materialized_columns
) were enabled, and the columns might appear in a different order.
To ensure consistent and expected results, especially when migrating old queries to the new analyzer, it is advisable to specify columns explicitly in the SELECT
clause rather than using *
.
Handling of Type Modifiers for columns in USING
Clause
In the new version of the analyzer, the rules for determining the common supertype for columns specified in the USING
clause have been standardized to produce more predictable outcomes, especially when dealing with type modifiers like LowCardinality
and Nullable
.
-
LowCardinality(T)
andT
: When a column of typeLowCardinality(T)
is joined with a column of typeT
, the resulting common supertype will beT
, effectively discarding theLowCardinality
modifier. -
Nullable(T)
andT
: When a column of typeNullable(T)
is joined with a column of typeT
, the resulting common supertype will beNullable(T)
, ensuring that the nullable property is preserved.
Example:
In this query, the common supertype for id
is determined as String
, discarding the LowCardinality
modifier from t1
.
Projection column names changes
During projection names computation, aliases are not substituted.
Incompatible function arguments types
In the new analyzer, type inference happens during initial query analysis.
This change means that type checks are done before short-circuit evaluation; thus, if
function arguments must always have a common supertype.
Example:
The following query fails with There is no supertype for types Array(UInt8), String because some of them are Array and some of them are not
:
Heterogeneous clusters
The new analyzer significantly changed the communication protocol between servers in the cluster. Thus, it's impossible to run distributed queries on servers with different enable_analyzer
setting values.
Mutations are interpreted by previous analyzer
Mutations are still using the old analyzer.
This means some new ClickHouse SQL features can't be used in mutations. For example, the QUALIFY
clause.
Status can be checked here.
Unsupported features
The list of features new analyzer currently doesn't support:
- Annoy index.
- Hypothesis index. Work in progress here.
- Window view is not supported. There are no plans to support it in the future.