Skip to content

[SPARK-31561][SQL] Add QUALIFY Clause#55401

Open
viirya wants to merge 4 commits intoapache:masterfrom
viirya:qualify-clause
Open

[SPARK-31561][SQL] Add QUALIFY Clause#55401
viirya wants to merge 4 commits intoapache:masterfrom
viirya:qualify-clause

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Apr 17, 2026

What changes were proposed in this pull request?

Add QUALIFY clause to Spark SQL using UnresolvedQualify LogicalPlan node and a self-contained ResolveQualify rule in Analyzer, with structured error conditions.

PR #55019 models QUALIFY as a marker expression (QualifyExpression) wrapped inside a Filter. The design forces QUALIFY handling logic to be scattered across four Analyzer rules. This PR models QUALIFY as a LogicalPlan node (UnresolvedQualify), resolved by a single self-contained ResolveQualify rule. ResolveQualify completes all work in one pass once the child plan is resolved.

Why are the changes needed?

QUALIFY is supported by several popular SQL engines including Snowflake, Databricks SQL etc, and users expect it when porting SQL that filters on window-function results. Without it, equivalent Spark queries need an extra subquery or CTE just to filter on a window alias.

This change closes that gap and makes Spark SQL more compatible with existing SQL workloads while preserving clear analyzer rules around window and aggregate semantics.

Does this PR introduce any user-facing change?

Yes. Spark SQL can now parse and analyze queries that use QUALIFY, for example:

SELECT a, ROW_NUMBER() OVER (ORDER BY b) AS rn
FROM t
QUALIFY rn = 1

This PR also introduces user-visible analysis errors for invalid QUALIFY usage, such as using aggregate functions directly in the QUALIFY predicate.

How was this patch tested?

Unit tests and e2e tests

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

viirya and others added 4 commits April 18, 2026 13:40
Add QUALIFY clause to Spark SQL using UnresolvedQualify LogicalPlan node
and a self-contained ResolveQualify rule in Analyzer, with structured
error conditions.

Co-authored-by: Claude Code
Co-authored-by: Chao Sun <chao@openai.com>
…dation, and strict error handling to ResolveQualify

- Add resolveConditionSubqueries to handle correlated subqueries in QUALIFY
  conditions, using the same fake-Project pattern as HAVING resolution.
- Validate that resolved attributes in the Aggregate case are present in
  grouping expressions or aggregate output, rejecting invalid references.
- Change the catch-all case in resolveQualifyCondition to throw
  SparkException.internalError instead of silently returning.

Co-authored-by: Claude Code
Co-authored-by: Chao Sun <chao@openai.com>
…on and aggregate validation

- Add test for correlated subquery in QUALIFY condition (EXISTS).
- Add test that non-grouping column references with GROUP BY are rejected.

Co-authored-by: Claude Code
Co-authored-by: Chao Sun <chao@openai.com>
…r test for QUALIFY clause

- Generate qualify.sql.out and analyzer-results/qualify.sql.out via
  SPARK_GENERATE_GOLDEN_FILES=1.
- Fix SparkSqlParserSuite QUALIFY test to assert node types instead of
  full plan tree comparison.
- Fix scalastyle issues: non-ASCII em-dash, import line length, unused import.

Co-authored-by: Claude Code
Co-authored-by: Chao Sun <chao@openai.com>
@viirya viirya requested a review from sunchao April 18, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant