[SPARK-31561][SQL] Add QUALIFY Clause#55401
Open
viirya wants to merge 4 commits intoapache:masterfrom
Open
Conversation
Add QUALIFY clause to Spark SQL using UnresolvedQualify LogicalPlan node and a self-contained ResolveQualify rule in Analyzer, with structured error conditions. Co-authored-by: Claude Code Co-authored-by: Chao Sun <chao@openai.com>
…dation, and strict error handling to ResolveQualify - Add resolveConditionSubqueries to handle correlated subqueries in QUALIFY conditions, using the same fake-Project pattern as HAVING resolution. - Validate that resolved attributes in the Aggregate case are present in grouping expressions or aggregate output, rejecting invalid references. - Change the catch-all case in resolveQualifyCondition to throw SparkException.internalError instead of silently returning. Co-authored-by: Claude Code Co-authored-by: Chao Sun <chao@openai.com>
…on and aggregate validation - Add test for correlated subquery in QUALIFY condition (EXISTS). - Add test that non-grouping column references with GROUP BY are rejected. Co-authored-by: Claude Code Co-authored-by: Chao Sun <chao@openai.com>
…r test for QUALIFY clause - Generate qualify.sql.out and analyzer-results/qualify.sql.out via SPARK_GENERATE_GOLDEN_FILES=1. - Fix SparkSqlParserSuite QUALIFY test to assert node types instead of full plan tree comparison. - Fix scalastyle issues: non-ASCII em-dash, import line length, unused import. Co-authored-by: Claude Code Co-authored-by: Chao Sun <chao@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add QUALIFY clause to Spark SQL using UnresolvedQualify LogicalPlan node and a self-contained ResolveQualify rule in Analyzer, with structured error conditions.
PR #55019 models QUALIFY as a marker expression (QualifyExpression) wrapped inside a Filter. The design forces QUALIFY handling logic to be scattered across four Analyzer rules. This PR models QUALIFY as a LogicalPlan node (UnresolvedQualify), resolved by a single self-contained ResolveQualify rule. ResolveQualify completes all work in one pass once the child plan is resolved.
Why are the changes needed?
QUALIFY is supported by several popular SQL engines including Snowflake, Databricks SQL etc, and users expect it when porting SQL that filters on window-function results. Without it, equivalent Spark queries need an extra subquery or CTE just to filter on a window alias.
This change closes that gap and makes Spark SQL more compatible with existing SQL workloads while preserving clear analyzer rules around window and aggregate semantics.
Does this PR introduce any user-facing change?
Yes. Spark SQL can now parse and analyze queries that use QUALIFY, for example:
This PR also introduces user-visible analysis errors for invalid QUALIFY usage, such as using aggregate functions directly in the QUALIFY predicate.
How was this patch tested?
Unit tests and e2e tests
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code