Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: sszz01/datafusion-python
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: apache/datafusion-python
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 6 commits
  • 37 files changed
  • 2 contributors

Commits on Apr 23, 2026

  1. Add SKILL.md and enrich package docstring (apache#1497)

    * Add AGENTS.md and enrich __init__.py module docstring
    
    Add python/datafusion/AGENTS.md as a comprehensive DataFrame API guide
    for AI agents and users. It ships with pip automatically (Maturin includes
    everything under python-source = "python"). Covers core abstractions,
    import conventions, data loading, all DataFrame operations, expression
    building, a SQL-to-DataFrame reference table, common pitfalls, idiomatic
    patterns, and a categorized function index.
    
    Enrich the __init__.py module docstring from 2 lines to a full overview
    with core abstractions, a quick-start example, and a pointer to AGENTS.md.
    
    Closes apache#1394 (PR 1a)
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Clarify audience of root vs package AGENTS.md
    
    The root AGENTS.md (symlinked as CLAUDE.md) is for contributors working
    on the project. Add a pointer to python/datafusion/AGENTS.md which is
    the user-facing DataFrame API guide shipped with the package. Also add
    the Apache license header to the package AGENTS.md.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Add PR template and pre-commit check guidance to AGENTS.md
    
    Document that all PRs must follow .github/pull_request_template.md and
    that pre-commit hooks must pass before committing. List all configured
    hooks (actionlint, ruff, ruff-format, cargo fmt, cargo clippy, codespell,
    uv-lock) and the command to run them manually.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Remove duplicated hook list from AGENTS.md
    
    Let the hooks be discoverable from .pre-commit-config.yaml rather than
    maintaining a separate list that can drift.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Fix AGENTS.md: Arrow C Data Interface, aggregate filter, fluent example
    
    - Clarify that DataFusion works with any Arrow C Data Interface
      implementation, not just PyArrow.
    - Show the filter keyword argument on aggregate functions (the idiomatic
      HAVING equivalent) instead of the post-aggregate .filter() pattern.
    - Update the SQL reference table to show FILTER (WHERE ...) syntax.
    - Remove the now-incorrect "Aggregate then filter for HAVING" pitfall.
    - Add .collect() to the fluent chaining example so the result is clearly
      materialized.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Update agents file after working through the first tpc-h query using only the text description
    
    * Add feedback from working through each of the TPC-H queries
    
    * Address Copilot review feedback on AGENTS.md
    
    - Wrap CASE/WHEN method-chain examples in parentheses and assign to a
      variable so they are valid Python as shown (Copilot #1, #2).
    - Fix INTERSECT/EXCEPT mapping: the default distinct=False corresponds to
      INTERSECT ALL / EXCEPT ALL, not the distinct forms. Updated both the
      Set Operations section and the SQL reference table to show both the
      ALL and distinct variants (Copilot #4).
    - Change write_parquet / write_csv / write_json examples to file-style
      paths (output.parquet, etc.) to match the convention used in existing
      tests and examples. Note that a directory path is also valid for
      partitioned output (Copilot #5).
    
    Verified INTERSECT/EXCEPT semantics with a script:
      df1.intersect(df2)                -> [1, 1, 2]  (= INTERSECT ALL)
      df1.intersect(df2, distinct=True) -> [1, 2]     (= INTERSECT)
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Use short-form comparisons in AGENTS.md examples
    
    Drop lit() on the RHS of comparison operators since Expr auto-wraps raw
    Python values, matching the style the guide recommends (Copilot #3, #6).
    
    Updates examples in the Aggregation, CASE/WHEN, SQL reference table,
    Common Pitfalls, Fluent Chaining, and Variables-as-CTEs sections, plus
    the __init__.py quick-start snippet. Prose explanations of the rule
    (which cite the long form as the thing to avoid) are left unchanged.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * Move user guide from python/datafusion/AGENTS.md to SKILL.md
    
    The in-wheel AGENTS.md was not a real distribution channel -- no shipping
    agent walks site-packages for AGENTS.md files. Moving to SKILL.md at the
    repo root, with YAML frontmatter, lets the skill ecosystems (npx skills,
    Claude Code plugin marketplaces, community aggregators) discover it.
    
    Update the pointers in the contributor AGENTS.md and the __init__.py
    module docstring accordingly. The docstring now references the GitHub
    URL since the file no longer ships with the wheel.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * Address review feedback: doctest, streaming, date/timestamp
    
    - Convert the __init__.py quick-start block to doctest format so it is
      picked up by `pytest --doctest-modules` (already the project default),
      preventing silent rot.
    - Extract streaming into its own SKILL.md subsection with guidance on
      when to prefer execute_stream() over collect(), sync and async
      iteration, and execute_stream_partitioned() for per-partition streams.
    - Generalize the date-arithmetic rule from Date32 to both Date32 and
      Date64 (both reject Duration at any precision, both accept
      month_day_nano_interval), and note that Timestamp columns differ and
      do accept Duration.
    - Document the PyArrow-inherited type mapping returned by
      to_pydict()/to_pylist(), including the nanosecond fallback to
      pandas.Timestamp / pandas.Timedelta and the to_pandas() footgun where
      date columns come back as an object dtype.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * Distinguish user guide from agent reference in module docstring
    
    The docstring pointed readers at SKILL.md as a "comprehensive guide," but
    SKILL.md is written in a dense, skill-oriented format for agents — humans
    are better served by the online user guide. Put the online docs first as
    the primary reference and label the SKILL.md link as the agent reference.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    timsaucer and claude authored Apr 23, 2026
    Configuration menu
    Copy the full SHA
    4030997 View commit details
    Browse the repository at this point in the history
  2. Skills require the header to be the first thing in the file which con…

    …flicts with the RAT check. Make an exception for this file. (apache#1501)
    timsaucer authored Apr 23, 2026
    Configuration menu
    Copy the full SHA
    8a5d783 View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2026

  1. docs: enrich module docstrings and add doctest examples (apache#1498)

    * Enrich module docstrings and add doctest examples
    
    Expands the module docstrings for `functions.py`, `dataframe.py`,
    `expr.py`, and `context.py` so each module opens with a concept summary,
    cross-references to related APIs, and a small executable example.
    
    Adds doctest examples to the high-traffic `DataFrame` methods that
    previously lacked them: `select`, `aggregate`, `sort`, `limit`, `join`,
    and `union`. Optional parameters are demonstrated with keyword syntax,
    and examples reuse the same input data across variants so the effect of
    each option is easy to see.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * Use distinct group sums in aggregate docstring example
    
    Change the score data from [1, 2, 3] to [1, 2, 5] so the grouped
    result produces [3, 5] instead of [3, 3], removing ambiguity about
    which total belongs to which team.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * Align module-docstring examples with SKILL.md idioms
    
    Drop the redundant lit() in the dataframe.py module-docstring filter
    example and use a plain string group key in the aggregate() doctest, so
    both examples model the style SKILL.md recommends. Also document the
    sort("a") string form and sort_by() shortcut in SKILL.md's sorting
    section.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    timsaucer and claude authored Apr 24, 2026
    Configuration menu
    Copy the full SHA
    8741d30 View commit details
    Browse the repository at this point in the history
  2. docs: add README section for AI coding assistants (apache#1503)

    Points users to the repo-root SKILL.md via the npx skills registry or a
    manual AGENTS.md / CLAUDE.md pointer. Implements PR 1c of the plan in apache#1394.
    
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    timsaucer and claude authored Apr 24, 2026
    Configuration menu
    Copy the full SHA
    c8bb9f7 View commit details
    Browse the repository at this point in the history
  3. tpch examples: rewrite queries idiomatically and embed reference SQL (a…

    …pache#1504)
    
    * tpch examples: add reference SQL to each query, fix Q20
    
    - Append the canonical TPC-H reference SQL (from benchmarks/tpch/queries/)
      to each q01..q22 module docstring so readers can compare the DataFrame
      translation against the SQL at a glance.
    - Fix Q20: `df = df.filter(col("ps_availqty") > lit(0.5) * col("total_sold"))`
      was missing the assignment so the filter was dropped from the pipeline.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * tpch examples: rewrite non-idiomatic queries in idiomatic DataFrame form
    
    Rewrite the seven TPC-H example queries that did not demonstrate the
    idiomatic DataFrame pattern. The remaining queries (Q02/Q11/Q15/Q17/Q22,
    which use window functions in place of correlated subqueries) already are
    idiomatic and are left unchanged.
    
    - Q04: replace `.aggregate([col("l_orderkey")], [])` with
      `.select("l_orderkey").distinct()`, which is the natural way to express
      "reduce to one row per order" on a DataFrame.
    - Q07: remove the CASE-as-filter on `n_name` and use
      `F.in_list(col("n_name"), [nation_1, nation_2])` instead. Drops a
      comment block that admitted the filter form was simpler.
    - Q08: rewrite the switched CASE `F.case(...).when(lit(False), ...)` as a
      searched `F.when(col(...).is_not_null(), ...).otherwise(...)`. That
      mirrors the reference SQL's `case when ... then ... else 0 end` shape.
    - Q12: replace `array_position(make_array(...), col)` with
      `F.in_list(col("l_shipmode"), [...])`. Same semantics, without routing
      through array construction / array search.
    - Q19: remove the pyarrow UDF that re-implemented a disjunctive predicate
      in Python. Build the same predicate in DataFusion by OR-combining one
      `in_list` + range-filter expression per brand. Keeps the per-brand
      constants in the existing `items_of_interest` dict.
    - Q20: use `F.starts_with` instead of an explicit substring slice. Replace
      the inner-join + `select(...).distinct()` tail with a semi join against
      a precomputed set of excess-quantity suppliers so the supplier columns
      are preserved without deduplication after the fact.
    - Q21: replace the `array_agg` / `array_length` / `array_element` pipeline
      with two semi joins. One semi join keeps orders with more than one
      distinct supplier (stand-in for the reference SQL's `exists` subquery),
      the other keeps orders with exactly one late supplier (stand-in for the
      `not exists` subquery).
    
    All 22 answer-file comparisons and 22 plan-comparison diagnostics still
    pass (`pytest examples/tpch/_tests.py`: 44 passed).
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * tpch examples: align reference SQL constants with DataFrame queries
    
    The reference SQL embedded in each q01..q22 module docstring was carried
    over verbatim from ``benchmarks/tpch/queries/`` and uses a different set
    of TPC-H substitution parameters than the DataFrame examples
    (answer-file-validated at scale factor 1). Update each reference SQL to
    use the substitution parameters the DataFrame uses, so both expressions
    describe the same query and would produce the same results against the
    same data.
    
    Constants aligned:
    
    - Q01: ``90 days`` cutoff (DataFrame ``DAYS_BEFORE_FINAL = 90``).
    - Q02: ``p_size = 15``, ``p_type like '%BRASS'``, ``r_name = 'EUROPE'``.
    - Q04: base date ``1993-07-01`` (``3 month`` interval preserved per the
      "quarter of a year" wording).
    - Q05: ``r_name = 'ASIA'``.
    - Q06: ``l_discount between 0.06 - 0.01 and 0.06 + 0.01``.
    - Q07: nations ``'FRANCE'`` / ``'GERMANY'``.
    - Q08: ``r_name = 'AMERICA'``, ``p_type = 'ECONOMY ANODIZED STEEL'``,
      inner-case ``nation = 'BRAZIL'``.
    - Q09: ``p_name like '%green%'``.
    - Q10: base date ``1993-10-01`` (``3 month`` interval preserved).
    - Q11: ``n_name = 'GERMANY'``.
    - Q12: ship modes ``('MAIL', 'SHIP')``, base date ``1994-01-01``.
    - Q13: ``o_comment not like '%special%requests%'``.
    - Q14: base date ``1995-09-01``.
    - Q15: base date ``1996-01-01``.
    - Q16: ``p_brand <> 'Brand#45'``, ``p_type not like 'MEDIUM POLISHED%'``,
      sizes ``(49, 14, 23, 45, 19, 3, 36, 9)``.
    - Q17: ``p_brand = 'Brand#23'``, ``p_container = 'MED BOX'``.
    - Q18: ``sum(l_quantity) > 300``.
    - Q19: brands ``Brand#12`` / ``Brand#23`` / ``Brand#34`` with the matching
      minimum quantities (1, 10, 20).
    - Q20: ``p_name like 'forest%'``, base date ``1994-01-01``,
      ``n_name = 'CANADA'``.
    - Q21: ``n_name = 'SAUDI ARABIA'``.
    - Q22: country codes ``('13', '31', '23', '29', '30', '18', '17')``.
    
    Interval units (month / year) are preserved where the problem-statement
    text reads "given quarter", "given year", "given month". Q01 keeps the
    literal "days" unit because the TPC-H problem statement itself describes
    the cutoff in days.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * tpch examples: apply SKILL.md idioms across all 22 queries
    
    Sweep every q01..q22 example for idiomatic DataFrame style as described in
    the repo-root SKILL.md:
    
    - ``col("x") == "s"`` in place of ``col("x") == lit("s")`` on comparison
      right-hand sides (auto-wrap applies).
    - Plain-name strings in ``select``/``aggregate``/``sort`` group/sort key
      lists when the key is a bare column.
    - Drop redundant ``how="inner"`` and single-element ``left_on``/``right_on``
      list wrapping on equi-joins.
    - Collapse chained ``.filter(a).filter(b)`` runs into ``.filter(a, b)``
      and chained ``.with_column`` runs into ``.with_columns(a=..., b=...)``.
    - ``df.sort_by(...)`` or plain-name ``df.sort(...)`` when no null-placement
      override is needed.
    - ``F.count_star()`` in place of ``F.count(col("x"))`` whenever the SQL
      reads ``count(*)``.
    - ``F.starts_with(col, lit(prefix))`` and ``~F.starts_with(...)`` in place
      of substring-prefix equality/inequality tricks.
    - ``F.in_list(col, [lit(...)])`` in place of ``~F.array_position(...).
      is_null()`` and in place of disjunctions of equality comparisons.
    - Searched ``F.when(cond, x).otherwise(y)`` in place of switched
      ``F.case(bool_expr).when(lit(True/False), x).end()`` forms.
    - Semi-joins as the DataFrame form of ``EXISTS`` (Q04); anti-joins as
      ``NOT EXISTS`` (Q22 was already using this idiom).
    - Whole-frame window aggregates as the DataFrame stand-in for a SQL
      scalar subquery (Q11/Q15/Q17/Q22).
    
    Individual query fixes of note:
    
    - Q16 — add the secondary sort keys (``p_brand``, ``p_type``, ``p_size``)
      that the TPC-H spec requires but the original DataFrame omitted.
    - Q22 — drop a stray ``df.show()`` mid-pipeline; replace the 0-based
      substring slice with ``F.left(col("c_phone"), lit(2))``.
    - Q14 — rewrite the promo/non-promo factor split as a searched CASE inside
      ``F.sum(...)`` so the DataFrame expression matches the reference SQL
      shape exactly.
    
    All 22 answer-file comparisons still pass at scale factor 1.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    * tpch examples: more idiomatic aggregate FILTER, string funcs, date handling
    
    Additional sweep of the TPC-H DataFrame examples informed by comparing
    against a fresh set of SKILL.md-only generations under
    ``examples/tpch/agentic_queries/``:
    
    - Q02: ``F.ends_with(col("p_type"), lit(TYPE_OF_INTEREST))`` in place of
      ``F.strpos(col, lit) > 0``. The reference SQL is ``p_type like '%BRASS'``,
      which is an ends_with check, not contains. ``F.strpos > 0`` returned the
      correct rows on TPC-H data by coincidence but is semantically wrong.
    - Q09: ``F.contains(col("p_name"), lit(part_color))`` in place of
      ``F.strpos(col, lit) > 0``. The SQL is ``p_name like '%green%'``.
    - Q08, Q12, Q14: use the ``filter`` keyword on ``F.sum`` / ``F.count`` —
      the DataFrame form of SQL ``sum(...) FILTER (WHERE ...)`` — instead of
      wrapping the aggregate input in ``F.when(cond, x).otherwise(0)``. Q08
      also reorganises to inner-join the supplier's nation onto the regional
      sales, which removes the previous left-join + ``F.when(is_not_null, ...)``
      dance.
    - Q15: compute the grand maximum revenue as a separate scalar aggregate
      and ``join_on(...)`` on equality, instead of the whole-frame window
      ``F.max`` + filter shape. Simpler plan, same result.
    - Q16: ``F.regexp_like(col, pattern)`` in place of
      ``F.regexp_match(col, pattern).is_not_null()``.
    - Q04, Q05, Q06, Q07, Q08, Q10, Q12, Q14, Q15, Q20: store both the start
      and the end of the date window as plain ``datetime.date`` objects and
      compare with ``lit(end_date)``, instead of carrying the start date +
      ``pa.month_day_nano_interval`` and adding them at query-build time.
      Drops unused ``pyarrow`` imports from the files that no longer need
      Arrow scalars.
    
    All 22 answer-file comparisons still pass at scale factor 1.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    timsaucer and claude authored Apr 24, 2026
    Configuration menu
    Copy the full SHA
    0357716 View commit details
    Browse the repository at this point in the history
  4. feat: add AI skill to find and improve the Pythonic interface to func…

    …tions (apache#1484)
    
    * feat: accept native Python types in function arguments instead of requiring lit()
    
    Update 47 functions in functions.py to accept native Python types (int, float,
    str) for arguments that are contextually literals, eliminating verbose lit()
    wrapping. For example, users can now write split_part(col("a"), ",", 2) instead
    of split_part(col("a"), lit(","), lit(2)). All changes are backward compatible.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * fix: update alias function signatures to match pythonic primary functions
    
    Update instr and position (aliases of strpos) to accept Expr | str for
    the substring parameter, matching the updated primary function signature.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * docs: update make-pythonic skill to require alias type hint updates
    
    Alias functions that delegate to a primary function must have their type
    hints updated to match, even though coercion logic is only added to the
    primary. Added a new Step 3 to the implementation workflow for this.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * fix: address review feedback on pythonic skill and function signatures
    
    Update SKILL.md to prevent three classes of issues: clarify that float
    already accepts int per PEP 484 (avoiding redundant int | float that
    fails ruff PYI041), add backward-compat rule for Category B so existing
    Expr params aren't removed, and add guidance for inline coercion with
    many optional nullable params instead of local helpers.
    
    Replace regexp_instr's _to_raw() helper with inline coercion matching
    the pattern used throughout the rest of the file.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * refactor: add coerce_to_expr helpers and replace inline coercion patterns
    
    Introduce coerce_to_expr() and coerce_to_expr_or_none() in expr.py as the
    complement to ensure_expr() — where ensure_expr rejects non-Expr values,
    these helpers wrap them via Expr.literal(). Replaces ~60 inline isinstance
    checks in functions.py with single-line helper calls, and updates the
    make-pythonic skill to document the new pattern.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * docs: add aggregate function literal detection to make-pythonic skill
    
    Add Technique 1a to detect literal-only arguments in aggregate functions.
    Unlike scalar UDFs which enforce literals in invoke_with_args(), aggregate
    functions enforce them in accumulator() via get_scalar_value(),
    validate_percentile_expr(), or downcast_ref::<Literal>(). Without this
    technique, the skill would incorrectly classify arguments like
    approx_percentile_cont's percentile as Category A (Expr | float) when they
    should be Category B (float only). Updates the decision flow to branch on
    scalar vs aggregate before checking for literal enforcement.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * docs: add window function literal detection to make-pythonic skill
    
    Add Technique 1b to detect literal-only arguments in window functions.
    Window functions enforce literals in partition_evaluator() via
    get_scalar_value_from_args() / downcast_ref::<Literal>(), not in
    invoke_with_args() (scalar) or accumulator() (aggregate). Updates the
    decision flow to branch on scalar vs aggregate vs window.
    
    Known window functions with literal-only arguments: ntile (n), lead/lag
    (offset, default_value), nth_value (n).
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * fix: use explicit None checks, widen numeric type hints, and add tests
    
    Replace 7 fragile truthiness checks (x.expr if x else None) with
    explicit is not None checks to prevent silent None when zero-valued
    literals are passed. Widen log/power/pow type hints to Expr | int | float
    with noqa: PYI041 for clarity. Add unit tests for coerce_to_expr helpers
    and integration tests for pythonic calling conventions.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * chore: suppress FBT003 in tests and remove redundant noqa comments
    
    Add FBT003 (boolean positional value) to the per-file-ignores for
    python/tests/* in pyproject.toml, and remove the 6 now-redundant
    inline noqa: FBT003 comments across test_expr.py and test_context.py.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * docs: replace static function lists with discovery instructions in skill
    
    Replace hardcoded "Known aggregate/window functions with literal-only
    arguments" lists with instructions to discover them dynamically by
    searching the upstream crate source. Keeps a few examples as validation
    anchors so the agent knows its search is working correctly.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    * fix: make interrupt test reliable on Python 3.11
    
    PyThreadState_SetAsyncExc only delivers exceptions when the thread is
    executing Python bytecode, not while in native (Rust/C) code. The
    previous test had two issues causing flakiness on Python 3.11:
    
    1. The interrupt fired before df.collect() entered the UDF, while the
       thread was still in native code where async exceptions are ignored.
    2. time.sleep(2.0) is a single C call where async exceptions are not
       checked — they're only checked between bytecode instructions.
    
    Fix by adding a threading.Event so the interrupt waits until the UDF is
    actually executing Python code, and by sleeping in small increments so
    the eval loop has opportunities to check for pending exceptions.
    
    Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    timsaucer and claude authored Apr 24, 2026
    Configuration menu
    Copy the full SHA
    e0284c6 View commit details
    Browse the repository at this point in the history
Loading