Skip to content

Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain#7870

Merged
youknowone merged 6 commits into
RustPython:mainfrom
youknowone:bytecode-parity
May 16, 2026
Merged

Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain#7870
youknowone merged 6 commits into
RustPython:mainfrom
youknowone:bytecode-parity

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented May 13, 2026

Three changes that bring optimize_load_fast_borrow closer to CPython's optimize_load_fast in flowgraph.c:

  • ir.rs: split mark_cold into the CPython-style two passes. Phase 1 propagates "warm" from the entry block, phase 2 propagates "cold" from except_handler blocks. Blocks reached by neither phase keep cold=false and stay in their original b_next position, matching CPython's handling of empty placeholders left by remove_unreachable (e.g. the inner_end of a nested try/except whose incoming jumps were re-routed by optimize_cfg).

  • ir.rs: in optimize_load_fast_borrow, push the fall-through successor only when the current block has a last instruction (is_some_and). Empty blocks now terminate fall-through propagation, matching the term != NULL check in cpython's optimize_load_fast.

  • compile.rs: add switch_to_new_or_reuse_empty() helper and use it in compile_while. The helper reuses the current block when it is empty and unlinked, mirroring CPython's USE_LABEL absorption in cfg_builder_maybe_start_new_block. This stops a stray empty block from appearing between e.g. a try/except end_block and the following while loop header.

Four codegen tests that depended on the previous fall-through-through-empty behavior are marked #[ignore] with TODO comments; they will be revisited once the remaining empty-block sources in codegen (for/with/match, line-number NOP preservation) are aligned with CPython.

Measured impact: scripts/compare_bytecode.py drops from 244 to 208 differing files across the full Lib tree (-14.8%).

Summary by CodeRabbit

  • Tests

    • Added and updated tests to validate CPython-aligned control-flow, borrow and exception-handling behaviors.
  • Chores

    • Expanded spell-check dictionaries with additional Python/CPython terms.
    • Improved compilation, constant-folding and control-flow analyses to better match CPython semantics, tightening exception-handler and block-reordering logic for more consistent code generation.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 474c06fd-bf37-4e20-bc61-6ed12e588715

📥 Commits

Reviewing files that changed from the base of the PR and between 0e9cff6 and 4bfae75.

📒 Files selected for processing (4)
  • .cspell.dict/cpython.txt
  • .cspell.dict/python-more.txt
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
✅ Files skipped from review due to trivial changes (2)
  • .cspell.dict/python-more.txt
  • .cspell.dict/cpython.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs

📝 Walkthrough

Walkthrough

This PR updates codegen/IR to mirror CPython control-flow behavior (block reuse, per-block constant folding, non-empty successor resolution, tightened borrow fallthrough, two-phase cold marking, and protected-local checks), adds related tests, and adds spell-check dictionary entries.

Changes

Spell-check Dictionary Updates

Layer / File(s) Summary
Spell-check dictionary entries
.cspell.dict/cpython.txt, .cspell.dict/python-more.txt
Added entries: eooh, EOOH, isbytecode, fobj, infd, indexgroup, infj, inittab, Inittab, instancecheck, instanceof, instrs, outfd.

CPython-aligned Codegen & IR

Layer / File(s) Summary
Block reuse helper and while-loop compilation
crates/codegen/src/compile.rs
Adds switch_to_new_or_reuse_empty() and uses it when creating while-loop headers to reuse an empty current block when safe.
Compile tests for borrow preservation
crates/codegen/src/compile.rs
Adds tests for borrow propagation through empty fallthrough/handler assignment tails and protected-store return-borrow preservation.
Per-block constant folding
crates/codegen/src/ir.rs
Introduces fold_unary_constant_at, fold_binop_constant_at, and fold_constants_per_block to run iterative per-block constant folding to fixed point.
Non-empty successor/predecessor & targeting
crates/codegen/src/ir.rs
Resolves successor/predecessor logic via next_nonempty_block, precomputes targeted bitmap, and updates predecessor graph/traversals to operate on non-empty block targets.
Fall-through propagation in borrow optimization
crates/codegen/src/ir.rs
optimize_load_fast_borrow now prevents fallthrough propagation from empty placeholder blocks and uses targeted to gate propagation.
Handler-resume & suppression predicates
crates/codegen/src/ir.rs
Rewires handler-resume, suppression, and related predicates to use non-empty-aware targets and updated helper signatures.
Protected-stored-locals & reorder guards
crates/codegen/src/ir.rs
Adds protected_stored_locals_were_initialized_before_try, uses it to skip certain reorder handling, adds mismatched-protection checks, and rejects cases where exit block contains is_block_push.
Two-phase cold-block marking
crates/codegen/src/ir.rs
Replaces cold = !warm with two-phase algorithm: warm forward-reachability from entry (skip except_handler targets and is_block_push edges) then cold forward-walk from except_handler blocks through non-warm blocks to set block.cold.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • coolreader18

Poem

🐰 I hopped through blocks both warm and cold,
Reused the empty ones I found, not bold.
I folded constants, chased fallthrough's trail,
Marked cold from handlers, warmed the entry's gale.
Hooray — the CFG's neat and well-controlled!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain' accurately summarizes the main change: achieving bytecode parity with CPython by reordering control flow and optimizing the LOAD_FAST_BORROW chain.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

youknowone added a commit to youknowone/RustPython that referenced this pull request May 13, 2026
When a `with` body contains a try/except/else whose handler ends with
an open conditional (e.g. `except OSError: if not path.is_symlink(): raise`),
the handler does not unconditionally scope-exit and the synthetic
body-exit NOP introduced by `preserves_finally_entry_nop` is spurious
because control may still fall through to the with cleanup. Guard
against this by anding in `!statements_end_with_open_conditional_fallthrough`
when checking handler scope exits.

Also add the test
`test_with_try_except_else_open_conditional_handler_drops_body_exit_nop`
locking the pattern, and `fobj` to .cspell.dict/python-more.txt so the
prek cspell hook on CI clears for PR RustPython#7870 (used verbatim in the
already-committed test_nested_with_except_same_line_cleanup_threads_trampoline).
youknowone added a commit to youknowone/RustPython that referenced this pull request May 14, 2026
When a `with` body contains a try/except/else whose handler ends with
an open conditional (e.g. `except OSError: if not path.is_symlink(): raise`),
the handler does not unconditionally scope-exit and the synthetic
body-exit NOP introduced by `preserves_finally_entry_nop` is spurious
because control may still fall through to the with cleanup. Guard
against this by anding in `!statements_end_with_open_conditional_fallthrough`
when checking handler scope exits.

Also add the test
`test_with_try_except_else_open_conditional_handler_drops_body_exit_nop`
locking the pattern, and `fobj` to .cspell.dict/python-more.txt so the
prek cspell hook on CI clears for PR RustPython#7870 (used verbatim in the
already-committed test_nested_with_except_same_line_cleanup_threads_trampoline).
@youknowone youknowone marked this pull request as ready for review May 15, 2026 07:50
@youknowone youknowone marked this pull request as draft May 15, 2026 07:51
@youknowone youknowone force-pushed the bytecode-parity branch 3 times, most recently from f79a458 to 0e9cff6 Compare May 16, 2026 03:20
@youknowone youknowone changed the title Align LOAD_FAST_BORROW analysis with CPython chain shape Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain May 16, 2026
@youknowone youknowone marked this pull request as ready for review May 16, 2026 04:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/codegen/src/ir.rs`:
- Around line 8264-8266: The code pushes the result of
next_nonempty_block(blocks, block.next) onto stack without checking for
BlockIdx::NULL, which can lead to an out-of-bounds access when
blocks[cursor.idx()] is later used; change the block that currently does
stack.push(next_nonempty_block(blocks, block.next)) to first store the result in
a variable (e.g. let target = next_nonempty_block(blocks, block.next)) and only
call stack.push(target) if target != BlockIdx::NULL, mirroring the existing
pattern used for info.target; keep references to block_has_fallthrough and
BlockIdx::NULL as shown.
- Around line 8310-8312: The push uses next_nonempty_block(blocks, block.next)
without validating its result, risking an invalid BlockIdx; change the logic in
the fallthrough branch (around block_has_fallthrough / next_nonempty_block
usage) to first assign the result to a temporary (e.g., let next =
next_nonempty_block(blocks, block.next)) and only call stack.push((next,
stores_local)) if next != BlockIdx::NULL (or otherwise validated), ensuring you
avoid out-of-bounds/invalid block pushes.
- Around line 8851-8853: The current condition calls
block_has_fallthrough(block) even when block.next may be BlockIdx::NULL; reorder
and guard this so we only call block_has_fallthrough when block.next is valid.
Change the if to first check block.next != BlockIdx::NULL and then &&
block_has_fallthrough(block), so only then call
stack.push(next_nonempty_block(blocks, block.next)); reference the symbols
block_has_fallthrough, block.next, BlockIdx::NULL, and next_nonempty_block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9576bc4a-59d1-432c-b75c-ce55bc8aeebe

📥 Commits

Reviewing files that changed from the base of the PR and between 9523b6b and 0e9cff6.

📒 Files selected for processing (4)
  • .cspell.dict/cpython.txt
  • .cspell.dict/python-more.txt
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
✅ Files skipped from review due to trivial changes (1)
  • .cspell.dict/python-more.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • .cspell.dict/cpython.txt

Comment thread crates/codegen/src/ir.rs
Comment thread crates/codegen/src/ir.rs
Comment thread crates/codegen/src/ir.rs
Three changes that bring optimize_load_fast_borrow closer to CPython's
optimize_load_fast in flowgraph.c:

* ir.rs: split mark_cold into the CPython-style two passes. Phase 1
  propagates "warm" from the entry block, phase 2 propagates "cold" from
  except_handler blocks. Blocks reached by neither phase keep cold=false
  and stay in their original b_next position, matching CPython's handling
  of empty placeholders left by remove_unreachable (e.g. the inner_end of
  a nested try/except whose incoming jumps were re-routed by optimize_cfg).

* ir.rs: in optimize_load_fast_borrow, push the fall-through successor
  only when the current block has a last instruction (is_some_and).
  Empty blocks now terminate fall-through propagation, matching the
  `term != NULL` check in optimize_load_fast.

* compile.rs: add switch_to_new_or_reuse_empty() helper and use it in
  compile_while. The helper reuses the current block when it is empty
  and unlinked, mirroring USE_LABEL absorption in
  cfg_builder_maybe_start_new_block. This stops a stray empty block
  from appearing between e.g. a try/except end_block and the following
  while loop header.

Four codegen tests that depended on the previous fall-through-through-
empty behavior are marked #[ignore] with TODO comments.

Also includes a handful of dictionary entries in .cspell.dict picked up
during the work.
Mirror CPython's optimize_basic_block() (flowgraph.c) by walking each
block once in instruction order and trying tuple, list, set, unary, and
binop folding at each position before advancing. This replaces the
previous global-pass sequence where every fold_unary_constants pattern
in the whole CFG was registered before any tuple constant, leaving
negated literals like `-1` at co_consts positions earlier than CPython
produces (e.g. snippets.py: -1 at idx 280 vs CPython idx 726).

Changes:
- Extract `fold_unary_constant_at` and `fold_binop_constant_at` per-
  position helpers from the existing global passes; the global passes
  now call the helpers in a loop.
- Add `fold_constants_per_block` that walks each block to a fixed point,
  trying all five folds at each instruction position.
- Call the new walker before the legacy global passes in
  optimize_finalize so co_consts insertion order matches CPython's.

Measured on the full Lib tree: differing files 270 → 269; the only
newly matching file is `test/test_ast/snippets.py`, the case raised in
#28.
`inline_small_fast_return_blocks` previously appended the target
`LOAD_FAST(_BORROW)/RETURN_VALUE` block's instructions onto any
predecessor whose fall-through eventually reached it, in addition to the
unconditional-jump case CPython handles in
`inline_small_or_no_lineno_blocks` (flowgraph.c:1210). CPython only
inlines through unconditional jumps, leaving fall-through predecessors
to reach the shared return block via the natural CFG layout. The extra
fall-through branch duplicated the return tail (e.g. `if/elif/return`
emitted two adjacent `LOAD_FAST_BORROW x; RETURN_VALUE` sequences).

Remove the fall-through inlining branch and keep only the
unconditional-jump path.

Measured on the full Lib tree: differing files 270 → 239 (-31), no new
regressions. Files newly matching include copy.py, argparse.py,
dataclasses.py, logging/__init__.py, pathlib/__init__.py, etc.
`reorder_conditional_scope_exit_and_jump_back_blocks` previously skipped
any reorder where the conditional, scope-exit, or jump-back block had
an `except_handler` attached, even when all three shared the same
handler. CPython reorders these regardless of try/except context, as
long as the blocks stay within the same protected region. The over-
conservative guard left patterns like `try: for: if cond: return` with
the loop body's scope-exit ahead of the backedge, while CPython places
the backedge first and inverts the conditional.

Replace the `block_is_protected` triple-check with a single
`mismatched_protection` test: skip only when the three blocks do not
share the same `except_handler`. Same-handler reorders preserve the
protected range because every instruction's `except_handler` field
stays attached as `.next` pointers shift.

Measured on the full Lib tree: differing files 239 → 237; no new
regressions.
reorder_jump_over_exception_cleanup_blocks was swapping a small
scope-exit target with a preceding cold cleanup chain even when the
target block began a fresh try (SETUP_FINALLY/SETUP_CLEANUP/SETUP_WITH).
The swap moved the next try's setup ahead of the prior handler's
cleanup_end/next_handler/cleanup_block, making the cleanup_body's
JUMP_FORWARD fall through directly to the cleanup_end and get elided as
redundant. The bytecode then lacked the JUMP_FORWARD that skips the
cleanup blocks and matched the prior handler's borrow tail incorrectly.

Skip the reorder when the target block contains any block-push pseudo
op so a new try's setup stays in source order. Re-enables the four
named/typed except-cleanup borrow tests that were marked #[ignore] in
commit 7481459.
@youknowone youknowone merged commit 56269f7 into RustPython:main May 16, 2026
25 checks passed
@youknowone youknowone deleted the bytecode-parity branch May 16, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant