Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain by youknowone · Pull Request #7870 · RustPython/RustPython

youknowone · 2026-05-13T11:58:11Z

Three changes that bring optimize_load_fast_borrow closer to CPython's optimize_load_fast in flowgraph.c:

ir.rs: split mark_cold into the CPython-style two passes. Phase 1 propagates "warm" from the entry block, phase 2 propagates "cold" from except_handler blocks. Blocks reached by neither phase keep cold=false and stay in their original b_next position, matching CPython's handling of empty placeholders left by remove_unreachable (e.g. the inner_end of a nested try/except whose incoming jumps were re-routed by optimize_cfg).
ir.rs: in optimize_load_fast_borrow, push the fall-through successor only when the current block has a last instruction (is_some_and). Empty blocks now terminate fall-through propagation, matching the term != NULL check in cpython's optimize_load_fast.
compile.rs: add switch_to_new_or_reuse_empty() helper and use it in compile_while. The helper reuses the current block when it is empty and unlinked, mirroring CPython's USE_LABEL absorption in cfg_builder_maybe_start_new_block. This stops a stray empty block from appearing between e.g. a try/except end_block and the following while loop header.

Four codegen tests that depended on the previous fall-through-through-empty behavior are marked #[ignore] with TODO comments; they will be revisited once the remaining empty-block sources in codegen (for/with/match, line-number NOP preservation) are aligned with CPython.

Measured impact: scripts/compare_bytecode.py drops from 244 to 208 differing files across the full Lib tree (-14.8%).

Summary by CodeRabbit

Tests
- Added and updated tests to validate CPython-aligned control-flow, borrow and exception-handling behaviors.
Chores
- Expanded spell-check dictionaries with additional Python/CPython terms.
- Improved compilation, constant-folding and control-flow analyses to better match CPython semantics, tightening exception-handler and block-reordering logic for more consistent code generation.

coderabbitai · 2026-05-13T11:58:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 474c06fd-bf37-4e20-bc61-6ed12e588715

📥 Commits

Reviewing files that changed from the base of the PR and between 0e9cff6 and 4bfae75.

📒 Files selected for processing (4)

.cspell.dict/cpython.txt
.cspell.dict/python-more.txt
crates/codegen/src/compile.rs
crates/codegen/src/ir.rs

✅ Files skipped from review due to trivial changes (2)

.cspell.dict/python-more.txt
.cspell.dict/cpython.txt

🚧 Files skipped from review as they are similar to previous changes (2)

crates/codegen/src/compile.rs
crates/codegen/src/ir.rs

📝 Walkthrough

Walkthrough

This PR updates codegen/IR to mirror CPython control-flow behavior (block reuse, per-block constant folding, non-empty successor resolution, tightened borrow fallthrough, two-phase cold marking, and protected-local checks), adds related tests, and adds spell-check dictionary entries.

Changes

Spell-check Dictionary Updates

Layer / File(s)	Summary
Spell-check dictionary entries `.cspell.dict/cpython.txt`, `.cspell.dict/python-more.txt`	Added entries: `eooh`, `EOOH`, `isbytecode`, `fobj`, `infd`, `indexgroup`, `infj`, `inittab`, `Inittab`, `instancecheck`, `instanceof`, `instrs`, `outfd`.

CPython-aligned Codegen & IR

Layer / File(s)	Summary
Block reuse helper and while-loop compilation `crates/codegen/src/compile.rs`	Adds `switch_to_new_or_reuse_empty()` and uses it when creating while-loop headers to reuse an empty current block when safe.
Compile tests for borrow preservation `crates/codegen/src/compile.rs`	Adds tests for borrow propagation through empty fallthrough/handler assignment tails and protected-store return-borrow preservation.
Per-block constant folding `crates/codegen/src/ir.rs`	Introduces `fold_unary_constant_at`, `fold_binop_constant_at`, and `fold_constants_per_block` to run iterative per-block constant folding to fixed point.
Non-empty successor/predecessor & targeting `crates/codegen/src/ir.rs`	Resolves successor/predecessor logic via `next_nonempty_block`, precomputes `targeted` bitmap, and updates predecessor graph/traversals to operate on non-empty block targets.
Fall-through propagation in borrow optimization `crates/codegen/src/ir.rs`	`optimize_load_fast_borrow` now prevents fallthrough propagation from empty placeholder blocks and uses `targeted` to gate propagation.
Handler-resume & suppression predicates `crates/codegen/src/ir.rs`	Rewires handler-resume, suppression, and related predicates to use non-empty-aware targets and updated helper signatures.
Protected-stored-locals & reorder guards `crates/codegen/src/ir.rs`	Adds `protected_stored_locals_were_initialized_before_try`, uses it to skip certain reorder handling, adds mismatched-protection checks, and rejects cases where exit block contains `is_block_push`.
Two-phase cold-block marking `crates/codegen/src/ir.rs`	Replaces `cold = !warm` with two-phase algorithm: warm forward-reachability from entry (skip `except_handler` targets and `is_block_push` edges) then cold forward-walk from `except_handler` blocks through non-warm blocks to set `block.cold`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

RustPython/RustPython#7578: Related changes to codegen IR/CFG control-flow handling around loop/backedge semantics.
RustPython/RustPython#7773: Overlaps in optimize_load_fast_borrow and borrow-propagation/IR traversal adjustments.
RustPython/RustPython#7721: Related try/except and borrow-deoptimization/protected-store CFG work.

Suggested reviewers

coolreader18

Poem

🐰 I hopped through blocks both warm and cold,
Reused the empty ones I found, not bold.
I folded constants, chased fallthrough's trail,
Marked cold from handlers, warmed the entry's gale.
Hooray — the CFG's neat and well-controlled!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain' accurately summarizes the main change: achieving bytecode parity with CPython by reordering control flow and optimizing the LOAD_FAST_BORROW chain.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

When a `with` body contains a try/except/else whose handler ends with an open conditional (e.g. `except OSError: if not path.is_symlink(): raise`), the handler does not unconditionally scope-exit and the synthetic body-exit NOP introduced by `preserves_finally_entry_nop` is spurious because control may still fall through to the with cleanup. Guard against this by anding in `!statements_end_with_open_conditional_fallthrough` when checking handler scope exits. Also add the test `test_with_try_except_else_open_conditional_handler_drops_body_exit_nop` locking the pattern, and `fobj` to .cspell.dict/python-more.txt so the prek cspell hook on CI clears for PR RustPython#7870 (used verbatim in the already-committed test_nested_with_except_same_line_cleanup_threads_trampoline).

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/codegen/src/ir.rs`:
- Around line 8264-8266: The code pushes the result of
next_nonempty_block(blocks, block.next) onto stack without checking for
BlockIdx::NULL, which can lead to an out-of-bounds access when
blocks[cursor.idx()] is later used; change the block that currently does
stack.push(next_nonempty_block(blocks, block.next)) to first store the result in
a variable (e.g. let target = next_nonempty_block(blocks, block.next)) and only
call stack.push(target) if target != BlockIdx::NULL, mirroring the existing
pattern used for info.target; keep references to block_has_fallthrough and
BlockIdx::NULL as shown.
- Around line 8310-8312: The push uses next_nonempty_block(blocks, block.next)
without validating its result, risking an invalid BlockIdx; change the logic in
the fallthrough branch (around block_has_fallthrough / next_nonempty_block
usage) to first assign the result to a temporary (e.g., let next =
next_nonempty_block(blocks, block.next)) and only call stack.push((next,
stores_local)) if next != BlockIdx::NULL (or otherwise validated), ensuring you
avoid out-of-bounds/invalid block pushes.
- Around line 8851-8853: The current condition calls
block_has_fallthrough(block) even when block.next may be BlockIdx::NULL; reorder
and guard this so we only call block_has_fallthrough when block.next is valid.
Change the if to first check block.next != BlockIdx::NULL and then &&
block_has_fallthrough(block), so only then call
stack.push(next_nonempty_block(blocks, block.next)); reference the symbols
block_has_fallthrough, block.next, BlockIdx::NULL, and next_nonempty_block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9576bc4a-59d1-432c-b75c-ce55bc8aeebe

📥 Commits

Reviewing files that changed from the base of the PR and between 9523b6b and 0e9cff6.

📒 Files selected for processing (4)

.cspell.dict/cpython.txt
.cspell.dict/python-more.txt
crates/codegen/src/compile.rs
crates/codegen/src/ir.rs

✅ Files skipped from review due to trivial changes (1)

.cspell.dict/python-more.txt

🚧 Files skipped from review as they are similar to previous changes (1)

.cspell.dict/cpython.txt

Three changes that bring optimize_load_fast_borrow closer to CPython's optimize_load_fast in flowgraph.c: * ir.rs: split mark_cold into the CPython-style two passes. Phase 1 propagates "warm" from the entry block, phase 2 propagates "cold" from except_handler blocks. Blocks reached by neither phase keep cold=false and stay in their original b_next position, matching CPython's handling of empty placeholders left by remove_unreachable (e.g. the inner_end of a nested try/except whose incoming jumps were re-routed by optimize_cfg). * ir.rs: in optimize_load_fast_borrow, push the fall-through successor only when the current block has a last instruction (is_some_and). Empty blocks now terminate fall-through propagation, matching the `term != NULL` check in optimize_load_fast. * compile.rs: add switch_to_new_or_reuse_empty() helper and use it in compile_while. The helper reuses the current block when it is empty and unlinked, mirroring USE_LABEL absorption in cfg_builder_maybe_start_new_block. This stops a stray empty block from appearing between e.g. a try/except end_block and the following while loop header. Four codegen tests that depended on the previous fall-through-through- empty behavior are marked #[ignore] with TODO comments. Also includes a handful of dictionary entries in .cspell.dict picked up during the work.

Mirror CPython's optimize_basic_block() (flowgraph.c) by walking each block once in instruction order and trying tuple, list, set, unary, and binop folding at each position before advancing. This replaces the previous global-pass sequence where every fold_unary_constants pattern in the whole CFG was registered before any tuple constant, leaving negated literals like `-1` at co_consts positions earlier than CPython produces (e.g. snippets.py: -1 at idx 280 vs CPython idx 726). Changes: - Extract `fold_unary_constant_at` and `fold_binop_constant_at` per- position helpers from the existing global passes; the global passes now call the helpers in a loop. - Add `fold_constants_per_block` that walks each block to a fixed point, trying all five folds at each instruction position. - Call the new walker before the legacy global passes in optimize_finalize so co_consts insertion order matches CPython's. Measured on the full Lib tree: differing files 270 → 269; the only newly matching file is `test/test_ast/snippets.py`, the case raised in #28.

`inline_small_fast_return_blocks` previously appended the target `LOAD_FAST(_BORROW)/RETURN_VALUE` block's instructions onto any predecessor whose fall-through eventually reached it, in addition to the unconditional-jump case CPython handles in `inline_small_or_no_lineno_blocks` (flowgraph.c:1210). CPython only inlines through unconditional jumps, leaving fall-through predecessors to reach the shared return block via the natural CFG layout. The extra fall-through branch duplicated the return tail (e.g. `if/elif/return` emitted two adjacent `LOAD_FAST_BORROW x; RETURN_VALUE` sequences). Remove the fall-through inlining branch and keep only the unconditional-jump path. Measured on the full Lib tree: differing files 270 → 239 (-31), no new regressions. Files newly matching include copy.py, argparse.py, dataclasses.py, logging/__init__.py, pathlib/__init__.py, etc.

`reorder_conditional_scope_exit_and_jump_back_blocks` previously skipped any reorder where the conditional, scope-exit, or jump-back block had an `except_handler` attached, even when all three shared the same handler. CPython reorders these regardless of try/except context, as long as the blocks stay within the same protected region. The over- conservative guard left patterns like `try: for: if cond: return` with the loop body's scope-exit ahead of the backedge, while CPython places the backedge first and inverts the conditional. Replace the `block_is_protected` triple-check with a single `mismatched_protection` test: skip only when the three blocks do not share the same `except_handler`. Same-handler reorders preserve the protected range because every instruction's `except_handler` field stays attached as `.next` pointers shift. Measured on the full Lib tree: differing files 239 → 237; no new regressions.

reorder_jump_over_exception_cleanup_blocks was swapping a small scope-exit target with a preceding cold cleanup chain even when the target block began a fresh try (SETUP_FINALLY/SETUP_CLEANUP/SETUP_WITH). The swap moved the next try's setup ahead of the prior handler's cleanup_end/next_handler/cleanup_block, making the cleanup_body's JUMP_FORWARD fall through directly to the cleanup_end and get elided as redundant. The bytecode then lacked the JUMP_FORWARD that skips the cleanup blocks and matched the prior handler's borrow tail incorrectly. Skip the reorder when the target block contains any block-push pseudo op so a new try's setup stays in source order. Re-enables the four named/typed except-cleanup borrow tests that were marked #[ignore] in commit 7481459.

youknowone force-pushed the bytecode-parity branch from 27b6e5f to 17a5041 Compare May 14, 2026 09:37

youknowone marked this pull request as ready for review May 15, 2026 07:50

youknowone marked this pull request as draft May 15, 2026 07:51

youknowone force-pushed the bytecode-parity branch 3 times, most recently from f79a458 to 0e9cff6 Compare May 16, 2026 03:20

youknowone changed the title ~~Align LOAD_FAST_BORROW analysis with CPython chain shape~~ Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain May 16, 2026

youknowone marked this pull request as ready for review May 16, 2026 04:20

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread crates/codegen/src/ir.rs

Comment thread crates/codegen/src/ir.rs

Comment thread crates/codegen/src/ir.rs

youknowone added 6 commits May 16, 2026 16:01

fix if block_idx == BlockIdx::NULL

4bfae75

youknowone force-pushed the bytecode-parity branch from 0e9cff6 to 4bfae75 Compare May 16, 2026 07:01

youknowone merged commit 56269f7 into RustPython:main May 16, 2026
25 checks passed

youknowone deleted the bytecode-parity branch May 16, 2026 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain#7870

Bytecode parity - CFG reorders and LOAD_FAST_BORROW chain#7870
youknowone merged 6 commits into
RustPython:mainfrom
youknowone:bytecode-parity

youknowone commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youknowone commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youknowone commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading