Autogen instructions & opcodes#7797
Conversation
📝 WalkthroughWalkthroughThis PR refactors the RustPython bytecode instruction system to generate Rust opcode/instruction enums from a TOML configuration file, removes the ChangesBytecode Opcode System Refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
crates/compiler-core/generate.py (2)
18-21: 💤 Low valueChain the re-raised exception (B904).
Inside the
except KeyErrorclause, raise the new error with explicit chaining so traceback context is preserved (and to silence the RuffB904warning).♻️ Proposed fix
try: CPYTHON_ROOT = pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve() -except KeyError: - raise ValueError("Missing environment variable 'CPYTHON_ROOT'") +except KeyError as e: + raise ValueError("Missing environment variable 'CPYTHON_ROOT'") from eAs per coding guidelines: "Use ruff for linting Python code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/compiler-core/generate.py` around lines 18 - 21, The except KeyError block re-raises a new ValueError without exception chaining, triggering Ruff B904; update the except to capture the original KeyError (e.g., except KeyError as e) and re-raise the ValueError with explicit chaining using "from e" so the traceback context for CPYTHON_ROOT resolution (pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve()) is preserved.
462-470: 💤 Low valueWalrus binds
opargbut the body never reads it (F841).In
fn_as_opcode, the walrus operator assignsopargonly to gate thearms += " { .. }"branch. Since the value isn't used, a plain containment check is clearer and avoids the dead binding flagged by Flake8.♻️ Proposed fix
- for instr in self: - name = instr.name - arms += f"Self::{name}" - if oparg := self.metadata.get(name, {}).get("oparg"): - arms += " { .. }" - - arms += f"=> {self.opcode_enum}::{name},\n" + for instr in self: + name = instr.name + arms += f"Self::{name}" + if "oparg" in self.metadata.get(name, {}): + arms += " { .. }" + + arms += f" => {self.opcode_enum}::{name},\n"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/compiler-core/generate.py` around lines 462 - 470, The walrus in fn_as_opcode assigns oparg but never uses it, triggering Flake8 F841; replace the walrus check in the loop (where arms is built for each instr/name) with a plain containment check against self.metadata for "oparg" (e.g., use if "oparg" in self.metadata.get(name, {}) or if self.metadata.get(name, {}).get("oparg") without assignment) so the branch that appends " { .. }" runs without creating an unused binding.crates/compiler-core/src/bytecode/instruction.rs (1)
480-498: 💤 Low value
AnyOpcode::deoptcan be flattened slightly without losingconst-ness.The
if let Some(_) { Some(_) } else { None }shape is forced becauseOption::mapisn't usable in aconst fn. That's fine, but amatchis a touch shorter and matches the style used by neighbouringreal/pseudoaccessors above:♻️ Proposed sketch
#[must_use] pub const fn deopt(&self) -> Option<Self> { match self { - Self::Real(opcode) => { - if let Some(op) = opcode.deopt() { - Some(Self::Real(op)) - } else { - None - } - } - Self::Pseudo(opcode) => { - if let Some(op) = opcode.deopt() { - Some(Self::Pseudo(op)) - } else { - None - } - } + Self::Real(opcode) => match opcode.deopt() { + Some(op) => Some(Self::Real(op)), + None => None, + }, + Self::Pseudo(opcode) => match opcode.deopt() { + Some(op) => Some(Self::Pseudo(op)), + None => None, + }, } }Same
match-on-Optionpattern is already used byOpcode::deoptimizeat the top of this file, so this is also a consistency nit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/compiler-core/src/bytecode/instruction.rs` around lines 480 - 498, The AnyOpcode::deopt const fn can be simplified: inside each arm (Self::Real(opcode) and Self::Pseudo(opcode)) replace the if-let Some/else None pattern with a match on opcode.deopt() that returns Some(Self::Real(op)) / Some(Self::Pseudo(op)) or None, preserving const fn and behavior and matching the style used by Opcode::deoptimize and the nearby accessors.crates/stdlib/src/_opcode.rs (1)
70-145: ⚡ Quick winPlease add direct
_opcoderegression coverage.This changes Python-visible behavior for
stack_effect()and thehas_*helpers, but the tests below still only snapshotdis.dis. A few focused assertions for valid, invalid, and specialized opcodes would make future generator changes much safer.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/stdlib/src/_opcode.rs` around lines 70 - 145, Add direct unit tests targeting the _opcode module: call the Python-exposed functions stack_effect (with jump true/false/None), is_valid, has_arg, has_const, has_name, has_jump, has_free, has_local, and has_exc to assert expected behavior for (1) a known valid opcode (e.g., LOAD_CONST or POP_TOP) returns correct stack effects and true for the corresponding has_* helpers, (2) an invalid opcode integer causes is_valid False and stack_effect raises/returns the ValueError behavior observed in the implementation, and (3) a specialized opcode (one whose deopt() is Some) triggers the ValueError path in stack_effect; keep tests focused and small so future opcode generator changes are caught without relying only on dis.dis snapshots.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/compiler-core/opcode.toml`:
- Around line 269-270: The oparg type for
PseudoOpcode.opcodes.StoreFastMaybeNull is incorrect: change the var_num oparg
from oparg::NameIdx to oparg::VarNum so PseudoInstruction::StoreFastMaybeNull is
generated with var_num: Arg<oparg::VarNum> (matching other var_num uses like
StoreFast/LoadFast) to index localsplus and preserve the VarNum newtype's type
safety.
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 49-58: The doc-comment promises that stack_effect_jump differs for
certain ops like Self::ForIter, but stack_effect_jump currently just delegates
to stack_effect(oparg), so ForIter's branch effect isn't handled; update
stack_effect_jump to match on instruction variants (e.g., match self {
Self::ForIter => -1, /* other special cases */ ... , _ =>
self.stack_effect(oparg) }) so branch (jump=true) effects are correct, or
alternatively remove the misleading ForIter example from the doc-comment if you
do not want to implement overrides now; ensure callers that rely on
stack_effect_jump (the call-site noted in the review) will receive correct
jump-edge counts.
- Around line 121-124: Instruction::stack_effect_jump incorrectly delegates to
Opcode::stack_effect instead of Opcode::stack_effect_jump; update the method to
call self.as_opcode().stack_effect_jump(oparg) so it mirrors
PseudoInstruction::stack_effect_jump and correctly uses the branch-specific
stack effect behavior for opcodes like ForIter (use as_opcode(),
stack_effect_jump, and Instruction::stack_effect_jump to locate the change).
---
Nitpick comments:
In `@crates/compiler-core/generate.py`:
- Around line 18-21: The except KeyError block re-raises a new ValueError
without exception chaining, triggering Ruff B904; update the except to capture
the original KeyError (e.g., except KeyError as e) and re-raise the ValueError
with explicit chaining using "from e" so the traceback context for CPYTHON_ROOT
resolution (pathlib.Path(os.environ["CPYTHON_ROOT"]).expanduser().resolve()) is
preserved.
- Around line 462-470: The walrus in fn_as_opcode assigns oparg but never uses
it, triggering Flake8 F841; replace the walrus check in the loop (where arms is
built for each instr/name) with a plain containment check against self.metadata
for "oparg" (e.g., use if "oparg" in self.metadata.get(name, {}) or if
self.metadata.get(name, {}).get("oparg") without assignment) so the branch that
appends " { .. }" runs without creating an unused binding.
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 480-498: The AnyOpcode::deopt const fn can be simplified: inside
each arm (Self::Real(opcode) and Self::Pseudo(opcode)) replace the if-let
Some/else None pattern with a match on opcode.deopt() that returns
Some(Self::Real(op)) / Some(Self::Pseudo(op)) or None, preserving const fn and
behavior and matching the style used by Opcode::deoptimize and the nearby
accessors.
In `@crates/stdlib/src/_opcode.rs`:
- Around line 70-145: Add direct unit tests targeting the _opcode module: call
the Python-exposed functions stack_effect (with jump true/false/None), is_valid,
has_arg, has_const, has_name, has_jump, has_free, has_local, and has_exc to
assert expected behavior for (1) a known valid opcode (e.g., LOAD_CONST or
POP_TOP) returns correct stack effects and true for the corresponding has_*
helpers, (2) an invalid opcode integer causes is_valid False and stack_effect
raises/returns the ValueError behavior observed in the implementation, and (3) a
specialized opcode (one whose deopt() is Some) triggers the ValueError path in
stack_effect; keep tests focused and small so future opcode generator changes
are caught without relying only on dis.dis snapshots.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 5f2bd321-730f-4d37-9cf6-fcd6aeaa987a
📒 Files selected for processing (12)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/compiler-core/generate.pycrates/compiler-core/opcode.tomlcrates/compiler-core/src/bytecode.rscrates/compiler-core/src/bytecode/instruction.rscrates/compiler-core/src/bytecode/instructions.rscrates/compiler-core/src/bytecode/oparg.rscrates/stdlib/src/_opcode.rscrates/vm/src/builtins/frame.rscrates/vm/src/frame.rsscripts/generate_opcode_metadata.py
| [PseudoOpcode.opcodes.StoreFastMaybeNull] | ||
| oparg = { name = "var_num", type = "oparg::NameIdx" } |
There was a problem hiding this comment.
Wrong oparg type for StoreFastMaybeNull.
The field is named var_num (consistent with StoreFast, LoadFast, etc.), but the type is set to oparg::NameIdx instead of oparg::VarNum. Every other var_num entry in this file (e.g., lines 64, 121, 124, 127, 133, 205) uses oparg::VarNum. As written, the generated PseudoInstruction::StoreFastMaybeNull { var_num: Arg<oparg::NameIdx> } would index into names, not localsplus, and lose the VarNum newtype's type safety (NameIdx is just pub type NameIdx = u32;).
🐛 Proposed fix
[PseudoOpcode.opcodes.StoreFastMaybeNull]
-oparg = { name = "var_num", type = "oparg::NameIdx" }
+oparg = { name = "var_num", type = "oparg::VarNum" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/compiler-core/opcode.toml` around lines 269 - 270, The oparg type for
PseudoOpcode.opcodes.StoreFastMaybeNull is incorrect: change the var_num oparg
from oparg::NameIdx to oparg::VarNum so PseudoInstruction::StoreFastMaybeNull is
generated with var_num: Arg<oparg::VarNum> (matching other var_num uses like
StoreFast/LoadFast) to index localsplus and preserve the VarNum newtype's type
safety.
| /// Stack effect when the instruction takes its branch (jump=true). | ||
| /// | ||
| /// CPython equivalent: `stack_effect(opcode, oparg, jump=True)`. | ||
| /// For most instructions this equals the fallthrough effect. | ||
| /// Override for instructions where branch and fallthrough differ | ||
| /// (e.g. [`Self::ForIter`]: fallthrough = +1, branch = −1). | ||
| #[must_use] | ||
| pub fn stack_effect_jump(&self, oparg: u32) -> i32 { | ||
| self.stack_effect(oparg) | ||
| } |
There was a problem hiding this comment.
Doc-comment promises a ForIter override that isn't implemented.
The comment states "Override for instructions where branch and fallthrough differ (e.g. Self::ForIter: fallthrough = +1, branch = −1)", but the body just delegates to self.stack_effect(oparg), so ForIter will report +1 on both edges. Either:
- match
Self::ForIter(and any other affected ops) explicitly here, or - drop the misleading example from the doc-comment until the override lands.
Given the related call-site bug at line 122, callers depending on jump-edge effects will compound the miscount. Worth resolving before this is consumed by the codegen/CFG-flow logic that uses stack_effect_jump.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/compiler-core/src/bytecode/instruction.rs` around lines 49 - 58, The
doc-comment promises that stack_effect_jump differs for certain ops like
Self::ForIter, but stack_effect_jump currently just delegates to
stack_effect(oparg), so ForIter's branch effect isn't handled; update
stack_effect_jump to match on instruction variants (e.g., match self {
Self::ForIter => -1, /* other special cases */ ... , _ =>
self.stack_effect(oparg) }) so branch (jump=true) effects are correct, or
alternatively remove the misleading ForIter example from the doc-comment if you
do not want to implement overrides now; ensure callers that rely on
stack_effect_jump (the call-site noted in the review) will receive correct
jump-edge counts.
| #[must_use] | ||
| pub fn stack_effect_jump(&self, oparg: u32) -> i32 { | ||
| self.as_opcode().stack_effect(oparg) | ||
| } |
There was a problem hiding this comment.
Instruction::stack_effect_jump calls the wrong delegate.
This forwards to Opcode::stack_effect, not Opcode::stack_effect_jump. PseudoInstruction::stack_effect_jump (line 151-153) correctly forwards to stack_effect_jump. Today the two Opcode methods return identical values, but the moment Opcode::stack_effect_jump is overridden for branch-vs-fallthrough cases (e.g. ForIter, as called out in the doc-comment at lines 49-55), the Instruction path will silently produce wrong stack effects on the branch edge — exactly the contract this method is meant to model.
🐛 Proposed fix
#[must_use]
pub fn stack_effect_jump(&self, oparg: u32) -> i32 {
- self.as_opcode().stack_effect(oparg)
+ self.as_opcode().stack_effect_jump(oparg)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[must_use] | |
| pub fn stack_effect_jump(&self, oparg: u32) -> i32 { | |
| self.as_opcode().stack_effect(oparg) | |
| } | |
| #[must_use] | |
| pub fn stack_effect_jump(&self, oparg: u32) -> i32 { | |
| self.as_opcode().stack_effect_jump(oparg) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/compiler-core/src/bytecode/instruction.rs` around lines 121 - 124,
Instruction::stack_effect_jump incorrectly delegates to Opcode::stack_effect
instead of Opcode::stack_effect_jump; update the method to call
self.as_opcode().stack_effect_jump(oparg) so it mirrors
PseudoInstruction::stack_effect_jump and correctly uses the branch-specific
stack effect behavior for opcodes like ForIter (use as_opcode(),
stack_effect_jump, and Instruction::stack_effect_jump to locate the change).
There was a problem hiding this comment.
feeling like this could be written using macro_rules and embed opcode.toml in the file. not confident though
There was a problem hiding this comment.
We can always revert this PR and do that
I intend to break this PR to multiple PRs.
Posting this as a draft for now so I can get early feedback on my general idea.
Even if this doesn't get merged, we can make use of some auto-generated parts here (maybe)
Summary by CodeRabbit