Reject format-string field index above Py_ssize_t::MAX#7708
Reject format-string field index above Py_ssize_t::MAX#7708youknowone merged 1 commit intoRustPython:mainfrom
Conversation
CPython rejects digit-only format-string field names that overflow Py_ssize_t at parse time with ValueError: Too many decimal digits in format string (Python/string_parser.c::get_integer). RustPython's FieldName::parse accepted any digit string usize::from_str could parse, producing IndexError or KeyError at lookup instead. Cap the parsed index at isize::MAX (Py_ssize_t::MAX on every platform) inside FieldName::parse. Also reject digits-only strings whose value overflows usize itself (caught when parse_usize returns None on an all-digit input). A new FormatParseError::TooManyDecimalDigits maps to the byte-identical CPython wording. Unmasks test_str.StrTest.test_format_huge_item_number.
📝 WalkthroughWalkthroughAdds a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_str.py (TODO: 14) dependencies: dependent tests: (no tests depend on str) Legend:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/common/src/format.rs (1)
1186-1204: Logic correctly mirrors CPython'sget_integerand handles both overflow regimes.The two-stage check is correct:
- If
parse_usizesucceeds, values in(isize::MAX, usize::MAX]are now rejected (this regime is otherwise reachable on 64-bit).- If
parse_usizefails on an all-ASCII-digit segment, it's ausizeoverflow and is also rejected.Non-digit inputs (
-5,+5with current Rust semantics, leading whitespace, non-UTF-8 wtf8) still fall through toFieldType::Keywordexactly as before, so no regression.Optional nit: consider adding unit tests next to
test_parse_field_name(line 1729) for the three boundaries —isize::MAX(Index),isize::MAX + 1(TooManyDecimalDigits), and ausize-overflowing digits-only string (TooManyDecimalDigits) — to lock the behavior in at this layer rather than relying solely on the unmaskedtest_format_huge_item_number.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/common/src/format.rs` around lines 1186 - 1204, The parsing branch correctly enforces two overflow regimes but lacks explicit unit tests locking this behavior; add tests near test_parse_field_name that assert parse_usize/field-name parsing returns FieldType::Index for isize::MAX, and returns FormatParseError::TooManyDecimalDigits for both isize::MAX + 1 and for a digits-only string that would overflow usize (to cover the second overflow branch that triggers the early error), referencing the existing parse logic around parse_usize, FieldType::Index, FormatParseError::TooManyDecimalDigits and FieldType::Keyword so future changes don’t regress this behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/common/src/format.rs`:
- Around line 1186-1204: The parsing branch correctly enforces two overflow
regimes but lacks explicit unit tests locking this behavior; add tests near
test_parse_field_name that assert parse_usize/field-name parsing returns
FieldType::Index for isize::MAX, and returns
FormatParseError::TooManyDecimalDigits for both isize::MAX + 1 and for a
digits-only string that would overflow usize (to cover the second overflow
branch that triggers the early error), referencing the existing parse logic
around parse_usize, FieldType::Index, FormatParseError::TooManyDecimalDigits and
FieldType::Keyword so future changes don’t regress this behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 591ff054-6b9c-455a-9866-62aa6b8606f6
⛔ Files ignored due to path filters (1)
Lib/test/test_str.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/common/src/format.rscrates/vm/src/format.rs
🔥🔥🔥 |
Background
CPython rejects digit-only format-string field names that overflow
Py_ssize_tat parse time withValueError: Too many decimal digits in format string(Python/string_parser.c::get_integer). RustPython'sFieldName::parseaccepted any digit stringusize::from_strcould parse:usizebut aboveisize::MAX→ wrong exception at lookup (IndexError: tuple index out of range)usize::MAX→ fell through toKeyword, thenKeyErrorRepro
Fix
In
FieldName::parse, afterparse_usizereturns the digit-only index, reject values aboveisize::MAX(matching CPython'sPy_ssize_t::MAX). Whenparse_usizeitself fails on a digits-only string (overflowingusize), raise the same error. A newFormatParseError::TooManyDecimalDigitsmaps to the byte-identical CPython wording.The cap is
isize::MAX as usize, which equalsPy_ssize_t::MAXon every platform (i64 on 64-bit, i32 on 32-bit). Values up toisize::MAXparse successfully and only fail at lookup, matching CPython.Nested
[index]field-name parts ('{0[huge]}') have the same shape but no upstream regression test; left for a follow-up.Tests unmasked
test_str.StrTest.test_format_huge_item_numberVerification
CPython 3.14.4 byte-identical for boundary cases:
{2147483647}(i32::MAX): IndexError (parse OK){2147483648}(i32::MAX+1): IndexError (parse OK — not band-aid){9223372036854775807}(isize::MAX): IndexError (parse OK){9223372036854775808}(isize::MAX+1): ValueError{99999999999999999999...}(usize overflow): ValueErrorNormal field names unchanged:
{0},{abc},{0[1]},{abc1}(digit in keyword),{007}(leading zeros),{0.attr}, dynamic'{0:{1}}'.format('x', 5).No regressions across
test_str,test_format,test_fstring.Summary by CodeRabbit