Skip to content

Reject format-string field index above Py_ssize_t::MAX#7708

Merged
youknowone merged 1 commit intoRustPython:mainfrom
changjoon-park:fix-format-huge-field-index
Apr 28, 2026
Merged

Reject format-string field index above Py_ssize_t::MAX#7708
youknowone merged 1 commit intoRustPython:mainfrom
changjoon-park:fix-format-huge-field-index

Conversation

@changjoon-park
Copy link
Copy Markdown
Contributor

@changjoon-park changjoon-park commented Apr 27, 2026

Background

CPython rejects digit-only format-string field names that overflow Py_ssize_t at parse time with ValueError: Too many decimal digits in format string (Python/string_parser.c::get_integer). RustPython's FieldName::parse accepted any digit string usize::from_str could parse:

  • Values within usize but above isize::MAX → wrong exception at lookup (IndexError: tuple index out of range)
  • Values above usize::MAX → fell through to Keyword, then KeyError

Repro

import sys
fs = '{{{}:.6f}}'.format(sys.maxsize + 1)  # '{9223372036854775808:.6f}'

fs.format(2.34)
# CPython 3.14:    ValueError: Too many decimal digits in format string
# RustPython:      IndexError: tuple index out of range  (before this PR)

Fix

In FieldName::parse, after parse_usize returns the digit-only index, reject values above isize::MAX (matching CPython's Py_ssize_t::MAX). When parse_usize itself fails on a digits-only string (overflowing usize), raise the same error. A new FormatParseError::TooManyDecimalDigits maps to the byte-identical CPython wording.

The cap is isize::MAX as usize, which equals Py_ssize_t::MAX on every platform (i64 on 64-bit, i32 on 32-bit). Values up to isize::MAX parse successfully and only fail at lookup, matching CPython.

Nested [index] field-name parts ('{0[huge]}') have the same shape but no upstream regression test; left for a follow-up.

Tests unmasked

  • test_str.StrTest.test_format_huge_item_number

Verification

CPython 3.14.4 byte-identical for boundary cases:

  • {2147483647} (i32::MAX): IndexError (parse OK)
  • {2147483648} (i32::MAX+1): IndexError (parse OK — not band-aid)
  • {9223372036854775807} (isize::MAX): IndexError (parse OK)
  • {9223372036854775808} (isize::MAX+1): ValueError
  • {99999999999999999999...} (usize overflow): ValueError

Normal field names unchanged: {0}, {abc}, {0[1]}, {abc1} (digit in keyword), {007} (leading zeros), {0.attr}, dynamic '{0:{1}}'.format('x', 5).

No regressions across test_str, test_format, test_fstring.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced validation for numeric field names in format strings to detect values exceeding parsing limits and provide more precise error messaging when invalid format specifications are encountered.

CPython rejects digit-only format-string field names that overflow
Py_ssize_t at parse time with ValueError: Too many decimal digits in
format string (Python/string_parser.c::get_integer). RustPython's
FieldName::parse accepted any digit string usize::from_str could parse,
producing IndexError or KeyError at lookup instead.

Cap the parsed index at isize::MAX (Py_ssize_t::MAX on every platform)
inside FieldName::parse. Also reject digits-only strings whose value
overflows usize itself (caught when parse_usize returns None on an
all-digit input). A new FormatParseError::TooManyDecimalDigits maps to
the byte-identical CPython wording.

Unmasks test_str.StrTest.test_format_huge_item_number.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

Adds a new TooManyDecimalDigits error variant to FormatParseError enum. Updates FieldName::parse in the common format module to detect when digit-only field segments exceed integer limits and report them with this new error type. The VM layer's exception conversion is extended to map this error variant to a ValueError.

Changes

Cohort / File(s) Summary
Format parsing error handling
crates/common/src/format.rs
Introduces TooManyDecimalDigits enum variant to FormatParseError. Updates FieldName::parse logic to validate that parsed numeric values don't exceed isize::MAX and to handle usize parsing overflow for all-digit segments as TooManyDecimalDigits instead of falling back to Keyword type.
VM exception conversion
crates/vm/src/format.rs
Extends FormatParseError to Python exception conversion to handle the new TooManyDecimalDigits variant, returning a ValueError with an appropriate error message.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • ShaharNaveh
  • youknowone

Poem

🐰 Whiskers twitching with delight,
The digits dance beyond their height,
Too many zeros, now they say,
"You've overflowed in every way!"
A new error hops into place,

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and specifically describes the main change: rejecting format-string field indices that exceed Py_ssize_t::MAX, which is the core objective of this changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_str.py (TODO: 14)
[x] test: cpython/Lib/test/test_fstring.py (TODO: 19)
[x] test: cpython/Lib/test/test_string_literals.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on str)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/common/src/format.rs (1)

1186-1204: Logic correctly mirrors CPython's get_integer and handles both overflow regimes.

The two-stage check is correct:

  • If parse_usize succeeds, values in (isize::MAX, usize::MAX] are now rejected (this regime is otherwise reachable on 64-bit).
  • If parse_usize fails on an all-ASCII-digit segment, it's a usize overflow and is also rejected.

Non-digit inputs (-5, +5 with current Rust semantics, leading whitespace, non-UTF-8 wtf8) still fall through to FieldType::Keyword exactly as before, so no regression.

Optional nit: consider adding unit tests next to test_parse_field_name (line 1729) for the three boundaries — isize::MAX (Index), isize::MAX + 1 (TooManyDecimalDigits), and a usize-overflowing digits-only string (TooManyDecimalDigits) — to lock the behavior in at this layer rather than relying solely on the unmasked test_format_huge_item_number.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/common/src/format.rs` around lines 1186 - 1204, The parsing branch
correctly enforces two overflow regimes but lacks explicit unit tests locking
this behavior; add tests near test_parse_field_name that assert
parse_usize/field-name parsing returns FieldType::Index for isize::MAX, and
returns FormatParseError::TooManyDecimalDigits for both isize::MAX + 1 and for a
digits-only string that would overflow usize (to cover the second overflow
branch that triggers the early error), referencing the existing parse logic
around parse_usize, FieldType::Index, FormatParseError::TooManyDecimalDigits and
FieldType::Keyword so future changes don’t regress this behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/common/src/format.rs`:
- Around line 1186-1204: The parsing branch correctly enforces two overflow
regimes but lacks explicit unit tests locking this behavior; add tests near
test_parse_field_name that assert parse_usize/field-name parsing returns
FieldType::Index for isize::MAX, and returns
FormatParseError::TooManyDecimalDigits for both isize::MAX + 1 and for a
digits-only string that would overflow usize (to cover the second overflow
branch that triggers the early error), referencing the existing parse logic
around parse_usize, FieldType::Index, FormatParseError::TooManyDecimalDigits and
FieldType::Keyword so future changes don’t regress this behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 591ff054-6b9c-455a-9866-62aa6b8606f6

📥 Commits

Reviewing files that changed from the base of the PR and between 1d42ee5 and 455519f.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_str.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/common/src/format.rs
  • crates/vm/src/format.rs

Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@changjoon-park
Copy link
Copy Markdown
Contributor Author

🔥

🔥🔥🔥

@youknowone youknowone merged commit 68aece5 into RustPython:main Apr 28, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants