Skip to content

gh-151424: Fix impossible stack traces in RemoteUnwinder(..., cache_frames=True) by copying chunks on cache miss#151426

Open
maurycy wants to merge 3 commits into
python:mainfrom
maurycy:gh-151424
Open

gh-151424: Fix impossible stack traces in RemoteUnwinder(..., cache_frames=True) by copying chunks on cache miss#151426
maurycy wants to merge 3 commits into
python:mainfrom
maurycy:gh-151424

Conversation

@maurycy

@maurycy maurycy commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

The issue is caused by memory walks with cache_frames=True when there's a cache miss.

If I understand the whole machniery correctly.

Basically, we don't copy chunks in cache mode:

// In cache mode, copying stack chunks is more expensive than direct memory reads
if (!unwinder->cache_frames) {
if (copy_stack_chunks(unwinder, *current_tstate, &chunks) < 0) {
set_exception_cause(unwinder, PyExc_RuntimeError, "Failed to copy stack chunks");
goto error;
}
}

So there are never any chunks in the cache mode, not a problem with full cache hits:

if (ctx->chunks && ctx->chunks->count > 0) {
if (parse_frame_from_chunks(unwinder, &frame, frame_addr, &next_frame_addr, &stackpointer, ctx->chunks) == 0) {
goto parsed_frame;
}
PyErr_Clear();
}
{
uintptr_t address_of_code_object = 0;
int parse_result;
if (ctx->prefetch.frame && ctx->prefetch.frame_addr == frame_addr) {
parse_result = parse_frame_buffer(
unwinder, &frame, ctx->prefetch.frame,
&address_of_code_object, &next_frame_addr);
}
else {
parse_result = parse_frame_object(
unwinder, &frame, frame_addr,
&address_of_code_object, &next_frame_addr);
}
if (parse_result < 0) {
set_exception_cause(unwinder, PyExc_RuntimeError, "Failed to parse frame object in chain");
return -1;
}
}

But when there's no cache hit:

int full_hit = try_full_cache_hit(unwinder, ctx, thread_id);
if (full_hit != 0) {
return full_hit < 0 ? -1 : 0;
}
Py_ssize_t frames_before = PyList_GET_SIZE(ctx->frame_info);
if (process_frame_chain(unwinder, ctx) < 0) {

it falls back to many memory reads:

while ((void*)frame_addr != NULL) {

char frame[SIZEOF_INTERP_FRAME];
Py_ssize_t bytes_read = _Py_RemoteDebug_ReadRemoteMemory(
&unwinder->handle,
address,
SIZEOF_INTERP_FRAME,
frame
);

which isn't very atomic... the target mutates between reads and we're lost.

That's definitely the cause. Is there a cleaner fix ?

One side-note - let's not tighten the validation yet?

Generally, it's a statistical profiler so we should have some statistical tests for correctness, too.

Proof

Reusing

def leaf_a(): return sum(range(50))
def leaf_b(): return sum(range(50))
def hot_a(): return leaf_a()
def hot_b(): return leaf_b()
while True:
    hot_a(); hot_b();

from #151424 we get

2026-06-13T03:14:37.706845000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (gh-151424 141f623*?) % sudo ./python.exe -m profiling.sampling run --collapsed -o /tmp/stacks.txt -d 10 repro.py
Captured 10,000 samples in 10.00 seconds
Sample rate: 999.99 samples/sec
Error rate: 14.91
Collapsed stack output written to /tmp/stacks.txt
2026-06-13T03:14:51.568490000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (gh-151424 141f623*?) % cat /tmp/stacks.txt 
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_a:3;repro.py:leaf_a:1 3324
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_b:4;repro.py:leaf_b:2 3269
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6 1234
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_a:3 341
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_b:4 324
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_b:4;repro.py:leaf_b:-1 8
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_a:3;repro.py:leaf_a:-1 6
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:hot_a:3;repro.py:leaf_b:2 1
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:leaf_a:1 1
tid:13199240;<frozen runpy>:_run_module_as_main:201;<frozen runpy>:_run_code:87;repro.py:<module>:6;repro.py:leaf_b:2 1
2026-06-13T03:15:14.324823000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (gh-151424 141f623*?) % 

The impossible rate went to just 0,20%.

@maurycy maurycy changed the title gh-151424: Snapshot chunks on frame cache miss in RemoteUnwinder(..., cache_frames=True) gh-151424: Fix impossible stack traces in RemoteUnwinder(..., cache_frames=True) by copying chunks on cache miss Jun 13, 2026
@maurycy maurycy marked this pull request as ready for review June 13, 2026 02:01
@maurycy maurycy requested a review from pablogsal as a code owner June 13, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant