Fix silent/frozen interrupted turns: replay terminal status on reconnect#1619
Merged
Conversation
🦋 Changeset detectedLatest commit: 1a02284 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…ns aren't silent When a chat turn errors or recovery is exhausted, the terminal `MSG_CHAT_RESPONSE` broadcast is transient. A client disconnected at that moment (e.g. during a deploy / WebSocket reconnect storm) misses it, and on reconnect `onConnect` previously replayed only the current messages with no terminal signal — so the turn looked frozen (no completed response, no error). Think now persists a durable record of the last terminal turn (`_recordTerminalChatStatus`, key `cf:chat:last-terminal`) on error / exhausted recovery, and replays it on connect via `_buildIdleConnectMessages`. The record is cleared when a later turn completes (or aborts). Benign recovery skips (e.g. `conversation_changed`, where a newer turn owns the UI) are intentionally not recorded, so they don't false-banner. Test: think-session.test.ts "replays a terminal error to a reconnecting client (hydration)" — a failed turn is surfaced in the connect payload; a later completed turn clears it. Full think suite: 432. Also adds a `fail`-triggered failing model to examples/deploy-churn so the frozen-on-reconnect behavior can be confirmed manually in a browser. This is the A+C half of the silent-interrupted fix (surface terminal failures; keep benign skips silent). The B half (a live "recovering…" progress indicator, which needs a new client-facing signal + client rendering) is deferred. Co-authored-by: Cursor <cursoragent@cursor.com>
59e412a to
1a02284
Compare
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
5 tasks
threepointone
added a commit
that referenced
this pull request
May 30, 2026
…_disabled skips Closes two remaining "frozen turn" hydration gaps from the terminal-status work (#1619), folded in alongside the tool-result durability fix: - The outer catch in `_handleChatRequest` broadcasts an error for failures that happen before the stream starts (e.g. message reconciliation/persist), but never recorded the terminal status. A client disconnected at that moment missed the transient broadcast and stayed frozen on reconnect. It now records terminal "error" before broadcasting, so `_buildIdleConnectMessages` replays it. - A recovery skip from `onChatRecovery` returning `{ continue: false }` only marked the submission interrupted (which does not touch the WS transcript), so a WS chat client got no durable terminal signal at all. Unlike a benign `conversation_changed` skip (a newer turn already owns the UI), disabling recovery abandons the turn with no superseding turn, so it now records terminal status and broadcasts an error like exhaustion does. `conversation_changed` / `not_recoverable` skips remain silent. Tests: a pre-stream failure and a `{ continue: false }` skip both surface a terminal error to a reconnecting client via the idle-connect replay. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
An interrupted/failed chat turn could leave the client "frozen" — no completed
response and no error — after a WebSocket reconnect. The terminal
MSG_CHAT_RESPONSE { error: true }broadcast (on a turn error or exhaustedrecovery) is transient: a client disconnected at that moment (e.g. during a
deploy / reconnect storm) misses it, and on reconnect
onConnectreplayed onlythe current messages with no terminal signal.
So the bug was not a missing broadcast — it was a hydration gap: terminal
status was never replayed to a reconnecting client.
Fix
Think now persists a durable record of the last terminal turn and replays it on
connect:
_recordTerminalChatStatus(status, requestId, body)— onerror(from_fireResponseHook) or exhausted recovery (_exhaustChatRecovery), persist{ requestId, body }undercf:chat:last-terminal; oncompleted/aborted,clear it (the persisted messages convey the outcome).
_buildIdleConnectMessages()— on connect with no active stream, send thecurrent messages plus a replay of the last terminal error (if any) as a
terminal
MSG_CHAT_RESPONSE { done: true, error: true }.Benign recovery skips (e.g.
conversation_changed, where a newer turn owns theUI) are intentionally not recorded, so they don't false-banner. The terminal
message type is unchanged, so existing clients handle it as before.
This is the A + C half of the silent-interrupted issue (surface terminal
failures; keep benign skips silent). The B half — a live "recovering…"
progress indicator — is a separate client-facing feature tracked in its own
issue (it needs a new chat-protocol signal +
useAgentChat/SPA rendering).Test
think-session.test.ts→ "replays a terminal error to a reconnecting client(hydration)": a failed turn is surfaced in the connect payload (
error: true,matching requestId); a later completed turn clears it.
examples/deploy-churngains afail-triggered failing model so thefrozen-on-reconnect behavior can be confirmed manually in a browser
(
npm start, send a message containing "fail", reload).Test plan
npm run check(sherif + exports + oxfmt + oxlint + typecheck — 91 projects)npm run test -w @cloudflare/think— 432 pass (incl. new test)npm run test:e2e -w @cloudflare/think— 9 pass (exercises connect/recovery)Related
Part of the deploy-churn recovery hardening: #1615, #1617, and #1618
(continuation assistant-prefill). Companion follow-up: the "recovering…"
progress indicator (B), filed separately.
Made with Cursor