Skip to content

Fix silent/frozen interrupted turns: replay terminal status on reconnect#1619

Merged
threepointone merged 1 commit into
mainfrom
fix/recovery-status-hydration
May 30, 2026
Merged

Fix silent/frozen interrupted turns: replay terminal status on reconnect#1619
threepointone merged 1 commit into
mainfrom
fix/recovery-status-hydration

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

Summary

An interrupted/failed chat turn could leave the client "frozen" — no completed
response and no error — after a WebSocket reconnect. The terminal
MSG_CHAT_RESPONSE { error: true } broadcast (on a turn error or exhausted
recovery) is transient: a client disconnected at that moment (e.g. during a
deploy / reconnect storm) misses it, and on reconnect onConnect replayed only
the current messages with no terminal signal.

So the bug was not a missing broadcast — it was a hydration gap: terminal
status was never replayed to a reconnecting client.

Fix

Think now persists a durable record of the last terminal turn and replays it on
connect:

  • _recordTerminalChatStatus(status, requestId, body) — on error (from
    _fireResponseHook) or exhausted recovery (_exhaustChatRecovery), persist
    { requestId, body } under cf:chat:last-terminal; on completed/aborted,
    clear it (the persisted messages convey the outcome).
  • _buildIdleConnectMessages() — on connect with no active stream, send the
    current messages plus a replay of the last terminal error (if any) as a
    terminal MSG_CHAT_RESPONSE { done: true, error: true }.

Benign recovery skips (e.g. conversation_changed, where a newer turn owns the
UI) are intentionally not recorded, so they don't false-banner. The terminal
message type is unchanged, so existing clients handle it as before.

This is the A + C half of the silent-interrupted issue (surface terminal
failures; keep benign skips silent). The B half — a live "recovering…"
progress indicator — is a separate client-facing feature tracked in its own
issue (it needs a new chat-protocol signal + useAgentChat/SPA rendering).

Test

think-session.test.ts → "replays a terminal error to a reconnecting client
(hydration)": a failed turn is surfaced in the connect payload (error: true,
matching requestId); a later completed turn clears it.

examples/deploy-churn gains a fail-triggered failing model so the
frozen-on-reconnect behavior can be confirmed manually in a browser
(npm start, send a message containing "fail", reload).

Test plan

  • npm run check (sherif + exports + oxfmt + oxlint + typecheck — 91 projects)
  • npm run test -w @cloudflare/think — 432 pass (incl. new test)
  • npm run test:e2e -w @cloudflare/think — 9 pass (exercises connect/recovery)
  • CI green

Related

Part of the deploy-churn recovery hardening: #1615, #1617, and #1618
(continuation assistant-prefill). Companion follow-up: the "recovering…"
progress indicator (B), filed separately.

Made with Cursor

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

🦋 Changeset detected

Latest commit: 1a02284

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/think Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

…ns aren't silent

When a chat turn errors or recovery is exhausted, the terminal
`MSG_CHAT_RESPONSE` broadcast is transient. A client disconnected at that
moment (e.g. during a deploy / WebSocket reconnect storm) misses it, and on
reconnect `onConnect` previously replayed only the current messages with no
terminal signal — so the turn looked frozen (no completed response, no error).

Think now persists a durable record of the last terminal turn
(`_recordTerminalChatStatus`, key `cf:chat:last-terminal`) on error / exhausted
recovery, and replays it on connect via `_buildIdleConnectMessages`. The record
is cleared when a later turn completes (or aborts). Benign recovery skips
(e.g. `conversation_changed`, where a newer turn owns the UI) are intentionally
not recorded, so they don't false-banner.

Test: think-session.test.ts "replays a terminal error to a reconnecting client
(hydration)" — a failed turn is surfaced in the connect payload; a later
completed turn clears it. Full think suite: 432.

Also adds a `fail`-triggered failing model to examples/deploy-churn so the
frozen-on-reconnect behavior can be confirmed manually in a browser.

This is the A+C half of the silent-interrupted fix (surface terminal failures;
keep benign skips silent). The B half (a live "recovering…" progress indicator,
which needs a new client-facing signal + client rendering) is deferred.

Co-authored-by: Cursor <cursoragent@cursor.com>
@threepointone threepointone force-pushed the fix/recovery-status-hydration branch from 59e412a to 1a02284 Compare May 29, 2026 23:56
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 30, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1619

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1619

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1619

hono-agents

npm i https://pkg.pr.new/hono-agents@1619

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1619

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1619

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1619

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1619

commit: 1a02284

@threepointone threepointone merged commit 6d1a8f9 into main May 30, 2026
4 checks passed
@threepointone threepointone deleted the fix/recovery-status-hydration branch May 30, 2026 07:32
@github-actions github-actions Bot mentioned this pull request May 30, 2026
threepointone added a commit that referenced this pull request May 30, 2026
…_disabled skips

Closes two remaining "frozen turn" hydration gaps from the terminal-status work
(#1619), folded in alongside the tool-result durability fix:

- The outer catch in `_handleChatRequest` broadcasts an error for failures that
  happen before the stream starts (e.g. message reconciliation/persist), but
  never recorded the terminal status. A client disconnected at that moment
  missed the transient broadcast and stayed frozen on reconnect. It now records
  terminal "error" before broadcasting, so `_buildIdleConnectMessages` replays it.

- A recovery skip from `onChatRecovery` returning `{ continue: false }` only
  marked the submission interrupted (which does not touch the WS transcript), so
  a WS chat client got no durable terminal signal at all. Unlike a benign
  `conversation_changed` skip (a newer turn already owns the UI), disabling
  recovery abandons the turn with no superseding turn, so it now records terminal
  status and broadcasts an error like exhaustion does. `conversation_changed` /
  `not_recoverable` skips remain silent.

Tests: a pre-stream failure and a `{ continue: false }` skip both surface a
terminal error to a reconnecting client via the idle-connect replay.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant