fix(baileys): prevent healthy instances from being killed after stream:error 515#2509
Open
octo-patch wants to merge 2 commits intoEvolutionAPI:mainfrom
Open
Conversation
…m:error 515 When WhatsApp sends stream:error code=515 (Connection Replaced), Baileys handles the reconnect correctly and fires connection.update with state='open'. However, WhatsApp then sends a 401 (loggedOut) to clean up the old session slot, which Evolution API incorrectly treated as a real logout, killing the newly-connected healthy instance. The fix tracks when a stream:error 515 node arrives via the CB:stream:error WebSocket event. If a loggedOut (401) close event fires within 30 seconds of a 515, it is treated as a transient reconnect rather than a real logout. Fixes EvolutionAPI#2498
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds tracking for recent WhatsApp stream:error 515 events and adjusts Baileys reconnect logic so that transient 401 loggedOut events following a 515 trigger a reconnect instead of destroying the WhatsApp instance. Sequence diagram for handling stream:error 515 followed by 401 loggedOutsequenceDiagram
actor User
participant WhatsApp
participant BaileysClient
participant BaileysStartupService
participant EvolutionAPI
User->>WhatsApp: Connect device B
WhatsApp-->>BaileysClient: stream:error code=515
BaileysClient->>BaileysStartupService: ws event CB:stream:error
BaileysStartupService->>BaileysStartupService: _lastStream515At = Date.now()
BaileysClient->>BaileysStartupService: connection.update state=connecting
BaileysClient->>BaileysStartupService: connection.update state=open
BaileysStartupService->>EvolutionAPI: webhook connection.update state=open
WhatsApp-->>BaileysClient: 401 loggedOut (old session cleanup)
BaileysClient->>BaileysStartupService: connection.update connection=close statusCode=401
BaileysStartupService->>BaileysStartupService: recentStream515 = Date.now() - _lastStream515At < 30000
alt statusCode is loggedOut and recentStream515
BaileysStartupService->>BaileysStartupService: shouldReconnect = true
BaileysStartupService->>BaileysStartupService: connectToWhatsapp(phoneNumber)
else treated as real logout
BaileysStartupService->>EvolutionAPI: logout.instance
end
Class diagram for updated BaileysStartupService reconnect logicclassDiagram
class ChannelStartupService
class BaileysStartupService {
- boolean endSession
- Log logBaileys
- Promise~void~ eventProcessingQueue
- number _lastStream515At
+ connectToWhatsapp(phoneNumber: string) Promise~void~
+ createClient(number: string) Promise~void~
}
ChannelStartupService <|-- BaileysStartupService
class BaileysClient {
+ ws: WebSocket
+ onConnectionUpdate()
}
class WebSocket {
+ on(event: string, handler: function)
}
BaileysStartupService --> BaileysClient : client
BaileysClient --> WebSocket : ws
class DisconnectReason {
<<enumeration>>
loggedOut
forbidden
}
class ConnectionCloseHandler {
+ handleClose(statusCode: number)
- recentStream515: boolean
- shouldReconnect: boolean
}
BaileysStartupService ..> ConnectionCloseHandler : uses
ConnectionCloseHandler ..> DisconnectReason : uses
%% Logical behavior (no notes):
%% - When CB:stream:error code 515 received, BaileysStartupService sets _lastStream515At
%% - In connection close handler, recentStream515 is true if now - _lastStream515At < 30000
%% - shouldReconnect is true if statusCode not in noReconnect list or statusCode is loggedOut and recentStream515
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- Consider extracting the
30_000window into a named constant (e.g.,STREAM_515_LOGGED_OUT_GRACE_MS) to make the intent of the timeframe clearer and easier to tweak in the future. - In the
CB:stream:errorhandler, it may be safer to handle both string and numericcodevalues (e.g.,String(node?.attrs?.code) === '515') to avoid relying on a specific type from the underlying library.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting the `30_000` window into a named constant (e.g., `STREAM_515_LOGGED_OUT_GRACE_MS`) to make the intent of the timeframe clearer and easier to tweak in the future.
- In the `CB:stream:error` handler, it may be safer to handle both string and numeric `code` values (e.g., `String(node?.attrs?.code) === '515'`) to avoid relying on a specific type from the underlying library.
## Individual Comments
### Comment 1
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="430" />
<code_context>
const statusCode = (lastDisconnect?.error as Boom)?.output?.statusCode;
const codesToNotReconnect = [DisconnectReason.loggedOut, DisconnectReason.forbidden, 402, 406];
- const shouldReconnect = !codesToNotReconnect.includes(statusCode);
+ const recentStream515 = Date.now() - this._lastStream515At < 30_000;
+ const shouldReconnect =
+ !codesToNotReconnect.includes(statusCode) ||
</code_context>
<issue_to_address>
**suggestion:** Consider extracting the 30s threshold into a named constant or config for clarity and tuning.
Using the literal `30_000` here makes future tuning harder as we learn more about `515` frequency in production. A named constant or config-backed value (like the cache TTLs) would document the intent and make it easier to adjust without changing code.
Suggested implementation:
```typescript
private logBaileys = this.configService.get<Log>('LOG').BAILEYS;
private eventProcessingQueue: Promise<void> = Promise.resolve();
private _lastStream515At = 0;
// Cache TTL constants (in seconds)
private readonly MESSAGE_CACHE_TTL_SECONDS = 5 * 60; // 5 minutes - avoid duplicate message processing
// Reconnect behavior thresholds (in milliseconds)
private readonly RECENT_STREAM_515_THRESHOLD_MS = 30_000; // 30 seconds - treat 515 as recent
```
```typescript
const statusCode = (lastDisconnect?.error as Boom)?.output?.statusCode;
const codesToNotReconnect = [DisconnectReason.loggedOut, DisconnectReason.forbidden, 402, 406];
const recentStream515 = Date.now() - this._lastStream515At < this.RECENT_STREAM_515_THRESHOLD_MS;
```
</issue_to_address>
### Comment 2
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="722" />
<code_context>
this.sendDataWebhook(Events.CALL, payload, true, ['websocket']);
});
+ this.client.ws.on('CB:stream:error', (node: any) => {
+ if (node?.attrs?.code === '515') {
+ this._lastStream515At = Date.now();
</code_context>
<issue_to_address>
**suggestion:** Tighten the `node` type instead of using `any` to improve safety and maintainability.
Because this handler is tied to `'CB:stream:error'`, the payload shape should be predictable. Defining a minimal type for `node` (e.g. `{ attrs?: { code?: string } }`) will improve type-checking and make future payload changes safer to handle.
Suggested implementation:
```typescript
type StreamErrorNode = { attrs?: { code?: string } };
this.client.ws.on('CB:stream:error', (node: StreamErrorNode) => {
```
```typescript
if (node.attrs?.code === '515') {
```
</issue_to_address>
### Comment 3
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="723" />
<code_context>
});
+ this.client.ws.on('CB:stream:error', (node: any) => {
+ if (node?.attrs?.code === '515') {
+ this._lastStream515At = Date.now();
+ }
</code_context>
<issue_to_address>
**suggestion:** Avoid the magic '515' string by introducing a named constant or enum entry.
Using the literal `'515'` here obscures what the code means. Please define a named constant or enum value instead so the intent is clear and future changes to the value are safer.
Suggested implementation:
```typescript
this.sendDataWebhook(Events.CALL, payload, true, ['websocket']);
});
const STREAM_ERROR_RECONNECT_CODE = '515';
this.client.ws.on('CB:stream:error', (node: any) => {
if (node?.attrs?.code === STREAM_ERROR_RECONNECT_CODE) {
this._lastStream515At = Date.now();
}
});
```
If you already have a shared constants file or an enum for WhatsApp/stream error codes in this codebase, it would be better to:
1. Move `STREAM_ERROR_RECONNECT_CODE` into that shared location (e.g., `whatsapp.constants.ts` or similar), and
2. Import and use it here instead of defining it inline within this method.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Addresses sourcery-ai review feedback on the previous commit:
- Extract the 30 000ms reconnect grace window into a named class constant
STREAM_515_RECONNECT_GRACE_MS so future tuning is self-documenting
rather than a literal scattered through the close handler.
- Extract the magic '515' string into STREAM_ERROR_CODE_RECONNECT.
- Replace the loose 'node: any' on the 'CB:stream:error' handler with a
minimal structural type ({ attrs?: { code?: string | number } }) so
the payload shape is documented and type-checked.
- Compare the code via String(...) so a numeric 515 from the underlying
socket library still triggers the grace window — the original literal
'515' check would have silently broken on a type change.
Author
|
Thanks for the review @sourcery-ai! I've pushed a follow-up commit (d803c9f) that addresses all three suggestions:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2498
Problem
When WhatsApp sends
stream:error code=515(Connection Replaced — normal multi-device protocol behavior), Baileys handles the reconnect correctly and firesconnection.updatewithstate='open'. However, WhatsApp then sends a401 loggedOutmessage to clean up the old session slot. Evolution API's close handler incorrectly treated this 401 as a real logout, killing the newly-connected healthy instance.Sequence of events that triggered the bug:
stream:error code=515— WhatsApp notifies of connection replacementCONNECTED TO WHATSAPP ✓— session is open, webhook firesstate='open'401 loggedOutto clean up old session slotlogout.instance→ REMOVED ← bugSolution
Track when a
stream:error 515node arrives via theCB:stream:errorWebSocket event using a class-level timestamp (_lastStream515At). In the connection close handler, if aloggedOut (401)event fires within 30 seconds of a 515, treat it as a transient reconnect side-effect rather than a real logout and callconnectToWhatsapp()instead of destroying the instance.Changes:
private _lastStream515At = 0;class property to track when 515 last occurredCB:stream:errorWebSocket handler increateClient()to record the timestamp when code is'515'shouldReconnectlogic in the connection close handler to allow reconnect when a loggedOut follows within 30s of a 515Testing
The fix can be validated by:
stream:error code=515appears followed byCONNECTED TO WHATSAPP ✓, the instance should remain connected instead of being removedWAMonitoringService: REMOVED/LOGOUTappears after a successful reconnectSummary by Sourcery
Bug Fixes: