Zombie state

The bot's process is alive — heartbeats are landing — but its Discord gateway connection is dead. The bot looks online but isn't actually working.

The most insidious failure mode. Your bot's process is running. Heartbeats are arriving at CloudLine. RAM and CPU look normal. But in Discord, nothing works — slash commands don't respond, presence shows offline, the bot doesn't see new messages.

The bot is in a zombie state. Detecting it without CloudLine is hard; recovering from it is straightforward once you know.

What's actually happening

A Discord bot keeps two separate connections to Discord:

The gateway WebSocket — receives events (messages, interactions, presence updates). This is the "is the bot active?" connection.
The REST API — sends responses (replies, message edits). This is HTTP, opened per request.

A zombie state is when the gateway has gone stale (the underlying TCP connection is dead, or the library lost track of it, or the server stopped acking heartbeats) but the bot's process is still happily looping. The REST API still works — anything you POST to it succeeds — but no events arrive, so there's nothing for the bot to react to.

How CloudLine detects it

The SDK reports two fields on every heartbeat:

gateway_ok — boolean, true when the gateway is currently READY and healthy.
gateway_stale_sec — how many seconds the gateway has been continuously unhealthy.

When gateway_ok is false and gateway_stale_sec keeps climbing, your bot is a zombie. The dashboard surfaces this as the Gateway tile flipping from "OK" to "Zombie".

NOTE

Raw-fetch (non-SDK) heartbeats don't include gateway health — they can't introspect the gateway state. The SDK is the only way to get zombie detection.

How to set up the alert

On the bot's Alerts tab, the Zombie alert is enabled by default with a 60-second sustained window. That means once the gateway has been bad for 60 continuous seconds, CloudLine fires the alert to your configured channels.

Tune the sustained window if you have noisy gateways (e.g. unstable hosting that drops every few minutes during peak hours):

Sustained window	Trade-off
10 s	Tightest. Will fire on transient reconnects. Noisy.
60 s (default)	Filters reconnect blips, catches real zombies within a minute.
300 s (5 min)	Very quiet. Only fires on persistent zombies — but you miss medium-length stalls.

Common causes

In rough order of frequency:

1. Missing or wrong intents

The most-common cause. Your bot connects fine, but the intents you declared don't match what your bot needs.

Wrong — bot can't see message events

const client = new Client({ intents: [GatewayIntentBits.Guilds] })
// then... bot tries to read messages → gateway disconnects silently

If your bot needs GuildMessages, MessageContent, or GuildPresences, declare them. Without the right intents, Discord may close the connection without an obvious error.

Check your bot's Discord Developer Portal page → Bot → Privileged Gateway Intents — toggles for MESSAGE CONTENT, PRESENCE, and SERVER MEMBERS. Each must be both toggled on in the portal and declared in your code.

2. Network instability between your host and Discord

The TCP connection is fragile. NAT timeouts (home routers), VPN reconnects, or noisy hosting can drop the WebSocket without your bot noticing. discord.js / discord.py have built-in reconnect logic for this — but sometimes they fail to detect the disconnect (no FIN packet ever arrives) and just sit there.

The fix: a real reconnect-on-stale watchdog. See Recovery below.

3. Long event-loop blocking

If your bot blocks the event loop for ~60+ seconds (huge synchronous calculation, unintended infinite loop in a hot path), Discord considers the gateway dead and closes it. The bot, when it finally unblocks, finds itself disconnected — but if the library doesn't auto-reconnect, you have a zombie.

The signal: CloudLine event-loop lag tile sustained above 1000 ms before the zombie state began.

4. Sleep / suspend on a laptop

If you're running the bot on your laptop and close the lid, the process resumes on wake but the gateway is gone. The SDK self-heals into a fresh heartbeat cadence — the gateway, not so much. Move the bot to a server, not a laptop.

5. Library bug

Rare, but discord.js and discord.py have had reconnection bugs over the years that produce zombies under specific conditions. Always run the latest minor version of your Discord library.

Recovery

The brute-force fix: restart the process. The new process opens a fresh gateway connection, and everything works again.

Automatic restart

In production, set up your host to auto-restart on a CloudLine zombie alert. Options:

PM2: trigger pm2 restart <name> from a webhook receiver.
Docker: a sidecar that listens for the alert and runs docker restart.
systemd: systemctl restart <unit> from a CloudLine webhook handler.
Discord webhook → script: route the alert to a webhook receiver that runs your restart command. Simple but requires you to host the receiver.

Reconnect-on-stale watchdog (no restart needed)

A more elegant fix: detect the stale gateway inside your bot and force a reconnect without restarting the process.

discord.js watchdog

setInterval(() => {
  if (client.ws.status !== 0 || client.ws.ping < 0) {
    console.warn('[watchdog] gateway looks stale — destroying + reconnecting')
    client.destroy()
    client.login(process.env.DISCORD_TOKEN)
  }
}, 60_000)

The watchdog runs every 60 seconds. If the gateway isn't READY (status !== 0) or it never acked the last heartbeat (ping < 0), it tears down and reopens the connection. The CloudLine SDK keeps running through this — heartbeats land, the dashboard shows the zombie briefly, then recovery fires when the gateway comes back.

Why not handle this in the SDK?

The SDK is intentionally measurement-only — it observes and reports, it never modifies your bot's behavior. A library that silently destroys and reconnects your gateway under conditions you didn't sign up for would be hard to debug if it ever misfired. We surface the signal; you decide how to react.

Confirming a zombie vs other states

State	Status ribbon	Gateway tile	What's happening
Healthy	Green	OK	Everything works.
Offline	Grey	(no data)	The bot isn't sending heartbeats. The process is dead, the network is down, or the secret is wrong.
Zombie	Green (!)	Zombie	The process is alive but the gateway is dead. Slash commands fail. This page.
Shard down	Green or Degraded	OK + Shard warning	Some shards are connected, others aren't. Partial outage on the affected shards.
At-risk	Yellow	OK	Recent reliability tier dropped below 97%. Heartbeats are landing fine, but you've had recent incidents. See Reliability tiers.

On this page