Alert Thresholds
When to be alerted — for offline, high latency, CPU spike, memory leak, event-loop lag, and error storms. Every knob, every range, every default.
CloudLine ships with safe defaults — you can sign up, attach the SDK, and get offline alerts immediately. This page covers the fine-grained thresholds for every other alert type, what they mean, and what plan they're available on.
All thresholds are set on the bot detail page → Alerts tab. Each alert has the same three knobs:
- Threshold — the value that has to be crossed (e.g. CPU above 80%).
- Sustained for — how long the condition has to hold before the alert fires (e.g. 60 seconds). This filters out one-tick spikes.
- Channels — Discord webhook, email, or both. See Alert channels.
If you set a threshold to empty / 0, that specific alert is off. The other alerts keep running independently.
Offline alert (always on)
The core alert — CloudLine pings you when your bot stops sending heartbeats.
| Setting | Default | What it does |
|---|---|---|
notifyOnOffline | on | Page you when the bot is marked offline. |
notifyOnRecovery | on | Page you again when it comes back. |
notifyOnDegraded | off | Also page on transient "degraded" states (slow heartbeats, gateway blips). Noisier; off by default. |
How fast offline fires depends on your check interval and offline threshold:
| Plan | Default interval | Offline threshold |
|---|---|---|
| Starter | 60 s | Fixed at 2 missed heartbeats (~2 min total). |
| Pro | 30 s | Configurable: 1, 2, or 3 missed heartbeats. |
| Business | 10 s (configurable: 10/20/30/45/60 s) | Configurable: 1, 2, or 3 missed heartbeats. |
Latency alert
Triggers when the Discord gateway ping stays above your threshold for the sustained window.
| Setting | Range | Default | Plan |
|---|---|---|---|
latencyThresholdMs | 10 – 60 000 ms | off | All plans |
latencySustainedSec | Starter 60+ / Pro 30+ / Business 10+, up to 3600 | 30 s | Floor varies by plan |
The sustained-window floor matches your check interval — a Starter user with a 60-second interval can't observe a recovery faster than 60 seconds, so the sustained floor is 60. Pro is 30, Business is 10.
Zombie alert (gateway stuck)
The bot's process is alive (sending heartbeats), but its Discord gateway connection is stale. The bot looks online to CloudLine but isn't serving any commands. See Zombie state.
| Setting | Range | Default |
|---|---|---|
zombieEnabled | on / off | on |
zombieSustainedSec | 10 – 3600 s | 60 s |
Shard-down alert
For bots using sharding (AutoShardedClient in discord.js, AutoShardedBot in discord.py). Triggers when fewer shards are connected than expected.
| Setting | Range | Default |
|---|---|---|
shardDownEnabled | on / off | on |
shardDownSustainedSec | 10 – 3600 s | 60 s |
Single-process bots don't trigger this even when enabled — the SDK reports null shard counts, and the alarm only fires on a real shard mismatch.
CPU alert
| Setting | Range | Default |
|---|---|---|
cpuThresholdPct | 1 – 100 % | off |
cpuSustainedSec | 10 – 3600 s | 60 s |
CPU % is relative to a single core — 100% means one core saturated. The value is clamped server-side to 0–100, so a process pegging several cores still reports 100 (not 400); the threshold input is likewise capped at 100. Set your threshold somewhere below 100 to catch a runaway before it fully saturates a core.
Memory alert
| Setting | Range | Default |
|---|---|---|
memThresholdMb | 16 – 65 536 MB | off |
memSustainedSec | 10 – 3600 s | 60 s |
Memory is process resident-set size (RSS) in megabytes. Pick the threshold from your bot's normal baseline + headroom — there's no universal "high".
Event-loop lag alert
For Node bots: how long the event loop is blocked between scheduled timers. High values mean the bot's main thread is busy doing synchronous work and can't process new events in time.
| Setting | Range | Default |
|---|---|---|
lagThresholdMs | 10 – 60 000 ms | off |
lagSustainedSec | 10 – 3600 s | 30 s |
Normal idle lag is 0–5 ms. Bursts to 20–50 ms during garbage collection are routine. Sustained above 100 ms is worth investigating.
Error-rate alert
Triggers when your bot reports more errors per minute than the threshold. Counts everything that lands in the Error Log — SDK-captured discord.js errors, unhandledRejection, plus anything you push via monitor.captureError().
| Setting | Range | Default |
|---|---|---|
errThresholdPerMin | 1 – 10 000 / min | off |
errSustainedSec | 10 – 3600 s | 60 s |
A sustained high error rate without an offline alert usually means your bot is partially broken — running, but failing every interaction.
How the sustained window works
A naive threshold ("CPU > 80%") would fire on a single 1-tick blip, which is useless. The sustained window asks: has the condition held for at least N seconds?
Threshold: CPU > 80%
Sustained: 60 seconds
CPU at t=0: 30% → ok
CPU at t=30: 85% → ABOVE — start streak
CPU at t=60: 90% → streak = 30s — not enough yet
CPU at t=90: 88% → streak = 60s — ALERT fires
CPU at t=120: 50% → streak reset; recovery alert if recoveryEnabledChanging a threshold on the dashboard resets the streak for that alert — the next sustained-window measurement starts fresh.
Quiet hours
Suppress non-critical alerts during a fixed daily window (e.g. 22:00 → 08:00). Offline alerts can still fire during quiet hours if quietHoursAllowOffline is on (default).
| Setting | Range | Default |
|---|---|---|
quietHoursStartMin | 0 – 1439 (minute of day) | off (null) |
quietHoursEndMin | 0 – 1439 (minute of day) | off (null) |
quietHoursTzOffsetMin | −720 to +720 | off (null) |
quietHoursAllowOffline | on / off | on |
The dashboard's quiet-hours picker handles the minute conversion for you — pick a time, and it computes the minute-of-day under your local timezone offset.
Recovery alerts
Every threshold-based alert (latency, CPU, memory, lag, errors, zombie, shard-down) also fires a recovery notification when the metric returns to normal for the sustained window. Same channels as the firing alert. You don't get to disable recovery alerts per-metric — if you want quieter, raise the threshold or extend the sustained window.
Custom branding (Business)
Business-plan users can customize the alert appearance per channel:
| Setting | What it does |
|---|---|
customBrandColor | Hex color (#rrggbb) for the Discord embed side-stripe and email accent. |
customLogoUrl | Uploaded via the Branding tab (R2 storage). Shown in Discord embeds and email headers. |
customReplyTo | Reply-to email address for alert emails. |
customFooterText | Up to 200 characters of footer text, shown under every Discord embed and email. |
Each of those can be scoped per channel — e.g. brand color on Discord only, default footer on email. Toggles default to "both channels".
Testing your setup
The dashboard's Alerts → Send test button delivers a sample message to every configured channel without actually firing on a real condition. Use it after every change to verify the webhook still works and your email isn't bouncing.
Reliability tiers
Excellent / Good / At-risk / Critical — what each label means, how it's computed, and why CloudLine's thresholds are Discord-bot-tuned instead of strict SRE.
Alert Channels
Where CloudLine sends your alerts — Discord webhook and email. How to set each one up, what they look like, and which is available on which plan.