How to Monitor Cron Jobs That Fail Silently

June 18, 2026 · PingGuard Team

8 min read

Here is the nightmare scenario, and almost every engineer has lived some version of it: your nightly database backup has been failing for three weeks. The cron entry is still there. The server is up. Nothing threw a pager. You only find out the morning you actually need to restore - and there is nothing to restore from.

Cron jobs are uniquely dangerous because when they fail, they usually fail quietly. This guide is about closing that gap: how to know the moment a scheduled job stops doing its job.

Why cron jobs fail silently

A cron job can stop working for a dozen mundane reasons, and almost none of them announce themselves:

  • The script errors out halfway through, but cron only emails output to a local mailbox nobody reads.
  • The server was rebooted and the cron daemon did not come back up the way you assumed.
  • A dependency moved - a path changed, a credential expired, a disk filled up.
  • The job is still running but hung, so it never finishes and never starts again.
  • Someone edited the crontab and fat-fingered a line, silently disabling the entry.

In every one of these cases, the absence of a result is the only signal. And absence is exactly the thing humans are worst at noticing.

Why normal monitoring does not catch it

Standard uptime monitoring works by reaching out: a monitoring service sends an HTTP request to your URL every minute and checks the response. That is perfect for a website or an API, because there is something running at a known address to poll.

A cron job is the opposite. There is no endpoint to hit. The job runs for a few seconds at 3am and then there is nothing there to check. You cannot poll a backup script. So uptime monitoring - the kind you would use for your website - simply does not apply.

You need to invert the direction: instead of something checking in on the job, the job has to check in with you.

The fix: a dead man's switch

The pattern that solves this is a dead man's switch, also called heartbeat monitoring. It works like this:

  1. You create a monitor and get a unique ping URL.
  2. At the end of each successful run, your job sends a quick HTTP request to that URL - it "checks in."
  3. You tell the monitor how often it should expect to hear from the job (say, once every 24 hours, with some grace period).
  4. If the expected check-in does not arrive on time, the monitor alerts you.

The elegance is that silence is the signal. You are no longer relying on catching an error - you are relying on the absence of a success, which is something a machine can watch perfectly and forever.

The mental model: a normal monitor asks "is it responding?" A heartbeat monitor asks "did it check in when it was supposed to?" For scheduled work, the second question is the one that matters.

Setting it up, step by step

The whole setup is usually one extra line in your crontab. Suppose you have a nightly backup at 3am:

# Before: no monitoring
0 3 * * *  /usr/local/bin/backup.sh

Add a ping to the end of the job so it checks in when it finishes:

# After: pings the monitor when the backup completes
0 3 * * *  /usr/local/bin/backup.sh && curl -fsS https://pingguard.org/heartbeat/your-token

That is it. Now tell your monitor to expect a check-in every 24 hours. If 3am comes and goes and no ping arrives - the script crashed, the box was down, the crontab got mangled - you get an alert instead of a false sense of safety.

This works from anything that can make an HTTP request: a shell script, a Python job, a Kubernetes CronJob, a systemd timer, a Windows scheduled task. The ping is just a URL.

Ping only on success (this part matters)

Notice the && in the example above. It is doing real work. In shell, A && B only runs B if A succeeds (exit code 0).

So the ping only fires if the backup actually succeeded. If backup.sh exits with an error, the curl never runs, no check-in arrives, and you get alerted. That is exactly what you want - a job that ran but failed should look identical to a job that did not run at all.

A common mistake is to put the ping on its own line, where it runs regardless of whether the real work succeeded. That tells you the cron daemon is alive, but not that the job did anything useful. Tie the ping to success.

Which jobs to monitor first

You do not need to instrument every cron entry on day one. Start with the jobs where silent failure hurts the most:

  • Backups - database dumps, file backups, offsite syncs. The classic, and the most painful when missing.
  • Data pipelines and ETL - scheduled imports, syncs, and transforms that downstream things depend on.
  • Billing and invoicing jobs - anything that touches money on a schedule.
  • Cleanup and retention jobs - log rotation, temp cleanup, expiry. Silent failure here fills disks.
  • Certificate and token renewals - jobs that refresh credentials before they expire.

A good rule of thumb: if a job silently not running would cost you data, money, or a 2am incident a week later, it needs a heartbeat.

A quick checklist

  • Create a heartbeat monitor and grab its ping URL.
  • Add the ping to the end of the job, tied to success with &&.
  • Set the expected interval to match the schedule, with a sensible grace period.
  • Send the alert somewhere you will actually see it (email, Slack, Telegram).
  • Test it: disable the job once and confirm you get the alert.

That last step is the one people skip and later regret. A monitor you have never seen fire is a monitor you do not actually trust. Break it once on purpose so you know it works.

Monitor your cron jobs with PingGuard

PingGuard does heartbeat / cron monitoring on the free plan - alongside website, API, and SSL monitoring in the same dashboard. Add your first heartbeat in under a minute.

Start free → How cron monitoring works

Related reading: Dead man's switch monitoring, explained · PingGuard vs Cronitor · PingGuard vs Healthchecks.io