DING

Don't store it. Stream it. DING it.

One binary that wraps your CI job, training run, or batch script — and pings you when it matters.

$ brew install ding-labs/tap/ding
$ curl -sf https://start.ding.ing | sh

or Docker, GitHub Actions, binary

The 60-second example

Drop a ding.yaml next to your job. Two rules: one fires the moment a test fails, one fires once at the end with the totals.

rules:
  # during the run — ping Slack the second a test fails
  - name: test_failed
    match:
      metric: test.failed
    condition: value > 0
    message: "❌ {{ .test_name }} failed on {{ .branch }}"
    alert:
      - notifier: slack

  # at end-of-run — one summary alert with the totals
  - name: run_summary
    mode: end-of-run
    match:
      metric: run.exit
    condition: exit_code != 0
    message: "Job failed in {{ .duration_seconds }}s on {{ .commit }}"
    alert:
      - notifier: slack

Then wrap your job:

ding run -- npm test

DING auto-detects GitHub Actions, GitLab CI, CircleCI, Jenkins, Buildkite, Argo, MLflow, and bare Kubernetes — and attaches run_id, branch, commit, workflow, job, and actor labels automatically. They're available in every alert message.

What

DING is a streaming alert engine that ships with your workload, not next to it. The job emits events; DING evaluates rules in-process; alerts fire during the run and a summary fires when the job exits. Both die together.

Two ways to run it:

Single static binary. No database, no agents, no cloud account. MIT licensed. No telemetry.

Where DING fits

Ephemeral compute
CI jobs, ML training runs, batch ETL, scheduled tasks. Anything that starts, does work, exits.
Single-server apps
One VM, edge boxes, internal tools. The cases where standing up Prometheus and Alertmanager is more infra than the app itself.
Inside an existing stack
Drop into one job inside a Prometheus shop. DING doesn't replace your fleet metrics — it covers the workloads they miss.
Prometheus assumes a fleet. DING assumes a 4-minute CI job. Pull-based monitoring needs a target that lives long enough to be scraped. Most jobs don't.

Rules

One YAML file. Lives in your repo. Ships with your code.

rules:
  # during-run: fires whenever the condition is true
  - name: cpu_spike
    match:
      metric: cpu_usage
    condition: value > 95
    cooldown: 1m
    message: "CPU spike on {{ .host }}: {{ .value }}%"
    alert:
      - notifier: stdout

  # windowed: avg over 5m, fires while sustained
  - name: cpu_sustained
    match:
      metric: cpu_usage
    condition: avg(value) over 5m > 80
    cooldown: 10m
    message: "Sustained high CPU: {{ .avg }}% on {{ .host }}"
    alert:
      - notifier: stdout

  # end-of-run: fires once at exit, against accumulated state
  - name: slow_run
    mode: end-of-run
    match:
      metric: request.latency
    condition: avg(value) over 1h > 200
    message: "Avg latency this run: {{ .avg }}ms"
    alert:
      - notifier: slack

All condition forms:

value > 95                       # single event
avg(value) over 5m > 80         # average over window
max(value) over 1m >= 100
min(value) over 10s < 10
sum(value) over 30s > 0
count(value) over 2m > 50       # number of events, not sum of values
condition_a AND condition_b     # compound
condition_a OR condition_b

Template variables in message::

Variable When Description
.metric always metric name
.value always raw event value
.rule always rule name
.fired_at always RFC3339 timestamp
.host, .region, … always any label from the event
.run_id, .branch, .commit, .workflow, … ding run auto-detected run-context labels
.exit_code, .duration_seconds run.exit event synthetic event emitted at end of run
.avg .max .min .sum .count windowed only aggregate result

Notifiers

stdout and github_actions are built in — no config needed. Everything else is declared once and referenced by name.

Built-in

Configurable

notifiers:
  slack:
    type: slack
    url: ${SLACK_WEBHOOK_URL}

  on-call:
    type: discord
    url: ${DISCORD_WEBHOOK_URL}

  custom:
    type: webhook
    url: https://example.com/hook
    max_attempts: 3       # retries on 5xx (default: 3)
    initial_backoff: 1s   # doubles each attempt (default: 1s)

Also supported: type: teams, type: pagerduty, type: telegram. Slack, Discord, and Teams auto-surface run-context fields (exit code, duration, branch, commit, workflow, actor) so alerts from a CI job land actionable. ${VAR} works in any string value — fail fast if unset.

The generic webhook payload:

{"rule":"cpu_spike","message":"CPU spike on web-01: 97%",
 "metric":"cpu_usage","value":97.0,"fired_at":"...","host":"web-01"}

4xx responses are dropped. 5xx responses retry with exponential backoff.

Why

Fires alerts in 4ms. Prometheus default scrape + eval + Alertmanager dispatch: ~62 seconds minimum. That's not a knock on Prometheus — it's a pull-based system built for persistence and fleet-wide aggregation. DING is push-based and stateless. The architecture is the difference.

Performance

4ms
Alert latency p50
p99: 16ms — Prometheus default: ~62s
116k
Requests / second
50 concurrent workers, 30s window
9ms
Cold start p50
fork → first /health — Prometheus: 185ms
106ns
Per rule evaluation
simple threshold — windowed: 157ns

Benchmarked 2026-03-23 on Apple M3. Full methodology and raw results →

Recipes

Ready-made configs for the platforms you already use. Drop them in, edit the rules, ship.

GitHub Actionsaction
Marketplace action. One step in your workflow.
GitLab CI
Pipeline job, artifacts upload, MR comments.
CircleCI
Orb-friendly. Wraps any step's command.
Jenkins
Declarative + scripted pipelines.
Buildkite
Step plugin. Works with agent fleet.
Kubernetes Jobs
initContainer pattern. Helm chart available.
Argo Workflows
Sidecar with downward API for run context.
MLflow
Auto-attaches experiment ID and tracking URI.

ding serve mode

For steady-state apps — anything that runs longer than the workload it watches. Starts an HTTP server on :8080, accepts events, evaluates rules continuously, hot-reloads config without restarting.

ding serve --config ./ding.yaml

# or pipe stdin
your-app | ding serve

# or POST events
curl -X POST http://localhost:8080/ingest \
  -H "Content-Type: application/json" \
  -d '{"metric":"cpu_usage","value":97,"host":"web-01"}'

Accepts JSON lines:

{"metric": "cpu_usage", "value": 92.5, "host": "web-01"}

or Prometheus text:

cpu_usage{host="web-01"} 92.5

HTTP API

Method Path Description
POST /ingest Send events
GET /health Liveness probe
GET /rules List rules + cooldown state
POST /reload Hot-reload config
GET /metrics Prometheus-format self-metrics

Operations

Reload config without restarting:

kill -HUP <pid>
# or
curl -X POST http://localhost:8080/reload

Survive restarts — persist cooldown state and windowed buffers to disk:

persistence:
  state_file: /var/lib/ding/state.json
  flush_interval: 30s

SIGTERM / SIGINT — drains in-flight requests, flushes state, exits 0.

Install

Homebrew:

brew install ding-labs/tap/ding

Binary:

curl -sf https://start.ding.ing | sh

Docker:

docker run -v ./ding.yaml:/etc/ding/ding.yaml \
  ghcr.io/ding-labs/ding

GitHub Actions: ding-labs/ding-action on the marketplace — one step in your workflow.

Kubernetes initContainer (self-copy into a shared volume):

ding install /shared/ding