Don't store it. Stream it. DING it.
One binary that wraps your CI job, training run, or batch script — and pings you when it matters.
brew install ding-labs/tap/ding
curl -sf https://start.ding.ing | sh
Drop a ding.yaml next to your job. Two rules:
one fires the moment a test fails, one fires once at the end
with the totals.
rules:
# during the run — ping Slack the second a test fails
- name: test_failed
match:
metric: test.failed
condition: value > 0
message: "❌ {{ .test_name }} failed on {{ .branch }}"
alert:
- notifier: slack
# at end-of-run — one summary alert with the totals
- name: run_summary
mode: end-of-run
match:
metric: run.exit
condition: exit_code != 0
message: "Job failed in {{ .duration_seconds }}s on {{ .commit }}"
alert:
- notifier: slack
Then wrap your job:
ding run -- npm test
DING auto-detects GitHub Actions, GitLab CI, CircleCI,
Jenkins, Buildkite, Argo, MLflow, and bare Kubernetes — and
attaches run_id, branch,
commit, workflow, job,
and actor labels automatically. They're
available in every alert message.
DING is a streaming alert engine that ships with your workload, not next to it. The job emits events; DING evaluates rules in-process; alerts fire during the run and a summary fires when the job exits. Both die together.
Two ways to run it:
ding run -- <cmd> —
wraps an ephemeral job (CI, training, batch). Captures
its event stream, evaluates rules in real time, fires a
summary at exit.
ding serve — long-running
HTTP daemon for steady-state apps. Accepts events on
POST /ingest, hot-reloads config.
Single static binary. No database, no agents, no cloud account. MIT licensed. No telemetry.
One YAML file. Lives in your repo. Ships with your code.
rules:
# during-run: fires whenever the condition is true
- name: cpu_spike
match:
metric: cpu_usage
condition: value > 95
cooldown: 1m
message: "CPU spike on {{ .host }}: {{ .value }}%"
alert:
- notifier: stdout
# windowed: avg over 5m, fires while sustained
- name: cpu_sustained
match:
metric: cpu_usage
condition: avg(value) over 5m > 80
cooldown: 10m
message: "Sustained high CPU: {{ .avg }}% on {{ .host }}"
alert:
- notifier: stdout
# end-of-run: fires once at exit, against accumulated state
- name: slow_run
mode: end-of-run
match:
metric: request.latency
condition: avg(value) over 1h > 200
message: "Avg latency this run: {{ .avg }}ms"
alert:
- notifier: slack
All condition forms:
value > 95 # single event
avg(value) over 5m > 80 # average over window
max(value) over 1m >= 100
min(value) over 10s < 10
sum(value) over 30s > 0
count(value) over 2m > 50 # number of events, not sum of values
condition_a AND condition_b # compound
condition_a OR condition_b
Template variables in message::
| Variable | When | Description |
|---|---|---|
.metric |
always | metric name |
.value |
always | raw event value |
.rule |
always | rule name |
.fired_at |
always | RFC3339 timestamp |
.host, .region, … |
always | any label from the event |
.run_id, .branch,
.commit, .workflow, …
|
ding run |
auto-detected run-context labels |
.exit_code,
.duration_seconds
|
run.exit event |
synthetic event emitted at end of run |
.avg .max
.min .sum
.count
|
windowed only | aggregate result |
stdout and github_actions are
built in — no config needed. Everything else is declared
once and referenced by name.
stdout — JSON line per
alert. Pipe it anywhere.
github_actions — writes
::warning:: annotations to the live log
and appends markdown to
$GITHUB_STEP_SUMMARY. Falls back to
stdout outside Actions.
notifiers:
slack:
type: slack
url: ${SLACK_WEBHOOK_URL}
on-call:
type: discord
url: ${DISCORD_WEBHOOK_URL}
custom:
type: webhook
url: https://example.com/hook
max_attempts: 3 # retries on 5xx (default: 3)
initial_backoff: 1s # doubles each attempt (default: 1s)
Also supported: type: teams,
type: pagerduty, type: telegram.
Slack, Discord, and Teams auto-surface run-context fields
(exit code, duration, branch, commit, workflow, actor) so
alerts from a CI job land actionable.
${VAR} works in any string value — fail fast
if unset.
The generic webhook payload:
{"rule":"cpu_spike","message":"CPU spike on web-01: 97%",
"metric":"cpu_usage","value":97.0,"fired_at":"...","host":"web-01"}
4xx responses are dropped. 5xx responses retry with exponential backoff.
avg(value) over 5m works with no database,
just memory
web-01 being loud doesn't silence
web-02
Benchmarked 2026-03-23 on Apple M3. Full methodology and raw results →
Ready-made configs for the platforms you already use. Drop them in, edit the rules, ship.
ding serve mode
For steady-state apps — anything that runs longer than the
workload it watches. Starts an HTTP server on
:8080, accepts events, evaluates rules
continuously, hot-reloads config without restarting.
ding serve --config ./ding.yaml
# or pipe stdin
your-app | ding serve
# or POST events
curl -X POST http://localhost:8080/ingest \
-H "Content-Type: application/json" \
-d '{"metric":"cpu_usage","value":97,"host":"web-01"}'
Accepts JSON lines:
{"metric": "cpu_usage", "value": 92.5, "host": "web-01"}
or Prometheus text:
cpu_usage{host="web-01"} 92.5
| Method | Path | Description |
|---|---|---|
POST |
/ingest |
Send events |
GET |
/health |
Liveness probe |
GET |
/rules |
List rules + cooldown state |
POST |
/reload |
Hot-reload config |
GET |
/metrics |
Prometheus-format self-metrics |
Reload config without restarting:
kill -HUP <pid>
# or
curl -X POST http://localhost:8080/reload
Survive restarts — persist cooldown state and windowed buffers to disk:
persistence:
state_file: /var/lib/ding/state.json
flush_interval: 30s
SIGTERM / SIGINT — drains in-flight requests, flushes state, exits 0.
Homebrew:
brew install ding-labs/tap/ding
Binary:
curl -sf https://start.ding.ing | sh
Docker:
docker run -v ./ding.yaml:/etc/ding/ding.yaml \
ghcr.io/ding-labs/ding
GitHub Actions: ding-labs/ding-action on the marketplace — one step in your workflow.
Kubernetes initContainer (self-copy into a shared volume):
ding install /shared/ding