diff --git a/todo.md b/todo.md index b6c7ded..3d0009f 100644 --- a/todo.md +++ b/todo.md @@ -1,84 +1,64 @@ # arcline-uptime — Lightweight Uptime Monitor -Polls HTTP/TCP endpoints on a schedule, stores results in SQLite, sends alerts -via webhook (Discord, Slack) and/or email. Single binary, no external services. +Polls HTTP/TCP/TLS/DNS endpoints on a schedule, stores results in SQLite, sends alerts +via webhook (Discord, Slack), email, ntfy, or Gotify. Single binary, no external services. ## Stack - Language: Go - Storage: SQLite (via modernc.org/sqlite — pure Go, no CGO) - Config: YAML -- Alerts: HTTP webhook (Discord/Slack compatible), SMTP email -- Optional UI: embedded web dashboard (net/http + Go templates) +- Alerts: Discord/Slack webhook, SMTP email, ntfy.sh, Gotify +- UI: embedded web dashboard (net/http + Go templates) -## Config format (uptime.yaml) -```yaml -global: - check_interval: 60 # seconds - timeout: 10 # seconds per check - alert_cooldown: 300 # seconds between repeat alerts for same monitor +## Done +- [x] Project scaffold +- [x] YAML config parser +- [x] HTTP monitor (status code, body contains, response time threshold) +- [x] HTTP POST/PUT with custom body and headers +- [x] TCP monitor (dial timeout) +- [x] TLS certificate expiry monitor +- [x] DNS resolution monitor (optional expected IP assertion) +- [x] Per-monitor interval and timeout overrides +- [x] Scheduler (ticker per monitor, immediate first check) +- [x] SQLite schema (checks, alerts_sent) +- [x] Result storage with configurable retention / auto-pruning +- [x] Discord / Slack webhook alerter +- [x] SMTP email alerter (multiple recipients) +- [x] ntfy.sh alerter +- [x] Gotify alerter +- [x] Per-monitor alert routing (named alerters) +- [x] Maintenance windows (suppress alerts on a schedule) +- [x] Alert cooldown logic (don't spam on sustained outage) +- [x] Recovery alert ("Main Website is back up, was down 12m 34s") +- [x] Web dashboard — current status page (24h / 7d / 30d uptime) +- [x] Web dashboard — history / SVG sparkline graph with down markers +- [x] Web dashboard — incident log +- [x] Public status page (no auth) +- [x] /metrics Prometheus endpoint (up, response_ms, uptime_24h, uptime_7d) +- [x] Basic auth for dashboard +- [x] systemd unit file example +- [x] README with self-hosting guide +- [x] Cross-compile Makefile +- [x] Structured logging (slog, text or JSON) +- [x] `start`, `check`, `list`, `version` CLI subcommands -alerts: - - type: discord - webhook_url: "https://discord.com/api/webhooks/..." - - type: email - smtp_host: mail.arclineit.com - smtp_port: 587 - from: alerts@arclineit.com - to: blake@arclineit.com - -monitors: - - name: "Main Website" - type: http - url: "https://arclineit.com" - expected_status: 200 - contains: "[arcline]" # optional string check in body - - - name: "Control Panel" - type: http - url: "https://cp.arclineit.com" - expected_status: 200 - - - name: "SSH" - type: tcp - host: "server1.arclineit.com" - port: 22 - - - name: "Mail Server" - type: tcp - host: "mail.arclineit.com" - port: 587 -``` - -## Web dashboard -- `/` — current status of all monitors (live, auto-refresh) -- `/history` — response time graph (ASCII sparklines or simple SVG) -- `/metrics` — Prometheus-compatible text endpoint (optional) -- Protected by basic auth (config: dashboard.username / dashboard.password) - -## Alert format (Discord example) -``` -[DOWN] Main Website -Expected 200, got 503 -Checked at 2026-03-03 14:32:01 UTC -Response time: 8043ms (timeout) -``` - -## Tasks -- [ ] Project scaffold -- [ ] YAML config parser -- [ ] HTTP monitor (status code, body contains, response time) -- [ ] TCP monitor (dial timeout) -- [ ] Scheduler (ticker per monitor, respect interval) -- [ ] SQLite schema (monitors, checks, alerts_sent) -- [ ] Result storage -- [ ] Discord webhook alerter -- [ ] SMTP email alerter -- [ ] Alert cooldown logic (don't spam on sustained outage) -- [ ] Recovery alert ("Main Website is back up, was down 12m 34s") -- [ ] Web dashboard — current status page -- [ ] Web dashboard — history / sparkline graph -- [ ] /metrics Prometheus endpoint -- [ ] Basic auth for dashboard -- [ ] systemd unit file example -- [ ] README with self-hosting guide -- [ ] Cross-compile Makefile +## Ideas +- [ ] ICMP/ping monitor +- [ ] HTTP response header assertion +- [ ] HTTP JSON path check +- [ ] SSH command check +- [ ] Generic webhook alerter (configurable template body) +- [ ] Telegram alerter +- [ ] PagerDuty alerter +- [ ] Escalation policy (alert A immediately, alert B after N minutes still down) +- [ ] `test-alert` subcommand +- [ ] `validate` subcommand (parse config, print summary, exit non-zero on errors) +- [ ] `--dry-run` flag for start (run checks, no alerts) +- [ ] Per-monitor detail page with full history and time-axis chart +- [ ] Uptime calendar heatmap (GitHub-style, per day) +- [ ] CSV / JSON export of check history +- [ ] JSON API (/api/v1/monitors, /api/v1/monitors/{name}/checks) +- [ ] Environment variable substitution in config (${VAR}) +- [ ] Config hot-reload on SIGHUP +- [ ] TLS for the dashboard itself +- [ ] Database backup command