updated todo

2026-03-22 11:29:41 -05:00
parent 22c36b5e2c
commit d593744ff7
1 changed files with 55 additions and 75 deletions
--- a/todo.md
+++ b/todo.md
@@ -1,84 +1,64 @@
 # arcline-uptime — Lightweight Uptime Monitor
-Polls HTTP/TCP endpoints on a schedule, stores results in SQLite, sends alerts
+Polls HTTP/TCP/TLS/DNS endpoints on a schedule, stores results in SQLite, sends alerts
-via webhook (Discord, Slack) and/or email. Single binary, no external services.
+via webhook (Discord, Slack), email, ntfy, or Gotify. Single binary, no external services.
 ## Stack
 - Language: Go
 - Storage: SQLite (via modernc.org/sqlite — pure Go, no CGO)
 - Config: YAML
- Alerts: HTTP webhook (Discord/Slack compatible), SMTP email
+- Alerts: Discord/Slack webhook, SMTP email, ntfy.sh, Gotify
- Optional UI: embedded web dashboard (net/http + Go templates)
+- UI: embedded web dashboard (net/http + Go templates)
-## Config format (uptime.yaml)
+## Done
-```yaml
+- [x] Project scaffold
-global:
+- [x] YAML config parser
-  check_interval: 60      # seconds
+- [x] HTTP monitor (status code, body contains, response time threshold)
-  timeout: 10             # seconds per check
+- [x] HTTP POST/PUT with custom body and headers
-  alert_cooldown: 300     # seconds between repeat alerts for same monitor
+- [x] TCP monitor (dial timeout)
 - [x] TLS certificate expiry monitor
 - [x] DNS resolution monitor (optional expected IP assertion)
 - [x] Per-monitor interval and timeout overrides
 - [x] Scheduler (ticker per monitor, immediate first check)
 - [x] SQLite schema (checks, alerts_sent)
 - [x] Result storage with configurable retention / auto-pruning
 - [x] Discord / Slack webhook alerter
 - [x] SMTP email alerter (multiple recipients)
 - [x] ntfy.sh alerter
 - [x] Gotify alerter
 - [x] Per-monitor alert routing (named alerters)
 - [x] Maintenance windows (suppress alerts on a schedule)
 - [x] Alert cooldown logic (don't spam on sustained outage)
 - [x] Recovery alert ("Main Website is back up, was down 12m 34s")
 - [x] Web dashboard — current status page (24h / 7d / 30d uptime)
 - [x] Web dashboard — history / SVG sparkline graph with down markers
 - [x] Web dashboard — incident log
 - [x] Public status page (no auth)
 - [x] /metrics Prometheus endpoint (up, response_ms, uptime_24h, uptime_7d)
 - [x] Basic auth for dashboard
 - [x] systemd unit file example
 - [x] README with self-hosting guide
 - [x] Cross-compile Makefile
 - [x] Structured logging (slog, text or JSON)
 - [x] `start`, `check`, `list`, `version` CLI subcommands
-alerts:
+## Ideas
-  - type: discord
+- [ ] ICMP/ping monitor
-    webhook_url: "https://discord.com/api/webhooks/..."
+- [ ] HTTP response header assertion
-  - type: email
+- [ ] HTTP JSON path check
-    smtp_host: mail.arclineit.com
+- [ ] SSH command check
-    smtp_port: 587
+- [ ] Generic webhook alerter (configurable template body)
-    from: alerts@arclineit.com
+- [ ] Telegram alerter
-    to: blake@arclineit.com
+- [ ] PagerDuty alerter
-
+- [ ] Escalation policy (alert A immediately, alert B after N minutes still down)
-monitors:
+- [ ] `test-alert` subcommand
-  - name: "Main Website"
+- [ ] `validate` subcommand (parse config, print summary, exit non-zero on errors)
-    type: http
+- [ ] `--dry-run` flag for start (run checks, no alerts)
-    url: "https://arclineit.com"
+- [ ] Per-monitor detail page with full history and time-axis chart
-    expected_status: 200
+- [ ] Uptime calendar heatmap (GitHub-style, per day)
-    contains: "[arcline]"        # optional string check in body
+- [ ] CSV / JSON export of check history
-
+- [ ] JSON API (/api/v1/monitors, /api/v1/monitors/{name}/checks)
-  - name: "Control Panel"
+- [ ] Environment variable substitution in config (${VAR})
-    type: http
+- [ ] Config hot-reload on SIGHUP
-    url: "https://cp.arclineit.com"
+- [ ] TLS for the dashboard itself
-    expected_status: 200
+- [ ] Database backup command
  - name: "SSH"
    type: tcp
    host: "server1.arclineit.com"
    port: 22
  - name: "Mail Server"
    type: tcp
    host: "mail.arclineit.com"
    port: 587
 ```
 ## Web dashboard
 - `/` — current status of all monitors (live, auto-refresh)
 - `/history` — response time graph (ASCII sparklines or simple SVG)
 - `/metrics` — Prometheus-compatible text endpoint (optional)
 - Protected by basic auth (config: dashboard.username / dashboard.password)
 ## Alert format (Discord example)
 ```
 [DOWN] Main Website
 Expected 200, got 503
 Checked at 2026-03-03 14:32:01 UTC
 Response time: 8043ms (timeout)
 ```
 ## Tasks
 - [ ] Project scaffold
 - [ ] YAML config parser
 - [ ] HTTP monitor (status code, body contains, response time)
 - [ ] TCP monitor (dial timeout)
 - [ ] Scheduler (ticker per monitor, respect interval)
 - [ ] SQLite schema (monitors, checks, alerts_sent)
 - [ ] Result storage
 - [ ] Discord webhook alerter
 - [ ] SMTP email alerter
 - [ ] Alert cooldown logic (don't spam on sustained outage)
 - [ ] Recovery alert ("Main Website is back up, was down 12m 34s")
 - [ ] Web dashboard — current status page
 - [ ] Web dashboard — history / sparkline graph
 - [ ] /metrics Prometheus endpoint
 - [ ] Basic auth for dashboard
 - [ ] systemd unit file example
 - [ ] README with self-hosting guide
 - [ ] Cross-compile Makefile