updated todo

This commit is contained in:
Blake Ridgway
2026-03-22 11:29:41 -05:00
parent 22c36b5e2c
commit d593744ff7

130
todo.md
View File

@@ -1,84 +1,64 @@
# arcline-uptime — Lightweight Uptime Monitor # arcline-uptime — Lightweight Uptime Monitor
Polls HTTP/TCP endpoints on a schedule, stores results in SQLite, sends alerts Polls HTTP/TCP/TLS/DNS endpoints on a schedule, stores results in SQLite, sends alerts
via webhook (Discord, Slack) and/or email. Single binary, no external services. via webhook (Discord, Slack), email, ntfy, or Gotify. Single binary, no external services.
## Stack ## Stack
- Language: Go - Language: Go
- Storage: SQLite (via modernc.org/sqlite — pure Go, no CGO) - Storage: SQLite (via modernc.org/sqlite — pure Go, no CGO)
- Config: YAML - Config: YAML
- Alerts: HTTP webhook (Discord/Slack compatible), SMTP email - Alerts: Discord/Slack webhook, SMTP email, ntfy.sh, Gotify
- Optional UI: embedded web dashboard (net/http + Go templates) - UI: embedded web dashboard (net/http + Go templates)
## Config format (uptime.yaml) ## Done
```yaml - [x] Project scaffold
global: - [x] YAML config parser
check_interval: 60 # seconds - [x] HTTP monitor (status code, body contains, response time threshold)
timeout: 10 # seconds per check - [x] HTTP POST/PUT with custom body and headers
alert_cooldown: 300 # seconds between repeat alerts for same monitor - [x] TCP monitor (dial timeout)
- [x] TLS certificate expiry monitor
- [x] DNS resolution monitor (optional expected IP assertion)
- [x] Per-monitor interval and timeout overrides
- [x] Scheduler (ticker per monitor, immediate first check)
- [x] SQLite schema (checks, alerts_sent)
- [x] Result storage with configurable retention / auto-pruning
- [x] Discord / Slack webhook alerter
- [x] SMTP email alerter (multiple recipients)
- [x] ntfy.sh alerter
- [x] Gotify alerter
- [x] Per-monitor alert routing (named alerters)
- [x] Maintenance windows (suppress alerts on a schedule)
- [x] Alert cooldown logic (don't spam on sustained outage)
- [x] Recovery alert ("Main Website is back up, was down 12m 34s")
- [x] Web dashboard — current status page (24h / 7d / 30d uptime)
- [x] Web dashboard — history / SVG sparkline graph with down markers
- [x] Web dashboard — incident log
- [x] Public status page (no auth)
- [x] /metrics Prometheus endpoint (up, response_ms, uptime_24h, uptime_7d)
- [x] Basic auth for dashboard
- [x] systemd unit file example
- [x] README with self-hosting guide
- [x] Cross-compile Makefile
- [x] Structured logging (slog, text or JSON)
- [x] `start`, `check`, `list`, `version` CLI subcommands
alerts: ## Ideas
- type: discord - [ ] ICMP/ping monitor
webhook_url: "https://discord.com/api/webhooks/..." - [ ] HTTP response header assertion
- type: email - [ ] HTTP JSON path check
smtp_host: mail.arclineit.com - [ ] SSH command check
smtp_port: 587 - [ ] Generic webhook alerter (configurable template body)
from: alerts@arclineit.com - [ ] Telegram alerter
to: blake@arclineit.com - [ ] PagerDuty alerter
- [ ] Escalation policy (alert A immediately, alert B after N minutes still down)
monitors: - [ ] `test-alert` subcommand
- name: "Main Website" - [ ] `validate` subcommand (parse config, print summary, exit non-zero on errors)
type: http - [ ] `--dry-run` flag for start (run checks, no alerts)
url: "https://arclineit.com" - [ ] Per-monitor detail page with full history and time-axis chart
expected_status: 200 - [ ] Uptime calendar heatmap (GitHub-style, per day)
contains: "[arcline]" # optional string check in body - [ ] CSV / JSON export of check history
- [ ] JSON API (/api/v1/monitors, /api/v1/monitors/{name}/checks)
- name: "Control Panel" - [ ] Environment variable substitution in config (${VAR})
type: http - [ ] Config hot-reload on SIGHUP
url: "https://cp.arclineit.com" - [ ] TLS for the dashboard itself
expected_status: 200 - [ ] Database backup command
- name: "SSH"
type: tcp
host: "server1.arclineit.com"
port: 22
- name: "Mail Server"
type: tcp
host: "mail.arclineit.com"
port: 587
```
## Web dashboard
- `/` — current status of all monitors (live, auto-refresh)
- `/history` — response time graph (ASCII sparklines or simple SVG)
- `/metrics` — Prometheus-compatible text endpoint (optional)
- Protected by basic auth (config: dashboard.username / dashboard.password)
## Alert format (Discord example)
```
[DOWN] Main Website
Expected 200, got 503
Checked at 2026-03-03 14:32:01 UTC
Response time: 8043ms (timeout)
```
## Tasks
- [ ] Project scaffold
- [ ] YAML config parser
- [ ] HTTP monitor (status code, body contains, response time)
- [ ] TCP monitor (dial timeout)
- [ ] Scheduler (ticker per monitor, respect interval)
- [ ] SQLite schema (monitors, checks, alerts_sent)
- [ ] Result storage
- [ ] Discord webhook alerter
- [ ] SMTP email alerter
- [ ] Alert cooldown logic (don't spam on sustained outage)
- [ ] Recovery alert ("Main Website is back up, was down 12m 34s")
- [ ] Web dashboard — current status page
- [ ] Web dashboard — history / sparkline graph
- [ ] /metrics Prometheus endpoint
- [ ] Basic auth for dashboard
- [ ] systemd unit file example
- [ ] README with self-hosting guide
- [ ] Cross-compile Makefile