feat: remote commands, systemd units, process observability, broker auth split
- Command intake (reboot/shutdown) on infoscreen/{uuid}/commands with ack lifecycle
- MQTT_USER/MQTT_PASSWORD_BROKER split from identity vars; configure_mqtt_security() updated
- infoscreen-simclient.service: Type=notify, WatchdogSec=60, Restart=on-failure
- infoscreen-notify-failure@.service + script: retained MQTT alert when systemd gives up (Gap 3)
- _sd_notify() watchdog keepalive in simclient main loop (Gap 1)
- broker_connection block in health payload: reconnect_count, last_disconnect_at (Gap 2)
- COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE canary flag with safety guard
- SERVER_TEAM_ACTIONS.md: server-side integration action items
- Docs: README, CHANGELOG, src/README, copilot-instructions updated
- 43 tests passing
This commit is contained in:
41
CHANGELOG.md
41
CHANGELOG.md
@@ -2,6 +2,47 @@
|
||||
|
||||
## April 2026
|
||||
|
||||
### Remote Command Intake
|
||||
|
||||
- Added MQTT command intake on `infoscreen/{client_id}/commands` (supports `reboot` and `shutdown`).
|
||||
- Added command acknowledgement publishing to `infoscreen/{client_id}/commands/ack` and `infoscreen/{client_id}/command/ack` with states `accepted`, `rejected`, `execution_started`, `completed`, `failed`.
|
||||
- Added `COMMAND_HELPER_PATH` environment variable; command execution delegated to an external shell helper so `simclient.py` requires no elevated privileges.
|
||||
- Added deduplication of commands by `command_id` with configurable TTL (`COMMAND_DEDUPE_TTL_HOURS`) and max-entries cap (`COMMAND_DEDUPE_MAX_ENTRIES`).
|
||||
- Added execution timeout (`COMMAND_EXEC_TIMEOUT_SEC`).
|
||||
- Added `COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE` flag for canary and test environments — immediately completes a mock reboot without waiting for process restart. Safety-guarded: only activates when the helper basename is `mock-command-helper.sh`.
|
||||
|
||||
### MQTT Broker Authentication Split
|
||||
|
||||
- Split broker connection credentials (`MQTT_USER`, `MQTT_PASSWORD_BROKER`) from legacy per-device identity fields (`MQTT_USERNAME`, `MQTT_PASSWORD`).
|
||||
- `configure_mqtt_security()` now prefers `MQTT_USER`/`MQTT_PASSWORD_BROKER` for broker login, with fallback to legacy vars if broker-specific vars are absent.
|
||||
|
||||
### Systemd Service Units
|
||||
|
||||
- Added `scripts/infoscreen-simclient.service` — systemd unit for `simclient.py` with `Type=notify`, `WatchdogSec=60`, `Restart=on-failure`, `StartLimitBurst=5`.
|
||||
- Added `scripts/start-simclient.sh` — launcher script mirroring `start-display-manager.sh`.
|
||||
- Updated `scripts/infoscreen-display.service` with `OnFailure=infoscreen-notify-failure@%n.service`.
|
||||
- Updated `src/pi-setup.sh` to install and enable both units plus the failure notifier template.
|
||||
|
||||
### Process Watchdog (Gap 1 — Hung Process Detection)
|
||||
|
||||
- Added zero-dependency `_sd_notify()` raw socket helper in `simclient.py` (no `systemd-python` package required).
|
||||
- Sends `READY=1` on main loop entry and `WATCHDOG=1` on every 5-second iteration.
|
||||
- Service unit uses `Type=notify` and `WatchdogSec=60`; systemd will restart the process if it stops sending keepalives for 60 seconds.
|
||||
|
||||
### OnFailure MQTT Notifier (Gap 3 — systemd Give-Up Detection)
|
||||
|
||||
- Added `scripts/infoscreen-notify-failure@.service` — systemd template unit triggered by `OnFailure=`.
|
||||
- Added `scripts/infoscreen-notify-failure.sh` — publishes a retained JSON payload to `infoscreen/{uuid}/service_failed` via `mosquitto_pub` so the monitoring dashboard gets an alert even when the process is fully dead.
|
||||
- Payload: `{"event":"service_failed","unit":"<unit-name>","client_uuid":"...","failed_at":"<ISO-UTC>"}`.
|
||||
|
||||
### Health Payload Broker Connection Block (Gap 2 — Broker vs. Process Ambiguity)
|
||||
|
||||
- Added `broker_connection` block to the health payload: `broker_reachable`, `reconnect_count`, `connect_count`, `last_disconnect_at`.
|
||||
- `simclient.py` now tracks `reconnect_count` and `connect_count` on every `on_connect` callback and `last_disconnect` timestamp on `on_disconnect`.
|
||||
- `publish_health_message()` accepts an optional `connection_state` parameter; both heartbeat-success call sites pass the enriched state.
|
||||
|
||||
### TV Power Coordination
|
||||
|
||||
- Added Phase 1 TV power coordination on `infoscreen/groups/{group_id}/power/intent`.
|
||||
- Added `POWER_CONTROL_MODE` with `local`, `hybrid`, and `mqtt` behavior.
|
||||
- Added `src/power_intent_state.json` and `src/power_state.json` for power IPC and telemetry.
|
||||
|
||||
Reference in New Issue
Block a user