Files
infoscreen-dev/TODO.md
RobbStarkAustria 0cd0d95612 feat: remote commands, systemd units, process observability, broker auth split
- Command intake (reboot/shutdown) on infoscreen/{uuid}/commands with ack lifecycle
- MQTT_USER/MQTT_PASSWORD_BROKER split from identity vars; configure_mqtt_security() updated
- infoscreen-simclient.service: Type=notify, WatchdogSec=60, Restart=on-failure
- infoscreen-notify-failure@.service + script: retained MQTT alert when systemd gives up (Gap 3)
- _sd_notify() watchdog keepalive in simclient main loop (Gap 1)
- broker_connection block in health payload: reconnect_count, last_disconnect_at (Gap 2)
- COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE canary flag with safety guard
- SERVER_TEAM_ACTIONS.md: server-side integration action items
- Docs: README, CHANGELOG, src/README, copilot-instructions updated
- 43 tests passing
2026-04-05 08:36:50 +02:00

129 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Project TODOs
This file tracks higher-level todos and design notes for the infoscreen client.
## Video playback (Raspberry Pi)
- Remove taskbar / window decorations in VLC window
- Goal: show video truly fullscreen without window title bar or OS panel/taskbar overlapping.
- Ideas / approaches:
- Use libVLC options from python-vlc: `--no-video-deco`, `--no-video-title-show`, `--video-on-top`, and call `player.set_fullscreen(True)` after playback starts.
- Run the Display Manager in a dedicated kiosk X session (no panel/desktop environment) or minimal window manager (openbox/matchbox) to avoid taskbar.
- Use `wmctrl` as a fallback to force fullscreen/above: `wmctrl -r <title> -b add,fullscreen,above`.
- Add an env var toggle, e.g. `VLC_KIOSK=1`, to enable these options from runtime.
- Acceptance criteria:
- Video occupies the full display area with no visible window controls or panels on top.
- Behaviour toggleable via env var.
- Add possibility to adjust sound level by HDMI-CEC using Python
- Goal: allow remote/automatic volume control over HDMI using CEC-capable hardware.
- Ideas / approaches:
- Use `libcec` bindings (e.g. `pycec` / `cec` packages) or call `cec-client` from shell to send volume commands to the TV/AVR.
- Map event volume (0.0-1.0) to CEC volume commands (some devices support absolute volume or key presses like `VOLUME_UP`/`VOLUME_DOWN`).
- Provide a small adapter module `src/hdmi_cec.py` that exposes `set_volume(level: float)` and `volume_step(up: bool)` used by `display_manager.py` when starting/stopping videos or on explicit volume events.
- Acceptance criteria:
- `set_volume()` issues appropriate CEC commands and returns success/failure.
- Document any platform limitations (some TVs don't support absolute volume via CEC).
## Systemd crash recovery (server team recommendation)
Reliable restart-on-crash for both processes must be handled by **systemd**, not by in-process watchdogs or ad-hoc shell scripts.
### What needs to be done
- `display_manager`: already has `scripts/infoscreen-display.service` with `Restart=on-failure` / `RestartSec=10`.
- Review `RestartSec` — may want a short backoff (e.g. 515 s) and `StartLimitIntervalSec` + `StartLimitBurst` to prevent thrash loops.
- `simclient`: **no service unit exists yet**.
- Create `scripts/infoscreen-simclient.service` modelled on the display service.
- Use `Restart=on-failure` and `RestartSec=10`.
- Wire `EnvironmentFile=/home/olafn/infoscreen-dev/.env` so the unit picks up `.env` variables automatically.
- Set `After=network-online.target` so MQTT connection is not attempted before the network is ready.
- Both units should be installed and enabled via `src/pi-setup.sh` (`systemctl enable --now`).
- After enabling, verify crash recovery with `kill -9 <pid>` and confirm systemd restarts the process within `RestartSec`.
### Acceptance criteria
- Both `simclient` and `display_manager` restart automatically within 15 s of any non-intentional exit.
- `systemctl status` shows `active (running)` after a crash-induced restart.
- `journalctl -u infoscreen-simclient` captures all process output (stdout + stderr).
- `pi-setup.sh` idempotently installs and enables both units.
### Notes
- Use `Restart=on-failure` — restarts on crashes and signals but not on clean `systemctl stop`, preserving operator control during deployments.
- The reboot/shutdown command flow publishes `execution_started` and then exits intentionally; systemd will restart simclient, and the recovery logic in the heartbeat loop will emit `completed` on reconnect. This is the intended lifecycle.
## Process health observability gaps
Two scenarios are currently undetected or ambiguous from the server/frontend perspective.
### Gap 1: Hung / deadlocked process ✅ implemented
**Solution implemented:** Zero-dependency `_sd_notify()` helper writes directly to `NOTIFY_SOCKET` (raw Unix socket, no extra package). `READY=1` is sent when the heartbeat loop starts; `WATCHDOG=1` is sent every 5 s in the main loop iteration. The service unit uses `Type=notify` + `WatchdogSec=60` — if the main loop freezes for 60 s, systemd kills and restarts the process automatically.
**To apply on device:**
```bash
sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl restart infoscreen-simclient
```
### Gap 2: MQTT broker unreachable vs. simclient dead ✅ implemented (client side)
**Solution implemented:** `connection_state` dict expanded with `reconnect_count` and `connect_count`. `publish_health_message()` now accepts `connection_state` and appends a `broker_connection` block to every health payload:
```json
"broker_connection": {
"broker_reachable": true,
"reconnect_count": 2,
"last_disconnect_at": "2026-04-04T10:00:00Z"
}
```
`broker_reachable` = `true` when MQTT is connected at publish time.
`reconnect_count` increments on every reconnection (first connect does not count).
`last_disconnect_at` is the UTC timestamp of the most recent disconnect.
**Server-side action still needed:**
- Display `reconnect_count` and `last_disconnect_at` in the frontend health dashboard.
- Alert heuristic: if **all** clients go silent simultaneously → likely broker issue; if only one → likely device issue.
### Gap 3: systemd gives up (StartLimitBurst exceeded) ✅ implemented
**Solution implemented:** `scripts/infoscreen-notify-failure@.service` (template unit) + `scripts/infoscreen-notify-failure.sh`. Both main units have `OnFailure=infoscreen-notify-failure@%n.service`. When systemd marks a service `failed`, the notifier runs once, reads broker credentials from `.env`, reads `client_uuid.txt`, and publishes a retained JSON payload to `infoscreen/{uuid}/service_failed` via `mosquitto_pub`.
**To apply on device:**
```bash
sudo cp ~/infoscreen-dev/scripts/infoscreen-notify-failure@.service /etc/systemd/system/
sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/
sudo cp ~/infoscreen-dev/scripts/infoscreen-display.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl restart infoscreen-simclient infoscreen-display
```
**Topic:** `infoscreen/{client_uuid}/service_failed` (retained)
**Payload:** `{"event":"service_failed","unit":"infoscreen-simclient.service","client_uuid":"...","failed_at":"2026-..."}`
## Next-high-level items
- Add environment-controlled libVLC hw-accel toggle (`VLC_HW_ACCEL=1|0`) to `display_manager.py` so software decode can be forced when necessary.
- Add automated tests for video start/stop lifecycle (mock python-vlc) to ensure resources are released on event end.
- Add allowlist validation for `website` / `webuntis` event URLs
- Goal: restrict browser-based events to approved hosts and schemes even if an authenticated publisher sends an unsafe URL.
- Ideas / approaches:
- Add env-configurable allowlists for general website hosts and WebUntis hosts.
- Allow only `https` by default and reject `file:`, `data:`, `javascript:`, loopback, and private-address URLs unless explicitly allowed.
- Enforce the same validation on both server-side payload generation and client-side execution in `display_manager.py`.
- Acceptance criteria:
- Unsafe or unapproved URLs are rejected before Chromium is launched.
- WebUntis and approved website events still work with explicit allowlist configuration.
## Notes
- Keep all changes backward-compatible: external `vlc` binary fallback should still work.
- Document any new env vars in `README.md` and `.github/copilot-instructions.md` if added.
---
Generated: 2025-10-25