- Command intake (reboot/shutdown) on infoscreen/{uuid}/commands with ack lifecycle
- MQTT_USER/MQTT_PASSWORD_BROKER split from identity vars; configure_mqtt_security() updated
- infoscreen-simclient.service: Type=notify, WatchdogSec=60, Restart=on-failure
- infoscreen-notify-failure@.service + script: retained MQTT alert when systemd gives up (Gap 3)
- _sd_notify() watchdog keepalive in simclient main loop (Gap 1)
- broker_connection block in health payload: reconnect_count, last_disconnect_at (Gap 2)
- COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE canary flag with safety guard
- SERVER_TEAM_ACTIONS.md: server-side integration action items
- Docs: README, CHANGELOG, src/README, copilot-instructions updated
- 43 tests passing
129 lines
7.5 KiB
Markdown
129 lines
7.5 KiB
Markdown
# Project TODOs
|
||
|
||
This file tracks higher-level todos and design notes for the infoscreen client.
|
||
|
||
## Video playback (Raspberry Pi)
|
||
|
||
- Remove taskbar / window decorations in VLC window
|
||
- Goal: show video truly fullscreen without window title bar or OS panel/taskbar overlapping.
|
||
- Ideas / approaches:
|
||
- Use libVLC options from python-vlc: `--no-video-deco`, `--no-video-title-show`, `--video-on-top`, and call `player.set_fullscreen(True)` after playback starts.
|
||
- Run the Display Manager in a dedicated kiosk X session (no panel/desktop environment) or minimal window manager (openbox/matchbox) to avoid taskbar.
|
||
- Use `wmctrl` as a fallback to force fullscreen/above: `wmctrl -r <title> -b add,fullscreen,above`.
|
||
- Add an env var toggle, e.g. `VLC_KIOSK=1`, to enable these options from runtime.
|
||
- Acceptance criteria:
|
||
- Video occupies the full display area with no visible window controls or panels on top.
|
||
- Behaviour toggleable via env var.
|
||
|
||
- Add possibility to adjust sound level by HDMI-CEC using Python
|
||
- Goal: allow remote/automatic volume control over HDMI using CEC-capable hardware.
|
||
- Ideas / approaches:
|
||
- Use `libcec` bindings (e.g. `pycec` / `cec` packages) or call `cec-client` from shell to send volume commands to the TV/AVR.
|
||
- Map event volume (0.0-1.0) to CEC volume commands (some devices support absolute volume or key presses like `VOLUME_UP`/`VOLUME_DOWN`).
|
||
- Provide a small adapter module `src/hdmi_cec.py` that exposes `set_volume(level: float)` and `volume_step(up: bool)` used by `display_manager.py` when starting/stopping videos or on explicit volume events.
|
||
- Acceptance criteria:
|
||
- `set_volume()` issues appropriate CEC commands and returns success/failure.
|
||
- Document any platform limitations (some TVs don't support absolute volume via CEC).
|
||
|
||
## Systemd crash recovery (server team recommendation)
|
||
|
||
Reliable restart-on-crash for both processes must be handled by **systemd**, not by in-process watchdogs or ad-hoc shell scripts.
|
||
|
||
### What needs to be done
|
||
|
||
- `display_manager`: already has `scripts/infoscreen-display.service` with `Restart=on-failure` / `RestartSec=10`.
|
||
- Review `RestartSec` — may want a short backoff (e.g. 5–15 s) and `StartLimitIntervalSec` + `StartLimitBurst` to prevent thrash loops.
|
||
- `simclient`: **no service unit exists yet**.
|
||
- Create `scripts/infoscreen-simclient.service` modelled on the display service.
|
||
- Use `Restart=on-failure` and `RestartSec=10`.
|
||
- Wire `EnvironmentFile=/home/olafn/infoscreen-dev/.env` so the unit picks up `.env` variables automatically.
|
||
- Set `After=network-online.target` so MQTT connection is not attempted before the network is ready.
|
||
- Both units should be installed and enabled via `src/pi-setup.sh` (`systemctl enable --now`).
|
||
- After enabling, verify crash recovery with `kill -9 <pid>` and confirm systemd restarts the process within `RestartSec`.
|
||
|
||
### Acceptance criteria
|
||
|
||
- Both `simclient` and `display_manager` restart automatically within 15 s of any non-intentional exit.
|
||
- `systemctl status` shows `active (running)` after a crash-induced restart.
|
||
- `journalctl -u infoscreen-simclient` captures all process output (stdout + stderr).
|
||
- `pi-setup.sh` idempotently installs and enables both units.
|
||
|
||
### Notes
|
||
|
||
- Use `Restart=on-failure` — restarts on crashes and signals but not on clean `systemctl stop`, preserving operator control during deployments.
|
||
- The reboot/shutdown command flow publishes `execution_started` and then exits intentionally; systemd will restart simclient, and the recovery logic in the heartbeat loop will emit `completed` on reconnect. This is the intended lifecycle.
|
||
|
||
## Process health observability gaps
|
||
|
||
Two scenarios are currently undetected or ambiguous from the server/frontend perspective.
|
||
|
||
### Gap 1: Hung / deadlocked process ✅ implemented
|
||
|
||
**Solution implemented:** Zero-dependency `_sd_notify()` helper writes directly to `NOTIFY_SOCKET` (raw Unix socket, no extra package). `READY=1` is sent when the heartbeat loop starts; `WATCHDOG=1` is sent every 5 s in the main loop iteration. The service unit uses `Type=notify` + `WatchdogSec=60` — if the main loop freezes for 60 s, systemd kills and restarts the process automatically.
|
||
|
||
**To apply on device:**
|
||
```bash
|
||
sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl restart infoscreen-simclient
|
||
```
|
||
|
||
### Gap 2: MQTT broker unreachable vs. simclient dead ✅ implemented (client side)
|
||
|
||
**Solution implemented:** `connection_state` dict expanded with `reconnect_count` and `connect_count`. `publish_health_message()` now accepts `connection_state` and appends a `broker_connection` block to every health payload:
|
||
|
||
```json
|
||
"broker_connection": {
|
||
"broker_reachable": true,
|
||
"reconnect_count": 2,
|
||
"last_disconnect_at": "2026-04-04T10:00:00Z"
|
||
}
|
||
```
|
||
|
||
`broker_reachable` = `true` when MQTT is connected at publish time.
|
||
`reconnect_count` increments on every reconnection (first connect does not count).
|
||
`last_disconnect_at` is the UTC timestamp of the most recent disconnect.
|
||
|
||
**Server-side action still needed:**
|
||
- Display `reconnect_count` and `last_disconnect_at` in the frontend health dashboard.
|
||
- Alert heuristic: if **all** clients go silent simultaneously → likely broker issue; if only one → likely device issue.
|
||
|
||
### Gap 3: systemd gives up (StartLimitBurst exceeded) ✅ implemented
|
||
|
||
**Solution implemented:** `scripts/infoscreen-notify-failure@.service` (template unit) + `scripts/infoscreen-notify-failure.sh`. Both main units have `OnFailure=infoscreen-notify-failure@%n.service`. When systemd marks a service `failed`, the notifier runs once, reads broker credentials from `.env`, reads `client_uuid.txt`, and publishes a retained JSON payload to `infoscreen/{uuid}/service_failed` via `mosquitto_pub`.
|
||
|
||
**To apply on device:**
|
||
```bash
|
||
sudo cp ~/infoscreen-dev/scripts/infoscreen-notify-failure@.service /etc/systemd/system/
|
||
sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/
|
||
sudo cp ~/infoscreen-dev/scripts/infoscreen-display.service /etc/systemd/system/
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl restart infoscreen-simclient infoscreen-display
|
||
```
|
||
|
||
**Topic:** `infoscreen/{client_uuid}/service_failed` (retained)
|
||
**Payload:** `{"event":"service_failed","unit":"infoscreen-simclient.service","client_uuid":"...","failed_at":"2026-..."}`
|
||
|
||
## Next-high-level items
|
||
|
||
- Add environment-controlled libVLC hw-accel toggle (`VLC_HW_ACCEL=1|0`) to `display_manager.py` so software decode can be forced when necessary.
|
||
- Add automated tests for video start/stop lifecycle (mock python-vlc) to ensure resources are released on event end.
|
||
- Add allowlist validation for `website` / `webuntis` event URLs
|
||
- Goal: restrict browser-based events to approved hosts and schemes even if an authenticated publisher sends an unsafe URL.
|
||
- Ideas / approaches:
|
||
- Add env-configurable allowlists for general website hosts and WebUntis hosts.
|
||
- Allow only `https` by default and reject `file:`, `data:`, `javascript:`, loopback, and private-address URLs unless explicitly allowed.
|
||
- Enforce the same validation on both server-side payload generation and client-side execution in `display_manager.py`.
|
||
- Acceptance criteria:
|
||
- Unsafe or unapproved URLs are rejected before Chromium is launched.
|
||
- WebUntis and approved website events still work with explicit allowlist configuration.
|
||
|
||
|
||
## Notes
|
||
|
||
- Keep all changes backward-compatible: external `vlc` binary fallback should still work.
|
||
- Document any new env vars in `README.md` and `.github/copilot-instructions.md` if added.
|
||
|
||
---
|
||
Generated: 2025-10-25
|