# Project TODOs This file tracks higher-level todos and design notes for the infoscreen client. ## Video playback (Raspberry Pi) - Remove taskbar / window decorations in VLC window - Goal: show video truly fullscreen without window title bar or OS panel/taskbar overlapping. - Ideas / approaches: - Use libVLC options from python-vlc: `--no-video-deco`, `--no-video-title-show`, `--video-on-top`, and call `player.set_fullscreen(True)` after playback starts. - Run the Display Manager in a dedicated kiosk X session (no panel/desktop environment) or minimal window manager (openbox/matchbox) to avoid taskbar. - Use `wmctrl` as a fallback to force fullscreen/above: `wmctrl -r -b add,fullscreen,above`. - Add an env var toggle, e.g. `VLC_KIOSK=1`, to enable these options from runtime. - Acceptance criteria: - Video occupies the full display area with no visible window controls or panels on top. - Behaviour toggleable via env var. - Add possibility to adjust sound level by HDMI-CEC using Python - Goal: allow remote/automatic volume control over HDMI using CEC-capable hardware. - Ideas / approaches: - Use `libcec` bindings (e.g. `pycec` / `cec` packages) or call `cec-client` from shell to send volume commands to the TV/AVR. - Map event volume (0.0-1.0) to CEC volume commands (some devices support absolute volume or key presses like `VOLUME_UP`/`VOLUME_DOWN`). - Provide a small adapter module `src/hdmi_cec.py` that exposes `set_volume(level: float)` and `volume_step(up: bool)` used by `display_manager.py` when starting/stopping videos or on explicit volume events. - Acceptance criteria: - `set_volume()` issues appropriate CEC commands and returns success/failure. - Document any platform limitations (some TVs don't support absolute volume via CEC). ## Systemd crash recovery (server team recommendation) Reliable restart-on-crash for both processes must be handled by **systemd**, not by in-process watchdogs or ad-hoc shell scripts. ### What needs to be done - `display_manager`: already has `scripts/infoscreen-display.service` with `Restart=on-failure` / `RestartSec=10`. - Review `RestartSec` — may want a short backoff (e.g. 5–15 s) and `StartLimitIntervalSec` + `StartLimitBurst` to prevent thrash loops. - `simclient`: **no service unit exists yet**. - Create `scripts/infoscreen-simclient.service` modelled on the display service. - Use `Restart=on-failure` and `RestartSec=10`. - Wire `EnvironmentFile=/home/olafn/infoscreen-dev/.env` so the unit picks up `.env` variables automatically. - Set `After=network-online.target` so MQTT connection is not attempted before the network is ready. - Both units should be installed and enabled via `src/pi-setup.sh` (`systemctl enable --now`). - After enabling, verify crash recovery with `kill -9 <pid>` and confirm systemd restarts the process within `RestartSec`. ### Acceptance criteria - Both `simclient` and `display_manager` restart automatically within 15 s of any non-intentional exit. - `systemctl status` shows `active (running)` after a crash-induced restart. - `journalctl -u infoscreen-simclient` captures all process output (stdout + stderr). - `pi-setup.sh` idempotently installs and enables both units. ### Notes - Use `Restart=on-failure` — restarts on crashes and signals but not on clean `systemctl stop`, preserving operator control during deployments. - The reboot/shutdown command flow publishes `execution_started` and then exits intentionally; systemd will restart simclient, and the recovery logic in the heartbeat loop will emit `completed` on reconnect. This is the intended lifecycle. ## Process health observability gaps Two scenarios are currently undetected or ambiguous from the server/frontend perspective. ### Gap 1: Hung / deadlocked process ✅ implemented **Solution implemented:** Zero-dependency `_sd_notify()` helper writes directly to `NOTIFY_SOCKET` (raw Unix socket, no extra package). `READY=1` is sent when the heartbeat loop starts; `WATCHDOG=1` is sent every 5 s in the main loop iteration. The service unit uses `Type=notify` + `WatchdogSec=60` — if the main loop freezes for 60 s, systemd kills and restarts the process automatically. **To apply on device:** ```bash sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl restart infoscreen-simclient ``` ### Gap 2: MQTT broker unreachable vs. simclient dead ✅ implemented (client side) **Solution implemented:** `connection_state` dict expanded with `reconnect_count` and `connect_count`. `publish_health_message()` now accepts `connection_state` and appends a `broker_connection` block to every health payload: ```json "broker_connection": { "broker_reachable": true, "reconnect_count": 2, "last_disconnect_at": "2026-04-04T10:00:00Z" } ``` `broker_reachable` = `true` when MQTT is connected at publish time. `reconnect_count` increments on every reconnection (first connect does not count). `last_disconnect_at` is the UTC timestamp of the most recent disconnect. **Server-side action still needed:** - Display `reconnect_count` and `last_disconnect_at` in the frontend health dashboard. - Alert heuristic: if **all** clients go silent simultaneously → likely broker issue; if only one → likely device issue. ### Gap 3: systemd gives up (StartLimitBurst exceeded) ✅ implemented **Solution implemented:** `scripts/infoscreen-notify-failure@.service` (template unit) + `scripts/infoscreen-notify-failure.sh`. Both main units have `OnFailure=infoscreen-notify-failure@%n.service`. When systemd marks a service `failed`, the notifier runs once, reads broker credentials from `.env`, reads `client_uuid.txt`, and publishes a retained JSON payload to `infoscreen/{uuid}/service_failed` via `mosquitto_pub`. **To apply on device:** ```bash sudo cp ~/infoscreen-dev/scripts/infoscreen-notify-failure@.service /etc/systemd/system/ sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/ sudo cp ~/infoscreen-dev/scripts/infoscreen-display.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl restart infoscreen-simclient infoscreen-display ``` **Topic:** `infoscreen/{client_uuid}/service_failed` (retained) **Payload:** `{"event":"service_failed","unit":"infoscreen-simclient.service","client_uuid":"...","failed_at":"2026-..."}` ## Next-high-level items - Add environment-controlled libVLC hw-accel toggle (`VLC_HW_ACCEL=1|0`) to `display_manager.py` so software decode can be forced when necessary. - Add automated tests for video start/stop lifecycle (mock python-vlc) to ensure resources are released on event end. - Add allowlist validation for `website` / `webuntis` event URLs - Goal: restrict browser-based events to approved hosts and schemes even if an authenticated publisher sends an unsafe URL. - Ideas / approaches: - Add env-configurable allowlists for general website hosts and WebUntis hosts. - Allow only `https` by default and reject `file:`, `data:`, `javascript:`, loopback, and private-address URLs unless explicitly allowed. - Enforce the same validation on both server-side payload generation and client-side execution in `display_manager.py`. - Acceptance criteria: - Unsafe or unapproved URLs are rejected before Chromium is launched. - WebUntis and approved website events still work with explicit allowlist configuration. ## Notes - Keep all changes backward-compatible: external `vlc` binary fallback should still work. - Document any new env vars in `README.md` and `.github/copilot-instructions.md` if added. --- Generated: 2025-10-25