Files
infoscreen-dev/TODO.md
RobbStarkAustria 0cd0d95612 feat: remote commands, systemd units, process observability, broker auth split
- Command intake (reboot/shutdown) on infoscreen/{uuid}/commands with ack lifecycle
- MQTT_USER/MQTT_PASSWORD_BROKER split from identity vars; configure_mqtt_security() updated
- infoscreen-simclient.service: Type=notify, WatchdogSec=60, Restart=on-failure
- infoscreen-notify-failure@.service + script: retained MQTT alert when systemd gives up (Gap 3)
- _sd_notify() watchdog keepalive in simclient main loop (Gap 1)
- broker_connection block in health payload: reconnect_count, last_disconnect_at (Gap 2)
- COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE canary flag with safety guard
- SERVER_TEAM_ACTIONS.md: server-side integration action items
- Docs: README, CHANGELOG, src/README, copilot-instructions updated
- 43 tests passing
2026-04-05 08:36:50 +02:00

7.5 KiB
Raw Permalink Blame History

Project TODOs

This file tracks higher-level todos and design notes for the infoscreen client.

Video playback (Raspberry Pi)

  • Remove taskbar / window decorations in VLC window

    • Goal: show video truly fullscreen without window title bar or OS panel/taskbar overlapping.
    • Ideas / approaches:
      • Use libVLC options from python-vlc: --no-video-deco, --no-video-title-show, --video-on-top, and call player.set_fullscreen(True) after playback starts.
      • Run the Display Manager in a dedicated kiosk X session (no panel/desktop environment) or minimal window manager (openbox/matchbox) to avoid taskbar.
      • Use wmctrl as a fallback to force fullscreen/above: wmctrl -r <title> -b add,fullscreen,above.
      • Add an env var toggle, e.g. VLC_KIOSK=1, to enable these options from runtime.
    • Acceptance criteria:
      • Video occupies the full display area with no visible window controls or panels on top.
      • Behaviour toggleable via env var.
  • Add possibility to adjust sound level by HDMI-CEC using Python

    • Goal: allow remote/automatic volume control over HDMI using CEC-capable hardware.
    • Ideas / approaches:
      • Use libcec bindings (e.g. pycec / cec packages) or call cec-client from shell to send volume commands to the TV/AVR.
      • Map event volume (0.0-1.0) to CEC volume commands (some devices support absolute volume or key presses like VOLUME_UP/VOLUME_DOWN).
      • Provide a small adapter module src/hdmi_cec.py that exposes set_volume(level: float) and volume_step(up: bool) used by display_manager.py when starting/stopping videos or on explicit volume events.
    • Acceptance criteria:
      • set_volume() issues appropriate CEC commands and returns success/failure.
      • Document any platform limitations (some TVs don't support absolute volume via CEC).

Systemd crash recovery (server team recommendation)

Reliable restart-on-crash for both processes must be handled by systemd, not by in-process watchdogs or ad-hoc shell scripts.

What needs to be done

  • display_manager: already has scripts/infoscreen-display.service with Restart=on-failure / RestartSec=10.
    • Review RestartSec — may want a short backoff (e.g. 515 s) and StartLimitIntervalSec + StartLimitBurst to prevent thrash loops.
  • simclient: no service unit exists yet.
    • Create scripts/infoscreen-simclient.service modelled on the display service.
    • Use Restart=on-failure and RestartSec=10.
    • Wire EnvironmentFile=/home/olafn/infoscreen-dev/.env so the unit picks up .env variables automatically.
    • Set After=network-online.target so MQTT connection is not attempted before the network is ready.
  • Both units should be installed and enabled via src/pi-setup.sh (systemctl enable --now).
  • After enabling, verify crash recovery with kill -9 <pid> and confirm systemd restarts the process within RestartSec.

Acceptance criteria

  • Both simclient and display_manager restart automatically within 15 s of any non-intentional exit.
  • systemctl status shows active (running) after a crash-induced restart.
  • journalctl -u infoscreen-simclient captures all process output (stdout + stderr).
  • pi-setup.sh idempotently installs and enables both units.

Notes

  • Use Restart=on-failure — restarts on crashes and signals but not on clean systemctl stop, preserving operator control during deployments.
  • The reboot/shutdown command flow publishes execution_started and then exits intentionally; systemd will restart simclient, and the recovery logic in the heartbeat loop will emit completed on reconnect. This is the intended lifecycle.

Process health observability gaps

Two scenarios are currently undetected or ambiguous from the server/frontend perspective.

Gap 1: Hung / deadlocked process implemented

Solution implemented: Zero-dependency _sd_notify() helper writes directly to NOTIFY_SOCKET (raw Unix socket, no extra package). READY=1 is sent when the heartbeat loop starts; WATCHDOG=1 is sent every 5 s in the main loop iteration. The service unit uses Type=notify + WatchdogSec=60 — if the main loop freezes for 60 s, systemd kills and restarts the process automatically.

To apply on device:

sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl restart infoscreen-simclient

Gap 2: MQTT broker unreachable vs. simclient dead implemented (client side)

Solution implemented: connection_state dict expanded with reconnect_count and connect_count. publish_health_message() now accepts connection_state and appends a broker_connection block to every health payload:

"broker_connection": {
  "broker_reachable": true,
  "reconnect_count": 2,
  "last_disconnect_at": "2026-04-04T10:00:00Z"
}

broker_reachable = true when MQTT is connected at publish time.
reconnect_count increments on every reconnection (first connect does not count).
last_disconnect_at is the UTC timestamp of the most recent disconnect.

Server-side action still needed:

  • Display reconnect_count and last_disconnect_at in the frontend health dashboard.
  • Alert heuristic: if all clients go silent simultaneously → likely broker issue; if only one → likely device issue.

Gap 3: systemd gives up (StartLimitBurst exceeded) implemented

Solution implemented: scripts/infoscreen-notify-failure@.service (template unit) + scripts/infoscreen-notify-failure.sh. Both main units have OnFailure=infoscreen-notify-failure@%n.service. When systemd marks a service failed, the notifier runs once, reads broker credentials from .env, reads client_uuid.txt, and publishes a retained JSON payload to infoscreen/{uuid}/service_failed via mosquitto_pub.

To apply on device:

sudo cp ~/infoscreen-dev/scripts/infoscreen-notify-failure@.service /etc/systemd/system/
sudo cp ~/infoscreen-dev/scripts/infoscreen-simclient.service /etc/systemd/system/
sudo cp ~/infoscreen-dev/scripts/infoscreen-display.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl restart infoscreen-simclient infoscreen-display

Topic: infoscreen/{client_uuid}/service_failed (retained) Payload: {"event":"service_failed","unit":"infoscreen-simclient.service","client_uuid":"...","failed_at":"2026-..."}

Next-high-level items

  • Add environment-controlled libVLC hw-accel toggle (VLC_HW_ACCEL=1|0) to display_manager.py so software decode can be forced when necessary.
  • Add automated tests for video start/stop lifecycle (mock python-vlc) to ensure resources are released on event end.
  • Add allowlist validation for website / webuntis event URLs
    • Goal: restrict browser-based events to approved hosts and schemes even if an authenticated publisher sends an unsafe URL.
    • Ideas / approaches:
      • Add env-configurable allowlists for general website hosts and WebUntis hosts.
      • Allow only https by default and reject file:, data:, javascript:, loopback, and private-address URLs unless explicitly allowed.
      • Enforce the same validation on both server-side payload generation and client-side execution in display_manager.py.
    • Acceptance criteria:
      • Unsafe or unapproved URLs are rejected before Chromium is launched.
      • WebUntis and approved website events still work with explicit allowlist configuration.

Notes

  • Keep all changes backward-compatible: external vlc binary fallback should still work.
  • Document any new env vars in README.md and .github/copilot-instructions.md if added.

Generated: 2025-10-25