Files
infoscreen-dev/SERVER_TEAM_ACTIONS.md
RobbStarkAustria 0cd0d95612 feat: remote commands, systemd units, process observability, broker auth split
- Command intake (reboot/shutdown) on infoscreen/{uuid}/commands with ack lifecycle
- MQTT_USER/MQTT_PASSWORD_BROKER split from identity vars; configure_mqtt_security() updated
- infoscreen-simclient.service: Type=notify, WatchdogSec=60, Restart=on-failure
- infoscreen-notify-failure@.service + script: retained MQTT alert when systemd gives up (Gap 3)
- _sd_notify() watchdog keepalive in simclient main loop (Gap 1)
- broker_connection block in health payload: reconnect_count, last_disconnect_at (Gap 2)
- COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE canary flag with safety guard
- SERVER_TEAM_ACTIONS.md: server-side integration action items
- Docs: README, CHANGELOG, src/README, copilot-instructions updated
- 43 tests passing
2026-04-05 08:36:50 +02:00

4.4 KiB
Raw Blame History

Server Team Action Items — Infoscreen Client

This document lists everything the server/infrastructure/frontend team must implement to complete the client integration. The client-side code is production-ready for all items listed here.


1. MQTT Broker Hardening (prerequisite for everything else)

  • Disable anonymous access on the broker.
  • Create one broker account per client device:
    • Username convention: infoscreen-client-<uuid-prefix> (e.g. infoscreen-client-9b8d1856)
    • Provision the password to the device .env as MQTT_PASSWORD_BROKER=
  • Create a server/publisher account (e.g. infoscreen-server) for all server-side publishes.
  • Enforce ACLs:
Topic Publisher
infoscreen/{uuid}/commands server only
infoscreen/{uuid}/command (alias) server only
infoscreen/{uuid}/group_id server only
infoscreen/events/{group_id} server only
infoscreen/groups/+/power/intent server only
infoscreen/{uuid}/commands/ack client only
infoscreen/{uuid}/command/ack client only
infoscreen/{uuid}/heartbeat client only
infoscreen/{uuid}/health client only
infoscreen/{uuid}/logs/# client only
infoscreen/{uuid}/service_failed client only

2. Reboot / Shutdown Command — Ack Lifecycle

Client publishes ack status updates to two topics per command (canonical + transitional alias):

  • infoscreen/{uuid}/commands/ack
  • infoscreen/{uuid}/command/ack

Ack payload schema (v1, frozen):

{
  "command_id": "07aab032-53c2-45ef-a5a3-6aa58e9d9fae",
  "status": "accepted | execution_started | completed | failed",
  "error_code": null,
  "error_message": null
}

Status lifecycle:

Status When Notes
accepted Command received and validated Immediate
execution_started Helper invoked Immediate after accepted
completed Execution confirmed For reboot_host: arrives after reconnect (1090 s after execution_started)
failed Helper returned error error_code and error_message will be set

Server must:

  • Track command_id through the full lifecycle and update status in DB/UI.
  • Surface failed + error_code to the operator UI.
  • Expect reboot_host completed to arrive after a reconnect delay — do not treat the gap as a timeout.
  • Use expires_at from the original command to determine when to abandon waiting.

3. Health Dashboard — Broker Connection Fields (Gap 2)

Every infoscreen/{uuid}/health payload now includes a broker_connection block:

{
  "timestamp": "2026-04-05T08:00:00.000000+00:00",
  "expected_state": { "event_id": 42 },
  "actual_state": {
    "process": "display_manager",
    "pid": 1234,
    "status": "running"
  },
  "broker_connection": {
    "broker_reachable": true,
    "reconnect_count": 2,
    "last_disconnect_at": "2026-04-04T10:30:00Z"
  }
}

Server must:

  • Display reconnect_count and last_disconnect_at per device in the health dashboard.
  • Implement alerting heuristic:
    • All clients go silent simultaneously → likely broker outage, not device crash.
    • Single client goes silent → device crash, network failure, or process hang.

4. Service-Failed MQTT Notification (Gap 3)

When systemd gives up restarting a service after repeated crashes (StartLimitBurst exceeded), the client automatically publishes a retained message:

Topic: infoscreen/{uuid}/service_failed

Payload:

{
  "event": "service_failed",
  "unit": "infoscreen-simclient.service",
  "client_uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
  "failed_at": "2026-04-05T08:00:00Z"
}

Server must:

  • Subscribe to infoscreen/+/service_failed on startup (retained — message survives broker restart).
  • Alert the operator immediately when this topic receives a payload.
  • Clear the retained message once the device is acknowledged or recovered:
    mosquitto_pub -t "infoscreen/{uuid}/service_failed" -n --retain
    

5. No Server Action Required

These items are fully implemented client-side and require no server changes:

  • systemd watchdog (WatchdogSec=60) — hangs detected and process restarted automatically.
  • Command deduplication — command_id deduplicated with 24-hour TTL.
  • Ack retry backoff — client retries ack publish on broker disconnect until expires_at.
  • Mock helper / test mode (COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE) — development only.