- Command intake (reboot/shutdown) on infoscreen/{uuid}/commands with ack lifecycle
- MQTT_USER/MQTT_PASSWORD_BROKER split from identity vars; configure_mqtt_security() updated
- infoscreen-simclient.service: Type=notify, WatchdogSec=60, Restart=on-failure
- infoscreen-notify-failure@.service + script: retained MQTT alert when systemd gives up (Gap 3)
- _sd_notify() watchdog keepalive in simclient main loop (Gap 1)
- broker_connection block in health payload: reconnect_count, last_disconnect_at (Gap 2)
- COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE canary flag with safety guard
- SERVER_TEAM_ACTIONS.md: server-side integration action items
- Docs: README, CHANGELOG, src/README, copilot-instructions updated
- 43 tests passing
4.4 KiB
4.4 KiB
Server Team Action Items — Infoscreen Client
This document lists everything the server/infrastructure/frontend team must implement to complete the client integration. The client-side code is production-ready for all items listed here.
1. MQTT Broker Hardening (prerequisite for everything else)
- Disable anonymous access on the broker.
- Create one broker account per client device:
- Username convention:
infoscreen-client-<uuid-prefix>(e.g.infoscreen-client-9b8d1856) - Provision the password to the device
.envasMQTT_PASSWORD_BROKER=
- Username convention:
- Create a server/publisher account (e.g.
infoscreen-server) for all server-side publishes. - Enforce ACLs:
| Topic | Publisher |
|---|---|
infoscreen/{uuid}/commands |
server only |
infoscreen/{uuid}/command (alias) |
server only |
infoscreen/{uuid}/group_id |
server only |
infoscreen/events/{group_id} |
server only |
infoscreen/groups/+/power/intent |
server only |
infoscreen/{uuid}/commands/ack |
client only |
infoscreen/{uuid}/command/ack |
client only |
infoscreen/{uuid}/heartbeat |
client only |
infoscreen/{uuid}/health |
client only |
infoscreen/{uuid}/logs/# |
client only |
infoscreen/{uuid}/service_failed |
client only |
2. Reboot / Shutdown Command — Ack Lifecycle
Client publishes ack status updates to two topics per command (canonical + transitional alias):
infoscreen/{uuid}/commands/ackinfoscreen/{uuid}/command/ack
Ack payload schema (v1, frozen):
{
"command_id": "07aab032-53c2-45ef-a5a3-6aa58e9d9fae",
"status": "accepted | execution_started | completed | failed",
"error_code": null,
"error_message": null
}
Status lifecycle:
| Status | When | Notes |
|---|---|---|
accepted |
Command received and validated | Immediate |
execution_started |
Helper invoked | Immediate after accepted |
completed |
Execution confirmed | For reboot_host: arrives after reconnect (10–90 s after execution_started) |
failed |
Helper returned error | error_code and error_message will be set |
Server must:
- Track
command_idthrough the full lifecycle and update status in DB/UI. - Surface
failed+error_codeto the operator UI. - Expect
reboot_hostcompletedto arrive after a reconnect delay — do not treat the gap as a timeout. - Use
expires_atfrom the original command to determine when to abandon waiting.
3. Health Dashboard — Broker Connection Fields (Gap 2)
Every infoscreen/{uuid}/health payload now includes a broker_connection block:
{
"timestamp": "2026-04-05T08:00:00.000000+00:00",
"expected_state": { "event_id": 42 },
"actual_state": {
"process": "display_manager",
"pid": 1234,
"status": "running"
},
"broker_connection": {
"broker_reachable": true,
"reconnect_count": 2,
"last_disconnect_at": "2026-04-04T10:30:00Z"
}
}
Server must:
- Display
reconnect_countandlast_disconnect_atper device in the health dashboard. - Implement alerting heuristic:
- All clients go silent simultaneously → likely broker outage, not device crash.
- Single client goes silent → device crash, network failure, or process hang.
4. Service-Failed MQTT Notification (Gap 3)
When systemd gives up restarting a service after repeated crashes (StartLimitBurst exceeded), the client automatically publishes a retained message:
Topic: infoscreen/{uuid}/service_failed
Payload:
{
"event": "service_failed",
"unit": "infoscreen-simclient.service",
"client_uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
"failed_at": "2026-04-05T08:00:00Z"
}
Server must:
- Subscribe to
infoscreen/+/service_failedon startup (retained — message survives broker restart). - Alert the operator immediately when this topic receives a payload.
- Clear the retained message once the device is acknowledged or recovered:
mosquitto_pub -t "infoscreen/{uuid}/service_failed" -n --retain
5. No Server Action Required
These items are fully implemented client-side and require no server changes:
- systemd watchdog (
WatchdogSec=60) — hangs detected and process restarted automatically. - Command deduplication —
command_iddeduplicated with 24-hour TTL. - Ack retry backoff — client retries ack publish on broker disconnect until
expires_at. - Mock helper / test mode (
COMMAND_MOCK_REBOOT_IMMEDIATE_COMPLETE) — development only.