## Client Team Implementation Spec (Raspberry Pi 5) ### Mission Implement client-side command handling for reliable restart and shutdown with strict validation, idempotency, acknowledgements, and reboot recovery continuity. ### Ownership Boundaries 1. Client team owns command intake, execution, acknowledgement emission, and post-reboot continuity. 2. Platform team owns command issuance, lifecycle aggregation, and server-side escalation logic. 3. Client implementation must not assume managed PoE availability. ### Required Client Behaviors ### Frozen MQTT Topics and Schemas (v1) 1. Canonical command topic: infoscreen/{client_uuid}/commands. 2. Canonical ack topic: infoscreen/{client_uuid}/commands/ack. 3. Transitional compatibility topics during migration: - infoscreen/{client_uuid}/command - infoscreen/{client_uuid}/command/ack 4. QoS policy: command QoS 1, ack QoS 1 recommended. 5. Retain policy: commands and acks are non-retained. Frozen command payload schema: ```json { "schema_version": "1.0", "command_id": "5d1f8b4b-7e85-44fb-8f38-3f5d5da5e2e4", "client_uuid": "9b8d1856-ff34-4864-a726-12de072d0f77", "action": "reboot_host", "issued_at": "2026-04-03T12:48:10Z", "expires_at": "2026-04-03T12:52:10Z", "requested_by": 1, "reason": "operator_request" } ``` Frozen ack payload schema: ```json { "command_id": "5d1f8b4b-7e85-44fb-8f38-3f5d5da5e2e4", "status": "execution_started", "error_code": null, "error_message": null } ``` Allowed ack status values: 1. accepted 2. execution_started 3. completed 4. failed Frozen command action values for v1: 1. reboot_host 2. shutdown_host Reserved but not emitted by server in v1: 1. restart_service Validation snippets for helper scripts: 1. Human-readable snippets: implementation-plans/reboot-command-payload-schemas.md 2. Machine-validated JSON Schema: implementation-plans/reboot-command-payload-schemas.json ### 1. Command Intake 1. Subscribe to the canonical command topic with QoS 1. 2. Parse required fields: schema_version, command_id, action, issued_at, expires_at, reason, requested_by, target metadata. 3. Reject invalid payloads with failed acknowledgement including error_code and diagnostic message. 4. Reject stale commands when current time exceeds expires_at. 5. Ignore already-processed command_id values. ### 2. Idempotency And Persistence 1. Persist processed command_id and execution result on local storage. 2. Persistence must survive service restart and full OS reboot. 3. On restart, reload dedupe cache before processing newly delivered commands. ### 3. Acknowledgement Contract Behavior 1. Emit accepted immediately after successful validation and dedupe pass. 2. Emit execution_started immediately before invoking the command action. 3. Emit completed only when local success condition is confirmed. 4. Emit failed with structured error_code on validation or execution failure. 5. If MQTT is temporarily unavailable, retry ack publish with bounded backoff until command expiry. ### 4. Execution Security Model 1. Execute via systemd-managed privileged helper. 2. Allow only whitelisted operations: - reboot_host - shutdown_host 3. Optionally keep restart_service handler as reserved path, but do not require it for v1 conformance. 4. Disallow arbitrary shell commands and untrusted arguments. 5. Enforce per-command execution timeout and terminate hung child processes. ### 5. Reboot Recovery Continuity 1. For reboot_host action: - send execution_started - trigger reboot promptly 2. During startup: - emit heartbeat early - emit process-health once service is ready 3. Keep last command execution state available after reboot for reconciliation. ### 6. Time And Timeout Semantics 1. Use monotonic timers for local elapsed-time checks. 2. Use UTC wall-clock only for protocol timestamps and expiry comparisons. 3. Target reconnect baseline on Pi 5 USB-SATA SSD: 90 seconds. 4. Accept cold-boot and USB enumeration ceiling up to 150 seconds. ### 7. Capability Reporting 1. Report recovery capability class: - software_only - managed_poe_available - manual_only 2. Report watchdog enabled status. 3. Report boot-source metadata for diagnostics. ### 8. Error Codes Minimum Set 1. invalid_schema 2. missing_field 3. stale_command 4. duplicate_command 5. permission_denied_local 6. execution_timeout 7. execution_failed 8. broker_unavailable 9. internal_error ### Acceptance Tests (Client Team) 1. Invalid schema payload is rejected and failed ack emitted. 2. Expired command is rejected and not executed. 3. Duplicate command_id is not executed twice. 4. reboot_host emits execution_started and reconnects with heartbeat in expected window. 5. restart_service action completes without host reboot and emits completed. 6. MQTT outage during ack path retries correctly without duplicate execution. 7. Boot-loop protection cooperates with server-side lockout semantics. ### Delivery Artifacts 1. Client protocol conformance checklist. 2. Test evidence for all acceptance tests. 3. Runtime logs showing full lifecycle for one restart and one reboot scenario. 4. Known limitations list per image version. ### Definition Of Done 1. All acceptance tests pass on Pi 5 USB-SATA SSD test devices. 2. No duplicate execution observed under reconnect and retained-delivery edge cases. 3. Acknowledgement sequence is complete and machine-parseable for server correlation. 4. Reboot recovery continuity works without managed PoE dependencies.