## Reboot Reliability Kickoff Summary ### Objective Ship a reliable, observable restart and shutdown workflow for Raspberry Pi 5 clients, with safe escalation and clear operator outcomes. ### What Is Included 1. Asynchronous command lifecycle with idempotent command_id handling. 2. Monitoring-first state visibility from queued to terminal outcomes. 3. Client acknowledgements for accepted, execution_started, completed, and failed. 4. Pi 5 USB-SATA SSD timeout baseline and tuning rules. 5. Capability-tier recovery with optional managed PoE escalation. ### What Is Not Included 1. Full maintenance module in client-management. 2. Required managed power-switch integration. 3. Production Wake-on-LAN rollout. ### Team Split 1. Platform team: API command lifecycle, safety controls, listener ack ingestion. 2. Web team: lifecycle-aware UX and command status display. 3. Client team: strict validation, dedupe, ack sequence, secure execution helper, reboot continuity. ### Ownership Matrix | Team | Primary Plan File | Main Deliverables | | --- | --- | --- | | Platform team | implementation-plans/reboot-implementation-handoff-share.md | Command lifecycle backend, policy enforcement, listener ack mapping, safety lockout and escalation | | Web team | implementation-plans/reboot-implementation-handoff-share.md | Lifecycle UI states, bulk safety UX, capability visibility, command status polling | | Client team | implementation-plans/reboot-implementation-handoff-client-team.md | Command validation, dedupe persistence, ack sequence, secure execution helper, reboot continuity | | Project coordination | implementation-plans/reboot-kickoff-summary.md | Phase sequencing, canary gates, rollback thresholds, cross-team sign-off tracking | ### Baseline Operational Defaults 1. Safety lockout: 3 reboot commands per client in rolling 15 minutes. 2. Escalation cooldown: 60 seconds. 3. Reconnect target on Pi 5 SSD: 90 seconds baseline, 150 seconds cold-boot ceiling. 4. Rollback canary trigger: failed plus timed_out above 5 percent. ### Frozen Contract Snapshot 1. Canonical command topic: infoscreen/{client_uuid}/commands. 2. Canonical ack topic: infoscreen/{client_uuid}/commands/ack. 3. Transitional compatibility topics during migration: - infoscreen/{client_uuid}/command - infoscreen/{client_uuid}/command/ack 4. Command schema version: 1.0. 5. Allowed command actions: reboot_host, shutdown_host. 6. Allowed ack status values: accepted, execution_started, completed, failed. 7. Validation snippets: - implementation-plans/reboot-command-payload-schemas.md - implementation-plans/reboot-command-payload-schemas.json ### Immediate Next Steps 1. Continue implementation in parallel by team against frozen contract. 2. Client team validates dedupe and expiry handling on canonical topics. 3. Platform team verifies ack-state transitions for accepted, execution_started, completed, failed. 4. Execute one-group canary and validate timing plus failure drills.