- Add GET /api/clients/crashed endpoint (process_status=crashed or stale heartbeat) - Add restart_app command action with same lifecycle + lockout as reboot_host - Scheduler: crash auto-recovery loop (CRASH_RECOVERY_ENABLED flag, lockout, MQTT publish) - Scheduler: unconditional command expiry sweep per poll cycle (sweep_expired_commands) - Listener: subscribe to infoscreen/+/service_failed; persist service_failed_at + unit - Listener: extract broker_connection block from health payload; persist reconnect_count + last_disconnect_at - DB migration b1c2d3e4f5a6: service_failed_at, service_failed_unit, mqtt_reconnect_count, mqtt_last_disconnect_at on clients - Add GET /api/clients/service_failed and POST /api/clients/<uuid>/clear_service_failed - Monitoring overview API: include mqtt_reconnect_count + mqtt_last_disconnect_at per client - Frontend: orange service-failed alert panel (hidden when empty, auto-refresh, quittieren action) - Frontend: MQTT reconnect count + last disconnect in client detail panel - MQTT auth hardening: listener/scheduler/server use env credentials; broker enforces allow_anonymous false - Client command lifecycle foundation: ClientCommand model, reboot_host/shutdown_host, full ACK lifecycle - Docs: TECH-CHANGELOG, DEV-CHANGELOG, MQTT_EVENT_PAYLOAD_GUIDE, copilot-instructions updated - Add implementation-plans/, RESTART_VALIDATION_CHECKLIST.md, TODO.md
3.8 KiB
3.8 KiB
Restart Validation Checklist
Purpose: Validate end-to-end restart command flow after MQTT auth hardening.
Scope
- API command issue route:
POST /api/clients/{uuid}/restart - MQTT command topic:
infoscreen/{uuid}/commands(compat:infoscreen/{uuid}/command) - MQTT ACK topic:
infoscreen/{uuid}/commands/ack(compat:infoscreen/{uuid}/command/ack) - Status API:
GET /api/clients/commands/{command_id}
Preconditions
- Stack is up and healthy (
db,mqtt,server,listener,scheduler). - You have an
adminorsuperadminaccount. - At least one canary client is online and can process restart commands.
.envhas validMQTT_USER/MQTT_PASSWORD.
1) Open Monitoring Session (MQTT)
On host/server:
set -a
. ./.env
set +a
mosquitto_sub -h 127.0.0.1 -p 1883 \
-u "$MQTT_USER" -P "$MQTT_PASSWORD" \
-t "infoscreen/+/commands" \
-t "infoscreen/+/commands/ack" \
-t "infoscreen/+/command" \
-t "infoscreen/+/command/ack" \
-v
Expected:
- Command publish appears on
infoscreen/{uuid}/commands. - ACK(s) appear on
infoscreen/{uuid}/commands/ack.
2) Login and Keep Session Cookie
API_BASE="http://127.0.0.1:8000"
USER="<admin_or_superadmin_username>"
PASS="<password>"
curl -sS -X POST "$API_BASE/api/auth/login" \
-H "Content-Type: application/json" \
-d "{\"username\":\"$USER\",\"password\":\"$PASS\"}" \
-c /tmp/infoscreen-cookies.txt
Expected:
- Login success response.
- Cookie jar file created at
/tmp/infoscreen-cookies.txt.
3) Pick Target Client UUID
Option A: Use known canary UUID.
Option B: query alive clients:
curl -sS "$API_BASE/api/clients/with_alive_status" -b /tmp/infoscreen-cookies.txt
Choose one uuid where is_alive is true.
4) Issue Restart Command
CLIENT_UUID="<target_uuid>"
curl -sS -X POST "$API_BASE/api/clients/$CLIENT_UUID/restart" \
-H "Content-Type: application/json" \
-b /tmp/infoscreen-cookies.txt \
-d '{"reason":"canary_restart_validation"}'
Expected:
- HTTP
202on success. - JSON includes
command.commandIdand initial status aroundpublished. - In MQTT monitor, a command payload with:
schema_version: "1.0"action: "reboot_host"- matching
command_id.
5) Poll Command Lifecycle Until Terminal
COMMAND_ID="<command_id_from_previous_step>"
for i in $(seq 1 20); do
curl -sS "$API_BASE/api/clients/commands/$COMMAND_ID" -b /tmp/infoscreen-cookies.txt
echo
sleep 3
done
Expected status progression (typical):
queued->publish_in_progress->published->ack_received->execution_started->completed
Failure/alternate terminal states:
failed(checkerrorCode/errorMessage)blocked_safety(reboot lockout triggered)
6) Validate Offline/Timeout Behavior
- Repeat step 4 for an offline client (or stop client process first).
- Confirm command does not falsely end as
completed. - Confirm status remains non-success and has usable failure diagnostics.
7) Validate Safety Lockout
Current lockout in API route:
- Threshold: 3 reboot commands
- Window: 15 minutes
Test:
- Send 4 restart commands quickly for same
uuid.
Expected:
- One request returns HTTP
429. - Command entry state
blocked_safetywith lockout error details.
8) Service Log Spot Check
docker compose logs --tail=150 server listener mqtt
Expected:
- No MQTT auth errors (
Not authorized,Connection Refused: not authorised). - Listener logs show ACK processing for
command_id.
9) Acceptance Criteria
- Restart command publish is visible on MQTT.
- ACK is received and mapped by listener.
- Status endpoint reaches correct terminal state.
- Safety lockout works under repeated restart attempts.
- No auth regression in broker/service logs.
Cleanup
rm -f /tmp/infoscreen-cookies.txt