feat: crash recovery, service_failed monitoring, broker health fields, command expiry sweep
- Add GET /api/clients/crashed endpoint (process_status=crashed or stale heartbeat) - Add restart_app command action with same lifecycle + lockout as reboot_host - Scheduler: crash auto-recovery loop (CRASH_RECOVERY_ENABLED flag, lockout, MQTT publish) - Scheduler: unconditional command expiry sweep per poll cycle (sweep_expired_commands) - Listener: subscribe to infoscreen/+/service_failed; persist service_failed_at + unit - Listener: extract broker_connection block from health payload; persist reconnect_count + last_disconnect_at - DB migration b1c2d3e4f5a6: service_failed_at, service_failed_unit, mqtt_reconnect_count, mqtt_last_disconnect_at on clients - Add GET /api/clients/service_failed and POST /api/clients/<uuid>/clear_service_failed - Monitoring overview API: include mqtt_reconnect_count + mqtt_last_disconnect_at per client - Frontend: orange service-failed alert panel (hidden when empty, auto-refresh, quittieren action) - Frontend: MQTT reconnect count + last disconnect in client detail panel - MQTT auth hardening: listener/scheduler/server use env credentials; broker enforces allow_anonymous false - Client command lifecycle foundation: ClientCommand model, reboot_host/shutdown_host, full ACK lifecycle - Docs: TECH-CHANGELOG, DEV-CHANGELOG, MQTT_EVENT_PAYLOAD_GUIDE, copilot-instructions updated - Add implementation-plans/, RESTART_VALIDATION_CHECKLIST.md, TODO.md
This commit is contained in:
12
.github/copilot-instructions.md
vendored
12
.github/copilot-instructions.md
vendored
@@ -13,15 +13,16 @@ It is not a changelog and not a full architecture handbook.
|
||||
- Keep changes minimal, match existing patterns, and update docs in the same commit when behavior changes.
|
||||
|
||||
## Fast file map
|
||||
- `scheduler/scheduler.py` - scheduler loop, MQTT event publishing, TV power intent publishing
|
||||
- `scheduler/db_utils.py` - event formatting and power-intent helper logic
|
||||
- `listener/listener.py` - discovery/heartbeat/log/screenshot MQTT consumption
|
||||
- `scheduler/scheduler.py` - scheduler loop, MQTT event publishing, TV power intent publishing, crash auto-recovery, command expiry sweep
|
||||
- `scheduler/db_utils.py` - event formatting, power-intent helpers, crash recovery helpers, command expiry sweep
|
||||
- `listener/listener.py` - discovery/heartbeat/log/screenshot/service_failed MQTT consumption
|
||||
- `server/init_academic_periods.py` - idempotent academic-period seeding + auto-activation for current date
|
||||
- `server/initialize_database.py` - migration + bootstrap orchestration for local/manual setup
|
||||
- `server/routes/events.py` - event CRUD, recurrence handling, UTC normalization
|
||||
- `server/routes/eventmedia.py` - file manager, media upload/stream endpoints
|
||||
- `server/routes/groups.py` - group lifecycle, alive status, order persistence
|
||||
- `server/routes/system_settings.py` - system settings CRUD and supplement-table endpoint
|
||||
- `server/routes/clients.py` - client metadata, restart/shutdown/restart_app command issuing, command status, crashed/service_failed alert endpoints
|
||||
- `dashboard/src/settings.tsx` - settings UX and system-defaults integration
|
||||
- `dashboard/src/components/CustomEventModal.tsx` - event creation/editing UX
|
||||
- `dashboard/src/monitoring.tsx` - superadmin monitoring page
|
||||
@@ -54,6 +55,9 @@ It is not a changelog and not a full architecture handbook.
|
||||
- Logs topic family: `infoscreen/{uuid}/logs/{error|warn|info}`
|
||||
- Health topic: `infoscreen/{uuid}/health`
|
||||
- Dashboard screenshot topic: `infoscreen/{uuid}/dashboard`
|
||||
- Client command topic (QoS1, non-retained): `infoscreen/{uuid}/commands` (compat alias: `infoscreen/{uuid}/command`)
|
||||
- Client command ack topic (QoS1, non-retained): `infoscreen/{uuid}/commands/ack` (compat alias: `infoscreen/{uuid}/command/ack`)
|
||||
- Service-failed topic (retained, client→server): `infoscreen/{uuid}/service_failed`
|
||||
- TV power intent Phase 1 topic (retained, QoS1): `infoscreen/groups/{group_id}/power/intent`
|
||||
|
||||
TV power intent Phase 1 rules:
|
||||
@@ -82,7 +86,9 @@ TV power intent Phase 1 rules:
|
||||
- Scheduler: `POLL_INTERVAL_SECONDS`, `REFRESH_SECONDS`
|
||||
- Power intent: `POWER_INTENT_PUBLISH_ENABLED`, `POWER_INTENT_HEARTBEAT_ENABLED`, `POWER_INTENT_EXPIRY_MULTIPLIER`, `POWER_INTENT_MIN_EXPIRY_SECONDS`
|
||||
- Monitoring: `PRIORITY_SCREENSHOT_TTL_SECONDS`
|
||||
- Crash recovery: `CRASH_RECOVERY_ENABLED`, `CRASH_RECOVERY_GRACE_SECONDS`, `CRASH_RECOVERY_LOCKOUT_MINUTES`, `CRASH_RECOVERY_COMMAND_EXPIRY_SECONDS`
|
||||
- Core: `DB_CONN`, `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_NAME`, `ENV`
|
||||
- MQTT auth/connectivity: `MQTT_BROKER_HOST`, `MQTT_BROKER_PORT`, `MQTT_USER`, `MQTT_PASSWORD` (listener/scheduler/server should use authenticated broker access)
|
||||
|
||||
## Edit guardrails
|
||||
- Do not edit generated assets in `dashboard/dist/`.
|
||||
|
||||
Reference in New Issue
Block a user