Files
infoscreen/DEV-CHANGELOG.md
Olaf 03e3c11e90 feat: crash recovery, service_failed monitoring, broker health fields, command expiry sweep
- Add GET /api/clients/crashed endpoint (process_status=crashed or stale heartbeat)
- Add restart_app command action with same lifecycle + lockout as reboot_host
- Scheduler: crash auto-recovery loop (CRASH_RECOVERY_ENABLED flag, lockout, MQTT publish)
- Scheduler: unconditional command expiry sweep per poll cycle (sweep_expired_commands)
- Listener: subscribe to infoscreen/+/service_failed; persist service_failed_at + unit
- Listener: extract broker_connection block from health payload; persist reconnect_count + last_disconnect_at
- DB migration b1c2d3e4f5a6: service_failed_at, service_failed_unit, mqtt_reconnect_count, mqtt_last_disconnect_at on clients
- Add GET /api/clients/service_failed and POST /api/clients/<uuid>/clear_service_failed
- Monitoring overview API: include mqtt_reconnect_count + mqtt_last_disconnect_at per client
- Frontend: orange service-failed alert panel (hidden when empty, auto-refresh, quittieren action)
- Frontend: MQTT reconnect count + last disconnect in client detail panel
- MQTT auth hardening: listener/scheduler/server use env credentials; broker enforces allow_anonymous false
- Client command lifecycle foundation: ClientCommand model, reboot_host/shutdown_host, full ACK lifecycle
- Docs: TECH-CHANGELOG, DEV-CHANGELOG, MQTT_EVENT_PAYLOAD_GUIDE, copilot-instructions updated
- Add implementation-plans/, RESTART_VALIDATION_CHECKLIST.md, TODO.md
2026-04-05 10:17:56 +00:00

57 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DEV-CHANGELOG
This changelog tracks all changes made in the development workspace, including internal, experimental, and in-progress updates. Entries here may not be reflected in public releases or the user-facing changelog.
---
## Unreleased (development workspace)
- Crash detection API: Added `GET /api/clients/crashed` returning clients with `process_status=crashed` or stale heartbeat; includes `crash_reason` field (`process_crashed` | `heartbeat_stale`).
- Crash auto-recovery (scheduler): Feature-flagged loop (`CRASH_RECOVERY_ENABLED`) scans crash candidates, issues `reboot_host` command, publishes to primary + compat MQTT topics; lockout window and expiry configurable via env.
- Command expiry sweep (scheduler): Unconditional per-cycle sweep in `sweep_expired_commands()` marks non-terminal `ClientCommand` rows past `expires_at` as `expired`.
- `restart_app` action registered in `server/routes/clients.py` API action map; sends same command lifecycle as `reboot_host`; safety lockout covers both actions.
- `service_failed` listener: subscribes to `infoscreen/+/service_failed` on every connect; persists `service_failed_at` + `service_failed_unit` to `Client`; empty payload (retain clear) silently ignored.
- Broker connection health: Listener health handler now extracts `broker_connection.reconnect_count` + `broker_connection.last_disconnect_at` and persists to `Client`.
- DB migration `b1c2d3e4f5a6`: adds `service_failed_at`, `service_failed_unit`, `mqtt_reconnect_count`, `mqtt_last_disconnect_at` to `clients` table.
- Model update: `models/models.py` Client class updated with all four new columns.
- `GET /api/clients/service_failed`: lists clients with `service_failed_at` set, admin-or-higher gated.
- `POST /api/clients/<uuid>/clear_service_failed`: clears DB flag and publishes empty retained MQTT to `infoscreen/{uuid}/service_failed`.
- Monitoring overview includes `mqtt_reconnect_count` + `mqtt_last_disconnect_at` per client.
- Frontend monitoring: orange service-failed alert panel (hidden when count=0), auto-refresh 15s, per-row Quittieren action.
- Frontend monitoring: client detail now shows MQTT reconnect count + last disconnect timestamp.
- Frontend types: `ServiceFailedClient`, `ServiceFailedClientsResponse`; helpers `fetchServiceFailedClients()`, `clearServiceFailed()` added to `dashboard/src/apiClients.ts`.
- `MQTT_EVENT_PAYLOAD_GUIDE.md`: added `service_failed` topic contract.
- MQTT auth hardening: Listener and scheduler now connect to broker with env-configured credentials (`MQTT_BROKER_HOST`, `MQTT_BROKER_PORT`, `MQTT_USER`, `MQTT_PASSWORD`) instead of anonymous fixed host/port defaults; optional TLS env toggles added in code path (`MQTT_TLS_*`).
- Broker auth enforcement: `mosquitto/config/mosquitto.conf` now disables anonymous access and enables password-file authentication. `docker-compose.yml` MQTT service now bootstraps/update password entries from env (`MQTT_USER`/`MQTT_PASSWORD`, optional canary user) before starting broker.
- Compose wiring: Added MQTT credential env propagation for listener/scheduler in both base and dev override compose files and switched MQTT healthcheck publish to authenticated mode.
- Backend implementation: Introduced client command lifecycle foundation for remote control in `server/routes/clients.py` with command persistence (`ClientCommand`), schema-based MQTT publish to `infoscreen/{uuid}/commands` (QoS1, non-retained), new endpoints `POST /api/clients/<uuid>/shutdown` and `GET /api/clients/commands/<command_id>`, and restart safety lockout (`blocked_safety` after 3 restarts in 15 minutes). Added migration `server/alembic/versions/aa12bb34cc56_add_client_commands_table.py` and model updates in `models/models.py`. Restart path keeps transitional legacy MQTT publish to `clients/{uuid}/restart` for compatibility.
- Listener integration: `listener/listener.py` now subscribes to `infoscreen/+/commands/ack` and updates command lifecycle states from client ACK payloads (`accepted`, `execution_started`, `completed`, `failed`).
- Frontend API client prep: Extended `dashboard/src/apiClients.ts` with `ClientCommand` typing and helper calls for lifecycle consumption (`shutdownClient`, `fetchClientCommandStatus`), and updated `restartClient` to accept optional reason payload.
- Contract freeze clarification: implementation-plan docs now explicitly freeze canonical MQTT topics (`infoscreen/{uuid}/commands`, `infoscreen/{uuid}/commands/ack`) and JSON schemas with examples; added transitional singular-topic compatibility aliases (`infoscreen/{uuid}/command`, `infoscreen/{uuid}/command/ack`) in server publish and listener ingest.
- Action value canonicalization: command payload actions are now frozen as host-level values (`reboot_host`, `shutdown_host`). API endpoint mapping is explicit (`/restart` -> `reboot_host`, `/shutdown` -> `shutdown_host`), and docs/examples were updated to remove `restart` payload ambiguity.
- Client helper snippets: Added frozen payload validation artifacts `implementation-plans/reboot-command-payload-schemas.md` and `implementation-plans/reboot-command-payload-schemas.json` (copy-ready snippets plus machine-validated JSON Schema).
- Documentation alignment: Added active reboot implementation handoff docs under `implementation-plans/` and linked them in `README.md` for immediate cross-team access (`reboot-implementation-handoff-share.md`, `reboot-implementation-handoff-client-team.md`, `reboot-kickoff-summary.md`).
- Programminfo GUI regression/fix: `dashboard/public/program-info.json` could not be loaded in Programminfo menu due to invalid JSON in the new alpha.16 changelog line (malformed quote in a text entry). Fixed JSON entry and verified file parses correctly again.
- Dashboard holiday banner fix: `dashboard/src/dashboard.tsx``loadHolidayStatus` now uses a stable `useCallback` with empty deps, preventing repeated re-creation on render. `useEffect` depends only on the stable callback reference.
- Dashboard Syncfusion stale-render fix: `MessageComponent` in the holiday banner now receives `key={`${severity}:${text}`}` to force remount when severity or text changes; without this Syncfusion cached stale DOM and the banner did not update reactively.
- Dashboard German text: Replaced transliterated forms (ae/oe/ue) with correct Umlauts throughout visible dashboard UI strings — `Präsentation`, `für`, `prüfen`, `Ferienüberschneidungen`, `verfügbar`, `Vorfälle`, `Ausfälle`.
- TV power intent (Phase 1): Scheduler publishes retained QoS1 group-level intents to `infoscreen/groups/{group_id}/power/intent` with transition+heartbeat semantics, startup/reconnect republish, and poll-based expiry (`max(3 × poll_interval_sec, 90s)`).
- TV power validation: Added unit/integration/canary coverage in `scheduler/test_power_intent_utils.py`, `scheduler/test_power_intent_scheduler.py`, and `test_power_intent_canary.py`.
- Monitoring system completion: End-to-end monitoring pipeline is active (MQTT logs/health → listener persistence → monitoring APIs → superadmin dashboard).
- Monitoring API: Added/active endpoints `GET /api/client-logs/monitoring-overview` and `GET /api/client-logs/recent-errors`; per-client logs via `GET /api/client-logs/<uuid>/logs`.
- Dashboard monitoring UI: Superadmin monitoring page is integrated and displays client health status, screenshots, process metadata, and recent error activity.
- Bugfix: Presentation flags `page_progress` and `auto_progress` now persist reliably across create/update and detached-occurrence flows.
- Frontend (Settings → Events): Added Presentations defaults (slideshow interval, page-progress, auto-progress) with load/save via `/api/system-settings`; UI uses Syncfusion controls.
- Backend defaults: Seeded `presentation_interval` ("10"), `presentation_page_progress` ("true"), `presentation_auto_progress` ("true") in `server/init_defaults.py` when missing.
- Data model: Added per-event fields `page_progress` and `auto_progress` on `Event`; Alembic migration applied successfully.
- Event modal (dashboard): Extended to show and persist presentation `pageProgress`/`autoProgress`; applies system defaults on create and preserves per-event values on edit; payload includes `page_progress`, `auto_progress`, and `slideshow_interval`.
- Scheduler behavior: Now publishes only currently active events per group (at "now"); clears retained topics by publishing `[]` for groups with no active events; normalizes naive timestamps and compares times in UTC; presentation payloads include `page_progress` and `auto_progress`.
- Recurrence handling: Still queries a 7day window to expand recurring events and apply exceptions; recurring events only deactivate after `recurrence_end` (UNTIL).
- Logging: Temporarily added filter diagnostics during debugging; removed verbose logs after verification.
- WebUntis event type: Implemented new `webuntis` type. Event creation resolves URL from system `supplement_table_url`; returns 400 if not configured. WebUntis behaves like Website on clients (shared website payload).
- Settings consolidation: Removed separate `webuntis_url` (if present during dev); WebUntis and Vertretungsplan share `supplement_table_url`. Removed `/api/system-settings/webuntis-url` endpoints; use `/api/system-settings/supplement-table`.
- Scheduler payloads: Added top-level `event_type` for all events; introduced unified nested `website` payload for both `website` and `webuntis` events: `{ "type": "browser", "url": "…" }`.
- Frontend: Program info bumped to `2025.1.0-alpha.13`; changelog includes WebUntis/Website unification and settings update. Event modal shows no per-event URL for WebUntis.
- Documentation: Added `MQTT_EVENT_PAYLOAD_GUIDE.md` and `WEBUNTIS_EVENT_IMPLEMENTATION.md`. Updated `.github/copilot-instructions.md` and `README.md` for unified Website/WebUntis handling and system settings usage.
Note: These changes are available in the development environment and may be included in future releases. For released changes, see TECH-CHANGELOG.md.