Files
infoscreen/DEV-CHANGELOG.md
Olaf 03e3c11e90 feat: crash recovery, service_failed monitoring, broker health fields, command expiry sweep
- Add GET /api/clients/crashed endpoint (process_status=crashed or stale heartbeat)
- Add restart_app command action with same lifecycle + lockout as reboot_host
- Scheduler: crash auto-recovery loop (CRASH_RECOVERY_ENABLED flag, lockout, MQTT publish)
- Scheduler: unconditional command expiry sweep per poll cycle (sweep_expired_commands)
- Listener: subscribe to infoscreen/+/service_failed; persist service_failed_at + unit
- Listener: extract broker_connection block from health payload; persist reconnect_count + last_disconnect_at
- DB migration b1c2d3e4f5a6: service_failed_at, service_failed_unit, mqtt_reconnect_count, mqtt_last_disconnect_at on clients
- Add GET /api/clients/service_failed and POST /api/clients/<uuid>/clear_service_failed
- Monitoring overview API: include mqtt_reconnect_count + mqtt_last_disconnect_at per client
- Frontend: orange service-failed alert panel (hidden when empty, auto-refresh, quittieren action)
- Frontend: MQTT reconnect count + last disconnect in client detail panel
- MQTT auth hardening: listener/scheduler/server use env credentials; broker enforces allow_anonymous false
- Client command lifecycle foundation: ClientCommand model, reboot_host/shutdown_host, full ACK lifecycle
- Docs: TECH-CHANGELOG, DEV-CHANGELOG, MQTT_EVENT_PAYLOAD_GUIDE, copilot-instructions updated
- Add implementation-plans/, RESTART_VALIDATION_CHECKLIST.md, TODO.md
2026-04-05 10:17:56 +00:00

9.8 KiB
Raw Blame History

DEV-CHANGELOG

This changelog tracks all changes made in the development workspace, including internal, experimental, and in-progress updates. Entries here may not be reflected in public releases or the user-facing changelog.


Unreleased (development workspace)

  • Crash detection API: Added GET /api/clients/crashed returning clients with process_status=crashed or stale heartbeat; includes crash_reason field (process_crashed | heartbeat_stale).
  • Crash auto-recovery (scheduler): Feature-flagged loop (CRASH_RECOVERY_ENABLED) scans crash candidates, issues reboot_host command, publishes to primary + compat MQTT topics; lockout window and expiry configurable via env.
  • Command expiry sweep (scheduler): Unconditional per-cycle sweep in sweep_expired_commands() marks non-terminal ClientCommand rows past expires_at as expired.
  • restart_app action registered in server/routes/clients.py API action map; sends same command lifecycle as reboot_host; safety lockout covers both actions.
  • service_failed listener: subscribes to infoscreen/+/service_failed on every connect; persists service_failed_at + service_failed_unit to Client; empty payload (retain clear) silently ignored.
  • Broker connection health: Listener health handler now extracts broker_connection.reconnect_count + broker_connection.last_disconnect_at and persists to Client.
  • DB migration b1c2d3e4f5a6: adds service_failed_at, service_failed_unit, mqtt_reconnect_count, mqtt_last_disconnect_at to clients table.
  • Model update: models/models.py Client class updated with all four new columns.
  • GET /api/clients/service_failed: lists clients with service_failed_at set, admin-or-higher gated.
  • POST /api/clients/<uuid>/clear_service_failed: clears DB flag and publishes empty retained MQTT to infoscreen/{uuid}/service_failed.
  • Monitoring overview includes mqtt_reconnect_count + mqtt_last_disconnect_at per client.
  • Frontend monitoring: orange service-failed alert panel (hidden when count=0), auto-refresh 15s, per-row Quittieren action.
  • Frontend monitoring: client detail now shows MQTT reconnect count + last disconnect timestamp.
  • Frontend types: ServiceFailedClient, ServiceFailedClientsResponse; helpers fetchServiceFailedClients(), clearServiceFailed() added to dashboard/src/apiClients.ts.
  • MQTT_EVENT_PAYLOAD_GUIDE.md: added service_failed topic contract.
  • MQTT auth hardening: Listener and scheduler now connect to broker with env-configured credentials (MQTT_BROKER_HOST, MQTT_BROKER_PORT, MQTT_USER, MQTT_PASSWORD) instead of anonymous fixed host/port defaults; optional TLS env toggles added in code path (MQTT_TLS_*).
  • Broker auth enforcement: mosquitto/config/mosquitto.conf now disables anonymous access and enables password-file authentication. docker-compose.yml MQTT service now bootstraps/update password entries from env (MQTT_USER/MQTT_PASSWORD, optional canary user) before starting broker.
  • Compose wiring: Added MQTT credential env propagation for listener/scheduler in both base and dev override compose files and switched MQTT healthcheck publish to authenticated mode.
  • Backend implementation: Introduced client command lifecycle foundation for remote control in server/routes/clients.py with command persistence (ClientCommand), schema-based MQTT publish to infoscreen/{uuid}/commands (QoS1, non-retained), new endpoints POST /api/clients/<uuid>/shutdown and GET /api/clients/commands/<command_id>, and restart safety lockout (blocked_safety after 3 restarts in 15 minutes). Added migration server/alembic/versions/aa12bb34cc56_add_client_commands_table.py and model updates in models/models.py. Restart path keeps transitional legacy MQTT publish to clients/{uuid}/restart for compatibility.
  • Listener integration: listener/listener.py now subscribes to infoscreen/+/commands/ack and updates command lifecycle states from client ACK payloads (accepted, execution_started, completed, failed).
  • Frontend API client prep: Extended dashboard/src/apiClients.ts with ClientCommand typing and helper calls for lifecycle consumption (shutdownClient, fetchClientCommandStatus), and updated restartClient to accept optional reason payload.
  • Contract freeze clarification: implementation-plan docs now explicitly freeze canonical MQTT topics (infoscreen/{uuid}/commands, infoscreen/{uuid}/commands/ack) and JSON schemas with examples; added transitional singular-topic compatibility aliases (infoscreen/{uuid}/command, infoscreen/{uuid}/command/ack) in server publish and listener ingest.
  • Action value canonicalization: command payload actions are now frozen as host-level values (reboot_host, shutdown_host). API endpoint mapping is explicit (/restart -> reboot_host, /shutdown -> shutdown_host), and docs/examples were updated to remove restart payload ambiguity.
  • Client helper snippets: Added frozen payload validation artifacts implementation-plans/reboot-command-payload-schemas.md and implementation-plans/reboot-command-payload-schemas.json (copy-ready snippets plus machine-validated JSON Schema).
  • Documentation alignment: Added active reboot implementation handoff docs under implementation-plans/ and linked them in README.md for immediate cross-team access (reboot-implementation-handoff-share.md, reboot-implementation-handoff-client-team.md, reboot-kickoff-summary.md).
  • Programminfo GUI regression/fix: dashboard/public/program-info.json could not be loaded in Programminfo menu due to invalid JSON in the new alpha.16 changelog line (malformed quote in a text entry). Fixed JSON entry and verified file parses correctly again.
  • Dashboard holiday banner fix: dashboard/src/dashboard.tsxloadHolidayStatus now uses a stable useCallback with empty deps, preventing repeated re-creation on render. useEffect depends only on the stable callback reference.
  • Dashboard Syncfusion stale-render fix: MessageComponent in the holiday banner now receives key={${severity}:${text}} to force remount when severity or text changes; without this Syncfusion cached stale DOM and the banner did not update reactively.
  • Dashboard German text: Replaced transliterated forms (ae/oe/ue) with correct Umlauts throughout visible dashboard UI strings — Präsentation, für, prüfen, Ferienüberschneidungen, verfügbar, Vorfälle, Ausfälle.
  • TV power intent (Phase 1): Scheduler publishes retained QoS1 group-level intents to infoscreen/groups/{group_id}/power/intent with transition+heartbeat semantics, startup/reconnect republish, and poll-based expiry (max(3 × poll_interval_sec, 90s)).
  • TV power validation: Added unit/integration/canary coverage in scheduler/test_power_intent_utils.py, scheduler/test_power_intent_scheduler.py, and test_power_intent_canary.py.
  • Monitoring system completion: End-to-end monitoring pipeline is active (MQTT logs/health → listener persistence → monitoring APIs → superadmin dashboard).
  • Monitoring API: Added/active endpoints GET /api/client-logs/monitoring-overview and GET /api/client-logs/recent-errors; per-client logs via GET /api/client-logs/<uuid>/logs.
  • Dashboard monitoring UI: Superadmin monitoring page is integrated and displays client health status, screenshots, process metadata, and recent error activity.
  • Bugfix: Presentation flags page_progress and auto_progress now persist reliably across create/update and detached-occurrence flows.
  • Frontend (Settings → Events): Added Presentations defaults (slideshow interval, page-progress, auto-progress) with load/save via /api/system-settings; UI uses Syncfusion controls.
  • Backend defaults: Seeded presentation_interval ("10"), presentation_page_progress ("true"), presentation_auto_progress ("true") in server/init_defaults.py when missing.
  • Data model: Added per-event fields page_progress and auto_progress on Event; Alembic migration applied successfully.
  • Event modal (dashboard): Extended to show and persist presentation pageProgress/autoProgress; applies system defaults on create and preserves per-event values on edit; payload includes page_progress, auto_progress, and slideshow_interval.
  • Scheduler behavior: Now publishes only currently active events per group (at "now"); clears retained topics by publishing [] for groups with no active events; normalizes naive timestamps and compares times in UTC; presentation payloads include page_progress and auto_progress.
  • Recurrence handling: Still queries a 7day window to expand recurring events and apply exceptions; recurring events only deactivate after recurrence_end (UNTIL).
  • Logging: Temporarily added filter diagnostics during debugging; removed verbose logs after verification.
  • WebUntis event type: Implemented new webuntis type. Event creation resolves URL from system supplement_table_url; returns 400 if not configured. WebUntis behaves like Website on clients (shared website payload).
  • Settings consolidation: Removed separate webuntis_url (if present during dev); WebUntis and Vertretungsplan share supplement_table_url. Removed /api/system-settings/webuntis-url endpoints; use /api/system-settings/supplement-table.
  • Scheduler payloads: Added top-level event_type for all events; introduced unified nested website payload for both website and webuntis events: { "type": "browser", "url": "…" }.
  • Frontend: Program info bumped to 2025.1.0-alpha.13; changelog includes WebUntis/Website unification and settings update. Event modal shows no per-event URL for WebUntis.
  • Documentation: Added MQTT_EVENT_PAYLOAD_GUIDE.md and WEBUNTIS_EVENT_IMPLEMENTATION.md. Updated .github/copilot-instructions.md and README.md for unified Website/WebUntis handling and system settings usage.

Note: These changes are available in the development environment and may be included in future releases. For released changes, see TECH-CHANGELOG.md.