feat(monitoring): add server-side client logging and health infrastructure
- add Alembic migration c1d2e3f4g5h6 for client monitoring:
- create client_logs table with FK to clients.uuid and performance indexes
- extend clients with process/health tracking fields
- extend data model with ClientLog, LogLevel, ProcessStatus, and ScreenHealthStatus
- enhance listener MQTT handling:
- subscribe to logs and health topics
- persist client logs from infoscreen/{uuid}/logs/{level}
- process health payloads and enrich heartbeat-derived client state
- add monitoring API blueprint server/routes/client_logs.py:
- GET /api/client-logs/<uuid>/logs
- GET /api/client-logs/summary
- GET /api/client-logs/recent-errors
- GET /api/client-logs/test
- register client_logs blueprint in server/wsgi.py
- align compose/dev runtime for listener live-code execution
- add client-side implementation docs:
- CLIENT_MONITORING_SPECIFICATION.md
- CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md
- update TECH-CHANGELOG.md and copilot-instructions.md:
- document monitoring changes
- codify post-release technical-notes/no-version-bump convention
This commit is contained in:
28
.github/copilot-instructions.md
vendored
28
.github/copilot-instructions.md
vendored
@@ -124,9 +124,11 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
|
||||
- Scheduler queries a future window (default: 7 days) to expand recurring events using RFC 5545 rules, applies event exceptions (skipped dates, detached occurrences), and publishes only events that are active at the current time (UTC). When a group has no active events, the scheduler clears its retained topic by publishing an empty list. Time comparisons are UTC; naive timestamps are normalized. Logging is concise; conversion lookups are cached and logged only once per media.
|
||||
- MQTT topics (paho-mqtt v2, use Callback API v2):
|
||||
- Discovery: `infoscreen/discovery` (JSON includes `uuid`, hw/ip data). ACK to `infoscreen/{uuid}/discovery_ack`. See `listener/listener.py`.
|
||||
- Heartbeat: `infoscreen/{uuid}/heartbeat` updates `Client.last_alive` (UTC).
|
||||
- Heartbeat: `infoscreen/{uuid}/heartbeat` updates `Client.last_alive` (UTC); enhanced payload includes `current_process`, `process_pid`, `process_status`, `current_event_id`.
|
||||
- Event lists (retained): `infoscreen/events/{group_id}` from `scheduler/scheduler.py`.
|
||||
- Per-client group assignment (retained): `infoscreen/{uuid}/group_id` via `server/mqtt_helper.py`.
|
||||
- Client logs: `infoscreen/{uuid}/logs/{error|warn|info}` with JSON payload (timestamp, message, context); QoS 1 for ERROR/WARN, QoS 0 for INFO.
|
||||
- Client health: `infoscreen/{uuid}/health` with metrics (expected_state, actual_state, health_metrics); QoS 0, published every 5 seconds.
|
||||
- Screenshots: server-side folders `server/received_screenshots/` and `server/screenshots/`; Nginx exposes `/screenshots/{uuid}.jpg` via `server/wsgi.py` route.
|
||||
|
||||
- Dev Container guidance: If extensions reappear inside the container, remove UI-only extensions from `devcontainer.json` `extensions` and map them in `remote.extensionKind` as `"ui"`.
|
||||
@@ -146,6 +148,11 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
|
||||
- `locked_until`: TIMESTAMP placeholder for account lockout (infrastructure in place, not yet enforced)
|
||||
- `deactivated_at`, `deactivated_by`: Soft-delete audit trail (FK self-reference); soft deactivation is the default, hard delete superadmin-only
|
||||
- Role hierarchy (privilege escalation enforced): `user` < `editor` < `admin` < `superadmin`
|
||||
- Client monitoring (migration: `c1d2e3f4g5h6_add_client_monitoring.py`):
|
||||
- `ClientLog` model: Centralized log storage with fields (id, client_uuid, timestamp, level, message, context, created_at); FK to clients.uuid (CASCADE)
|
||||
- `Client` model extended: 7 health monitoring fields (`current_event_id`, `current_process`, `process_status`, `process_pid`, `last_screenshot_analyzed`, `screen_health_status`, `last_screenshot_hash`)
|
||||
- Enums: `LogLevel` (ERROR, WARN, INFO, DEBUG), `ProcessStatus` (running, crashed, starting, stopped), `ScreenHealthStatus` (OK, BLACK, FROZEN, UNKNOWN)
|
||||
- Indexes: (client_uuid, timestamp DESC), (level, timestamp DESC), (created_at DESC) for performance
|
||||
- System settings: `system_settings` key–value store via `SystemSetting` for global configuration (e.g., WebUntis/Vertretungsplan supplement-table). Managed through routes in `server/routes/system_settings.py`.
|
||||
- Presentation defaults (system-wide):
|
||||
- `presentation_interval` (seconds, default "10")
|
||||
@@ -189,6 +196,11 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
|
||||
- `PUT /api/users/<id>/password` — admin password reset (requires backend check to reject self-reset for consistency)
|
||||
- `DELETE /api/users/<id>` — hard delete (superadmin only, with self-deletion check)
|
||||
- Auth routes (`server/routes/auth.py`): Enhanced to track login events (sets `last_login_at`, resets `failed_login_attempts` on success; increments `failed_login_attempts` and `last_failed_login_at` on failure). Self-service password change via `PUT /api/auth/change-password` requires current password verification.
|
||||
- Client logs (`server/routes/client_logs.py`): Centralized log retrieval for monitoring:
|
||||
- `GET /api/client-logs/<uuid>/logs` – Query client logs with filters (level, limit, since); admin_or_higher
|
||||
- `GET /api/client-logs/summary` – Log counts by level per client (last 24h); admin_or_higher
|
||||
- `GET /api/client-logs/recent-errors` – System-wide error monitoring; admin_or_higher
|
||||
- `GET /api/client-logs/test` – Infrastructure validation (no auth); returns recent logs with counts
|
||||
|
||||
Documentation maintenance: keep this file aligned with real patterns; update when routes/session/UTC rules change. Avoid long prose; link exact paths.
|
||||
|
||||
@@ -364,7 +376,8 @@ Docs maintenance guardrails (solo-friendly): Update this file alongside code cha
|
||||
## Quick examples
|
||||
- Add client description persists to DB and publishes group via MQTT: see `PUT /api/clients/<uuid>/description` in `routes/clients.py`.
|
||||
- Bulk group assignment emits retained messages for each client: `PUT /api/clients/group`.
|
||||
- Listener heartbeat path: `infoscreen/<uuid>/heartbeat` → sets `clients.last_alive`.
|
||||
- Listener heartbeat path: `infoscreen/<uuid>/heartbeat` → sets `clients.last_alive` and captures process health data.
|
||||
- Client monitoring flow: Client publishes to `infoscreen/{uuid}/logs/error` → listener stores in `client_logs` table → API serves via `/api/client-logs/<uuid>/logs` → dashboard displays (Phase 4, pending).
|
||||
|
||||
## Scheduler payloads: presentation extras
|
||||
- Presentation event payloads now include `page_progress` and `auto_progress` in addition to `slide_interval` and media files. These are sourced from per-event fields in the database (with system defaults applied on event creation).
|
||||
@@ -393,3 +406,14 @@ Questions or unclear areas? Tell us if you need: exact devcontainer debugging st
|
||||
- Breaking changes must be prefixed with `BREAKING:`
|
||||
- Keep ≤ 8–10 bullets; summarize or group micro-changes
|
||||
- JSON hygiene: valid JSON, no trailing commas, don’t edit historical entries except typos
|
||||
|
||||
## Versioning Convention (Tech vs UI)
|
||||
|
||||
- Use one unified app version across technical and user-facing release notes.
|
||||
- `dashboard/public/program-info.json` is user-facing and should list only user-visible changes.
|
||||
- `TECH-CHANGELOG.md` can include deeper technical details for the same released version.
|
||||
- If server/infrastructure work is implemented but not yet released or not user-visible, document it under the latest released section as:
|
||||
- `Backend technical work (post-release notes; no version bump)`
|
||||
- Do not create a new version header in `TECH-CHANGELOG.md` for internal milestones alone.
|
||||
- Bump version numbers when a release is actually cut/deployed (or when user-facing release notes are published), not for intermediate backend-only steps.
|
||||
- When UI integration lands later, include the user-visible part in the next release version and reference prior post-release technical groundwork when useful.
|
||||
|
||||
Reference in New Issue
Block a user