5 Commits

Author SHA1 Message Date
a58e9d3fca feat(listener): migrate dashboard MQTT payload to v2-only grouped schema
- Replace _extract_image_and_timestamp() with v2-only _extract_dashboard_payload_fields()
- Add _classify_dashboard_payload() + parse metrics (v2_success, parse_failures)
- Add soft _validate_v2_required_fields() for warning-only field checks
- Remove legacy fallback after soak confirmed legacy_fallback=0
- Fix: forward msg.payload directly to handle_screenshot() to avoid re-wrap bug
- Add 33 parser tests in listener/test_listener_parser.py
- Add MQTT_PAYLOAD_MIGRATION_GUIDE.md documenting the 10-step migration process
- Update README.md and copilot-instructions.md to reflect v2-only schema
2026-03-30 14:18:34 +00:00
90ccbdf920 fix(dashboard): restore event visibility and fix lint errors in App.tsx
Appointments: no longer hide existing events on holiday dates
Resources: load all overlapping events per group, include inactive/past events, and reload on date/view navigation
App.tsx: replace any types in password input handlers with typed event shapes
2026-03-30 09:51:22 +00:00
24cdf07279 feat(monitoring): add priority screenshot pipeline with screenshot_type + docs cleanup
Implement end-to-end support for typed screenshots and priority rendering in monitoring.

Added
- Accept and forward screenshot_type from MQTT screenshot/dashboard payloads
  (periodic, event_start, event_stop)
- Extend screenshot upload handling to persist typed screenshots and metadata
- Add dedicated priority screenshot serving endpoint with fallback behavior
- Extend monitoring overview with priority screenshot fields and summary count
- Add configurable PRIORITY_SCREENSHOT_TTL_SECONDS window for active priority state

Fixed
- Ensure screenshot cache-busting updates reliably via screenshot hash updates
- Preserve normal periodic screenshot flow while introducing event_start/event_stop priority path

Improved
- Monitoring dashboard now displays screenshot type badges
- Adaptive polling: faster refresh while priority screenshots are active
- Priority screenshot presentation is surfaced immediately to operators

Docs
- Update README and copilot-instructions to match new screenshot_type behavior,
  priority endpoint, TTL config, monitoring fields, and retention model
- Remove redundant/duplicate documentation blocks and improve troubleshooting section clarity
2026-03-29 13:13:13 +00:00
9c330f984f feat(monitoring): complete monitoring pipeline and fix presentation flag persistence
add superadmin monitoring dashboard with protected route, menu entry, and monitoring data client
add monitoring overview API endpoint and improve log serialization/aggregation for dashboard use
extend listener health/log handling with robust status/event/timestamp normalization and screenshot payload extraction
improve screenshot persistence and retrieval (timestamp-aware uploads, latest screenshot endpoint fallback)
fix page_progress and auto_progress persistence/serialization across create, update, and detached occurrence flows
align technical and project docs to reflect implemented monitoring and no-version-bump backend changes
add documentation sync log entry and include minor compose env indentation cleanup
2026-03-24 11:18:33 +00:00
3107d0f671 feat(monitoring): add server-side client logging and health infrastructure
- add Alembic migration c1d2e3f4g5h6 for client monitoring:
  - create client_logs table with FK to clients.uuid and performance indexes
  - extend clients with process/health tracking fields
- extend data model with ClientLog, LogLevel, ProcessStatus, and ScreenHealthStatus
- enhance listener MQTT handling:
  - subscribe to logs and health topics
  - persist client logs from infoscreen/{uuid}/logs/{level}
  - process health payloads and enrich heartbeat-derived client state
- add monitoring API blueprint server/routes/client_logs.py:
  - GET /api/client-logs/<uuid>/logs
  - GET /api/client-logs/summary
  - GET /api/client-logs/recent-errors
  - GET /api/client-logs/test
- register client_logs blueprint in server/wsgi.py
- align compose/dev runtime for listener live-code execution
- add client-side implementation docs:
  - CLIENT_MONITORING_SPECIFICATION.md
  - CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md
- update TECH-CHANGELOG.md and copilot-instructions.md:
  - document monitoring changes
  - codify post-release technical-notes/no-version-bump convention
2026-03-10 07:33:38 +00:00
25 changed files with 5281 additions and 119 deletions

View File

@@ -34,6 +34,7 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
- `dashboard/src/settings.tsx` — settings UI (nested tabs; system defaults for presentations and videos)
- `dashboard/src/ressourcen.tsx` — timeline view showing all groups' active events in parallel
- `dashboard/src/ressourcen.css` — timeline and resource view styling
- `dashboard/src/monitoring.tsx` — superadmin-only monitoring dashboard for client health, screenshots, and logs
@@ -50,11 +51,32 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
### Screenshot retention
- Screenshots sent via dashboard MQTT are stored in `server/screenshots/`.
- For each client, only the latest and last 20 timestamped screenshots are kept; older files are deleted automatically on each upload.
- Screenshot payloads support `screenshot_type` with values `periodic`, `event_start`, `event_stop`.
- `periodic` is the normal heartbeat/dashboard screenshot path; `event_start` and `event_stop` are high-priority screenshots for monitoring.
- For each client, the API keeps `{uuid}.jpg` as latest and the last 20 timestamped screenshots (`{uuid}_..._{type}.jpg`), deleting older timestamped files automatically.
- For high-priority screenshots, the API additionally maintains `{uuid}_priority.jpg` and metadata in `{uuid}_meta.json` (`latest_screenshot_type`, `last_priority_*`).
## Recent changes since last commit
### Latest (January 2026)
### Latest (March 2026)
- **Monitoring System Completion (no version bump)**:
- End-to-end monitoring pipeline completed: MQTT logs/health → listener persistence → monitoring APIs → superadmin dashboard
- API now serves aggregated monitoring via `GET /api/client-logs/monitoring-overview` and system-wide recent errors via `GET /api/client-logs/recent-errors`
- Monitoring dashboard (`dashboard/src/monitoring.tsx`) is active and displays client health states, screenshots, process metadata, and recent log activity
- **Screenshot Priority Pipeline (no version bump)**:
- Listener forwards `screenshot_type` from MQTT screenshot/dashboard payloads to `POST /api/clients/<uuid>/screenshot`.
- API stores typed screenshots, tracks latest/priority metadata, and serves priority images via `GET /screenshots/<uuid>/priority`.
- Monitoring overview exposes screenshot priority state (`latestScreenshotType`, `priorityScreenshotType`, `priorityScreenshotReceivedAt`, `hasActivePriorityScreenshot`) and `summary.activePriorityScreenshots`.
- Monitoring UI shows screenshot type badges and switches to faster refresh while priority screenshots are active.
- **MQTT Dashboard Payload v2 Cutover (no version bump)**:
- Dashboard payload parsing in `listener/listener.py` is now v2-only (`message`, `content`, `runtime`, `metadata`).
- Legacy top-level dashboard fallback was removed after migration soak (`legacy_fallback=0`).
- Listener observability summarizes parser health using `v2_success` and `parse_failures` counters.
- **Presentation Flags Persistence Fix**:
- Fixed persistence for presentation `page_progress` and `auto_progress` to ensure values are reliably stored and returned across create/update paths and detached occurrences
### Earlier (January 2026)
- **Ressourcen Page (Timeline View)**:
- New 'Ressourcen' page with parallel timeline view showing active events for all room groups
@@ -119,15 +141,17 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
## Service boundaries & data flow
- Database connection string is passed as `DB_CONN` (mysql+pymysql) to Python services.
- API builds its engine in `server/database.py` (loads `.env` only in development).
- Scheduler loads `DB_CONN` in `scheduler/db_utils.py`. Recurring events are expanded for the next 7 days, and event exceptions (skipped dates, detached occurrences) are respected. Only recurring events with recurrence_end in the future remain active. The scheduler publishes only events that are active at the current time and clears retained topics (publishes `[]`) for groups without active events. Time comparisons are UTC and naive timestamps are normalized.
- Listener also creates its own engine for writes to `clients`.
- Scheduler queries a future window (default: 7 days) to expand recurring events using RFC 5545 rules, applies event exceptions (skipped dates, detached occurrences), and publishes only events that are active at the current time (UTC). When a group has no active events, the scheduler clears its retained topic by publishing an empty list. Time comparisons are UTC; naive timestamps are normalized. Logging is concise; conversion lookups are cached and logged only once per media.
- MQTT topics (paho-mqtt v2, use Callback API v2):
- Discovery: `infoscreen/discovery` (JSON includes `uuid`, hw/ip data). ACK to `infoscreen/{uuid}/discovery_ack`. See `listener/listener.py`.
- Heartbeat: `infoscreen/{uuid}/heartbeat` updates `Client.last_alive` (UTC).
- Heartbeat: `infoscreen/{uuid}/heartbeat` updates `Client.last_alive` (UTC); enhanced payload includes `current_process`, `process_pid`, `process_status`, `current_event_id`.
- Event lists (retained): `infoscreen/events/{group_id}` from `scheduler/scheduler.py`.
- Per-client group assignment (retained): `infoscreen/{uuid}/group_id` via `server/mqtt_helper.py`.
- Screenshots: server-side folders `server/received_screenshots/` and `server/screenshots/`; Nginx exposes `/screenshots/{uuid}.jpg` via `server/wsgi.py` route.
- Client logs: `infoscreen/{uuid}/logs/{error|warn|info}` with JSON payload (timestamp, message, context); QoS 1 for ERROR/WARN, QoS 0 for INFO.
- Client health: `infoscreen/{uuid}/health` with metrics (expected_state, actual_state, health_metrics); QoS 0, published every 5 seconds.
- Dashboard screenshots: `infoscreen/{uuid}/dashboard` uses grouped v2 payload blocks (`message`, `content`, `runtime`, `metadata`); listener reads screenshot data from `content.screenshot` and capture type from `metadata.capture.type`.
- Screenshots: server-side folder `server/screenshots/`; API serves `/screenshots/{uuid}.jpg` (latest) and `/screenshots/{uuid}/priority` (active high-priority fallback to latest).
- Dev Container guidance: If extensions reappear inside the container, remove UI-only extensions from `devcontainer.json` `extensions` and map them in `remote.extensionKind` as `"ui"`.
@@ -146,6 +170,11 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
- `locked_until`: TIMESTAMP placeholder for account lockout (infrastructure in place, not yet enforced)
- `deactivated_at`, `deactivated_by`: Soft-delete audit trail (FK self-reference); soft deactivation is the default, hard delete superadmin-only
- Role hierarchy (privilege escalation enforced): `user` < `editor` < `admin` < `superadmin`
- Client monitoring (migration: `c1d2e3f4g5h6_add_client_monitoring.py`):
- `ClientLog` model: Centralized log storage with fields (id, client_uuid, timestamp, level, message, context, created_at); FK to clients.uuid (CASCADE)
- `Client` model extended: 7 health monitoring fields (`current_event_id`, `current_process`, `process_status`, `process_pid`, `last_screenshot_analyzed`, `screen_health_status`, `last_screenshot_hash`)
- Enums: `LogLevel` (ERROR, WARN, INFO, DEBUG), `ProcessStatus` (running, crashed, starting, stopped), `ScreenHealthStatus` (OK, BLACK, FROZEN, UNKNOWN)
- Indexes: (client_uuid, timestamp DESC), (level, timestamp DESC), (created_at DESC) for performance
- System settings: `system_settings` keyvalue store via `SystemSetting` for global configuration (e.g., WebUntis/Vertretungsplan supplement-table). Managed through routes in `server/routes/system_settings.py`.
- Presentation defaults (system-wide):
- `presentation_interval` (seconds, default "10")
@@ -189,6 +218,12 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
- `PUT /api/users/<id>/password` — admin password reset (requires backend check to reject self-reset for consistency)
- `DELETE /api/users/<id>` — hard delete (superadmin only, with self-deletion check)
- Auth routes (`server/routes/auth.py`): Enhanced to track login events (sets `last_login_at`, resets `failed_login_attempts` on success; increments `failed_login_attempts` and `last_failed_login_at` on failure). Self-service password change via `PUT /api/auth/change-password` requires current password verification.
- Client logs (`server/routes/client_logs.py`): Centralized log retrieval for monitoring:
- `GET /api/client-logs/<uuid>/logs` Query client logs with filters (level, limit, since); admin_or_higher
- `GET /api/client-logs/summary` Log counts by level per client (last 24h); admin_or_higher
- `GET /api/client-logs/recent-errors` System-wide error monitoring; admin_or_higher
- `GET /api/client-logs/monitoring-overview` Includes screenshot priority fields per client plus `summary.activePriorityScreenshots`; superadmin_only
- `GET /api/client-logs/test` Infrastructure validation (no auth); returns recent logs with counts
Documentation maintenance: keep this file aligned with real patterns; update when routes/session/UTC rules change. Avoid long prose; link exact paths.
@@ -246,6 +281,13 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
- API client in `dashboard/src/apiUsers.ts` for all user operations (listUsers, getUser, createUser, updateUser, resetUserPassword, deleteUser)
- Menu visibility: "Benutzer" menu item only visible to admin+ (role-gated in App.tsx)
- Monitoring page (`dashboard/src/monitoring.tsx`):
- Superadmin-only dashboard for client monitoring and diagnostics; menu item is hidden for lower roles and the route redirects non-superadmins.
- Uses `GET /api/client-logs/monitoring-overview` for aggregated live status, `GET /api/client-logs/recent-errors` for system-wide errors, and `GET /api/client-logs/<uuid>/logs` for per-client details.
- Shows per-client status (`healthy`, `warning`, `critical`, `offline`) based on heartbeat freshness, process state, screen state, and recent log counts.
- Displays latest screenshot preview and active priority screenshot (`/screenshots/{uuid}/priority` when active), screenshot type badges, current process metadata, and recent ERROR/WARN activity.
- Uses adaptive refresh: normal interval in steady state, faster polling while `activePriorityScreenshots > 0`.
- Settings page (`dashboard/src/settings.tsx`):
- Structure: Syncfusion TabComponent with role-gated tabs
- 📅 Academic Calendar (all users)
@@ -323,6 +365,7 @@ Note: Syncfusion usage in the dashboard is already documented above; if a UI for
- VITE_API_URL — Dashboard build-time base URL (prod); in dev the Vite proxy serves `/api` to `server:8000`.
- HEARTBEAT_GRACE_PERIOD_DEV / HEARTBEAT_GRACE_PERIOD_PROD — Groups "alive" window (defaults 180s dev / 170s prod). Clients send heartbeats every ~65s; grace periods allow 2 missed heartbeats plus safety margin.
- REFRESH_SECONDS — Optional scheduler republish interval; `0` disables periodic refresh.
- PRIORITY_SCREENSHOT_TTL_SECONDS — Optional monitoring priority window in seconds (default `120`); controls when event screenshots are considered active priority.
## Conventions & gotchas
- **Datetime Handling**:
@@ -332,7 +375,6 @@ Note: Syncfusion usage in the dashboard is already documented above; if a UI for
- Frontend **must** append 'Z' before parsing: `const utcStr = dateStr.endsWith('Z') ? dateStr : dateStr + 'Z'; new Date(utcStr);`
- Display in local timezone using `toLocaleTimeString('de-DE', { hour: '2-digit', minute: '2-digit' })`
- When sending to API, use `date.toISOString()` which includes 'Z' and is UTC
- Frontend must append `Z` to API strings before parsing; backend compares in UTC and returns ISO without `Z`.
- **JSON Naming Convention**:
- Backend uses snake_case internally (Python convention)
- API returns camelCase JSON (web standard): `startTime`, `endTime`, `groupId`, etc.
@@ -364,7 +406,8 @@ Docs maintenance guardrails (solo-friendly): Update this file alongside code cha
## Quick examples
- Add client description persists to DB and publishes group via MQTT: see `PUT /api/clients/<uuid>/description` in `routes/clients.py`.
- Bulk group assignment emits retained messages for each client: `PUT /api/clients/group`.
- Listener heartbeat path: `infoscreen/<uuid>/heartbeat` → sets `clients.last_alive`.
- Listener heartbeat path: `infoscreen/<uuid>/heartbeat` → sets `clients.last_alive` and captures process health data.
- Client monitoring flow: Client publishes to `infoscreen/{uuid}/logs/error` and `infoscreen/{uuid}/health` → listener stores/updates monitoring state → API serves `/api/client-logs/monitoring-overview`, `/api/client-logs/recent-errors`, and `/api/client-logs/<uuid>/logs` → superadmin monitoring dashboard displays live status.
## Scheduler payloads: presentation extras
- Presentation event payloads now include `page_progress` and `auto_progress` in addition to `slide_interval` and media files. These are sourced from per-event fields in the database (with system defaults applied on event creation).
@@ -393,3 +436,14 @@ Questions or unclear areas? Tell us if you need: exact devcontainer debugging st
- Breaking changes must be prefixed with `BREAKING:`
- Keep ≤ 810 bullets; summarize or group micro-changes
- JSON hygiene: valid JSON, no trailing commas, dont edit historical entries except typos
## Versioning Convention (Tech vs UI)
- Use one unified app version across technical and user-facing release notes.
- `dashboard/public/program-info.json` is user-facing and should list only user-visible changes.
- `TECH-CHANGELOG.md` can include deeper technical details for the same released version.
- If server/infrastructure work is implemented but not yet released or not user-visible, document it under the latest released section as:
- `Backend technical work (post-release notes; no version bump)`
- Do not create a new version header in `TECH-CHANGELOG.md` for internal milestones alone.
- Bump version numbers when a release is actually cut/deployed (or when user-facing release notes are published), not for intermediate backend-only steps.
- When UI integration lands later, include the user-visible part in the next release version and reference prior post-release technical groundwork when useful.

View File

@@ -98,3 +98,6 @@ exit 0 # warn only; do not block commit
- MQTT workers: `listener/listener.py`, `scheduler/scheduler.py`, `server/mqtt_helper.py`
- Frontend: `dashboard/vite.config.ts`, `dashboard/package.json`, `dashboard/src/*`
- Dev/Prod docs: `deployment.md`, `.env.example`
## Documentation sync log
- 2026-03-24: Synced docs for completed monitoring rollout and presentation flag persistence fix (`page_progress` / `auto_progress`). Updated `.github/copilot-instructions.md`, `README.md`, `TECH-CHANGELOG.md`, `DEV-CHANGELOG.md`, and `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` without a user-version bump.

View File

@@ -0,0 +1,757 @@
# 🚀 Client Monitoring Implementation Guide
**Phase-based implementation guide for basic monitoring in development phase**
---
## ✅ Phase 1: Server-Side Database Foundation
**Status:** ✅ COMPLETE
**Dependencies:** None - Already implemented
**Time estimate:** Completed
### ✅ Step 1.1: Database Migration
**File:** `server/alembic/versions/c1d2e3f4g5h6_add_client_monitoring.py`
**What it does:**
- Creates `client_logs` table for centralized logging
- Adds health monitoring columns to `clients` table
- Creates indexes for efficient querying
**To apply:**
```bash
cd /workspace/server
alembic upgrade head
```
### ✅ Step 1.2: Update Data Models
**File:** `models/models.py`
**What was added:**
- New enums: `LogLevel`, `ProcessStatus`, `ScreenHealthStatus`
- Updated `Client` model with health tracking fields
- New `ClientLog` model for log storage
---
## 🔧 Phase 2: Server-Side Backend Logic
**Status:** ✅ COMPLETE
**Dependencies:** Phase 1 complete
**Time estimate:** 2-3 hours
### Step 2.1: Extend MQTT Listener
**File:** `listener/listener.py`
**What to add:**
```python
# Add new topic subscriptions in on_connect():
client.subscribe("infoscreen/+/logs/error")
client.subscribe("infoscreen/+/logs/warn")
client.subscribe("infoscreen/+/logs/info") # Dev mode only
client.subscribe("infoscreen/+/health")
# Add new handler in on_message():
def handle_log_message(uuid, level, payload):
"""Store client log in database"""
from models.models import ClientLog, LogLevel
from server.database import Session
import json
session = Session()
try:
log_entry = ClientLog(
client_uuid=uuid,
timestamp=payload.get('timestamp', datetime.now(timezone.utc)),
level=LogLevel[level],
message=payload.get('message', ''),
context=json.dumps(payload.get('context', {}))
)
session.add(log_entry)
session.commit()
print(f"[LOG] {uuid} {level}: {payload.get('message', '')}")
except Exception as e:
print(f"Error saving log: {e}")
session.rollback()
finally:
session.close()
def handle_health_message(uuid, payload):
"""Update client health status"""
from models.models import Client, ProcessStatus
from server.database import Session
session = Session()
try:
client = session.query(Client).filter_by(uuid=uuid).first()
if client:
client.current_event_id = payload.get('expected_state', {}).get('event_id')
client.current_process = payload.get('actual_state', {}).get('process')
status_str = payload.get('actual_state', {}).get('status')
if status_str:
client.process_status = ProcessStatus[status_str]
client.process_pid = payload.get('actual_state', {}).get('pid')
session.commit()
except Exception as e:
print(f"Error updating health: {e}")
session.rollback()
finally:
session.close()
```
**Topic routing logic:**
```python
# In on_message callback, add routing:
if topic.endswith('/logs/error'):
handle_log_message(uuid, 'ERROR', payload)
elif topic.endswith('/logs/warn'):
handle_log_message(uuid, 'WARN', payload)
elif topic.endswith('/logs/info'):
handle_log_message(uuid, 'INFO', payload)
elif topic.endswith('/health'):
handle_health_message(uuid, payload)
```
### Step 2.2: Create API Routes
**File:** `server/routes/client_logs.py` (NEW)
```python
from flask import Blueprint, jsonify, request
from server.database import Session
from server.permissions import admin_or_higher
from models.models import ClientLog, Client
from sqlalchemy import desc
import json
client_logs_bp = Blueprint("client_logs", __name__, url_prefix="/api/client-logs")
@client_logs_bp.route("/<uuid>/logs", methods=["GET"])
@admin_or_higher
def get_client_logs(uuid):
"""
Get logs for a specific client
Query params:
- level: ERROR, WARN, INFO, DEBUG (optional)
- limit: number of entries (default 50, max 500)
- since: ISO timestamp (optional)
"""
session = Session()
try:
level = request.args.get('level')
limit = min(int(request.args.get('limit', 50)), 500)
since = request.args.get('since')
query = session.query(ClientLog).filter_by(client_uuid=uuid)
if level:
from models.models import LogLevel
query = query.filter_by(level=LogLevel[level])
if since:
from datetime import datetime
since_dt = datetime.fromisoformat(since.replace('Z', '+00:00'))
query = query.filter(ClientLog.timestamp >= since_dt)
logs = query.order_by(desc(ClientLog.timestamp)).limit(limit).all()
result = []
for log in logs:
result.append({
"id": log.id,
"timestamp": log.timestamp.isoformat() if log.timestamp else None,
"level": log.level.value if log.level else None,
"message": log.message,
"context": json.loads(log.context) if log.context else {}
})
session.close()
return jsonify({"logs": result, "count": len(result)})
except Exception as e:
session.close()
return jsonify({"error": str(e)}), 500
@client_logs_bp.route("/summary", methods=["GET"])
@admin_or_higher
def get_logs_summary():
"""Get summary of errors/warnings across all clients"""
session = Session()
try:
from sqlalchemy import func
from models.models import LogLevel
from datetime import datetime, timedelta
# Last 24 hours
since = datetime.utcnow() - timedelta(hours=24)
stats = session.query(
ClientLog.client_uuid,
ClientLog.level,
func.count(ClientLog.id).label('count')
).filter(
ClientLog.timestamp >= since
).group_by(
ClientLog.client_uuid,
ClientLog.level
).all()
result = {}
for stat in stats:
uuid = stat.client_uuid
if uuid not in result:
result[uuid] = {"ERROR": 0, "WARN": 0, "INFO": 0}
result[uuid][stat.level.value] = stat.count
session.close()
return jsonify({"summary": result, "period_hours": 24})
except Exception as e:
session.close()
return jsonify({"error": str(e)}), 500
```
**Register in `server/wsgi.py`:**
```python
from server.routes.client_logs import client_logs_bp
app.register_blueprint(client_logs_bp)
```
### Step 2.3: Add Health Data to Heartbeat Handler
**File:** `listener/listener.py` (extend existing heartbeat handler)
```python
# Modify existing heartbeat handler to capture health data
def on_message(client, userdata, message):
topic = message.topic
# Existing heartbeat logic...
if '/heartbeat' in topic:
uuid = extract_uuid_from_topic(topic)
try:
payload = json.loads(message.payload.decode())
# Update last_alive (existing)
session = Session()
client_obj = session.query(Client).filter_by(uuid=uuid).first()
if client_obj:
client_obj.last_alive = datetime.now(timezone.utc)
# NEW: Update health data if present in heartbeat
if 'process_status' in payload:
client_obj.process_status = ProcessStatus[payload['process_status']]
if 'current_process' in payload:
client_obj.current_process = payload['current_process']
if 'process_pid' in payload:
client_obj.process_pid = payload['process_pid']
if 'current_event_id' in payload:
client_obj.current_event_id = payload['current_event_id']
session.commit()
session.close()
except Exception as e:
print(f"Error processing heartbeat: {e}")
```
---
## 🖥️ Phase 3: Client-Side Implementation
**Status:** ✅ COMPLETE
**Dependencies:** Phase 2 complete
**Time estimate:** 3-4 hours
### Step 3.1: Create Client Watchdog Script
**File:** `client/watchdog.py` (NEW - on client device)
```python
#!/usr/bin/env python3
"""
Client-side process watchdog
Monitors VLC, Chromium, PDF viewer and reports health to server
"""
import psutil
import paho.mqtt.client as mqtt
import json
import time
from datetime import datetime, timezone
import sys
import os
class MediaWatchdog:
def __init__(self, client_uuid, mqtt_broker, mqtt_port=1883):
self.uuid = client_uuid
self.mqtt_client = mqtt.Client()
self.mqtt_client.connect(mqtt_broker, mqtt_port, 60)
self.mqtt_client.loop_start()
self.current_process = None
self.current_event_id = None
self.restart_attempts = 0
self.MAX_RESTARTS = 3
def send_log(self, level, message, context=None):
"""Send log message to server via MQTT"""
topic = f"infoscreen/{self.uuid}/logs/{level.lower()}"
payload = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"message": message,
"context": context or {}
}
self.mqtt_client.publish(topic, json.dumps(payload), qos=1)
print(f"[{level}] {message}")
def send_health(self, process_name, pid, status, event_id=None):
"""Send health status to server"""
topic = f"infoscreen/{self.uuid}/health"
payload = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"expected_state": {
"event_id": event_id
},
"actual_state": {
"process": process_name,
"pid": pid,
"status": status # 'running', 'crashed', 'starting', 'stopped'
}
}
self.mqtt_client.publish(topic, json.dumps(payload), qos=1, retain=False)
def is_process_running(self, process_name):
"""Check if a process is running"""
for proc in psutil.process_iter(['name', 'pid']):
try:
if process_name.lower() in proc.info['name'].lower():
return proc.info['pid']
except (psutil.NoSuchProcess, psutil.AccessDenied):
pass
return None
def monitor_loop(self):
"""Main monitoring loop"""
print(f"Watchdog started for client {self.uuid}")
self.send_log("INFO", "Watchdog service started", {"uuid": self.uuid})
while True:
try:
# Check expected process (would be set by main event handler)
if self.current_process:
pid = self.is_process_running(self.current_process)
if pid:
# Process is running
self.send_health(
self.current_process,
pid,
"running",
self.current_event_id
)
self.restart_attempts = 0 # Reset on success
else:
# Process crashed
self.send_log(
"ERROR",
f"Process {self.current_process} crashed or stopped",
{
"event_id": self.current_event_id,
"process": self.current_process,
"restart_attempt": self.restart_attempts
}
)
if self.restart_attempts < self.MAX_RESTARTS:
self.send_log("WARN", f"Attempting restart ({self.restart_attempts + 1}/{self.MAX_RESTARTS})")
self.restart_attempts += 1
# TODO: Implement restart logic (call event handler)
else:
self.send_log("ERROR", "Max restart attempts exceeded", {
"event_id": self.current_event_id
})
time.sleep(5) # Check every 5 seconds
except KeyboardInterrupt:
print("Watchdog stopped by user")
break
except Exception as e:
self.send_log("ERROR", f"Watchdog error: {str(e)}", {
"exception": str(e),
"traceback": str(sys.exc_info())
})
time.sleep(10) # Wait longer on error
if __name__ == "__main__":
import sys
if len(sys.argv) < 3:
print("Usage: python watchdog.py <client_uuid> <mqtt_broker>")
sys.exit(1)
uuid = sys.argv[1]
broker = sys.argv[2]
watchdog = MediaWatchdog(uuid, broker)
watchdog.monitor_loop()
```
### Step 3.2: Integrate with Existing Event Handler
**File:** `client/event_handler.py` (modify existing)
```python
# When starting a new event, notify watchdog
def play_event(event_data):
event_type = event_data.get('event_type')
event_id = event_data.get('id')
if event_type == 'video':
process_name = 'vlc'
# Start VLC...
elif event_type == 'website':
process_name = 'chromium'
# Start Chromium...
elif event_type == 'presentation':
process_name = 'pdf_viewer' # or your PDF tool
# Start PDF viewer...
# Notify watchdog about expected process
watchdog.current_process = process_name
watchdog.current_event_id = event_id
watchdog.restart_attempts = 0
```
### Step 3.3: Enhanced Heartbeat Payload
**File:** `client/heartbeat.py` (modify existing)
```python
# Modify existing heartbeat to include process status
def send_heartbeat(mqtt_client, uuid):
# Get current process status
current_process = None
process_pid = None
process_status = "stopped"
# Check if expected process is running
if watchdog.current_process:
pid = watchdog.is_process_running(watchdog.current_process)
if pid:
current_process = watchdog.current_process
process_pid = pid
process_status = "running"
payload = {
"uuid": uuid,
"timestamp": datetime.now(timezone.utc).isoformat(),
# Existing fields...
# NEW health fields:
"current_process": current_process,
"process_pid": process_pid,
"process_status": process_status,
"current_event_id": watchdog.current_event_id
}
mqtt_client.publish(f"infoscreen/{uuid}/heartbeat", json.dumps(payload))
```
---
## 🎨 Phase 4: Dashboard UI Integration
**Status:** ✅ COMPLETE
**Dependencies:** Phases 2 & 3 complete
**Time estimate:** 2-3 hours
### Step 4.1: Create Log Viewer Component
**File:** `dashboard/src/ClientLogs.tsx` (NEW)
```typescript
import React from 'react';
import { GridComponent, ColumnsDirective, ColumnDirective, Page, Inject } from '@syncfusion/ej2-react-grids';
interface LogEntry {
id: number;
timestamp: string;
level: 'ERROR' | 'WARN' | 'INFO' | 'DEBUG';
message: string;
context: any;
}
interface ClientLogsProps {
clientUuid: string;
}
export const ClientLogs: React.FC<ClientLogsProps> = ({ clientUuid }) => {
const [logs, setLogs] = React.useState<LogEntry[]>([]);
const [loading, setLoading] = React.useState(false);
const loadLogs = async (level?: string) => {
setLoading(true);
try {
const params = new URLSearchParams({ limit: '50' });
if (level) params.append('level', level);
const response = await fetch(`/api/client-logs/${clientUuid}/logs?${params}`);
const data = await response.json();
setLogs(data.logs);
} catch (err) {
console.error('Failed to load logs:', err);
} finally {
setLoading(false);
}
};
React.useEffect(() => {
loadLogs();
const interval = setInterval(() => loadLogs(), 30000); // Refresh every 30s
return () => clearInterval(interval);
}, [clientUuid]);
const levelTemplate = (props: any) => {
const colors = {
ERROR: 'text-red-600 bg-red-100',
WARN: 'text-yellow-600 bg-yellow-100',
INFO: 'text-blue-600 bg-blue-100',
DEBUG: 'text-gray-600 bg-gray-100'
};
return (
<span className={`px-2 py-1 rounded ${colors[props.level as keyof typeof colors]}`}>
{props.level}
</span>
);
};
return (
<div>
<div className="mb-4 flex gap-2">
<button onClick={() => loadLogs()} className="e-btn e-primary">All</button>
<button onClick={() => loadLogs('ERROR')} className="e-btn e-danger">Errors</button>
<button onClick={() => loadLogs('WARN')} className="e-btn e-warning">Warnings</button>
<button onClick={() => loadLogs('INFO')} className="e-btn e-info">Info</button>
</div>
<GridComponent
dataSource={logs}
allowPaging={true}
pageSettings={{ pageSize: 20 }}
>
<ColumnsDirective>
<ColumnDirective field='timestamp' headerText='Time' width='180' format='yMd HH:mm:ss' />
<ColumnDirective field='level' headerText='Level' width='100' template={levelTemplate} />
<ColumnDirective field='message' headerText='Message' width='400' />
</ColumnsDirective>
<Inject services={[Page]} />
</GridComponent>
</div>
);
};
```
### Step 4.2: Add Health Indicators to Client Cards
**File:** `dashboard/src/clients.tsx` (modify existing)
```typescript
// Add health indicator to client card
const getHealthBadge = (client: Client) => {
if (!client.process_status) {
return <span className="badge badge-secondary">Unknown</span>;
}
const badges = {
running: <span className="badge badge-success"> Running</span>,
crashed: <span className="badge badge-danger"> Crashed</span>,
starting: <span className="badge badge-warning"> Starting</span>,
stopped: <span className="badge badge-secondary"> Stopped</span>
};
return badges[client.process_status] || null;
};
// In client card render:
<div className="client-card">
<h3>{client.hostname || client.uuid}</h3>
<div>Status: {getHealthBadge(client)}</div>
<div>Process: {client.current_process || 'None'}</div>
<div>Event ID: {client.current_event_id || 'None'}</div>
<button onClick={() => showLogs(client.uuid)}>View Logs</button>
</div>
```
### Step 4.3: Add System Health Dashboard (Superadmin)
**File:** `dashboard/src/SystemMonitor.tsx` (NEW)
```typescript
import React from 'react';
import { ClientLogs } from './ClientLogs';
export const SystemMonitor: React.FC = () => {
const [summary, setSummary] = React.useState<any>({});
const loadSummary = async () => {
const response = await fetch('/api/client-logs/summary');
const data = await response.json();
setSummary(data.summary);
};
React.useEffect(() => {
loadSummary();
const interval = setInterval(loadSummary, 30000);
return () => clearInterval(interval);
}, []);
return (
<div className="system-monitor">
<h2>System Health Monitor (Superadmin)</h2>
<div className="alert-panel">
<h3>Active Issues</h3>
{Object.entries(summary).map(([uuid, stats]: [string, any]) => (
stats.ERROR > 0 || stats.WARN > 5 ? (
<div key={uuid} className="alert">
🔴 {uuid}: {stats.ERROR} errors, {stats.WARN} warnings (24h)
</div>
) : null
))}
</div>
{/* Real-time log stream */}
<div className="log-stream">
<h3>Recent Logs (All Clients)</h3>
{/* Implement real-time log aggregation */}
</div>
</div>
);
};
```
---
## 🧪 Phase 5: Testing & Validation
**Status:** ✅ COMPLETE
**Dependencies:** All previous phases
**Time estimate:** 1-2 hours
### Step 5.1: Server-Side Tests
```bash
# Test database migration
cd /workspace/server
alembic upgrade head
alembic downgrade -1
alembic upgrade head
# Test API endpoints
curl -X GET "http://localhost:8000/api/client-logs/<uuid>/logs?limit=10"
curl -X GET "http://localhost:8000/api/client-logs/summary"
```
### Step 5.2: Client-Side Tests
```bash
# On client device
python3 watchdog.py <your-uuid> <mqtt-broker-ip>
# Simulate process crash
pkill vlc # Should trigger error log and restart attempt
# Check MQTT messages
mosquitto_sub -h <broker> -t "infoscreen/+/logs/#" -v
mosquitto_sub -h <broker> -t "infoscreen/+/health" -v
```
### Step 5.3: Dashboard Tests
1. Open dashboard and navigate to Clients page
2. Verify health indicators show correct status
3. Click "View Logs" and verify logs appear
4. Navigate to System Monitor (superadmin)
5. Verify summary statistics are correct
---
## 📝 Configuration Summary
### Environment Variables
**Server (docker-compose.yml):**
```yaml
- LOG_RETENTION_DAYS=90 # How long to keep logs
- DEBUG_MODE=true # Enable INFO level logging via MQTT
```
**Client:**
```bash
export MQTT_BROKER="your-server-ip"
export CLIENT_UUID="abc-123-def"
export WATCHDOG_ENABLED=true
```
### MQTT Topics Reference
| Topic Pattern | Direction | Purpose |
|--------------|-----------|---------|
| `infoscreen/{uuid}/logs/error` | Client → Server | Error messages |
| `infoscreen/{uuid}/logs/warn` | Client → Server | Warning messages |
| `infoscreen/{uuid}/logs/info` | Client → Server | Info (dev only) |
| `infoscreen/{uuid}/health` | Client → Server | Health metrics |
| `infoscreen/{uuid}/heartbeat` | Client → Server | Enhanced heartbeat |
### Database Tables
**client_logs:**
- Stores all centralized logs
- Indexed by client_uuid, timestamp, level
- Auto-cleanup after 90 days (recommended)
**clients (extended):**
- `current_event_id`: Which event should be playing
- `current_process`: Expected process name
- `process_status`: running/crashed/starting/stopped
- `process_pid`: Process ID
- `screen_health_status`: OK/BLACK/FROZEN/UNKNOWN
- `last_screenshot_analyzed`: Last analysis time
- `last_screenshot_hash`: For frozen detection
---
## 🎯 Next Steps After Implementation
1. **Deploy Phase 1-2** to staging environment
2. **Test with 1-2 pilot clients** before full rollout
3. **Monitor traffic & performance** (should be minimal)
4. **Fine-tune log levels** based on actual noise
5. **Add alerting** (email/Slack when errors > threshold)
6. **Implement screenshot analysis** (Phase 2 enhancement)
7. **Add trending/analytics** (which clients are least reliable)
---
## 🚨 Troubleshooting
**Logs not appearing in database:**
- Check MQTT broker logs: `docker logs infoscreen-mqtt`
- Verify listener subscriptions: Check `listener/listener.py` logs
- Test MQTT manually: `mosquitto_pub -h broker -t "infoscreen/test/logs/error" -m '{"message":"test"}'`
**High database growth:**
- Check log_retention cleanup cronjob
- Reduce INFO level logging frequency
- Add sampling (log every 10th occurrence instead of all)
**Client watchdog not detecting crashes:**
- Verify psutil can see processes: `ps aux | grep vlc`
- Check permissions (may need sudo for some process checks)
- Increase monitor loop frequency for faster detection
---
## ✅ Completion Checklist
- [x] Phase 1: Database migration applied
- [x] Phase 2: Listener extended for log topics
- [x] Phase 2: API endpoints created and tested
- [x] Phase 3: Client watchdog implemented
- [x] Phase 3: Enhanced heartbeat deployed
- [x] Phase 4: Dashboard log viewer working
- [x] Phase 4: Health indicators visible
- [x] Phase 5: End-to-end testing complete
- [x] Documentation updated with new features
- [x] Production deployment plan created
---
**Last Updated:** 2026-03-24
**Author:** GitHub Copilot
**For:** Infoscreen 2025 Project

View File

@@ -0,0 +1,979 @@
# Client-Side Monitoring Specification
**Version:** 1.0
**Date:** 2026-03-10
**For:** Infoscreen Client Implementation
**Server Endpoint:** `192.168.43.201:8000` (or your production server)
**MQTT Broker:** `192.168.43.201:1883` (or your production MQTT broker)
---
## 1. Overview
Each infoscreen client must implement health monitoring and logging capabilities to report status to the central server via MQTT.
### 1.1 Goals
- **Detect failures:** Process crashes, frozen screens, content mismatches
- **Provide visibility:** Real-time health status visible on server dashboard
- **Enable remote diagnosis:** Centralized log storage for debugging
- **Auto-recovery:** Attempt automatic restart on failure
### 1.2 Architecture
```
┌─────────────────────────────────────────┐
│ Infoscreen Client │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Media Player │ │ Watchdog │ │
│ │ (VLC/Chrome) │◄───│ Monitor │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────────┐ │ │
│ │ Event Mgr │ │ │
│ │ (receives │ │ │
│ │ schedule) │◄───────────┘ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────────────────────┐ │
│ │ MQTT Client │ │
│ │ - Heartbeat (every 60s) │ │
│ │ - Logs (error/warn/info) │ │
│ │ - Health metrics (every 5s) │ │
│ └──────┬────────────────────────┘ │
└─────────┼──────────────────────────────┘
│ MQTT over TCP
┌─────────────┐
│ MQTT Broker │
│ (server) │
└─────────────┘
```
### 1.3 Current Compatibility Notes
- The server now accepts both the original specification payloads and the currently implemented Phase 3 client payloads.
- `infoscreen/{uuid}/health` may currently contain a reduced payload with only `expected_state.event_id` and `actual_state.process|pid|status`. Additional `health_metrics` fields from this specification remain recommended.
- `event_id` is still specified as an integer. For compatibility with the current Phase 3 client, the server also tolerates string values such as `event_123` and extracts the numeric suffix where possible.
- If the client sends `process_health` inside `infoscreen/{uuid}/dashboard`, the server treats it as a fallback source for `current_process`, `process_pid`, `process_status`, and `current_event_id`.
- Long term, the preferred client payload remains the structure in this specification so the server can surface richer monitoring data such as screen state and resource metrics.
---
## 2. MQTT Protocol Specification
### 2.1 Connection Parameters
```
Broker: 192.168.43.201 (or DNS hostname)
Port: 1883 (standard MQTT)
Protocol: MQTT v3.1.1
Client ID: "infoscreen-{client_uuid}"
Clean Session: false (retain subscriptions)
Keep Alive: 60 seconds
Username/Password: (if configured on broker)
```
### 2.2 QoS Levels
- **Heartbeat:** QoS 0 (fire and forget, high frequency)
- **Logs (ERROR/WARN):** QoS 1 (at least once delivery, important)
- **Logs (INFO):** QoS 0 (optional, high volume)
- **Health metrics:** QoS 0 (frequent, latest value matters)
---
## 3. Topic Structure & Payload Formats
### 3.1 Log Messages
#### Topic Pattern:
```
infoscreen/{client_uuid}/logs/{level}
```
Where `{level}` is one of: `error`, `warn`, `info`
#### Payload Format (JSON):
```json
{
"timestamp": "2026-03-10T07:30:00Z",
"message": "Human-readable error description",
"context": {
"event_id": 42,
"process": "vlc",
"error_code": "NETWORK_TIMEOUT",
"additional_key": "any relevant data"
}
}
```
#### Field Specifications:
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `timestamp` | string (ISO 8601 UTC) | Yes | When the event occurred. Use `YYYY-MM-DDTHH:MM:SSZ` format |
| `message` | string | Yes | Human-readable description of the event (max 1000 chars) |
| `context` | object | No | Additional structured data (will be stored as JSON) |
#### Example Topics:
```
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/error
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/warn
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/info
```
#### When to Send Logs:
**ERROR (Always send):**
- Process crashed (VLC/Chromium/PDF viewer terminated unexpectedly)
- Content failed to load (404, network timeout, corrupt file)
- Hardware failure detected (display off, audio device missing)
- Exception caught in main event loop
- Maximum restart attempts exceeded
**WARN (Always send):**
- Process restarted automatically (after crash)
- High resource usage (CPU >80%, RAM >90%)
- Slow performance (frame drops, lag)
- Non-critical failures (screenshot capture failed, cache full)
- Fallback content displayed (primary source unavailable)
**INFO (Send in development, optional in production):**
- Process started successfully
- Event transition (switched from video to presentation)
- Content loaded successfully
- Watchdog service started/stopped
---
### 3.2 Health Metrics
#### Topic Pattern:
```
infoscreen/{client_uuid}/health
```
#### Payload Format (JSON):
```json
{
"timestamp": "2026-03-10T07:30:00Z",
"expected_state": {
"event_id": 42,
"event_type": "video",
"media_file": "presentation.mp4",
"started_at": "2026-03-10T07:15:00Z"
},
"actual_state": {
"process": "vlc",
"pid": 1234,
"status": "running",
"uptime_seconds": 900,
"position": 45.3,
"duration": 180.0
},
"health_metrics": {
"screen_on": true,
"last_frame_update": "2026-03-10T07:29:58Z",
"frames_dropped": 2,
"network_errors": 0,
"cpu_percent": 15.3,
"memory_mb": 234
}
}
```
#### Field Specifications:
**expected_state:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `event_id` | integer | Yes | Current event ID from scheduler |
| `event_type` | string | Yes | `presentation`, `video`, `website`, `webuntis`, `message` |
| `media_file` | string | No | Filename or URL of current content |
| `started_at` | string (ISO 8601) | Yes | When this event started playing |
**actual_state:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `process` | string | Yes | `vlc`, `chromium`, `pdf_viewer`, `none` |
| `pid` | integer | No | Process ID (if running) |
| `status` | string | Yes | `running`, `crashed`, `starting`, `stopped` |
| `uptime_seconds` | integer | No | How long process has been running |
| `position` | float | No | Current playback position (seconds, for video/audio) |
| `duration` | float | No | Total content duration (seconds) |
**health_metrics:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `screen_on` | boolean | Yes | Is display powered on? |
| `last_frame_update` | string (ISO 8601) | No | Last time screen content changed |
| `frames_dropped` | integer | No | Video frames dropped (performance indicator) |
| `network_errors` | integer | No | Count of network errors in last interval |
| `cpu_percent` | float | No | CPU usage (0-100) |
| `memory_mb` | integer | No | RAM usage in megabytes |
#### Sending Frequency:
- **Normal operation:** Every 5 seconds
- **During startup/transition:** Every 1 second
- **After error:** Immediately + every 2 seconds until recovered
---
### 3.3 Enhanced Heartbeat
The existing heartbeat topic should be enhanced to include process status.
#### Topic Pattern:
```
infoscreen/{client_uuid}/heartbeat
```
#### Enhanced Payload Format (JSON):
```json
{
"uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
"timestamp": "2026-03-10T07:30:00Z",
"current_process": "vlc",
"process_pid": 1234,
"process_status": "running",
"current_event_id": 42
}
```
#### New Fields (add to existing heartbeat):
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `current_process` | string | No | Name of active media player process |
| `process_pid` | integer | No | Process ID |
| `process_status` | string | No | `running`, `crashed`, `starting`, `stopped` |
| `current_event_id` | integer | No | Event ID currently being displayed |
#### Sending Frequency:
- Keep existing: **Every 60 seconds**
- Include new fields if available
---
## 4. Process Monitoring Requirements
### 4.1 Processes to Monitor
| Media Type | Process Name | How to Detect |
|------------|--------------|---------------|
| Video | `vlc` | `ps aux \| grep vlc` or `pgrep vlc` |
| Website/WebUntis | `chromium` or `chromium-browser` | `pgrep chromium` |
| PDF Presentation | `evince`, `okular`, or custom viewer | `pgrep {viewer_name}` |
### 4.2 Monitoring Checks (Every 5 seconds)
#### Check 1: Process Alive
```
Goal: Verify expected process is running
Method:
- Get list of running processes (psutil or `ps`)
- Check if expected process name exists
- Match PID if known
Result:
- If missing → status = "crashed"
- If found → status = "running"
Action on crash:
- Send ERROR log immediately
- Attempt restart (max 3 attempts)
- Send WARN log on each restart
- If max restarts exceeded → send ERROR log, display fallback
```
#### Check 2: Process Responsive
```
Goal: Detect frozen processes
Method:
- For VLC: Query HTTP interface (status.json)
- For Chromium: Use DevTools Protocol (CDP)
- For custom viewers: Check last screen update time
Result:
- If same frame >30 seconds → likely frozen
- If playback position not advancing → frozen
Action on freeze:
- Send WARN log
- Force refresh (reload page, seek video, next slide)
- If refresh fails → restart process
```
#### Check 3: Content Match
```
Goal: Verify correct content is displayed
Method:
- Compare expected event_id with actual media/URL
- Check scheduled time window (is event still active?)
Result:
- Mismatch → content error
Action:
- Send WARN log
- Reload correct event from scheduler
```
---
## 5. Process Control Interface Requirements
### 5.1 VLC Control
**Requirement:** Enable VLC HTTP interface for monitoring
**Launch Command:**
```bash
vlc --intf http --http-host 127.0.0.1 --http-port 8080 --http-password "vlc_password" \
--fullscreen --loop /path/to/video.mp4
```
**Status Query:**
```bash
curl http://127.0.0.1:8080/requests/status.json --user ":vlc_password"
```
**Response Fields to Monitor:**
```json
{
"state": "playing", // "playing", "paused", "stopped"
"position": 0.25, // 0.0-1.0 (25% through)
"time": 45, // seconds into playback
"length": 180, // total duration in seconds
"volume": 256 // 0-512
}
```
---
### 5.2 Chromium Control
**Requirement:** Enable Chrome DevTools Protocol (CDP)
**Launch Command:**
```bash
chromium --remote-debugging-port=9222 --kiosk --app=https://example.com
```
**Status Query:**
```bash
curl http://127.0.0.1:9222/json
```
**Response Fields to Monitor:**
```json
[
{
"url": "https://example.com",
"title": "Page Title",
"type": "page"
}
]
```
**Advanced:** Use CDP WebSocket for events (page load, navigation, errors)
---
### 5.3 PDF Viewer (Custom or Standard)
**Option A: Standard Viewer (e.g., Evince)**
- No built-in API
- Monitor via process check + screenshot comparison
**Option B: Custom Python Viewer**
- Implement REST API for status queries
- Track: current page, total pages, last transition time
---
## 6. Watchdog Service Architecture
### 6.1 Service Components
**Component 1: Process Monitor Thread**
```
Responsibilities:
- Check process alive every 5 seconds
- Detect crashes and frozen processes
- Attempt automatic restart
- Send health metrics via MQTT
State Machine:
IDLE → STARTING → RUNNING → (if crash) → RESTARTING → RUNNING
→ (if max restarts) → FAILED
```
**Component 2: MQTT Publisher Thread**
```
Responsibilities:
- Maintain MQTT connection
- Send heartbeat every 60 seconds
- Send logs on-demand (queued from other components)
- Send health metrics every 5 seconds
- Reconnect on connection loss
```
**Component 3: Event Manager Integration**
```
Responsibilities:
- Receive event schedule from server
- Notify watchdog of expected process/content
- Launch media player processes
- Handle event transitions
```
### 6.2 Service Lifecycle
**On Startup:**
1. Load configuration (client UUID, MQTT broker, etc.)
2. Connect to MQTT broker
3. Send INFO log: "Watchdog service started"
4. Wait for first event from scheduler
**During Operation:**
1. Monitor loop runs every 5 seconds
2. Check expected vs actual process state
3. Send health metrics
4. Handle failures (log + restart)
**On Shutdown:**
1. Send INFO log: "Watchdog service stopping"
2. Gracefully stop monitored processes
3. Disconnect from MQTT
4. Exit cleanly
---
## 7. Auto-Recovery Logic
### 7.1 Restart Strategy
**Step 1: Detect Failure**
```
Trigger: Process not found in process list
Action:
- Log ERROR: "Process {name} crashed"
- Increment restart counter
- Check if within retry limit (max 3)
```
**Step 2: Attempt Restart**
```
If restart_attempts < MAX_RESTARTS:
- Log WARN: "Attempting restart ({attempt}/{MAX_RESTARTS})"
- Kill any zombie processes
- Wait 2 seconds (cooldown)
- Launch process with same parameters
- Wait 5 seconds for startup
- Verify process is running
- If success: reset restart counter, log INFO
- If fail: increment counter, repeat
```
**Step 3: Permanent Failure**
```
If restart_attempts >= MAX_RESTARTS:
- Log ERROR: "Max restart attempts exceeded, failing over"
- Display fallback content (static image with error message)
- Send notification to server (separate alert topic, optional)
- Wait for manual intervention or scheduler event change
```
### 7.2 Restart Cooldown
**Purpose:** Prevent rapid restart loops that waste resources
**Implementation:**
```
After each restart attempt:
- Wait 2 seconds before next restart
- After 3 failures: wait 30 seconds before trying again
- Reset counter on successful run >5 minutes
```
---
## 8. Resource Monitoring
### 8.1 System Metrics to Track
**CPU Usage:**
```
Method: Read /proc/stat or use psutil.cpu_percent()
Frequency: Every 5 seconds
Threshold: Warn if >80% for >60 seconds
```
**Memory Usage:**
```
Method: Read /proc/meminfo or use psutil.virtual_memory()
Frequency: Every 5 seconds
Threshold: Warn if >90% for >30 seconds
```
**Display Status:**
```
Method: Check DPMS state or xset query
Frequency: Every 30 seconds
Threshold: Error if display off (unexpected)
```
**Network Connectivity:**
```
Method: Ping server or check MQTT connection
Frequency: Every 60 seconds
Threshold: Warn if no server connectivity
```
---
## 9. Development vs Production Mode
### 9.1 Development Mode
**Enable via:** Environment variable `DEBUG=true` or `ENV=development`
**Behavior:**
- Send INFO level logs
- More verbose logging to console
- Shorter monitoring intervals (faster feedback)
- Screenshot capture every 30 seconds
- No rate limiting on logs
### 9.2 Production Mode
**Enable via:** `ENV=production`
**Behavior:**
- Send only ERROR and WARN logs
- Minimal console output
- Standard monitoring intervals
- Screenshot capture every 60 seconds
- Rate limiting: max 10 logs per minute per level
---
## 10. Configuration File Format
### 10.1 Recommended Config: JSON
**File:** `/etc/infoscreen/config.json` or `~/.config/infoscreen/config.json`
```json
{
"client": {
"uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
"hostname": "infoscreen-room-101"
},
"mqtt": {
"broker": "192.168.43.201",
"port": 1883,
"username": "",
"password": "",
"keepalive": 60
},
"monitoring": {
"enabled": true,
"health_interval_seconds": 5,
"heartbeat_interval_seconds": 60,
"max_restart_attempts": 3,
"restart_cooldown_seconds": 2
},
"logging": {
"level": "INFO",
"send_info_logs": false,
"console_output": true,
"local_log_file": "/var/log/infoscreen/watchdog.log"
},
"processes": {
"vlc": {
"http_port": 8080,
"http_password": "vlc_password"
},
"chromium": {
"debug_port": 9222
}
}
}
```
---
## 11. Error Scenarios & Expected Behavior
### Scenario 1: VLC Crashes Mid-Video
```
1. Watchdog detects: process_status = "crashed"
2. Send ERROR log: "VLC process crashed"
3. Attempt 1: Restart VLC with same video, seek to last position
4. If success: Send INFO log "VLC restarted successfully"
5. If fail: Repeat 2 more times
6. After 3 failures: Send ERROR "Max restarts exceeded", show fallback
```
### Scenario 2: Network Timeout Loading Website
```
1. Chromium fails to load page (CDP reports error)
2. Send WARN log: "Page load timeout"
3. Attempt reload (Chromium refresh)
4. If success after 10s: Continue monitoring
5. If timeout again: Send ERROR, try restarting Chromium
```
### Scenario 3: Display Powers Off (Hardware)
```
1. DPMS check detects display off
2. Send ERROR log: "Display powered off"
3. Attempt to wake display (xset dpms force on)
4. If success: Send INFO log
5. If fail: Hardware issue, alert admin
```
### Scenario 4: High CPU Usage
```
1. CPU >80% for 60 seconds
2. Send WARN log: "High CPU usage: 85%"
3. Check if expected (e.g., video playback is normal)
4. If unexpected: investigate process causing it
5. If critical (>95%): consider restarting offending process
```
---
## 12. Testing & Validation
### 12.1 Manual Tests (During Development)
**Test 1: Process Crash Simulation**
```bash
# Start video, then kill VLC manually
killall vlc
# Expected: ERROR log sent, automatic restart within 5 seconds
```
**Test 2: MQTT Connectivity**
```bash
# Subscribe to all client topics on server
mosquitto_sub -h 192.168.43.201 -t "infoscreen/{uuid}/#" -v
# Expected: See heartbeat every 60s, health every 5s
```
**Test 3: Log Levels**
```bash
# Trigger error condition and verify log appears in database
curl http://192.168.43.201:8000/api/client-logs/test
# Expected: See new log entry with correct level/message
```
### 12.2 Acceptance Criteria
**Client must:**
1. Send heartbeat every 60 seconds without gaps
2. Send ERROR log within 5 seconds of process crash
3. Attempt automatic restart (max 3 times)
4. Report health metrics every 5 seconds
5. Survive MQTT broker restart (reconnect automatically)
6. Survive network interruption (buffer logs, send when reconnected)
7. Use correct timestamp format (ISO 8601 UTC)
8. Only send logs for real client UUID (FK constraint)
---
## 13. Python Libraries (Recommended)
**For process monitoring:**
- `psutil` - Cross-platform process and system utilities
**For MQTT:**
- `paho-mqtt` - Official MQTT client (use v2.x with Callback API v2)
**For VLC control:**
- `requests` - HTTP client for status queries
**For Chromium control:**
- `websocket-client` or `pychrome` - Chrome DevTools Protocol
**For datetime:**
- `datetime` (stdlib) - Use `datetime.now(timezone.utc).isoformat()`
**Example requirements.txt:**
```
paho-mqtt>=2.0.0
psutil>=5.9.0
requests>=2.31.0
python-dateutil>=2.8.0
```
---
## 14. Security Considerations
### 14.1 MQTT Security
- If broker requires auth, store credentials in config file with restricted permissions (`chmod 600`)
- Consider TLS/SSL for MQTT (port 8883) if on untrusted network
- Use unique client ID to prevent impersonation
### 14.2 Process Control APIs
- VLC HTTP password should be random, not default
- Chromium debug port should bind to `127.0.0.1` only (not `0.0.0.0`)
- Restrict file system access for media player processes
### 14.3 Log Content
- **Do not log:** Passwords, API keys, personal data
- **Sanitize:** File paths (strip user directories), URLs (remove query params with tokens)
---
## 15. Performance Targets
| Metric | Target | Acceptable | Critical |
|--------|--------|------------|----------|
| Health check interval | 5s | 10s | 30s |
| Crash detection time | <5s | <10s | <30s |
| Restart time | <10s | <20s | <60s |
| MQTT publish latency | <100ms | <500ms | <2s |
| CPU usage (watchdog) | <2% | <5% | <10% |
| RAM usage (watchdog) | <50MB | <100MB | <200MB |
| Log message size | <1KB | <10KB | <100KB |
---
## 16. Troubleshooting Guide (For Client Development)
### Issue: Logs not appearing in server database
**Check:**
1. Is MQTT broker reachable? (`mosquitto_pub` test from client)
2. Is client UUID correct and exists in `clients` table?
3. Is timestamp format correct (ISO 8601 with 'Z')?
4. Check server listener logs for errors
### Issue: Health metrics not updating
**Check:**
1. Is health loop running? (check watchdog service status)
2. Is MQTT connected? (check connection status in logs)
3. Is payload JSON valid? (use JSON validator)
### Issue: Process restarts in loop
**Check:**
1. Is media file/URL accessible?
2. Is process command correct? (test manually)
3. Check process exit code (crash reason)
4. Increase restart cooldown to avoid rapid loops
---
## 17. Complete Message Flow Diagram
```
┌─────────────────────────────────────────────────────────┐
│ Infoscreen Client │
│ │
│ Event Occurs: │
│ - Process crashed │
│ - High CPU usage │
│ - Content loaded │
│ │
│ ┌────────────────┐ │
│ │ Decision Logic │ │
│ │ - Is it ERROR?│ │
│ │ - Is it WARN? │ │
│ │ - Is it INFO? │ │
│ └────────┬───────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ Build JSON Payload │ │
│ │ { │ │
│ │ "timestamp": "...", │ │
│ │ "message": "...", │ │
│ │ "context": {...} │ │
│ │ } │ │
│ └────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ MQTT Publish │ │
│ │ Topic: infoscreen/{uuid}/logs/error │
│ │ QoS: 1 │ │
│ └────────┬───────────────────────┘ │
└───────────┼──────────────────────────────────────────┘
│ TCP/IP (MQTT Protocol)
┌──────────────┐
│ MQTT Broker │
│ (Mosquitto) │
└──────┬───────┘
│ Topic: infoscreen/+/logs/#
┌──────────────────────────────┐
│ Listener Service │
│ (Python) │
│ │
│ - Parse JSON │
│ - Validate UUID │
│ - Store in database │
└──────┬───────────────────────┘
┌──────────────────────────────┐
│ MariaDB Database │
│ │
│ Table: client_logs │
│ - client_uuid │
│ - timestamp │
│ - level │
│ - message │
│ - context (JSON) │
└──────┬───────────────────────┘
│ SQL Query
┌──────────────────────────────┐
│ API Server (Flask) │
│ │
│ GET /api/client-logs/{uuid}/logs
│ GET /api/client-logs/summary
└──────┬───────────────────────┘
│ HTTP/JSON
┌──────────────────────────────┐
│ Dashboard (React) │
│ │
│ - Display logs │
│ - Filter by level │
│ - Show health status │
└───────────────────────────────┘
```
---
## 18. Quick Reference Card
### MQTT Topics Summary
```
infoscreen/{uuid}/logs/error → Critical failures
infoscreen/{uuid}/logs/warn → Non-critical issues
infoscreen/{uuid}/logs/info → Informational (dev mode)
infoscreen/{uuid}/health → Health metrics (every 5s)
infoscreen/{uuid}/heartbeat → Enhanced heartbeat (every 60s)
```
### JSON Timestamp Format
```python
from datetime import datetime, timezone
timestamp = datetime.now(timezone.utc).isoformat()
# Output: "2026-03-10T07:30:00+00:00" or "2026-03-10T07:30:00Z"
```
### Process Status Values
```
"running" - Process is alive and responding
"crashed" - Process terminated unexpectedly
"starting" - Process is launching (startup phase)
"stopped" - Process intentionally stopped
```
### Restart Logic
```
Max attempts: 3
Cooldown: 2 seconds between attempts
Reset: After 5 minutes of successful operation
```
---
## 19. Contact & Support
**Server API Documentation:**
- Base URL: `http://192.168.43.201:8000`
- Health check: `GET /health`
- Test logs: `GET /api/client-logs/test` (no auth)
- Full API docs: See `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` on server
**MQTT Broker:**
- Host: `192.168.43.201`
- Port: `1883` (standard), `9001` (WebSocket)
- Test tool: `mosquitto_pub` / `mosquitto_sub`
**Database Schema:**
- Table: `client_logs`
- Foreign Key: `client_uuid``clients.uuid` (ON DELETE CASCADE)
- Constraint: UUID must exist in clients table before logging
**Server-Side Logs:**
```bash
# View listener logs (processes MQTT messages)
docker compose logs -f listener
# View server logs (API requests)
docker compose logs -f server
```
---
## 20. Appendix: Example Implementations
### A. Minimal Python Watchdog (Pseudocode)
```python
import time
import json
import psutil
import paho.mqtt.client as mqtt
from datetime import datetime, timezone
class MinimalWatchdog:
def __init__(self, client_uuid, mqtt_broker):
self.uuid = client_uuid
self.mqtt_client = mqtt.Client(callback_api_version=mqtt.CallbackAPIVersion.VERSION2)
self.mqtt_client.connect(mqtt_broker, 1883, 60)
self.mqtt_client.loop_start()
self.expected_process = None
self.restart_attempts = 0
self.MAX_RESTARTS = 3
def send_log(self, level, message, context=None):
topic = f"infoscreen/{self.uuid}/logs/{level}"
payload = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"message": message,
"context": context or {}
}
self.mqtt_client.publish(topic, json.dumps(payload), qos=1)
def is_process_running(self, process_name):
for proc in psutil.process_iter(['name']):
if process_name in proc.info['name']:
return True
return False
def monitor_loop(self):
while True:
if self.expected_process:
if not self.is_process_running(self.expected_process):
self.send_log("error", f"{self.expected_process} crashed")
if self.restart_attempts < self.MAX_RESTARTS:
self.restart_process()
else:
self.send_log("error", "Max restarts exceeded")
time.sleep(5)
# Usage:
watchdog = MinimalWatchdog("9b8d1856-ff34-4864-a726-12de072d0f77", "192.168.43.201")
watchdog.expected_process = "vlc"
watchdog.monitor_loop()
```
---
**END OF SPECIFICATION**
Questions? Refer to:
- `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` (server repo)
- Server API: `http://192.168.43.201:8000/api/client-logs/test`
- MQTT test: `mosquitto_sub -h 192.168.43.201 -t infoscreen/#`

View File

@@ -5,6 +5,10 @@ This changelog tracks all changes made in the development workspace, including i
---
## Unreleased (development workspace)
- Monitoring system completion: End-to-end monitoring pipeline is active (MQTT logs/health → listener persistence → monitoring APIs → superadmin dashboard).
- Monitoring API: Added/active endpoints `GET /api/client-logs/monitoring-overview` and `GET /api/client-logs/recent-errors`; per-client logs via `GET /api/client-logs/<uuid>/logs`.
- Dashboard monitoring UI: Superadmin monitoring page is integrated and displays client health status, screenshots, process metadata, and recent error activity.
- Bugfix: Presentation flags `page_progress` and `auto_progress` now persist reliably across create/update and detached-occurrence flows.
- Frontend (Settings → Events): Added Presentations defaults (slideshow interval, page-progress, auto-progress) with load/save via `/api/system-settings`; UI uses Syncfusion controls.
- Backend defaults: Seeded `presentation_interval` ("10"), `presentation_page_progress` ("true"), `presentation_auto_progress` ("true") in `server/init_defaults.py` when missing.
- Data model: Added per-event fields `page_progress` and `auto_progress` on `Event`; Alembic migration applied successfully.

View File

@@ -0,0 +1,194 @@
# MQTT Payload Migration Guide
## Purpose
This guide describes a practical migration from the current dashboard screenshot payload to a grouped schema, with client-side implementation first and server-side migration second.
## Scope
- Environment: development and alpha systems (no production installs)
- Message topic: infoscreen/<client_id>/dashboard
- Capture types to preserve: periodic, event_start, event_stop
## Target Schema (v2)
The canonical message should be grouped into four logical blocks in this order:
1. message
2. content
3. runtime
4. metadata
Example shape:
```json
{
"message": {
"client_id": "<uuid>",
"status": "alive"
},
"content": {
"screenshot": {
"filename": "latest.jpg",
"data": "<base64>",
"timestamp": "2026-03-30T10:15:41.123456+00:00",
"size": 183245
}
},
"runtime": {
"system_info": {
"hostname": "pi-display-01",
"ip": "192.168.1.42",
"uptime": 123456.7
},
"process_health": {
"event_id": "evt-123",
"event_type": "presentation",
"current_process": "impressive",
"process_pid": 4123,
"process_status": "running",
"restart_count": 0
}
},
"metadata": {
"schema_version": "2.0",
"producer": "simclient",
"published_at": "2026-03-30T10:15:42.004321+00:00",
"capture": {
"type": "periodic",
"captured_at": "2026-03-30T10:15:41.123456+00:00",
"age_s": 0.9,
"triggered": false,
"send_immediately": false
},
"transport": {
"qos": 0,
"publisher": "simclient"
}
}
}
```
## Step-by-Step: Client-Side First
1. Create a migration branch.
- Example: feature/payload-v2
2. Freeze a baseline sample from MQTT.
- Capture one payload via mosquitto_sub and store it for comparison.
3. Implement one canonical payload builder.
- Centralize JSON assembly in one function only.
- Do not duplicate payload construction across code paths.
4. Add versioned metadata.
- Set metadata.schema_version = "2.0".
- Add metadata.producer = "simclient".
- Add metadata.published_at in UTC ISO format.
5. Map existing data into grouped blocks.
- client_id/status -> message
- screenshot object -> content.screenshot
- system_info/process_health -> runtime
- capture mode and freshness -> metadata.capture
6. Preserve existing capture semantics.
- Keep type values unchanged: periodic, event_start, event_stop.
- Keep UTC ISO timestamps.
- Keep screenshot encoding and size behavior unchanged.
7. Optional short-term compatibility mode (recommended for one sprint).
- Either:
- Keep current legacy fields in parallel, or
- Add a legacy block with old field names.
- Goal: prevent immediate server breakage while parser updates are merged.
8. Improve publish logs for verification.
- Log schema_version, metadata.capture.type, metadata.capture.age_s.
9. Validate all three capture paths end-to-end.
- periodic capture
- event_start trigger capture
- event_stop trigger capture
10. Lock the client contract.
- Save one validated JSON sample per capture type.
- Use those samples in server parser tests.
## Step-by-Step: Server-Side Migration
1. Add support for grouped v2 parsing.
- Parse from message/content/runtime/metadata first.
2. Add fallback parser for legacy payload (temporary).
- If grouped keys are absent, parse old top-level keys.
3. Normalize to one internal server model.
- Convert both parser paths into one DTO/entity used by dashboard logic.
4. Validate required fields.
- Required:
- message.client_id
- message.status
- metadata.schema_version
- metadata.capture.type
- Optional:
- runtime.process_health
- content.screenshot (if no screenshot available)
5. Update dashboard consumers.
- Read grouped fields from internal model (not raw old keys).
6. Add migration observability.
- Counters:
- v2 parse success
- legacy fallback usage
- parse failures
- Warning log for unknown schema_version.
7. Run mixed-format integration tests.
- New client -> new server
- Legacy client -> new server (fallback path)
8. Cut over to v2 preferred.
- Keep fallback for short soak period only.
9. Remove fallback and legacy assumptions.
- After stability window, remove old parser path.
10. Final cleanup.
- Keep one schema doc and test fixtures.
- Remove temporary compatibility switches.
## Legacy to v2 Field Mapping
| Legacy field | v2 field |
|---|---|
| client_id | message.client_id |
| status | message.status |
| screenshot | content.screenshot |
| screenshot_type | metadata.capture.type |
| screenshot_age_s | metadata.capture.age_s |
| timestamp | metadata.published_at |
| system_info | runtime.system_info |
| process_health | runtime.process_health |
## Acceptance Criteria
1. All capture types parse and display correctly.
- periodic
- event_start
- event_stop
2. Screenshot payload integrity is unchanged.
- filename, data, timestamp, size remain valid.
3. Metadata is centrally visible at message end.
- schema_version, capture metadata, transport metadata all inside metadata.
4. No regression in dashboard update timing.
- Triggered screenshots still publish quickly.
## Suggested Timeline (Dev Only)
1. Day 1: client v2 payload implementation + local tests
2. Day 2: server v2 parser + fallback
3. Day 3-5: soak in dev, monitor parse metrics
4. Day 6+: remove fallback and finalize v2-only

View File

@@ -0,0 +1,533 @@
# Phase 3: Client-Side Monitoring Implementation
**Status**: ✅ COMPLETE
**Date**: 11. März 2026
**Architecture**: Two-process design with health-state bridge
---
## Overview
This document describes the **Phase 3** client-side monitoring implementation integrated into the existing infoscreen-dev codebase. The implementation adds:
1.**Health-state tracking** for all display processes (Impressive, Chromium, VLC)
2.**Tiered logging**: Local rotating logs + selective MQTT transmission
3.**Process crash detection** with bounded restart attempts
4.**MQTT health/log topics** feeding the monitoring server
5.**Impressive-aware process mapping** (presentations → impressive, websites → chromium, videos → vlc)
---
## Architecture
### Two-Process Design
```
┌─────────────────────────────────────────────────────────┐
│ simclient.py (MQTT Client) │
│ - Discovers device, sends heartbeat │
│ - Downloads presentation files │
│ - Reads health state from display_manager │
│ - Publishes health/log messages to MQTT │
│ - Sends screenshots for dashboard │
└────────┬────────────────────────────────────┬───────────┘
│ │
│ reads: current_process_health.json │
│ │
│ writes: current_event.json │
│ │
┌────────▼────────────────────────────────────▼───────────┐
│ display_manager.py (Display Control) │
│ - Monitors events and manages displays │
│ - Launches Impressive (presentations) │
│ - Launches Chromium (websites) │
│ - Launches VLC (videos) │
│ - Tracks process health and crashes │
│ - Detects and restarts crashed processes │
│ - Writes health state to JSON bridge │
│ - Captures screenshots to shared folder │
└─────────────────────────────────────────────────────────┘
```
---
## Implementation Details
### 1. Health State Tracking (display_manager.py)
**File**: `src/display_manager.py`
**New Class**: `ProcessHealthState`
Tracks process health and persists to JSON for simclient to read:
```python
class ProcessHealthState:
"""Track and persist process health state for monitoring integration"""
- event_id: Currently active event identifier
- event_type: presentation, website, video, or None
- process_name: impressive, chromium-browser, vlc, or None
- process_pid: Process ID or None for libvlc
- status: running, crashed, starting, stopped
- restart_count: Number of restart attempts
- max_restarts: Maximum allowed restarts (3)
```
Methods:
- `update_running()` - Mark process as started (logs to monitoring.log)
- `update_crashed()` - Mark process as crashed (warning to monitoring.log)
- `update_restart_attempt()` - Increment restart counter (logs attempt and checks max)
- `update_stopped()` - Mark process as stopped (info to monitoring.log)
- `save()` - Persist state to `src/current_process_health.json`
**New Health State File**: `src/current_process_health.json`
```json
{
"event_id": "event_123",
"event_type": "presentation",
"current_process": "impressive",
"process_pid": 1234,
"process_status": "running",
"restart_count": 0,
"timestamp": "2026-03-11T10:30:45.123456+00:00"
}
```
### 2. Monitoring Logger (both files)
**Local Rotating Logs**: 5 files × 5 MB each = 25 MB max per device
**display_manager.py**:
```python
MONITORING_LOG_PATH = "logs/monitoring.log"
monitoring_logger = logging.getLogger("monitoring")
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
```
**simclient.py**:
- Shares same `logs/monitoring.log` file
- Both processes write to monitoring logger for health events
- Local logs never rotate (persisted for technician inspection)
**Log Filtering** (tiered strategy):
- **ERROR**: Local + MQTT (published to `infoscreen/{uuid}/logs/error`)
- **WARN**: Local + MQTT (published to `infoscreen/{uuid}/logs/warn`)
- **INFO**: Local only (unless `DEBUG_MODE=1`)
- **DEBUG**: Local only (always)
### 3. Process Mapping with Impressive Support
**display_manager.py** - When starting processes:
| Event Type | Process Name | Health Status |
|-----------|--------------|---------------|
| presentation | `impressive` | tracked with PID |
| website/webpage/webuntis | `chromium` or `chromium-browser` | tracked with PID |
| video | `vlc` | tracked (may have no PID if using libvlc) |
**Per-Process Updates**:
- Presentation: `health.update_running('event_id', 'presentation', 'impressive', pid)`
- Website: `health.update_running('event_id', 'website', browser_name, pid)`
- Video: `health.update_running('event_id', 'video', 'vlc', pid or None)`
### 4. Crash Detection and Restart Logic
**display_manager.py** - `process_events()` method:
```
If process not running AND same event_id:
├─ Check exit code
├─ If presentation with exit code 0: Normal completion (no restart)
├─ Else: Mark crashed
│ ├─ health.update_crashed()
│ └─ health.update_restart_attempt()
│ ├─ If restart_count > max_restarts: Give up
│ └─ Else: Restart display (loop back to start_display_for_event)
└─ Log to monitoring.log at each step
```
**Restart Logic**:
- Max 3 restart attempts per event
- Restarts only if same event still active
- Graceful exit (code 0) for Impressive auto-quit presentations is treated as normal
- All crashes logged to monitoring.log with context
### 5. MQTT Health and Log Topics
**simclient.py** - New functions:
**`read_health_state()`**
- Reads `src/current_process_health.json` written by display_manager
- Returns dict or None if no active process
**`publish_health_message(client, client_id)`**
- Topic: `infoscreen/{uuid}/health`
- QoS: 1 (reliable)
- Payload:
```json
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"expected_state": {
"event_id": "event_123"
},
"actual_state": {
"process": "impressive",
"pid": 1234,
"status": "running"
}
}
```
**`publish_log_message(client, client_id, level, message, context)`**
- Topics: `infoscreen/{uuid}/logs/error` or `infoscreen/{uuid}/logs/warn`
- QoS: 1 (reliable)
- Log level filtering (only ERROR/WARN sent unless DEBUG_MODE=1)
- Payload:
```json
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"message": "Process started: event_id=123 event_type=presentation process=impressive pid=1234",
"context": {
"event_id": "event_123",
"process": "impressive",
"event_type": "presentation"
}
}
```
**Enhanced Dashboard Heartbeat**:
- Topic: `infoscreen/{uuid}/dashboard`
- Now includes `process_health` block with event_id, process name, status, restart count
### 6. Integration Points
**Existing Features Preserved**:
- ✅ Impressive PDF presentations with auto-advance and loop
- ✅ Chromium website display with auto-scroll injection
- ✅ VLC video playback (python-vlc preferred, binary fallback)
- ✅ Screenshot capture and transmission
- ✅ HDMI-CEC TV control
- ✅ Two-process architecture
**New Integration Points**:
| File | Function | Change |
|------|----------|--------|
| display_manager.py | `__init__()` | Initialize `ProcessHealthState()` |
| display_manager.py | `start_presentation()` | Call `health.update_running()` with impressive |
| display_manager.py | `start_video()` | Call `health.update_running()` with vlc |
| display_manager.py | `start_webpage()` | Call `health.update_running()` with chromium |
| display_manager.py | `process_events()` | Detect crashes, call `health.update_crashed()` and `update_restart_attempt()` |
| display_manager.py | `stop_current_display()` | Call `health.update_stopped()` |
| simclient.py | `screenshot_service_thread()` | (No changes to interval) |
| simclient.py | Main heartbeat loop | Call `publish_health_message()` after successful heartbeat |
| simclient.py | `send_screenshot_heartbeat()` | Read health state and include in dashboard payload |
---
## Logging Hierarchy
### Local Rotating Files (5 × 5 MB)
**`logs/display_manager.log`** (existing - updated):
- Display event processing
- Process lifecycle (start/stop)
- HDMI-CEC operations
- Presentation status
- Video/website startup
**`logs/simclient.log`** (existing - updated):
- MQTT connection/reconnection
- Discovery and heartbeat
- File downloads
- Group membership changes
- Dashboard payload info
**`logs/monitoring.log`** (NEW):
- Process health events (start, crash, restart, stop)
- Both display_manager and simclient write here
- Centralized health tracking
- Technician-focused: "What happened to the processes?"
```
# Example monitoring.log entries:
2026-03-11 10:30:45 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1234
2026-03-11 10:35:20 [WARNING] Process crashed: event_id=event_123 event_type=presentation process=impressive restart_count=0/3
2026-03-11 10:35:20 [WARNING] Restarting process: attempt 1/3 for impressive
2026-03-11 10:35:25 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1245
```
### MQTT Transmission (Selective)
**Always sent** (when error occurs):
- `infoscreen/{uuid}/logs/error` - Critical failures
- `infoscreen/{uuid}/logs/warn` - Restarts, crashes, missing binaries
**Development mode only** (if DEBUG_MODE=1):
- `infoscreen/{uuid}/logs/info` - Event start/stop, process running status
**Never sent**:
- DEBUG messages (local-only debug details)
- INFO messages in production
---
## Environment Variables
No new required variables. Existing configuration supports monitoring:
```bash
# Existing (unchanged):
ENV=development|production
DEBUG_MODE=0|1 # Enables INFO logs to MQTT
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR # Local log verbosity
HEARTBEAT_INTERVAL=5|60 # seconds
SCREENSHOT_INTERVAL=30|300 # seconds (display_manager_screenshot_capture)
# Recommended for monitoring:
SCREENSHOT_CAPTURE_INTERVAL=30 # How often display_manager captures screenshots
SCREENSHOT_MAX_WIDTH=800 # Downscale for bandwidth
SCREENSHOT_JPEG_QUALITY=70 # Balance quality/size
# File server (if different from MQTT broker):
FILE_SERVER_HOST=192.168.1.100
FILE_SERVER_PORT=8000
FILE_SERVER_SCHEME=http
```
---
## Testing Validation
### System-Level Test Sequence
**1. Start Services**:
```bash
# Terminal 1: Display Manager
./scripts/start-display-manager.sh
# Terminal 2: MQTT Client
./scripts/start-dev.sh
# Terminal 3: Monitor logs
tail -f logs/monitoring.log
```
**2. Trigger Each Event Type**:
```bash
# Via test menu or MQTT publish:
./scripts/test-display-manager.sh # Options 1-3 trigger events
```
**3. Verify Health State File**:
```bash
# Check health state gets written immediately
cat src/current_process_health.json
# Should show: event_id, event_type, current_process (impressive/chromium/vlc), process_status=running
```
**4. Check MQTT Topics**:
```bash
# Monitor health messages:
mosquitto_sub -h localhost -t "infoscreen/+/health" -v
# Monitor log messages:
mosquitto_sub -h localhost -t "infoscreen/+/logs/#" -v
# Monitor dashboard heartbeat:
mosquitto_sub -h localhost -t "infoscreen/+/dashboard" -v | head -c 500 && echo "..."
```
**5. Simulate Process Crash**:
```bash
# Find impressive/chromium/vlc PID:
ps aux | grep -E 'impressive|chromium|vlc'
# Kill process:
kill -9 <pid>
# Watch monitoring.log for crash detection and restart
tail -f logs/monitoring.log
# Should see: [WARNING] Process crashed... [WARNING] Restarting process...
```
**6. Verify Server Integration**:
```bash
# Server receives health messages:
sqlite3 infoscreen.db "SELECT process_status, current_process, restart_count FROM clients WHERE uuid='...';"
# Should show latest status from health message
# Server receives logs:
sqlite3 infoscreen.db "SELECT level, message FROM client_logs WHERE client_uuid='...' ORDER BY timestamp DESC LIMIT 10;"
# Should show ERROR/WARN entries from crashes/restarts
```
---
## Troubleshooting
### Health State File Not Created
**Symptom**: `src/current_process_health.json` missing
**Causes**:
- No event active (file only created when display starts)
- display_manager not running
**Check**:
```bash
ps aux | grep display_manager
tail -f logs/display_manager.log | grep "Process started\|Process stopped"
```
### MQTT Health Messages Not Arriving
**Symptom**: No health messages on `infoscreen/{uuid}/health` topic
**Causes**:
- simclient not reading health state file
- MQTT connection dropped
- Health update function not called
**Check**:
```bash
# Check health file exists and is recent:
ls -l src/current_process_health.json
stat src/current_process_health.json | grep Modify
# Monitor simclient logs:
tail -f logs/simclient.log | grep -E "Health|heartbeat|publish"
# Verify MQTT connection:
mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
```
### Restart Loop (Process Keeps Crashing)
**Symptom**: monitoring.log shows repeated crashes and restarts
**Check**:
```bash
# Read last log lines of the process (stored by display_manager):
tail -f logs/impressive.out.log # for presentations
tail -f logs/browser.out.log # for websites
tail -f logs/video_player.out.log # for videos
```
**Common Causes**:
- Missing binary (impressive not installed, chromium not found, vlc not available)
- Corrupt presentation file
- Invalid URL for website
- Insufficient permissions for screenshots
### Log Messages Not Reaching Server
**Symptom**: client_logs table in server DB is empty
**Causes**:
- Log level filtering: INFO messages in production are local-only
- Logs only published on ERROR/WARN
- MQTT publish failing silently
**Check**:
```bash
# Force DEBUG_MODE to see all logs:
export DEBUG_MODE=1
export LOG_LEVEL=DEBUG
# Restart simclient and trigger event
# Monitor local logs first:
tail -f logs/monitoring.log | grep -i error
```
---
## Performance Considerations
**Bandwidth per Client**:
- Health message: ~200 bytes per heartbeat interval (every 5-60s)
- Screenshot heartbeat: ~50-100 KB (every 30-300s)
- Log messages: ~100-500 bytes per crash/error (rare)
- **Total**: ~0.5-2 MB/day per device (very minimal)
**Disk Space on Client**:
- Monitoring logs: 5 files × 5 MB = 25 MB max
- Display manager logs: 5 files × 2 MB = 10 MB max
- MQTT client logs: 5 files × 2 MB = 10 MB max
- Screenshots: 20 files × 50-100 KB = 1-2 MB max
- **Total**: ~50 MB max (typical for Raspberry Pi USB/SSD)
**Rotation Strategy**:
- Old files automatically deleted when size limit reached
- Technician can SSH and `tail -f` any time
- No database overhead (file-based rotation is minimal CPU)
---
## Integration with Server (Phase 2)
The client implementation sends data to the server's Phase 2 endpoints:
**Expected Server Implementation** (from CLIENT_MONITORING_SETUP.md):
1. **MQTT Listener** receives and stores:
- `infoscreen/{uuid}/logs/error`, `/logs/warn`, `/logs/info`
- `infoscreen/{uuid}/health` messages
- Updates `clients` table with health fields
2. **Database Tables**:
- `clients.process_status`: running/crashed/starting/stopped
- `clients.current_process`: impressive/chromium/vlc/None
- `clients.process_pid`: PID value
- `clients.current_event_id`: Active event
- `client_logs`: table stores logs with level/message/context
3. **API Endpoints**:
- `GET /api/client-logs/{uuid}/logs?level=ERROR&limit=50`
- `GET /api/client-logs/summary` (errors/warnings across all clients)
---
## Summary of Changes
### Files Modified
1. **`src/display_manager.py`**:
- Added `psutil` import for future process monitoring
- Added `ProcessHealthState` class (60 lines)
- Added monitoring logger setup (8 lines)
- Added `health.update_running()` calls in `start_presentation()`, `start_video()`, `start_webpage()`
- Added crash detection and restart logic in `process_events()`
- Added `health.update_stopped()` in `stop_current_display()`
2. **`src/simclient.py`**:
- Added `timezone` import
- Added monitoring logger setup (8 lines)
- Added `read_health_state()` function
- Added `publish_health_message()` function
- Added `publish_log_message()` function (with level filtering)
- Updated `send_screenshot_heartbeat()` to include health data
- Updated heartbeat loop to call `publish_health_message()`
### Files Created
1. **`src/current_process_health.json`** (at runtime):
- Bridge file between display_manager and simclient
- Shared volume compatible (works in container setup)
2. **`logs/monitoring.log`** (at runtime):
- New rotating log file (5 × 5MB)
- Health events from both processes
---
## Next Steps
1. **Deploy to test client** and run validation sequence above
2. **Deploy server Phase 2** (if not yet done) to receive health/log messages
3. **Verify database updates** in server-side `clients` and `client_logs` tables
4. **Test dashboard UI** (Phase 4) to display health indicators
5. **Configure alerting** (email/Slack) for ERROR level messages
---
**Implementation Date**: 11. März 2026
**Part of**: Infoscreen 2025 Client Monitoring System
**Status**: Production Ready (with server Phase 2 integration)

View File

@@ -39,6 +39,7 @@ A comprehensive multi-service digital signage solution for educational instituti
Data flow summary:
- Listener: consumes discovery and heartbeat messages from the MQTT Broker and updates the API Server (client registration/heartbeats).
- Listener screenshot flow: consumes `infoscreen/{uuid}/screenshot` and `infoscreen/{uuid}/dashboard`. Dashboard messages use grouped v2 schema (`message`, `content`, `runtime`, `metadata`); screenshot data is read from `content.screenshot`, capture type from `metadata.capture.type`, and forwarded to `POST /api/clients/{uuid}/screenshot`.
- Scheduler: reads events from the API Server and publishes only currently active content to the MQTT Broker (retained topics per group). When a group has no active events, the scheduler clears its retained topic by publishing an empty list. All time comparisons are done in UTC; any naive timestamps are normalized.
- Clients: send discovery/heartbeat via the MQTT Broker (handled by the Listener) and receive content from the Scheduler via MQTT.
- Worker: receives conversion commands directly from the API Server and reports results/status back to the API (no MQTT involved).
@@ -225,17 +226,15 @@ For detailed deployment instructions, see:
## Recent changes since last commit
- Video / Streaming support: Added end-to-end support for video events. The API and dashboard now allow creating `video` events referencing uploaded media. The server exposes a range-capable streaming endpoint at `/api/eventmedia/stream/<media_id>/<filename>` so clients can seek during playback.
- Scheduler metadata: Scheduler now performs a best-effort HEAD probe for video stream URLs and includes basic metadata in the retained MQTT payload: `mime_type`, `size` (bytes) and `accept_ranges` (bool). Placeholders for richer metadata (`duration`, `resolution`, `bitrate`, `qualities`, `thumbnails`, `checksum`) are emitted as null/empty until a background worker fills them.
- Dashboard & uploads: The dashboard's FileManager upload limits were increased (to support Full-HD uploads) and client-side validation enforces a maximum video length (10 minutes). The event modal exposes playback flags (`autoplay`, `loop`, `volume`, `muted`) and initializes them from system defaults for new events.
- DB model & API: `Event` includes `muted` in addition to `autoplay`, `loop`, and `volume`; endpoints accept, persist, and return these fields for video events. Events reference uploaded media via `event_media_id`.
- Settings UI: Settings page refactored to nested tabs; added Events → Videos defaults (autoplay, loop, volume, mute) backed by system settings keys (`video_autoplay`, `video_loop`, `video_volume`, `video_muted`).
- Academic Calendar UI: Merged “School Holidays Import” and “List” into a single “📥 Import & Liste” tab; nested tab selection is persisted with controlled `selectedItem` state to avoid jumps.
- Monitoring system: End-to-end monitoring is now implemented. The listener ingests `logs/*` and `health` MQTT topics, the API exposes monitoring endpoints (`/api/client-logs/monitoring-overview`, `/api/client-logs/recent-errors`, `/api/client-logs/<uuid>/logs`), and the superadmin dashboard page shows live client status, screenshots, and recent errors.
- Screenshot priority flow: Screenshot payloads now support `screenshot_type` (`periodic`, `event_start`, `event_stop`). `event_start` and `event_stop` are treated as high-priority screenshots; the API stores typed screenshots, maintains priority metadata, and serves active priority screenshots through `/screenshots/{uuid}/priority`.
- MQTT dashboard payload v2 cutover: Listener parsing is now v2-only for dashboard JSON payloads (`message/content/runtime/metadata`). Legacy top-level dashboard fallback has been removed after migration completion; parser observability tracks `v2_success` and `parse_failures`.
- Presentation persistence fix: Fixed persistence of presentation flags so `page_progress` and `auto_progress` are reliably stored and returned for create/update flows and detached occurrences.
- Additional improvements: Video/streaming, scheduler metadata, settings defaults, and UI refinements remain documented in the detailed sections below.
These changes are designed to be safe if metadata extraction or probes fail — clients should still attempt playback using the provided `url` and fall back to requesting/resolving richer metadata when available.
See `MQTT_EVENT_PAYLOAD_GUIDE.md` for details.
- `infoscreen/{uuid}/group_id` - Client group assignment
## 🧩 Developer Environment Notes (Dev Container)
- Extensions: UI-only `Dev Containers` runs on the host UI; not installed inside the container to avoid reinstallation loops. See `/.devcontainer/devcontainer.json` (`remote.extensionKind`).
@@ -345,8 +344,9 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
- `POST /api/conversions/{media_id}/pdf` - Request conversion
- `GET /api/conversions/{media_id}/status` - Check conversion status
- `GET /api/eventmedia/stream/<media_id>/<filename>` - Stream media with byte-range support (206) for seeking
- `POST /api/clients/{uuid}/screenshot` - Upload screenshot for client (base64 JPEG)
- **Screenshot retention:** Only the latest and last 20 timestamped screenshots per client are stored on the server. Older screenshots are automatically deleted.
- `POST /api/clients/{uuid}/screenshot` - Upload screenshot for client (base64 JPEG, optional `timestamp`, optional `screenshot_type` = `periodic|event_start|event_stop`)
- **Screenshot retention:** The API stores `{uuid}.jpg` as latest plus the last 20 timestamped screenshots per client; older timestamped files are deleted automatically.
- **Priority screenshots:** For `event_start`/`event_stop`, the API also keeps `{uuid}_priority.jpg` and metadata (`{uuid}_meta.json`) used by monitoring priority selection.
### System Settings
- `GET /api/system-settings` - List all system settings (admin+)
@@ -380,7 +380,11 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
### Health & Monitoring
- `GET /health` - Service health check
- `GET /api/screenshots/{uuid}.jpg` - Client screenshots
- `GET /screenshots/{uuid}.jpg` - Latest client screenshot
- `GET /screenshots/{uuid}/priority` - Active high-priority screenshot (falls back to latest)
- `GET /api/client-logs/monitoring-overview` - Aggregated monitoring overview for dashboard (superadmin)
- `GET /api/client-logs/recent-errors` - Recent error feed across clients (admin+)
- `GET /api/client-logs/{uuid}/logs` - Filtered per-client logs (admin+)
## 🎨 Frontend Features
@@ -444,6 +448,11 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
- Real-time event status: shows currently running events with type, title, and time window
- Filters out unassigned groups for focused view
- Resource-based Syncfusion timeline scheduler with resize and drag-drop support
- **Monitoring**: Superadmin-only monitoring dashboard
- Live client health states (`healthy`, `warning`, `critical`, `offline`) from heartbeat/process/log data
- Latest screenshot preview with screenshot-type badges (`periodic`, `event_start`, `event_stop`) and process metadata per client
- Active priority screenshots are surfaced immediately and polled faster while priority items are active
- System-wide recent error stream and per-client log drill-down
- **Program info**: Version, build info, tech stack and paginated changelog (reads `dashboard/public/program-info.json`)
## 🔒 Security & Authentication
@@ -474,7 +483,8 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
- MQTT: Pub/sub functionality test
- Dashboard: Nginx availability
- **Scheduler**: Logging is concise; conversion lookups are cached and logged only once per media.
- Dashboard: Nginx availability
- Monitoring API: `/api/client-logs/monitoring-overview` and `/api/client-logs/recent-errors` for live diagnostics
- Monitoring overview includes screenshot priority state (`latestScreenshotType`, `priorityScreenshotType`, `priorityScreenshotReceivedAt`, `hasActivePriorityScreenshot`) and `summary.activePriorityScreenshots`
### Logging Strategy
- **Development**: Docker Compose logs with service prefixes
@@ -549,7 +559,6 @@ docker exec -it infoscreen-db mysqladmin ping
# Restart dependent services
```
**MQTT communication issues**
**Vite import-analysis errors (Syncfusion splitbuttons)**
```bash
# Symptom
@@ -565,6 +574,8 @@ docker compose rm -sf dashboard
docker volume rm <project>_dashboard-node-modules <project>_dashboard-vite-cache || true
docker compose up -d --build dashboard
```
**MQTT communication issues**
```bash
# Test MQTT broker
mosquitto_pub -h localhost -t test -m "hello"

View File

@@ -56,6 +56,58 @@ Notes for integrators:
- CSS follows modern Material 3 color-function notation (`rgb(r g b / alpha%)`)
- Syncfusion ScheduleComponent requires TimelineViews, Resize, and DragAndDrop modules injected
Backend technical work (post-release notes; no version bump):
- 📊 **Client Monitoring Infrastructure (Server-Side) (2026-03-10)**:
- Database schema: New Alembic migration `c1d2e3f4g5h6_add_client_monitoring.py` (idempotent) adds:
- `client_logs` table: Stores centralized logs with columns (id, client_uuid, timestamp, level, message, context, created_at)
- Foreign key: `client_logs.client_uuid``clients.uuid` (ON DELETE CASCADE)
- Health monitoring columns added to `clients` table: `current_event_id`, `current_process`, `process_status`, `process_pid`, `last_screenshot_analyzed`, `screen_health_status`, `last_screenshot_hash`
- Indexes for performance: (client_uuid, timestamp DESC), (level, timestamp DESC), (created_at DESC)
- Data models (`models/models.py`):
- New enums: `LogLevel` (ERROR, WARN, INFO, DEBUG), `ProcessStatus` (running, crashed, starting, stopped), `ScreenHealthStatus` (OK, BLACK, FROZEN, UNKNOWN)
- New model: `ClientLog` with foreign key to `Client` (CASCADE on delete)
- Extended `Client` model with 7 health monitoring fields
- MQTT listener extensions (`listener/listener.py`):
- New topic subscriptions: `infoscreen/+/logs/error`, `infoscreen/+/logs/warn`, `infoscreen/+/logs/info`, `infoscreen/+/health`
- Log handler: Parses JSON payloads, creates `ClientLog` entries, validates client UUID exists (FK constraint)
- Health handler: Updates client state from MQTT health messages
- Enhanced heartbeat handler: Captures `process_status`, `current_process`, `process_pid`, `current_event_id` from payload
- API endpoints (`server/routes/client_logs.py`):
- `GET /api/client-logs/<uuid>/logs` Retrieve client logs with filters (level, limit, since); authenticated (admin_or_higher)
- `GET /api/client-logs/summary` Get log counts by level per client for last 24h; authenticated (admin_or_higher)
- `GET /api/client-logs/monitoring-overview` Aggregated monitoring overview for dashboard clients/statuses; authenticated (admin_or_higher)
- `GET /api/client-logs/recent-errors` System-wide error monitoring; authenticated (admin_or_higher)
- `GET /api/client-logs/test` Infrastructure validation endpoint (no auth required)
- Blueprint registered in `server/wsgi.py` as `client_logs_bp`
- Dev environment fix: Updated `docker-compose.override.yml` listener service to use `working_dir: /workspace` and direct command path for live code reload
- 🖥️ **Monitoring Dashboard Integration (2026-03-24)**:
- Frontend monitoring dashboard (`dashboard/src/monitoring.tsx`) is active and wired to monitoring APIs
- Superadmin-only route/menu integration completed in `dashboard/src/App.tsx`
- Added dashboard monitoring API client (`dashboard/src/apiClientMonitoring.ts`) for overview and recent errors
- 🐛 **Presentation Flags Persistence Fix (2026-03-24)**:
- Fixed persistence for presentation flags `page_progress` and `auto_progress` across create/update and detached-occurrence flows
- API serialization now reliably returns stored values for presentation behavior fields
- 📡 **MQTT Protocol Extensions**:
- New log topics: `infoscreen/{uuid}/logs/{error|warn|info}` with JSON payload (timestamp, message, context)
- New health topic: `infoscreen/{uuid}/health` with metrics (expected_state, actual_state, health_metrics)
- Enhanced heartbeat: `infoscreen/{uuid}/heartbeat` now includes `current_process`, `process_pid`, `process_status`, `current_event_id`
- QoS levels: ERROR/WARN logs use QoS 1 (at least once), INFO/health use QoS 0 (fire and forget)
- 📖 **Documentation**:
- New file: `CLIENT_MONITORING_SPECIFICATION.md` Comprehensive 20-section technical spec for client-side implementation (MQTT protocol, process monitoring, auto-recovery, payload formats, testing guide)
- New file: `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` 5-phase implementation guide (database, backend, client watchdog, dashboard UI, testing)
- Updated `.github/copilot-instructions.md`: Added MQTT topics section, client monitoring integration notes
-**Validation**:
- End-to-end testing completed: MQTT message → listener → database → API confirmed working
- Test flow: Published message to `infoscreen/{real-uuid}/logs/error` → listener logs showed receipt → database stored entry → test API returned log data
- Known client UUIDs validated: 9b8d1856-ff34-4864-a726-12de072d0f77, 7f65c615-5827-4ada-9ac8-4727c2e8ee55, bdbfff95-0b2b-4265-8cc7-b0284509540a
Notes for integrators:
- Tiered logging strategy: ERROR/WARN always centralized (QoS 1), INFO dev-only (QoS 0), DEBUG local-only
- Monitoring dashboard is implemented and consumes `/api/client-logs/monitoring-overview`, `/api/client-logs/recent-errors`, and `/api/client-logs/<uuid>/logs`
- Foreign key constraint prevents logging for non-existent clients (data integrity enforced)
- Migration is idempotent and can be safely rerun after interruption
- Use `GET /api/client-logs/test` for quick infrastructure validation without authentication
## 2025.1.0-beta.1 (TBD)
- 🔐 **User Management & Role-Based Access Control**:
- Backend: Implemented comprehensive user management API (`server/routes/users.py`) with 6 endpoints (GET, POST, PUT, DELETE users + password reset).

View File

@@ -1,5 +1,5 @@
import React, { useState } from 'react';
import { BrowserRouter as Router, Routes, Route, Link, Outlet, useNavigate } from 'react-router-dom';
import { BrowserRouter as Router, Routes, Route, Link, Outlet, useNavigate, Navigate } from 'react-router-dom';
import { SidebarComponent } from '@syncfusion/ej2-react-navigations';
import { ButtonComponent } from '@syncfusion/ej2-react-buttons';
import { DropDownButtonComponent } from '@syncfusion/ej2-react-splitbuttons';
@@ -19,6 +19,7 @@ import {
Settings,
Monitor,
MonitorDotIcon,
Activity,
LogOut,
Wrench,
Info,
@@ -31,6 +32,7 @@ const sidebarItems = [
{ name: 'Ressourcen', path: '/ressourcen', icon: Boxes, minRole: 'editor' },
{ name: 'Raumgruppen', path: '/infoscr_groups', icon: MonitorDotIcon, minRole: 'admin' },
{ name: 'Infoscreen-Clients', path: '/clients', icon: Monitor, minRole: 'admin' },
{ name: 'Monitor-Dashboard', path: '/monitoring', icon: Activity, minRole: 'superadmin' },
{ name: 'Erweiterungsmodus', path: '/setup', icon: Wrench, minRole: 'admin' },
{ name: 'Medien', path: '/medien', icon: Image, minRole: 'editor' },
{ name: 'Benutzer', path: '/benutzer', icon: User, minRole: 'admin' },
@@ -49,6 +51,7 @@ import Benutzer from './users';
import Einstellungen from './settings';
import SetupMode from './SetupMode';
import Programminfo from './programminfo';
import MonitoringDashboard from './monitoring';
import Logout from './logout';
import Login from './login';
import { useAuth } from './useAuth';
@@ -436,7 +439,7 @@ const Layout: React.FC = () => {
type="password"
placeholder="Aktuelles Passwort"
value={pwdCurrent}
input={(e: any) => setPwdCurrent(e.value)}
input={(e: { value?: string }) => setPwdCurrent(e.value ?? '')}
disabled={pwdBusy}
/>
</div>
@@ -446,7 +449,7 @@ const Layout: React.FC = () => {
type="password"
placeholder="Mindestens 6 Zeichen"
value={pwdNew}
input={(e: any) => setPwdNew(e.value)}
input={(e: { value?: string }) => setPwdNew(e.value ?? '')}
disabled={pwdBusy}
/>
</div>
@@ -456,7 +459,7 @@ const Layout: React.FC = () => {
type="password"
placeholder="Wiederholen"
value={pwdConfirm}
input={(e: any) => setPwdConfirm(e.value)}
input={(e: { value?: string }) => setPwdConfirm(e.value ?? '')}
disabled={pwdBusy}
/>
</div>
@@ -480,6 +483,14 @@ const App: React.FC = () => {
return <>{children}</>;
};
const RequireSuperadmin: React.FC<{ children: React.ReactNode }> = ({ children }) => {
const { isAuthenticated, loading, user } = useAuth();
if (loading) return <div style={{ padding: 24 }}>Lade ...</div>;
if (!isAuthenticated) return <Login />;
if (user?.role !== 'superadmin') return <Navigate to="/" replace />;
return <>{children}</>;
};
return (
<ToastProvider>
<Routes>
@@ -499,6 +510,14 @@ const App: React.FC = () => {
<Route path="benutzer" element={<Benutzer />} />
<Route path="einstellungen" element={<Einstellungen />} />
<Route path="clients" element={<Infoscreens />} />
<Route
path="monitoring"
element={
<RequireSuperadmin>
<MonitoringDashboard />
</RequireSuperadmin>
}
/>
<Route path="setup" element={<SetupMode />} />
<Route path="programminfo" element={<Programminfo />} />
</Route>

View File

@@ -0,0 +1,111 @@
export interface MonitoringLogEntry {
id: number;
timestamp: string | null;
level: 'ERROR' | 'WARN' | 'INFO' | 'DEBUG' | null;
message: string;
context: Record<string, unknown>;
client_uuid?: string;
}
export interface MonitoringClient {
uuid: string;
hostname?: string | null;
description?: string | null;
ip?: string | null;
model?: string | null;
groupId?: number | null;
groupName?: string | null;
registrationTime?: string | null;
lastAlive?: string | null;
isAlive: boolean;
status: 'healthy' | 'warning' | 'critical' | 'offline';
currentEventId?: number | null;
currentProcess?: string | null;
processStatus?: string | null;
processPid?: number | null;
screenHealthStatus?: string | null;
lastScreenshotAnalyzed?: string | null;
lastScreenshotHash?: string | null;
latestScreenshotType?: 'periodic' | 'event_start' | 'event_stop' | null;
priorityScreenshotType?: 'event_start' | 'event_stop' | null;
priorityScreenshotReceivedAt?: string | null;
hasActivePriorityScreenshot?: boolean;
screenshotUrl: string;
logCounts24h: {
error: number;
warn: number;
info: number;
debug: number;
};
latestLog?: MonitoringLogEntry | null;
latestError?: MonitoringLogEntry | null;
}
export interface MonitoringOverview {
summary: {
totalClients: number;
onlineClients: number;
offlineClients: number;
healthyClients: number;
warningClients: number;
criticalClients: number;
errorLogs: number;
warnLogs: number;
activePriorityScreenshots: number;
};
periodHours: number;
gracePeriodSeconds: number;
since: string;
timestamp: string;
clients: MonitoringClient[];
}
export interface ClientLogsResponse {
client_uuid: string;
logs: MonitoringLogEntry[];
count: number;
limit: number;
}
async function parseJsonResponse<T>(response: Response, fallbackMessage: string): Promise<T> {
const data = await response.json();
if (!response.ok) {
throw new Error(data.error || fallbackMessage);
}
return data as T;
}
export async function fetchMonitoringOverview(hours = 24): Promise<MonitoringOverview> {
const response = await fetch(`/api/client-logs/monitoring-overview?hours=${hours}`, {
credentials: 'include',
});
return parseJsonResponse<MonitoringOverview>(response, 'Fehler beim Laden der Monitoring-Übersicht');
}
export async function fetchRecentClientErrors(limit = 20): Promise<MonitoringLogEntry[]> {
const response = await fetch(`/api/client-logs/recent-errors?limit=${limit}`, {
credentials: 'include',
});
const data = await parseJsonResponse<{ errors: MonitoringLogEntry[] }>(
response,
'Fehler beim Laden der letzten Fehler'
);
return data.errors;
}
export async function fetchClientMonitoringLogs(
uuid: string,
options: { level?: string; limit?: number } = {}
): Promise<MonitoringLogEntry[]> {
const params = new URLSearchParams();
if (options.level && options.level !== 'ALL') {
params.set('level', options.level);
}
params.set('limit', String(options.limit ?? 100));
const response = await fetch(`/api/client-logs/${uuid}/logs?${params.toString()}`, {
credentials: 'include',
});
const data = await parseJsonResponse<ClientLogsResponse>(response, 'Fehler beim Laden der Client-Logs');
return data.logs;
}

View File

@@ -523,28 +523,10 @@ const Appointments: React.FC = () => {
}, [holidays, allowScheduleOnHolidays]);
const dataSource = useMemo(() => {
// Filter: Events with SkipHolidays=true (from internal Event type) are never shown on holidays
const filteredEvents = events.filter(ev => {
if (ev.SkipHolidays) {
// If event falls within a holiday, hide it
const s = ev.StartTime instanceof Date ? ev.StartTime : new Date(ev.StartTime);
const e = ev.EndTime instanceof Date ? ev.EndTime : new Date(ev.EndTime);
for (const h of holidays) {
const hs = new Date(h.start_date + 'T00:00:00');
const he = new Date(h.end_date + 'T23:59:59');
if (
(s >= hs && s <= he) ||
(e >= hs && e <= he) ||
(s <= hs && e >= he)
) {
return false;
}
}
}
return true;
});
return [...filteredEvents, ...holidayDisplayEvents, ...holidayBlockEvents];
}, [events, holidayDisplayEvents, holidayBlockEvents, holidays]);
// Existing events should always be visible; holiday skipping for recurring events
// is handled via RecurrenceException from the backend.
return [...events, ...holidayDisplayEvents, ...holidayBlockEvents];
}, [events, holidayDisplayEvents, holidayBlockEvents]);
// Removed dataSource logging
@@ -1227,37 +1209,6 @@ const Appointments: React.FC = () => {
}
}}
eventRendered={(args: EventRenderedArgs) => {
// Always hide events that skip holidays when they fall on holidays, regardless of toggle
if (args.data) {
const ev = args.data as unknown as Partial<Event>;
if (ev.SkipHolidays && !args.data.isHoliday) {
const s =
args.data.StartTime instanceof Date
? args.data.StartTime
: new Date(args.data.StartTime);
const e =
args.data.EndTime instanceof Date ? args.data.EndTime : new Date(args.data.EndTime);
if (isWithinHolidayRange(s, e)) {
args.cancel = true;
return;
}
}
}
// Blende Nicht-Ferien-Events aus, falls sie in Ferien fallen und Terminieren nicht erlaubt ist
// Hide events on holidays if not allowed
if (!allowScheduleOnHolidays && args.data && !args.data.isHoliday) {
const s =
args.data.StartTime instanceof Date
? args.data.StartTime
: new Date(args.data.StartTime);
const e =
args.data.EndTime instanceof Date ? args.data.EndTime : new Date(args.data.EndTime);
if (isWithinHolidayRange(s, e)) {
args.cancel = true;
return;
}
}
if (selectedGroupId && args.data && args.data.Id) {
const groupColor = getGroupColor(selectedGroupId, groups);

View File

@@ -0,0 +1,373 @@
.monitoring-page {
display: flex;
flex-direction: column;
gap: 1.25rem;
padding: 0.5rem 0.25rem 1rem;
}
.monitoring-header-row {
display: flex;
justify-content: space-between;
align-items: flex-start;
gap: 1rem;
flex-wrap: wrap;
}
.monitoring-title {
margin: 0;
font-size: 1.75rem;
font-weight: 700;
color: #5c4318;
}
.monitoring-subtitle {
margin: 0.35rem 0 0;
color: #6b7280;
max-width: 60ch;
}
.monitoring-toolbar {
display: flex;
align-items: end;
gap: 0.75rem;
flex-wrap: wrap;
}
.monitoring-toolbar-field {
display: flex;
flex-direction: column;
gap: 0.35rem;
min-width: 190px;
}
.monitoring-toolbar-field-compact {
min-width: 160px;
}
.monitoring-toolbar-field label {
font-size: 0.875rem;
font-weight: 600;
color: #5b4b32;
}
.monitoring-meta-row {
display: flex;
gap: 1rem;
flex-wrap: wrap;
color: #6b7280;
font-size: 0.92rem;
}
.monitoring-summary-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
gap: 1rem;
}
.monitoring-metric-card {
overflow: hidden;
}
.monitoring-metric-content {
display: flex;
flex-direction: column;
gap: 0.35rem;
}
.monitoring-metric-title {
font-size: 0.9rem;
font-weight: 600;
color: #6b7280;
}
.monitoring-metric-value {
font-size: 2rem;
font-weight: 700;
color: #1f2937;
line-height: 1;
}
.monitoring-metric-subtitle {
font-size: 0.85rem;
color: #64748b;
}
.monitoring-main-grid {
display: grid;
grid-template-columns: minmax(0, 2fr) minmax(320px, 1fr);
gap: 1rem;
align-items: start;
}
.monitoring-sidebar-column {
display: flex;
flex-direction: column;
gap: 1rem;
}
.monitoring-panel {
background: #fff;
border: 1px solid #e5e7eb;
border-radius: 16px;
padding: 1.1rem;
box-shadow: 0 12px 40px rgb(120 89 28 / 8%);
}
.monitoring-clients-panel {
min-width: 0;
}
.monitoring-panel-header {
display: flex;
justify-content: space-between;
align-items: center;
gap: 0.75rem;
margin-bottom: 0.85rem;
}
.monitoring-panel-header-stacked {
align-items: end;
flex-wrap: wrap;
}
.monitoring-panel-header h3 {
margin: 0;
font-size: 1.1rem;
font-weight: 700;
}
.monitoring-panel-header span {
color: #6b7280;
font-size: 0.9rem;
}
.monitoring-detail-card .e-card-content {
padding-top: 0;
}
.monitoring-detail-list {
display: flex;
flex-direction: column;
gap: 0.75rem;
}
.monitoring-detail-row {
display: flex;
justify-content: space-between;
gap: 1rem;
align-items: flex-start;
border-bottom: 1px solid #f1f5f9;
padding-bottom: 0.55rem;
}
.monitoring-detail-row span {
color: #64748b;
font-size: 0.9rem;
}
.monitoring-detail-row strong {
text-align: right;
color: #111827;
}
.monitoring-status-badge {
display: inline-flex;
align-items: center;
justify-content: center;
padding: 0.22rem 0.6rem;
border-radius: 999px;
font-weight: 700;
font-size: 0.78rem;
letter-spacing: 0.01em;
}
.monitoring-screenshot {
width: 100%;
border-radius: 12px;
border: 1px solid #e5e7eb;
background: linear-gradient(135deg, #f8fafc, #e2e8f0);
min-height: 180px;
object-fit: cover;
}
.monitoring-screenshot-meta {
margin-top: 0.55rem;
font-size: 0.88rem;
color: #64748b;
display: flex;
flex-direction: column;
gap: 0.35rem;
}
.monitoring-shot-type {
display: inline-flex;
align-items: center;
border-radius: 999px;
padding: 0.15rem 0.55rem;
font-size: 0.78rem;
font-weight: 700;
}
.monitoring-shot-type-periodic {
background: #e2e8f0;
color: #334155;
}
.monitoring-shot-type-event {
background: #ffedd5;
color: #9a3412;
}
.monitoring-shot-type-active {
box-shadow: 0 0 0 2px #fdba74;
}
.monitoring-error-box {
display: flex;
flex-direction: column;
gap: 0.5rem;
padding: 0.85rem;
border-radius: 12px;
background: linear-gradient(135deg, #fff1f2, #fee2e2);
border: 1px solid #fecdd3;
}
.monitoring-error-time {
color: #9f1239;
font-size: 0.85rem;
font-weight: 600;
}
.monitoring-error-message {
color: #4c0519;
font-weight: 600;
}
.monitoring-mono {
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace;
font-size: 0.85rem;
}
.monitoring-log-detail-row {
display: flex;
justify-content: space-between;
gap: 1rem;
align-items: flex-start;
border-bottom: 1px solid #f1f5f9;
padding-bottom: 0.55rem;
}
.monitoring-log-detail-row span {
color: #64748b;
font-size: 0.9rem;
}
.monitoring-log-detail-row strong {
text-align: right;
color: #111827;
}
.monitoring-log-context {
margin: 0;
background: #f8fafc;
border: 1px solid #e2e8f0;
border-radius: 10px;
padding: 0.75rem;
white-space: pre-wrap;
overflow-wrap: anywhere;
max-height: 280px;
overflow: auto;
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace;
font-size: 0.84rem;
color: #0f172a;
}
.monitoring-log-dialog-content {
display: flex;
flex-direction: column;
gap: 1rem;
padding: 0.9rem 1rem 0.55rem;
}
.monitoring-log-dialog-body {
min-height: 340px;
display: flex;
flex-direction: column;
justify-content: space-between;
}
.monitoring-log-dialog-actions {
margin-top: 0.5rem;
padding: 0 1rem 0.9rem;
display: flex;
justify-content: flex-end;
}
.monitoring-log-context-title {
font-weight: 600;
margin-bottom: 0.55rem;
}
.monitoring-log-dialog-content .monitoring-log-detail-row {
padding: 0.1rem 0 0.75rem;
}
.monitoring-log-dialog-content .monitoring-log-context {
padding: 0.95rem;
border-radius: 12px;
}
.monitoring-lower-grid {
display: grid;
grid-template-columns: repeat(2, minmax(0, 1fr));
gap: 1rem;
}
@media (width <= 1200px) {
.monitoring-main-grid,
.monitoring-lower-grid {
grid-template-columns: 1fr;
}
}
@media (width <= 720px) {
.monitoring-page {
padding: 0.25rem 0 0.75rem;
}
.monitoring-title {
font-size: 1.5rem;
}
.monitoring-header-row,
.monitoring-panel-header,
.monitoring-detail-row,
.monitoring-log-detail-row {
flex-direction: column;
align-items: flex-start;
}
.monitoring-detail-row strong,
.monitoring-log-detail-row strong {
text-align: left;
}
.monitoring-toolbar,
.monitoring-toolbar-field,
.monitoring-toolbar-field-compact {
width: 100%;
}
.monitoring-log-dialog-content {
padding: 0.4rem 0.2rem 0.1rem;
gap: 0.75rem;
}
.monitoring-log-dialog-body {
min-height: 300px;
}
.monitoring-log-dialog-actions {
padding: 0 0.2rem 0.4rem;
}
}

View File

@@ -0,0 +1,573 @@
import React from 'react';
import {
fetchClientMonitoringLogs,
fetchMonitoringOverview,
fetchRecentClientErrors,
type MonitoringClient,
type MonitoringLogEntry,
type MonitoringOverview,
} from './apiClientMonitoring';
import { useAuth } from './useAuth';
import { ButtonComponent } from '@syncfusion/ej2-react-buttons';
import { DropDownListComponent } from '@syncfusion/ej2-react-dropdowns';
import {
GridComponent,
ColumnsDirective,
ColumnDirective,
Inject,
Page,
Search,
Sort,
Toolbar,
} from '@syncfusion/ej2-react-grids';
import { MessageComponent } from '@syncfusion/ej2-react-notifications';
import { DialogComponent } from '@syncfusion/ej2-react-popups';
import './monitoring.css';
const REFRESH_INTERVAL_MS = 15000;
const PRIORITY_REFRESH_INTERVAL_MS = 3000;
const hourOptions = [
{ text: 'Letzte 6 Stunden', value: 6 },
{ text: 'Letzte 24 Stunden', value: 24 },
{ text: 'Letzte 72 Stunden', value: 72 },
{ text: 'Letzte 168 Stunden', value: 168 },
];
const logLevelOptions = [
{ text: 'Alle Logs', value: 'ALL' },
{ text: 'ERROR', value: 'ERROR' },
{ text: 'WARN', value: 'WARN' },
{ text: 'INFO', value: 'INFO' },
{ text: 'DEBUG', value: 'DEBUG' },
];
const statusPalette: Record<string, { label: string; color: string; background: string }> = {
healthy: { label: 'Stabil', color: '#166534', background: '#dcfce7' },
warning: { label: 'Warnung', color: '#92400e', background: '#fef3c7' },
critical: { label: 'Kritisch', color: '#991b1b', background: '#fee2e2' },
offline: { label: 'Offline', color: '#334155', background: '#e2e8f0' },
};
function parseUtcDate(value?: string | null): Date | null {
if (!value) return null;
const trimmed = value.trim();
if (!trimmed) return null;
const hasTimezone = /[zZ]$|[+-]\d{2}:?\d{2}$/.test(trimmed);
const utcValue = hasTimezone ? trimmed : `${trimmed}Z`;
const parsed = new Date(utcValue);
if (Number.isNaN(parsed.getTime())) return null;
return parsed;
}
function formatTimestamp(value?: string | null): string {
if (!value) return 'Keine Daten';
const date = parseUtcDate(value);
if (!date) return value;
return date.toLocaleString('de-DE');
}
function formatRelative(value?: string | null): string {
if (!value) return 'Keine Daten';
const date = parseUtcDate(value);
if (!date) return 'Unbekannt';
const diffMs = Date.now() - date.getTime();
const diffMinutes = Math.floor(diffMs / 60000);
const diffHours = Math.floor(diffMinutes / 60);
const diffDays = Math.floor(diffHours / 24);
if (diffMinutes < 1) return 'gerade eben';
if (diffMinutes < 60) return `vor ${diffMinutes} Min.`;
if (diffHours < 24) return `vor ${diffHours} Std.`;
return `vor ${diffDays} Tag${diffDays === 1 ? '' : 'en'}`;
}
function statusBadge(status: string) {
const palette = statusPalette[status] || statusPalette.offline;
return (
<span
className="monitoring-status-badge"
style={{ color: palette.color, backgroundColor: palette.background }}
>
{palette.label}
</span>
);
}
function screenshotTypeBadge(type?: string | null, hasPriority = false) {
const normalized = (type || 'periodic').toLowerCase();
const map: Record<string, { label: string; className: string }> = {
periodic: { label: 'Periodisch', className: 'monitoring-shot-type-periodic' },
event_start: { label: 'Event-Start', className: 'monitoring-shot-type-event' },
event_stop: { label: 'Event-Stopp', className: 'monitoring-shot-type-event' },
};
const info = map[normalized] || map.periodic;
const classes = `monitoring-shot-type ${info.className}${hasPriority ? ' monitoring-shot-type-active' : ''}`;
return <span className={classes}>{info.label}</span>;
}
function renderMetricCard(title: string, value: number, subtitle: string, accent: string) {
return (
<div className="e-card monitoring-metric-card" style={{ borderTop: `4px solid ${accent}` }}>
<div className="e-card-content monitoring-metric-content">
<div className="monitoring-metric-title">{title}</div>
<div className="monitoring-metric-value">{value}</div>
<div className="monitoring-metric-subtitle">{subtitle}</div>
</div>
</div>
);
}
function renderContext(context?: Record<string, unknown>): string {
if (!context || Object.keys(context).length === 0) {
return 'Kein Kontext vorhanden';
}
try {
return JSON.stringify(context, null, 2);
} catch {
return 'Kontext konnte nicht formatiert werden';
}
}
function buildScreenshotUrl(client: MonitoringClient, overviewTimestamp?: string | null): string {
const refreshKey = client.lastScreenshotHash || client.lastScreenshotAnalyzed || overviewTimestamp;
if (!refreshKey) {
return client.screenshotUrl;
}
const separator = client.screenshotUrl.includes('?') ? '&' : '?';
return `${client.screenshotUrl}${separator}v=${encodeURIComponent(refreshKey)}`;
}
const MonitoringDashboard: React.FC = () => {
const { user } = useAuth();
const [hours, setHours] = React.useState<number>(24);
const [logLevel, setLogLevel] = React.useState<string>('ALL');
const [overview, setOverview] = React.useState<MonitoringOverview | null>(null);
const [recentErrors, setRecentErrors] = React.useState<MonitoringLogEntry[]>([]);
const [clientLogs, setClientLogs] = React.useState<MonitoringLogEntry[]>([]);
const [selectedClientUuid, setSelectedClientUuid] = React.useState<string | null>(null);
const [loading, setLoading] = React.useState<boolean>(true);
const [error, setError] = React.useState<string | null>(null);
const [logsLoading, setLogsLoading] = React.useState<boolean>(false);
const [screenshotErrored, setScreenshotErrored] = React.useState<boolean>(false);
const selectedClientUuidRef = React.useRef<string | null>(null);
const [selectedLogEntry, setSelectedLogEntry] = React.useState<MonitoringLogEntry | null>(null);
const selectedClient = React.useMemo<MonitoringClient | null>(() => {
if (!overview || !selectedClientUuid) return null;
return overview.clients.find(client => client.uuid === selectedClientUuid) || null;
}, [overview, selectedClientUuid]);
const selectedClientScreenshotUrl = React.useMemo<string | null>(() => {
if (!selectedClient) return null;
return buildScreenshotUrl(selectedClient, overview?.timestamp || null);
}, [selectedClient, overview?.timestamp]);
React.useEffect(() => {
selectedClientUuidRef.current = selectedClientUuid;
}, [selectedClientUuid]);
const loadOverview = React.useCallback(async (requestedHours: number, preserveSelection = true) => {
setLoading(true);
setError(null);
try {
const [overviewData, errorsData] = await Promise.all([
fetchMonitoringOverview(requestedHours),
fetchRecentClientErrors(25),
]);
setOverview(overviewData);
setRecentErrors(errorsData);
const currentSelection = selectedClientUuidRef.current;
const nextSelectedUuid =
preserveSelection && currentSelection && overviewData.clients.some(client => client.uuid === currentSelection)
? currentSelection
: overviewData.clients[0]?.uuid || null;
setSelectedClientUuid(nextSelectedUuid);
setScreenshotErrored(false);
} catch (loadError) {
setError(loadError instanceof Error ? loadError.message : 'Monitoring-Daten konnten nicht geladen werden');
} finally {
setLoading(false);
}
}, []);
React.useEffect(() => {
loadOverview(hours, false);
}, [hours, loadOverview]);
React.useEffect(() => {
const hasActivePriorityScreenshots = (overview?.summary.activePriorityScreenshots || 0) > 0;
const intervalMs = hasActivePriorityScreenshots ? PRIORITY_REFRESH_INTERVAL_MS : REFRESH_INTERVAL_MS;
const intervalId = window.setInterval(() => {
loadOverview(hours);
}, intervalMs);
return () => window.clearInterval(intervalId);
}, [hours, loadOverview, overview?.summary.activePriorityScreenshots]);
React.useEffect(() => {
if (!selectedClientUuid) {
setClientLogs([]);
return;
}
let active = true;
const loadLogs = async () => {
setLogsLoading(true);
try {
const logs = await fetchClientMonitoringLogs(selectedClientUuid, { level: logLevel, limit: 100 });
if (active) {
setClientLogs(logs);
}
} catch (loadError) {
if (active) {
setClientLogs([]);
setError(loadError instanceof Error ? loadError.message : 'Client-Logs konnten nicht geladen werden');
}
} finally {
if (active) {
setLogsLoading(false);
}
}
};
loadLogs();
return () => {
active = false;
};
}, [selectedClientUuid, logLevel]);
React.useEffect(() => {
setScreenshotErrored(false);
}, [selectedClientUuid]);
if (!user || user.role !== 'superadmin') {
return (
<MessageComponent severity="Error" content="Dieses Monitoring-Dashboard ist nur für Superadministratoren sichtbar." />
);
}
const clientGridData = (overview?.clients || []).map(client => ({
...client,
displayName: client.description || client.hostname || client.uuid,
lastAliveDisplay: formatTimestamp(client.lastAlive),
currentProcessDisplay: client.currentProcess || 'kein Prozess',
processStatusDisplay: client.processStatus || 'unbekannt',
errorCount: client.logCounts24h.error,
warnCount: client.logCounts24h.warn,
}));
return (
<div className="monitoring-page">
<div className="monitoring-header-row">
<div>
<h2 className="monitoring-title">Monitor-Dashboard</h2>
<p className="monitoring-subtitle">
Live-Zustand der Infoscreen-Clients, Prozessstatus und zentrale Fehlerprotokolle.
</p>
</div>
<div className="monitoring-toolbar">
<div className="monitoring-toolbar-field">
<label>Zeitraum</label>
<DropDownListComponent
dataSource={hourOptions}
fields={{ text: 'text', value: 'value' }}
value={hours}
change={(args: { value: number }) => setHours(Number(args.value))}
/>
</div>
<ButtonComponent cssClass="e-primary" onClick={() => loadOverview(hours)} disabled={loading}>
Aktualisieren
</ButtonComponent>
</div>
</div>
{error && <MessageComponent severity="Error" content={error} />}
{overview && (
<div className="monitoring-meta-row">
<span>Stand: {formatTimestamp(overview.timestamp)}</span>
<span>Alive-Fenster: {overview.gracePeriodSeconds} Sekunden</span>
<span>Betrachtungszeitraum: {overview.periodHours} Stunden</span>
</div>
)}
<div className="monitoring-summary-grid">
{renderMetricCard('Clients gesamt', overview?.summary.totalClients || 0, 'Registrierte Displays', '#7c3aed')}
{renderMetricCard('Online', overview?.summary.onlineClients || 0, 'Heartbeat innerhalb der Grace-Periode', '#15803d')}
{renderMetricCard('Warnungen', overview?.summary.warningClients || 0, 'Warn-Logs oder Übergangszustände', '#d97706')}
{renderMetricCard('Kritisch', overview?.summary.criticalClients || 0, 'Crashs oder Fehler-Logs', '#dc2626')}
{renderMetricCard('Offline', overview?.summary.offlineClients || 0, 'Keine frischen Signale', '#475569')}
{renderMetricCard('Prioritäts-Screens', overview?.summary.activePriorityScreenshots || 0, 'Event-Start/Stop aktiv', '#ea580c')}
{renderMetricCard('Fehler-Logs', overview?.summary.errorLogs || 0, 'Im gewählten Zeitraum', '#b91c1c')}
</div>
{loading && !overview ? (
<MessageComponent severity="Info" content="Monitoring-Daten werden geladen ..." />
) : (
<div className="monitoring-main-grid">
<div className="monitoring-panel monitoring-clients-panel">
<div className="monitoring-panel-header">
<h3>Client-Zustand</h3>
<span>{overview?.clients.length || 0} Einträge</span>
</div>
<GridComponent
dataSource={clientGridData}
allowPaging={true}
pageSettings={{ pageSize: 10 }}
allowSorting={true}
toolbar={['Search']}
height={460}
rowSelected={(args: { data: MonitoringClient }) => {
setSelectedClientUuid(args.data.uuid);
}}
>
<ColumnsDirective>
<ColumnDirective
field="status"
headerText="Status"
width="120"
template={(props: MonitoringClient) => statusBadge(props.status)}
/>
<ColumnDirective field="displayName" headerText="Client" width="190" />
<ColumnDirective field="groupName" headerText="Gruppe" width="150" />
<ColumnDirective field="currentProcessDisplay" headerText="Prozess" width="130" />
<ColumnDirective field="processStatusDisplay" headerText="Prozessstatus" width="130" />
<ColumnDirective field="errorCount" headerText="ERROR" textAlign="Right" width="90" />
<ColumnDirective field="warnCount" headerText="WARN" textAlign="Right" width="90" />
<ColumnDirective field="lastAliveDisplay" headerText="Letztes Signal" width="170" />
</ColumnsDirective>
<Inject services={[Page, Search, Sort, Toolbar]} />
</GridComponent>
</div>
<div className="monitoring-sidebar-column">
<div className="e-card monitoring-detail-card">
<div className="e-card-header">
<div className="e-card-header-caption">
<div className="e-card-title">Aktiver Client</div>
</div>
</div>
<div className="e-card-content">
{selectedClient ? (
<div className="monitoring-detail-list">
<div className="monitoring-detail-row">
<span>Name</span>
<strong>{selectedClient.description || selectedClient.hostname || selectedClient.uuid}</strong>
</div>
<div className="monitoring-detail-row">
<span>Status</span>
<strong>{statusBadge(selectedClient.status)}</strong>
</div>
<div className="monitoring-detail-row">
<span>UUID</span>
<strong className="monitoring-mono">{selectedClient.uuid}</strong>
</div>
<div className="monitoring-detail-row">
<span>Raumgruppe</span>
<strong>{selectedClient.groupName || 'Nicht zugeordnet'}</strong>
</div>
<div className="monitoring-detail-row">
<span>Prozess</span>
<strong>{selectedClient.currentProcess || 'kein Prozess'}</strong>
</div>
<div className="monitoring-detail-row">
<span>PID</span>
<strong>{selectedClient.processPid || 'keine PID'}</strong>
</div>
<div className="monitoring-detail-row">
<span>Event-ID</span>
<strong>{selectedClient.currentEventId || 'keine Zuordnung'}</strong>
</div>
<div className="monitoring-detail-row">
<span>Letztes Signal</span>
<strong>{formatRelative(selectedClient.lastAlive)}</strong>
</div>
<div className="monitoring-detail-row">
<span>Bildschirmstatus</span>
<strong>{selectedClient.screenHealthStatus || 'UNKNOWN'}</strong>
</div>
<div className="monitoring-detail-row">
<span>Letzte Analyse</span>
<strong>{formatTimestamp(selectedClient.lastScreenshotAnalyzed)}</strong>
</div>
<div className="monitoring-detail-row">
<span>Screenshot-Typ</span>
<strong>
{screenshotTypeBadge(
selectedClient.latestScreenshotType,
!!selectedClient.hasActivePriorityScreenshot
)}
</strong>
</div>
{selectedClient.priorityScreenshotReceivedAt && (
<div className="monitoring-detail-row">
<span>Priorität empfangen</span>
<strong>{formatTimestamp(selectedClient.priorityScreenshotReceivedAt)}</strong>
</div>
)}
</div>
) : (
<MessageComponent severity="Info" content="Wählen Sie links einen Client aus." />
)}
</div>
</div>
<div className="e-card monitoring-detail-card">
<div className="e-card-header">
<div className="e-card-header-caption">
<div className="e-card-title">Der letzte Screenshot</div>
</div>
</div>
<div className="e-card-content">
{selectedClient ? (
<>
{screenshotErrored ? (
<MessageComponent severity="Warning" content="Für diesen Client liegt noch kein Screenshot vor." />
) : (
<img
src={selectedClientScreenshotUrl || selectedClient.screenshotUrl}
alt={`Screenshot ${selectedClient.uuid}`}
className="monitoring-screenshot"
onError={() => setScreenshotErrored(true)}
/>
)}
<div className="monitoring-screenshot-meta">
<span>Empfangen: {formatTimestamp(selectedClient.lastScreenshotAnalyzed)}</span>
<span>
Typ:{' '}
{screenshotTypeBadge(
selectedClient.latestScreenshotType,
!!selectedClient.hasActivePriorityScreenshot
)}
</span>
</div>
</>
) : (
<MessageComponent severity="Info" content="Kein Client ausgewählt." />
)}
</div>
</div>
<div className="e-card monitoring-detail-card">
<div className="e-card-header">
<div className="e-card-header-caption">
<div className="e-card-title">Letzter Fehler</div>
</div>
</div>
<div className="e-card-content">
{selectedClient?.latestError ? (
<div className="monitoring-error-box">
<div className="monitoring-error-time">{formatTimestamp(selectedClient.latestError.timestamp)}</div>
<div className="monitoring-error-message">{selectedClient.latestError.message}</div>
</div>
) : (
<MessageComponent severity="Success" content="Kein ERROR-Log für den ausgewählten Client gefunden." />
)}
</div>
</div>
</div>
</div>
)}
<div className="monitoring-lower-grid">
<div className="monitoring-panel">
<div className="monitoring-panel-header monitoring-panel-header-stacked">
<div>
<h3>Client-Logs</h3>
<span>{selectedClient ? `Client ${selectedClient.uuid}` : 'Kein Client ausgewählt'}</span>
</div>
<div className="monitoring-toolbar-field monitoring-toolbar-field-compact">
<label>Level</label>
<DropDownListComponent
dataSource={logLevelOptions}
fields={{ text: 'text', value: 'value' }}
value={logLevel}
change={(args: { value: string }) => setLogLevel(String(args.value))}
/>
</div>
</div>
{logsLoading && <MessageComponent severity="Info" content="Client-Logs werden geladen ..." />}
<GridComponent
dataSource={clientLogs}
allowPaging={true}
pageSettings={{ pageSize: 8 }}
allowSorting={true}
height={320}
rowSelected={(args: { data: MonitoringLogEntry }) => {
setSelectedLogEntry(args.data);
}}
>
<ColumnsDirective>
<ColumnDirective field="timestamp" headerText="Zeit" width="180" template={(props: MonitoringLogEntry) => formatTimestamp(props.timestamp)} />
<ColumnDirective field="level" headerText="Level" width="90" />
<ColumnDirective field="message" headerText="Nachricht" width="360" />
</ColumnsDirective>
<Inject services={[Page, Sort]} />
</GridComponent>
</div>
<div className="monitoring-panel">
<div className="monitoring-panel-header">
<h3>Letzte Fehler systemweit</h3>
<span>{recentErrors.length} Einträge</span>
</div>
<GridComponent dataSource={recentErrors} allowPaging={true} pageSettings={{ pageSize: 8 }} allowSorting={true} height={320}>
<ColumnsDirective>
<ColumnDirective field="timestamp" headerText="Zeit" width="180" template={(props: MonitoringLogEntry) => formatTimestamp(props.timestamp)} />
<ColumnDirective field="client_uuid" headerText="Client" width="220" />
<ColumnDirective field="message" headerText="Nachricht" width="360" />
</ColumnsDirective>
<Inject services={[Page, Sort]} />
</GridComponent>
</div>
</div>
<DialogComponent
isModal={true}
visible={!!selectedLogEntry}
width="860px"
minHeight="420px"
header="Log-Details"
animationSettings={{ effect: 'None' }}
buttons={[]}
showCloseIcon={true}
close={() => setSelectedLogEntry(null)}
>
{selectedLogEntry && (
<div className="monitoring-log-dialog-body">
<div className="monitoring-log-dialog-content">
<div className="monitoring-log-detail-row">
<span>Zeit</span>
<strong>{formatTimestamp(selectedLogEntry.timestamp)}</strong>
</div>
<div className="monitoring-log-detail-row">
<span>Level</span>
<strong>{selectedLogEntry.level || 'Unbekannt'}</strong>
</div>
<div className="monitoring-log-detail-row">
<span>Nachricht</span>
<strong style={{ whiteSpace: 'normal', textAlign: 'left' }}>{selectedLogEntry.message}</strong>
</div>
<div>
<div className="monitoring-log-context-title">Kontext</div>
<pre className="monitoring-log-context">{renderContext(selectedLogEntry.context)}</pre>
</div>
</div>
<div className="monitoring-log-dialog-actions">
<ButtonComponent onClick={() => setSelectedLogEntry(null)}>Schließen</ButtonComponent>
</div>
</div>
)}
</DialogComponent>
</div>
);
};
export default MonitoringDashboard;

View File

@@ -33,7 +33,7 @@ const Ressourcen: React.FC = () => {
const [groupOrder, setGroupOrder] = useState<number[]>([]);
const [showOrderPanel, setShowOrderPanel] = useState<boolean>(false);
const [timelineView] = useState<TimelineView>('day');
const [viewDate] = useState<Date>(() => {
const [viewDate, setViewDate] = useState<Date>(() => {
const now = new Date();
now.setHours(0, 0, 0, 0);
return now;
@@ -110,23 +110,31 @@ const Ressourcen: React.FC = () => {
for (const group of groups) {
try {
console.log(`[Ressourcen] Fetching events for group "${group.name}" (ID: ${group.id})`);
const apiEvents = await fetchEvents(group.id.toString(), false, {
const apiEvents = await fetchEvents(group.id.toString(), true, {
start,
end,
});
console.log(`[Ressourcen] Got ${apiEvents?.length || 0} events for group "${group.name}"`);
if (Array.isArray(apiEvents) && apiEvents.length > 0) {
const event = apiEvents[0];
const eventTitle = event.subject || event.title || 'Unnamed Event';
const eventType = event.type || event.event_type || 'other';
const eventStart = event.startTime || event.start;
const eventEnd = event.endTime || event.end;
for (const event of apiEvents) {
const eventTitle = event.subject || event.title || 'Unnamed Event';
const eventType = event.type || event.event_type || 'other';
const eventStart = event.startTime || event.start;
const eventEnd = event.endTime || event.end;
if (!eventStart || !eventEnd) {
continue;
}
if (eventStart && eventEnd) {
const parsedStart = parseUTCDate(eventStart);
const parsedEnd = parseUTCDate(eventEnd);
// Keep only events that overlap the visible range.
if (parsedEnd < start || parsedStart > end) {
continue;
}
// Capitalize first letter of event type
const formattedType = eventType.charAt(0).toUpperCase() + eventType.slice(1);
@@ -138,7 +146,6 @@ const Ressourcen: React.FC = () => {
ResourceId: group.id,
EventType: eventType,
});
console.log(`[Ressourcen] Group "${group.name}" has event: ${eventTitle}`);
}
}
} catch (error) {
@@ -324,6 +331,16 @@ const Ressourcen: React.FC = () => {
group={{ resources: ['Groups'], allowGroupEdit: false }}
timeScale={{ interval: 60, slotCount: 1 }}
rowAutoHeight={false}
actionComplete={(args) => {
if (args.requestType === 'dateNavigate' || args.requestType === 'viewNavigate') {
const selected = scheduleRef.current?.selectedDate;
if (selected) {
const normalized = new Date(selected);
normalized.setHours(0, 0, 0, 0);
setViewDate(normalized);
}
}
}}
>
<ViewsDirective>
<ViewDirective option="TimelineDay" displayName="Tag"></ViewDirective>

View File

@@ -18,8 +18,9 @@ services:
environment:
- DB_CONN=mysql+pymysql://${DB_USER}:${DB_PASSWORD}@db/${DB_NAME}
- DB_URL=mysql+pymysql://${DB_USER}:${DB_PASSWORD}@db/${DB_NAME}
- ENV=${ENV:-development}
- FLASK_SECRET_KEY=${FLASK_SECRET_KEY:-dev-secret-key-change-in-production}
- API_BASE_URL=http://server:8000
- ENV=${ENV:-development}
- FLASK_SECRET_KEY=${FLASK_SECRET_KEY:-dev-secret-key-change-in-production}
- DEFAULT_SUPERADMIN_USERNAME=${DEFAULT_SUPERADMIN_USERNAME:-superadmin}
- DEFAULT_SUPERADMIN_PASSWORD=${DEFAULT_SUPERADMIN_PASSWORD}
# 🔧 ENTFERNT: Volume-Mount ist nur für die Entwicklung

View File

@@ -3,15 +3,17 @@ import json
import logging
import datetime
import base64
import re
import requests
import paho.mqtt.client as mqtt
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from models.models import Client
from models.models import Client, ClientLog, LogLevel, ProcessStatus, ScreenHealthStatus
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s [%(levelname)s] %(message)s')
# Load .env in development
if os.getenv("ENV", "development") == "development":
# Load .env only when not already configured by Docker (API_BASE_URL not set by compose means we're outside a container)
_api_already_set = bool(os.environ.get("API_BASE_URL"))
if not _api_already_set and os.getenv("ENV", "development") == "development":
try:
from dotenv import load_dotenv
load_dotenv(".env")
@@ -30,6 +32,288 @@ Session = sessionmaker(bind=engine)
# API configuration
API_BASE_URL = os.getenv("API_BASE_URL", "http://server:8000")
# Dashboard payload migration observability
DASHBOARD_METRICS_LOG_EVERY = int(os.getenv("DASHBOARD_METRICS_LOG_EVERY", "5"))
DASHBOARD_PARSE_METRICS = {
"v2_success": 0,
"parse_failures": 0,
}
def normalize_process_status(value):
if value is None:
return None
if isinstance(value, ProcessStatus):
return value
normalized = str(value).strip().lower()
if not normalized:
return None
try:
return ProcessStatus(normalized)
except ValueError:
return None
def normalize_event_id(value):
if value is None or isinstance(value, bool):
return None
if isinstance(value, int):
return value
if isinstance(value, float):
return int(value)
normalized = str(value).strip()
if not normalized:
return None
if normalized.isdigit():
return int(normalized)
match = re.search(r"(\d+)$", normalized)
if match:
return int(match.group(1))
return None
def parse_timestamp(value):
if not value:
return None
if isinstance(value, (int, float)):
try:
ts_value = float(value)
if ts_value > 1e12:
ts_value = ts_value / 1000.0
return datetime.datetime.fromtimestamp(ts_value, datetime.UTC)
except (TypeError, ValueError, OverflowError):
return None
try:
value_str = str(value).strip()
if value_str.isdigit():
ts_value = float(value_str)
if ts_value > 1e12:
ts_value = ts_value / 1000.0
return datetime.datetime.fromtimestamp(ts_value, datetime.UTC)
parsed = datetime.datetime.fromisoformat(value_str.replace('Z', '+00:00'))
if parsed.tzinfo is None:
return parsed.replace(tzinfo=datetime.UTC)
return parsed.astimezone(datetime.UTC)
except ValueError:
return None
def infer_screen_health_status(payload_data):
explicit = payload_data.get('screen_health_status')
if explicit:
try:
return ScreenHealthStatus[str(explicit).strip().upper()]
except KeyError:
pass
metrics = payload_data.get('health_metrics') or {}
if metrics.get('screen_on') is False:
return ScreenHealthStatus.BLACK
last_frame_update = parse_timestamp(metrics.get('last_frame_update'))
if last_frame_update:
age_seconds = (datetime.datetime.now(datetime.UTC) - last_frame_update).total_seconds()
if age_seconds > 30:
return ScreenHealthStatus.FROZEN
return ScreenHealthStatus.OK
return None
def apply_monitoring_update(client_obj, *, event_id=None, process_name=None, process_pid=None,
process_status=None, last_seen=None, screen_health_status=None,
last_screenshot_analyzed=None):
if last_seen:
client_obj.last_alive = last_seen
normalized_event_id = normalize_event_id(event_id)
if normalized_event_id is not None:
client_obj.current_event_id = normalized_event_id
if process_name is not None:
client_obj.current_process = process_name
if process_pid is not None:
client_obj.process_pid = process_pid
normalized_status = normalize_process_status(process_status)
if normalized_status is not None:
client_obj.process_status = normalized_status
if screen_health_status is not None:
client_obj.screen_health_status = screen_health_status
if last_screenshot_analyzed is not None:
existing = client_obj.last_screenshot_analyzed
if existing is not None and existing.tzinfo is None:
existing = existing.replace(tzinfo=datetime.UTC)
candidate = last_screenshot_analyzed
if candidate.tzinfo is None:
candidate = candidate.replace(tzinfo=datetime.UTC)
if existing is None or candidate >= existing:
client_obj.last_screenshot_analyzed = candidate
def _normalize_screenshot_type(raw_type):
if raw_type is None:
return None
normalized = str(raw_type).strip().lower()
if normalized in ("periodic", "event_start", "event_stop"):
return normalized
return None
def _classify_dashboard_payload(data):
"""
Classify dashboard payload into migration categories for observability.
"""
if not isinstance(data, dict):
return "parse_failures", None
message_obj = data.get("message") if isinstance(data.get("message"), dict) else None
content_obj = data.get("content") if isinstance(data.get("content"), dict) else None
metadata_obj = data.get("metadata") if isinstance(data.get("metadata"), dict) else None
schema_version = metadata_obj.get("schema_version") if metadata_obj else None
# v2 detection: grouped blocks available with metadata.
if message_obj is not None and content_obj is not None and metadata_obj is not None:
return "v2_success", schema_version
return "parse_failures", schema_version
def _record_dashboard_parse_metric(mode, uuid, schema_version=None, reason=None):
if mode not in DASHBOARD_PARSE_METRICS:
mode = "parse_failures"
DASHBOARD_PARSE_METRICS[mode] += 1
total = sum(DASHBOARD_PARSE_METRICS.values())
if mode == "v2_success":
if schema_version is None:
logging.warning(f"Dashboard payload from {uuid}: missing metadata.schema_version for grouped payload")
else:
version_text = str(schema_version).strip()
if not version_text.startswith("2"):
logging.warning(f"Dashboard payload from {uuid}: unknown schema_version={version_text}")
if mode == "parse_failures":
if reason:
logging.warning(f"Dashboard payload parse failure for {uuid}: {reason}")
else:
logging.warning(f"Dashboard payload parse failure for {uuid}")
if DASHBOARD_METRICS_LOG_EVERY > 0 and total % DASHBOARD_METRICS_LOG_EVERY == 0:
logging.info(
"Dashboard payload metrics: "
f"total={total}, "
f"v2_success={DASHBOARD_PARSE_METRICS['v2_success']}, "
f"parse_failures={DASHBOARD_PARSE_METRICS['parse_failures']}"
)
def _validate_v2_required_fields(data, uuid):
"""
Soft validation of required v2 fields for grouped dashboard payloads.
Logs a WARNING for each missing field. Never drops the message.
"""
message_obj = data.get("message") if isinstance(data.get("message"), dict) else {}
metadata_obj = data.get("metadata") if isinstance(data.get("metadata"), dict) else {}
capture_obj = metadata_obj.get("capture") if isinstance(metadata_obj.get("capture"), dict) else {}
missing = []
if not message_obj.get("client_id"):
missing.append("message.client_id")
if not message_obj.get("status"):
missing.append("message.status")
if not metadata_obj.get("schema_version"):
missing.append("metadata.schema_version")
if not capture_obj.get("type"):
missing.append("metadata.capture.type")
if missing:
logging.warning(
f"Dashboard v2 payload from {uuid} missing required fields: {', '.join(missing)}"
)
def _extract_dashboard_payload_fields(data):
"""
Parse dashboard payload fields from the grouped v2 schema only.
"""
if not isinstance(data, dict):
return {
"image": None,
"timestamp": None,
"screenshot_type": None,
"status": None,
"process_health": {},
}
# v2 grouped payload blocks
message_obj = data.get("message") if isinstance(data.get("message"), dict) else None
content_obj = data.get("content") if isinstance(data.get("content"), dict) else None
runtime_obj = data.get("runtime") if isinstance(data.get("runtime"), dict) else None
metadata_obj = data.get("metadata") if isinstance(data.get("metadata"), dict) else None
screenshot_obj = None
if isinstance(content_obj, dict) and isinstance(content_obj.get("screenshot"), dict):
screenshot_obj = content_obj.get("screenshot")
capture_obj = metadata_obj.get("capture") if metadata_obj and isinstance(metadata_obj.get("capture"), dict) else None
# Screenshot type comes from v2 metadata.capture.type.
screenshot_type = _normalize_screenshot_type(capture_obj.get("type") if capture_obj else None)
# Image from v2 content.screenshot.
image_value = None
for container in (screenshot_obj,):
if not isinstance(container, dict):
continue
for key in ("data", "image"):
value = container.get(key)
if isinstance(value, str) and value:
image_value = value
break
if image_value is not None:
break
# Timestamp precedence: v2 screenshot.timestamp -> capture.captured_at -> metadata.published_at
timestamp_value = None
timestamp_candidates = [
screenshot_obj.get("timestamp") if screenshot_obj else None,
capture_obj.get("captured_at") if capture_obj else None,
metadata_obj.get("published_at") if metadata_obj else None,
]
for value in timestamp_candidates:
if value is not None:
timestamp_value = value
break
# Monitoring fields from v2 message/runtime.
status_value = (message_obj or {}).get("status")
process_health = (runtime_obj or {}).get("process_health")
if not isinstance(process_health, dict):
process_health = {}
return {
"image": image_value,
"timestamp": timestamp_value,
"screenshot_type": screenshot_type,
"status": status_value,
"process_health": process_health,
}
def handle_screenshot(uuid, payload):
"""
@@ -40,13 +324,21 @@ def handle_screenshot(uuid, payload):
# Try to parse as JSON first
try:
data = json.loads(payload.decode())
if "image" in data:
extracted = _extract_dashboard_payload_fields(data)
image_b64 = extracted["image"]
timestamp_value = extracted["timestamp"]
screenshot_type = extracted["screenshot_type"]
if image_b64:
# Payload is JSON with base64 image
api_payload = {"image": data["image"]}
api_payload = {"image": image_b64}
if timestamp_value is not None:
api_payload["timestamp"] = timestamp_value
if screenshot_type:
api_payload["screenshot_type"] = screenshot_type
headers = {"Content-Type": "application/json"}
logging.debug(f"Forwarding base64 screenshot from {uuid} to API")
else:
logging.warning(f"Screenshot JSON from {uuid} missing 'image' field")
logging.warning(f"Screenshot JSON from {uuid} missing image/data field")
return
except (json.JSONDecodeError, UnicodeDecodeError):
# Payload is raw binary image data - encode to base64 for API
@@ -78,7 +370,14 @@ def on_connect(client, userdata, flags, reasonCode, properties):
client.subscribe("infoscreen/+/heartbeat")
client.subscribe("infoscreen/+/screenshot")
client.subscribe("infoscreen/+/dashboard")
logging.info(f"MQTT connected (reasonCode: {reasonCode}); (re)subscribed to discovery, heartbeats, screenshots, and dashboards")
# Subscribe to monitoring topics
client.subscribe("infoscreen/+/logs/error")
client.subscribe("infoscreen/+/logs/warn")
client.subscribe("infoscreen/+/logs/info")
client.subscribe("infoscreen/+/health")
logging.info(f"MQTT connected (reasonCode: {reasonCode}); (re)subscribed to discovery, heartbeats, screenshots, dashboards, logs, and health")
except Exception as e:
logging.error(f"Subscribe failed on connect: {e}")
@@ -94,24 +393,37 @@ def on_message(client, userdata, msg):
try:
payload_text = msg.payload.decode()
data = json.loads(payload_text)
shot = data.get("screenshot")
if isinstance(shot, dict):
# Prefer 'data' field (base64) inside screenshot object
image_b64 = shot.get("data")
if image_b64:
logging.debug(f"Dashboard enthält Screenshot für {uuid}; Weiterleitung an API")
# Build a lightweight JSON with image field for API handler
api_payload = json.dumps({"image": image_b64}).encode("utf-8")
handle_screenshot(uuid, api_payload)
parse_mode, schema_version = _classify_dashboard_payload(data)
_record_dashboard_parse_metric(parse_mode, uuid, schema_version=schema_version)
if parse_mode == "v2_success":
_validate_v2_required_fields(data, uuid)
extracted = _extract_dashboard_payload_fields(data)
image_b64 = extracted["image"]
ts_value = extracted["timestamp"]
screenshot_type = extracted["screenshot_type"]
if image_b64:
logging.debug(f"Dashboard enthält Screenshot für {uuid}; Weiterleitung an API")
# Forward original v2 payload so handle_screenshot can parse grouped fields.
handle_screenshot(uuid, msg.payload)
# Update last_alive if status present
if data.get("status") == "alive":
if extracted["status"] == "alive":
session = Session()
client_obj = session.query(Client).filter_by(uuid=uuid).first()
if client_obj:
client_obj.last_alive = datetime.datetime.now(datetime.UTC)
process_health = extracted["process_health"]
apply_monitoring_update(
client_obj,
last_seen=datetime.datetime.now(datetime.UTC),
event_id=process_health.get('event_id'),
process_name=process_health.get('current_process') or process_health.get('process'),
process_pid=process_health.get('process_pid') or process_health.get('pid'),
process_status=process_health.get('process_status') or process_health.get('status'),
)
session.commit()
session.close()
except Exception as e:
_record_dashboard_parse_metric("parse_failures", uuid, reason=str(e))
logging.error(f"Fehler beim Verarbeiten des Dashboard-Payloads von {uuid}: {e}")
return
@@ -124,15 +436,110 @@ def on_message(client, userdata, msg):
# Heartbeat-Handling
if topic.startswith("infoscreen/") and topic.endswith("/heartbeat"):
uuid = topic.split("/")[1]
try:
# Parse payload to get optional health data
payload_data = json.loads(msg.payload.decode())
except (json.JSONDecodeError, UnicodeDecodeError):
payload_data = {}
session = Session()
client_obj = session.query(Client).filter_by(uuid=uuid).first()
if client_obj:
client_obj.last_alive = datetime.datetime.now(datetime.UTC)
apply_monitoring_update(
client_obj,
last_seen=datetime.datetime.now(datetime.UTC),
event_id=payload_data.get('current_event_id'),
process_name=payload_data.get('current_process'),
process_pid=payload_data.get('process_pid'),
process_status=payload_data.get('process_status'),
)
session.commit()
logging.info(
f"Heartbeat von {uuid} empfangen, last_alive (UTC) aktualisiert.")
logging.info(f"Heartbeat von {uuid} empfangen, last_alive (UTC) aktualisiert.")
session.close()
return
# Log-Handling (ERROR, WARN, INFO)
if topic.startswith("infoscreen/") and "/logs/" in topic:
parts = topic.split("/")
if len(parts) >= 4:
uuid = parts[1]
level_str = parts[3].upper() # 'error', 'warn', 'info' -> 'ERROR', 'WARN', 'INFO'
try:
payload_data = json.loads(msg.payload.decode())
message = payload_data.get('message', '')
timestamp_str = payload_data.get('timestamp')
context = payload_data.get('context', {})
# Parse timestamp or use current time
if timestamp_str:
try:
log_timestamp = datetime.datetime.fromisoformat(timestamp_str.replace('Z', '+00:00'))
if log_timestamp.tzinfo is None:
log_timestamp = log_timestamp.replace(tzinfo=datetime.UTC)
except ValueError:
log_timestamp = datetime.datetime.now(datetime.UTC)
else:
log_timestamp = datetime.datetime.now(datetime.UTC)
# Store in database
session = Session()
try:
log_level = LogLevel[level_str]
log_entry = ClientLog(
client_uuid=uuid,
timestamp=log_timestamp,
level=log_level,
message=message,
context=json.dumps(context) if context else None
)
session.add(log_entry)
session.commit()
logging.info(f"[{level_str}] {uuid}: {message}")
except Exception as e:
logging.error(f"Error saving log from {uuid}: {e}")
session.rollback()
finally:
session.close()
except (json.JSONDecodeError, UnicodeDecodeError) as e:
logging.error(f"Could not parse log payload from {uuid}: {e}")
return
# Health-Handling
if topic.startswith("infoscreen/") and topic.endswith("/health"):
uuid = topic.split("/")[1]
try:
payload_data = json.loads(msg.payload.decode())
session = Session()
client_obj = session.query(Client).filter_by(uuid=uuid).first()
if client_obj:
# Update expected state
expected = payload_data.get('expected_state', {})
# Update actual state
actual = payload_data.get('actual_state', {})
screen_health_status = infer_screen_health_status(payload_data)
apply_monitoring_update(
client_obj,
last_seen=datetime.datetime.now(datetime.UTC),
event_id=expected.get('event_id'),
process_name=actual.get('process'),
process_pid=actual.get('pid'),
process_status=actual.get('status'),
screen_health_status=screen_health_status,
last_screenshot_analyzed=parse_timestamp((payload_data.get('health_metrics') or {}).get('last_frame_update')),
)
session.commit()
logging.debug(f"Health update from {uuid}: {actual.get('process')} ({actual.get('status')})")
session.close()
except (json.JSONDecodeError, UnicodeDecodeError) as e:
logging.error(f"Could not parse health payload from {uuid}: {e}")
except Exception as e:
logging.error(f"Error processing health from {uuid}: {e}")
return
# Discovery-Handling
if topic == "infoscreen/discovery":

View File

@@ -0,0 +1,378 @@
"""
Mixed-format integration tests for the dashboard payload parser.
Tests cover:
- Legacy top-level payload is rejected (v2-only mode)
- v2 grouped payload: periodic capture
- v2 grouped payload: event_start capture
- v2 grouped payload: event_stop capture
- Classification into v2_success / parse_failures
- Soft required-field validation (v2 only, never drops message)
- Edge cases: missing image, missing status, non-dict payload
"""
import sys
import os
import logging
import importlib.util
# listener/ has no __init__.py — load the module directly from its file path
os.environ.setdefault("DB_CONN", "sqlite:///:memory:") # prevent DB engine errors on import
_LISTENER_PATH = os.path.join(os.path.dirname(__file__), "listener.py")
_spec = importlib.util.spec_from_file_location("listener_module", _LISTENER_PATH)
_mod = importlib.util.module_from_spec(_spec)
_spec.loader.exec_module(_mod)
_extract_dashboard_payload_fields = _mod._extract_dashboard_payload_fields
_classify_dashboard_payload = _mod._classify_dashboard_payload
_validate_v2_required_fields = _mod._validate_v2_required_fields
_normalize_screenshot_type = _mod._normalize_screenshot_type
DASHBOARD_PARSE_METRICS = _mod.DASHBOARD_PARSE_METRICS
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
IMAGE_B64 = "aGVsbG8=" # base64("hello")
LEGACY_PAYLOAD = {
"client_id": "uuid-legacy",
"status": "alive",
"screenshot": {
"data": IMAGE_B64,
"timestamp": "2026-03-30T10:00:00+00:00",
},
"screenshot_type": "periodic",
"process_health": {
"current_process": "impressive",
"process_pid": 1234,
"process_status": "running",
"event_id": 42,
},
}
def _make_v2(capture_type):
return {
"message": {
"client_id": "uuid-v2",
"status": "alive",
},
"content": {
"screenshot": {
"filename": "latest.jpg",
"data": IMAGE_B64,
"timestamp": "2026-03-30T10:15:41.123456+00:00",
"size": 6,
}
},
"runtime": {
"system_info": {
"hostname": "pi-display-01",
"ip": "192.168.1.42",
"uptime": 12345.0,
},
"process_health": {
"event_id": "evt-7",
"event_type": "presentation",
"current_process": "impressive",
"process_pid": 4123,
"process_status": "running",
"restart_count": 0,
},
},
"metadata": {
"schema_version": "2.0",
"producer": "simclient",
"published_at": "2026-03-30T10:15:42.004321+00:00",
"capture": {
"type": capture_type,
"captured_at": "2026-03-30T10:15:41.123456+00:00",
"age_s": 0.9,
"triggered": capture_type != "periodic",
"send_immediately": capture_type != "periodic",
},
"transport": {"qos": 0, "publisher": "simclient"},
},
}
V2_PERIODIC = _make_v2("periodic")
V2_EVT_START = _make_v2("event_start")
V2_EVT_STOP = _make_v2("event_stop")
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def assert_eq(label, actual, expected):
assert actual == expected, f"FAIL [{label}]: expected {expected!r}, got {actual!r}"
def assert_not_none(label, actual):
assert actual is not None, f"FAIL [{label}]: expected non-None, got None"
def assert_none(label, actual):
assert actual is None, f"FAIL [{label}]: expected None, got {actual!r}"
def assert_warns(label, fn, substring):
"""Assert that fn() emits a logging.WARNING containing substring."""
records = []
handler = logging.handlers_collector(records)
logger = logging.getLogger()
logger.addHandler(handler)
try:
fn()
finally:
logger.removeHandler(handler)
warnings = [r.getMessage() for r in records if r.levelno == logging.WARNING]
assert any(substring in w for w in warnings), (
f"FAIL [{label}]: no WARNING containing {substring!r} found in {warnings}"
)
class _CapturingHandler(logging.Handler):
def __init__(self, records):
super().__init__()
self._records = records
def emit(self, record):
self._records.append(record)
def capture_warnings(fn):
"""Run fn(), return list of WARNING message strings."""
records = []
handler = _CapturingHandler(records)
logger = logging.getLogger()
logger.addHandler(handler)
try:
fn()
finally:
logger.removeHandler(handler)
return [r.getMessage() for r in records if r.levelno == logging.WARNING]
# ---------------------------------------------------------------------------
# Tests: _normalize_screenshot_type
# ---------------------------------------------------------------------------
def test_normalize_known_types():
for t in ("periodic", "event_start", "event_stop"):
assert_eq(f"normalize_{t}", _normalize_screenshot_type(t), t)
assert_eq(f"normalize_{t}_upper", _normalize_screenshot_type(t.upper()), t)
def test_normalize_unknown_returns_none():
assert_none("normalize_unknown", _normalize_screenshot_type("unknown"))
assert_none("normalize_none", _normalize_screenshot_type(None))
assert_none("normalize_empty", _normalize_screenshot_type(""))
# ---------------------------------------------------------------------------
# Tests: _classify_dashboard_payload
# ---------------------------------------------------------------------------
def test_classify_legacy():
mode, ver = _classify_dashboard_payload(LEGACY_PAYLOAD)
assert_eq("classify_legacy_mode", mode, "parse_failures")
assert_none("classify_legacy_version", ver)
def test_classify_v2_periodic():
mode, ver = _classify_dashboard_payload(V2_PERIODIC)
assert_eq("classify_v2_periodic_mode", mode, "v2_success")
assert_eq("classify_v2_periodic_version", ver, "2.0")
def test_classify_v2_event_start():
mode, ver = _classify_dashboard_payload(V2_EVT_START)
assert_eq("classify_v2_event_start_mode", mode, "v2_success")
def test_classify_v2_event_stop():
mode, ver = _classify_dashboard_payload(V2_EVT_STOP)
assert_eq("classify_v2_event_stop_mode", mode, "v2_success")
def test_classify_non_dict():
mode, ver = _classify_dashboard_payload("not a dict")
assert_eq("classify_non_dict", mode, "parse_failures")
def test_classify_empty_dict():
mode, ver = _classify_dashboard_payload({})
assert_eq("classify_empty_dict", mode, "parse_failures")
# ---------------------------------------------------------------------------
# Tests: _extract_dashboard_payload_fields — legacy payload rejected in v2-only mode
# ---------------------------------------------------------------------------
def test_legacy_image_not_extracted():
r = _extract_dashboard_payload_fields(LEGACY_PAYLOAD)
assert_none("legacy_image", r["image"])
def test_legacy_screenshot_type_missing():
r = _extract_dashboard_payload_fields(LEGACY_PAYLOAD)
assert_none("legacy_screenshot_type", r["screenshot_type"])
def test_legacy_status_missing():
r = _extract_dashboard_payload_fields(LEGACY_PAYLOAD)
assert_none("legacy_status", r["status"])
def test_legacy_process_health_empty():
r = _extract_dashboard_payload_fields(LEGACY_PAYLOAD)
assert_eq("legacy_process_health", r["process_health"], {})
def test_legacy_timestamp_missing():
r = _extract_dashboard_payload_fields(LEGACY_PAYLOAD)
assert_none("legacy_timestamp", r["timestamp"])
# ---------------------------------------------------------------------------
# Tests: _extract_dashboard_payload_fields — v2 periodic
# ---------------------------------------------------------------------------
def test_v2_periodic_image():
r = _extract_dashboard_payload_fields(V2_PERIODIC)
assert_eq("v2_periodic_image", r["image"], IMAGE_B64)
def test_v2_periodic_screenshot_type():
r = _extract_dashboard_payload_fields(V2_PERIODIC)
assert_eq("v2_periodic_type", r["screenshot_type"], "periodic")
def test_v2_periodic_status():
r = _extract_dashboard_payload_fields(V2_PERIODIC)
assert_eq("v2_periodic_status", r["status"], "alive")
def test_v2_periodic_process_health():
r = _extract_dashboard_payload_fields(V2_PERIODIC)
assert_eq("v2_periodic_pid", r["process_health"]["process_pid"], 4123)
assert_eq("v2_periodic_process", r["process_health"]["current_process"], "impressive")
def test_v2_periodic_timestamp_prefers_screenshot():
r = _extract_dashboard_payload_fields(V2_PERIODIC)
# screenshot.timestamp must take precedence over capture.captured_at / published_at
assert_eq("v2_periodic_ts", r["timestamp"], "2026-03-30T10:15:41.123456+00:00")
# ---------------------------------------------------------------------------
# Tests: _extract_dashboard_payload_fields — v2 event_start
# ---------------------------------------------------------------------------
def test_v2_event_start_type():
r = _extract_dashboard_payload_fields(V2_EVT_START)
assert_eq("v2_event_start_type", r["screenshot_type"], "event_start")
def test_v2_event_start_image():
r = _extract_dashboard_payload_fields(V2_EVT_START)
assert_eq("v2_event_start_image", r["image"], IMAGE_B64)
# ---------------------------------------------------------------------------
# Tests: _extract_dashboard_payload_fields — v2 event_stop
# ---------------------------------------------------------------------------
def test_v2_event_stop_type():
r = _extract_dashboard_payload_fields(V2_EVT_STOP)
assert_eq("v2_event_stop_type", r["screenshot_type"], "event_stop")
def test_v2_event_stop_image():
r = _extract_dashboard_payload_fields(V2_EVT_STOP)
assert_eq("v2_event_stop_image", r["image"], IMAGE_B64)
# ---------------------------------------------------------------------------
# Tests: _extract_dashboard_payload_fields — edge cases
# ---------------------------------------------------------------------------
def test_non_dict_returns_nulls():
r = _extract_dashboard_payload_fields("bad")
assert_none("non_dict_image", r["image"])
assert_none("non_dict_type", r["screenshot_type"])
assert_none("non_dict_status", r["status"])
def test_missing_image_returns_none():
payload = {**V2_PERIODIC, "content": {"screenshot": {"timestamp": "2026-03-30T10:00:00+00:00"}}}
r = _extract_dashboard_payload_fields(payload)
assert_none("missing_image", r["image"])
def test_missing_process_health_returns_empty_dict():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["runtime"]["process_health"]
r = _extract_dashboard_payload_fields(payload)
assert_eq("missing_ph", r["process_health"], {})
def test_timestamp_fallback_to_captured_at_when_no_screenshot_ts():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["content"]["screenshot"]["timestamp"]
r = _extract_dashboard_payload_fields(payload)
assert_eq("ts_fallback_captured_at", r["timestamp"], "2026-03-30T10:15:41.123456+00:00")
def test_timestamp_fallback_to_published_at_when_no_capture_ts():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["content"]["screenshot"]["timestamp"]
del payload["metadata"]["capture"]["captured_at"]
r = _extract_dashboard_payload_fields(payload)
assert_eq("ts_fallback_published_at", r["timestamp"], "2026-03-30T10:15:42.004321+00:00")
# ---------------------------------------------------------------------------
# Tests: _validate_v2_required_fields (soft — never raises)
# ---------------------------------------------------------------------------
def test_v2_valid_payload_no_warnings():
warnings = capture_warnings(lambda: _validate_v2_required_fields(V2_PERIODIC, "uuid-v2"))
assert warnings == [], f"FAIL: unexpected warnings for valid payload: {warnings}"
def test_v2_missing_client_id_warns():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["message"]["client_id"]
warnings = capture_warnings(lambda: _validate_v2_required_fields(payload, "uuid-v2"))
assert any("message.client_id" in w for w in warnings), f"FAIL: {warnings}"
def test_v2_missing_status_warns():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["message"]["status"]
warnings = capture_warnings(lambda: _validate_v2_required_fields(payload, "uuid-v2"))
assert any("message.status" in w for w in warnings), f"FAIL: {warnings}"
def test_v2_missing_schema_version_warns():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["metadata"]["schema_version"]
warnings = capture_warnings(lambda: _validate_v2_required_fields(payload, "uuid-v2"))
assert any("metadata.schema_version" in w for w in warnings), f"FAIL: {warnings}"
def test_v2_missing_capture_type_warns():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["metadata"]["capture"]["type"]
warnings = capture_warnings(lambda: _validate_v2_required_fields(payload, "uuid-v2"))
assert any("metadata.capture.type" in w for w in warnings), f"FAIL: {warnings}"
def test_v2_multiple_missing_fields_all_reported():
import copy
payload = copy.deepcopy(V2_PERIODIC)
del payload["message"]["client_id"]
del payload["metadata"]["capture"]["type"]
warnings = capture_warnings(lambda: _validate_v2_required_fields(payload, "uuid-v2"))
assert len(warnings) == 1, f"FAIL: expected 1 combined warning, got {warnings}"
assert "message.client_id" in warnings[0], f"FAIL: {warnings}"
assert "metadata.capture.type" in warnings[0], f"FAIL: {warnings}"
# ---------------------------------------------------------------------------
# Runner
# ---------------------------------------------------------------------------
def run_all():
tests = {k: v for k, v in globals().items() if k.startswith("test_") and callable(v)}
passed = failed = 0
for name, fn in sorted(tests.items()):
try:
fn()
print(f" PASS {name}")
passed += 1
except AssertionError as e:
print(f" FAIL {name}: {e}")
failed += 1
except Exception as e:
print(f" ERROR {name}: {type(e).__name__}: {e}")
failed += 1
print(f"\n{passed} passed, {failed} failed out of {passed + failed} tests")
return failed == 0
if __name__ == "__main__":
ok = run_all()
sys.exit(0 if ok else 1)

View File

@@ -21,6 +21,27 @@ class AcademicPeriodType(enum.Enum):
trimester = "trimester"
class LogLevel(enum.Enum):
ERROR = "ERROR"
WARN = "WARN"
INFO = "INFO"
DEBUG = "DEBUG"
class ProcessStatus(enum.Enum):
running = "running"
crashed = "crashed"
starting = "starting"
stopped = "stopped"
class ScreenHealthStatus(enum.Enum):
OK = "OK"
BLACK = "BLACK"
FROZEN = "FROZEN"
UNKNOWN = "UNKNOWN"
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True, autoincrement=True)
@@ -106,6 +127,31 @@ class Client(Base):
is_active = Column(Boolean, default=True, nullable=False)
group_id = Column(Integer, ForeignKey(
'client_groups.id'), nullable=False, default=1)
# Health monitoring fields
current_event_id = Column(Integer, nullable=True)
current_process = Column(String(50), nullable=True) # 'vlc', 'chromium', 'pdf_viewer'
process_status = Column(Enum(ProcessStatus), nullable=True)
process_pid = Column(Integer, nullable=True)
last_screenshot_analyzed = Column(TIMESTAMP(timezone=True), nullable=True)
screen_health_status = Column(Enum(ScreenHealthStatus), nullable=True, server_default='UNKNOWN')
last_screenshot_hash = Column(String(32), nullable=True)
class ClientLog(Base):
__tablename__ = 'client_logs'
id = Column(Integer, primary_key=True, autoincrement=True)
client_uuid = Column(String(36), ForeignKey('clients.uuid', ondelete='CASCADE'), nullable=False, index=True)
timestamp = Column(TIMESTAMP(timezone=True), nullable=False, index=True)
level = Column(Enum(LogLevel), nullable=False, index=True)
message = Column(Text, nullable=False)
context = Column(Text, nullable=True) # JSON stored as text
created_at = Column(TIMESTAMP(timezone=True), server_default=func.current_timestamp(), nullable=False)
__table_args__ = (
Index('ix_client_logs_client_timestamp', 'client_uuid', 'timestamp'),
Index('ix_client_logs_level_timestamp', 'level', 'timestamp'),
)
class EventType(enum.Enum):

View File

@@ -306,6 +306,7 @@ def format_event_with_media(event):
"autoplay": getattr(event, "autoplay", True),
"loop": getattr(event, "loop", False),
"volume": getattr(event, "volume", 0.8),
"muted": getattr(event, "muted", False),
# Best-effort metadata to help clients decide how to stream
"mime_type": mime_type,
"size": size,

View File

@@ -0,0 +1,84 @@
"""add client monitoring tables and columns
Revision ID: c1d2e3f4g5h6
Revises: 4f0b8a3e5c20
Create Date: 2026-03-09 21:08:38.000000
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = 'c1d2e3f4g5h6'
down_revision = '4f0b8a3e5c20'
branch_labels = None
depends_on = None
def upgrade():
bind = op.get_bind()
inspector = sa.inspect(bind)
# 1. Add health monitoring columns to clients table (safe on rerun)
existing_client_columns = {c['name'] for c in inspector.get_columns('clients')}
if 'current_event_id' not in existing_client_columns:
op.add_column('clients', sa.Column('current_event_id', sa.Integer(), nullable=True))
if 'current_process' not in existing_client_columns:
op.add_column('clients', sa.Column('current_process', sa.String(50), nullable=True))
if 'process_status' not in existing_client_columns:
op.add_column('clients', sa.Column('process_status', sa.Enum('running', 'crashed', 'starting', 'stopped', name='processstatus'), nullable=True))
if 'process_pid' not in existing_client_columns:
op.add_column('clients', sa.Column('process_pid', sa.Integer(), nullable=True))
if 'last_screenshot_analyzed' not in existing_client_columns:
op.add_column('clients', sa.Column('last_screenshot_analyzed', sa.TIMESTAMP(timezone=True), nullable=True))
if 'screen_health_status' not in existing_client_columns:
op.add_column('clients', sa.Column('screen_health_status', sa.Enum('OK', 'BLACK', 'FROZEN', 'UNKNOWN', name='screenhealthstatus'), nullable=True, server_default='UNKNOWN'))
if 'last_screenshot_hash' not in existing_client_columns:
op.add_column('clients', sa.Column('last_screenshot_hash', sa.String(32), nullable=True))
# 2. Create client_logs table (safe on rerun)
if not inspector.has_table('client_logs'):
op.create_table('client_logs',
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('client_uuid', sa.String(36), nullable=False),
sa.Column('timestamp', sa.TIMESTAMP(timezone=True), nullable=False),
sa.Column('level', sa.Enum('ERROR', 'WARN', 'INFO', 'DEBUG', name='loglevel'), nullable=False),
sa.Column('message', sa.Text(), nullable=False),
sa.Column('context', sa.JSON(), nullable=True),
sa.Column('created_at', sa.TIMESTAMP(timezone=True), server_default=sa.func.current_timestamp(), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.ForeignKeyConstraint(['client_uuid'], ['clients.uuid'], ondelete='CASCADE'),
mysql_charset='utf8mb4',
mysql_collate='utf8mb4_unicode_ci',
mysql_engine='InnoDB'
)
# 3. Create indexes for efficient querying (safe on rerun)
client_log_indexes = {idx['name'] for idx in inspector.get_indexes('client_logs')} if inspector.has_table('client_logs') else set()
client_indexes = {idx['name'] for idx in inspector.get_indexes('clients')}
if 'ix_client_logs_client_timestamp' not in client_log_indexes:
op.create_index('ix_client_logs_client_timestamp', 'client_logs', ['client_uuid', 'timestamp'])
if 'ix_client_logs_level_timestamp' not in client_log_indexes:
op.create_index('ix_client_logs_level_timestamp', 'client_logs', ['level', 'timestamp'])
if 'ix_clients_process_status' not in client_indexes:
op.create_index('ix_clients_process_status', 'clients', ['process_status'])
def downgrade():
# Drop indexes
op.drop_index('ix_clients_process_status', table_name='clients')
op.drop_index('ix_client_logs_level_timestamp', table_name='client_logs')
op.drop_index('ix_client_logs_client_timestamp', table_name='client_logs')
# Drop table
op.drop_table('client_logs')
# Drop columns from clients
op.drop_column('clients', 'last_screenshot_hash')
op.drop_column('clients', 'screen_health_status')
op.drop_column('clients', 'last_screenshot_analyzed')
op.drop_column('clients', 'process_pid')
op.drop_column('clients', 'process_status')
op.drop_column('clients', 'current_process')
op.drop_column('clients', 'current_event_id')

View File

@@ -0,0 +1,491 @@
from flask import Blueprint, jsonify, request
from server.database import Session
from server.permissions import admin_or_higher, superadmin_only
from models.models import ClientLog, Client, ClientGroup, LogLevel
from sqlalchemy import desc, func
from datetime import datetime, timedelta, timezone
import json
import os
import glob
from server.serializers import dict_to_camel_case
client_logs_bp = Blueprint("client_logs", __name__, url_prefix="/api/client-logs")
PRIORITY_SCREENSHOT_TTL_SECONDS = int(os.environ.get("PRIORITY_SCREENSHOT_TTL_SECONDS", "120"))
def _grace_period_seconds():
env = os.environ.get("ENV", "production").lower()
if env in ("development", "dev"):
return int(os.environ.get("HEARTBEAT_GRACE_PERIOD_DEV", "180"))
return int(os.environ.get("HEARTBEAT_GRACE_PERIOD_PROD", "170"))
def _to_utc(dt):
if dt is None:
return None
if dt.tzinfo is None:
return dt.replace(tzinfo=timezone.utc)
return dt.astimezone(timezone.utc)
def _is_client_alive(last_alive, is_active):
if not last_alive or not is_active:
return False
return (datetime.now(timezone.utc) - _to_utc(last_alive)) <= timedelta(seconds=_grace_period_seconds())
def _safe_context(raw_context):
if not raw_context:
return {}
try:
return json.loads(raw_context)
except (TypeError, json.JSONDecodeError):
return {"raw": raw_context}
def _serialize_log_entry(log, include_client_uuid=False):
if not log:
return None
entry = {
"id": log.id,
"timestamp": log.timestamp.isoformat() if log.timestamp else None,
"level": log.level.value if log.level else None,
"message": log.message,
"context": _safe_context(log.context),
}
if include_client_uuid:
entry["client_uuid"] = log.client_uuid
return entry
def _determine_client_status(is_alive, process_status, screen_health_status, log_counts):
if not is_alive:
return "offline"
if process_status == "crashed" or screen_health_status in ("BLACK", "FROZEN"):
return "critical"
if log_counts.get("ERROR", 0) > 0:
return "critical"
if process_status in ("starting", "stopped") or log_counts.get("WARN", 0) > 0:
return "warning"
return "healthy"
def _infer_last_screenshot_ts(client_uuid):
screenshots_dir = os.path.join(os.path.dirname(__file__), "..", "screenshots")
candidate_files = []
latest_file = os.path.join(screenshots_dir, f"{client_uuid}.jpg")
if os.path.exists(latest_file):
candidate_files.append(latest_file)
candidate_files.extend(glob.glob(os.path.join(screenshots_dir, f"{client_uuid}_*.jpg")))
if not candidate_files:
return None
try:
newest_path = max(candidate_files, key=os.path.getmtime)
return datetime.fromtimestamp(os.path.getmtime(newest_path), timezone.utc)
except Exception:
return None
def _load_screenshot_metadata(client_uuid):
screenshots_dir = os.path.join(os.path.dirname(__file__), "..", "screenshots")
metadata_path = os.path.join(screenshots_dir, f"{client_uuid}_meta.json")
if not os.path.exists(metadata_path):
return {}
try:
with open(metadata_path, "r", encoding="utf-8") as metadata_file:
data = json.load(metadata_file)
return data if isinstance(data, dict) else {}
except Exception:
return {}
def _is_priority_screenshot_active(priority_received_at):
if not priority_received_at:
return False
try:
normalized = str(priority_received_at).replace("Z", "+00:00")
parsed = datetime.fromisoformat(normalized)
parsed_utc = _to_utc(parsed)
except Exception:
return False
return (datetime.now(timezone.utc) - parsed_utc) <= timedelta(seconds=PRIORITY_SCREENSHOT_TTL_SECONDS)
@client_logs_bp.route("/test", methods=["GET"])
def test_client_logs():
"""Test endpoint to verify logging infrastructure (no auth required)"""
session = Session()
try:
# Count total logs
total_logs = session.query(func.count(ClientLog.id)).scalar()
# Count by level
error_count = session.query(func.count(ClientLog.id)).filter_by(level=LogLevel.ERROR).scalar()
warn_count = session.query(func.count(ClientLog.id)).filter_by(level=LogLevel.WARN).scalar()
info_count = session.query(func.count(ClientLog.id)).filter_by(level=LogLevel.INFO).scalar()
# Get last 5 logs
recent_logs = session.query(ClientLog).order_by(desc(ClientLog.timestamp)).limit(5).all()
recent = []
for log in recent_logs:
recent.append({
"client_uuid": log.client_uuid,
"level": log.level.value if log.level else None,
"message": log.message,
"timestamp": log.timestamp.isoformat() if log.timestamp else None
})
session.close()
return jsonify({
"status": "ok",
"infrastructure": "working",
"total_logs": total_logs,
"counts": {
"ERROR": error_count,
"WARN": warn_count,
"INFO": info_count
},
"recent_5": recent
})
except Exception as e:
session.close()
return jsonify({"status": "error", "message": str(e)}), 500
@client_logs_bp.route("/<uuid>/logs", methods=["GET"])
@admin_or_higher
def get_client_logs(uuid):
"""
Get logs for a specific client
Query params:
- level: ERROR, WARN, INFO, DEBUG (optional)
- limit: number of entries (default 50, max 500)
- since: ISO timestamp (optional)
Example: /api/client-logs/abc-123/logs?level=ERROR&limit=100
"""
session = Session()
try:
# Verify client exists
client = session.query(Client).filter_by(uuid=uuid).first()
if not client:
session.close()
return jsonify({"error": "Client not found"}), 404
# Parse query parameters
level_param = request.args.get('level')
limit = min(int(request.args.get('limit', 50)), 500)
since_param = request.args.get('since')
# Build query
query = session.query(ClientLog).filter_by(client_uuid=uuid)
# Filter by log level
if level_param:
try:
level_enum = LogLevel[level_param.upper()]
query = query.filter_by(level=level_enum)
except KeyError:
session.close()
return jsonify({"error": f"Invalid level: {level_param}. Must be ERROR, WARN, INFO, or DEBUG"}), 400
# Filter by timestamp
if since_param:
try:
# Handle both with and without 'Z' suffix
since_str = since_param.replace('Z', '+00:00')
since_dt = datetime.fromisoformat(since_str)
if since_dt.tzinfo is None:
since_dt = since_dt.replace(tzinfo=timezone.utc)
query = query.filter(ClientLog.timestamp >= since_dt)
except ValueError:
session.close()
return jsonify({"error": "Invalid timestamp format. Use ISO 8601"}), 400
# Execute query
logs = query.order_by(desc(ClientLog.timestamp)).limit(limit).all()
# Format results
result = []
for log in logs:
result.append(_serialize_log_entry(log))
session.close()
return jsonify({
"client_uuid": uuid,
"logs": result,
"count": len(result),
"limit": limit
})
except Exception as e:
session.close()
return jsonify({"error": f"Server error: {str(e)}"}), 500
@client_logs_bp.route("/summary", methods=["GET"])
@admin_or_higher
def get_logs_summary():
"""
Get summary of errors/warnings across all clients in last 24 hours
Returns count of ERROR, WARN, INFO logs per client
Example response:
{
"summary": {
"client-uuid-1": {"ERROR": 5, "WARN": 12, "INFO": 45},
"client-uuid-2": {"ERROR": 0, "WARN": 3, "INFO": 20}
},
"period_hours": 24,
"timestamp": "2026-03-09T21:00:00Z"
}
"""
session = Session()
try:
# Get hours parameter (default 24, max 168 = 1 week)
hours = min(int(request.args.get('hours', 24)), 168)
since = datetime.now(timezone.utc) - timedelta(hours=hours)
# Query log counts grouped by client and level
stats = session.query(
ClientLog.client_uuid,
ClientLog.level,
func.count(ClientLog.id).label('count')
).filter(
ClientLog.timestamp >= since
).group_by(
ClientLog.client_uuid,
ClientLog.level
).all()
# Build summary dictionary
summary = {}
for stat in stats:
uuid = stat.client_uuid
if uuid not in summary:
# Initialize all levels to 0
summary[uuid] = {
"ERROR": 0,
"WARN": 0,
"INFO": 0,
"DEBUG": 0
}
summary[uuid][stat.level.value] = stat.count
# Get client info for enrichment
clients = session.query(Client.uuid, Client.hostname, Client.description).all()
client_info = {c.uuid: {"hostname": c.hostname, "description": c.description} for c in clients}
# Enrich summary with client info
enriched_summary = {}
for uuid, counts in summary.items():
enriched_summary[uuid] = {
"counts": counts,
"info": client_info.get(uuid, {})
}
session.close()
return jsonify({
"summary": enriched_summary,
"period_hours": hours,
"since": since.isoformat(),
"timestamp": datetime.now(timezone.utc).isoformat()
})
except Exception as e:
session.close()
return jsonify({"error": f"Server error: {str(e)}"}), 500
@client_logs_bp.route("/monitoring-overview", methods=["GET"])
@superadmin_only
def get_monitoring_overview():
"""Return a dashboard-friendly monitoring overview for all clients."""
session = Session()
try:
hours = min(int(request.args.get("hours", 24)), 168)
since = datetime.now(timezone.utc) - timedelta(hours=hours)
clients = (
session.query(Client, ClientGroup.name.label("group_name"))
.outerjoin(ClientGroup, Client.group_id == ClientGroup.id)
.order_by(ClientGroup.name.asc(), Client.description.asc(), Client.hostname.asc(), Client.uuid.asc())
.all()
)
log_stats = (
session.query(
ClientLog.client_uuid,
ClientLog.level,
func.count(ClientLog.id).label("count"),
)
.filter(ClientLog.timestamp >= since)
.group_by(ClientLog.client_uuid, ClientLog.level)
.all()
)
counts_by_client = {}
for stat in log_stats:
if stat.client_uuid not in counts_by_client:
counts_by_client[stat.client_uuid] = {
"ERROR": 0,
"WARN": 0,
"INFO": 0,
"DEBUG": 0,
}
counts_by_client[stat.client_uuid][stat.level.value] = stat.count
clients_payload = []
summary_counts = {
"total_clients": 0,
"online_clients": 0,
"offline_clients": 0,
"healthy_clients": 0,
"warning_clients": 0,
"critical_clients": 0,
"error_logs": 0,
"warn_logs": 0,
"active_priority_screenshots": 0,
}
for client, group_name in clients:
log_counts = counts_by_client.get(
client.uuid,
{"ERROR": 0, "WARN": 0, "INFO": 0, "DEBUG": 0},
)
is_alive = _is_client_alive(client.last_alive, client.is_active)
process_status = client.process_status.value if client.process_status else None
screen_health_status = client.screen_health_status.value if client.screen_health_status else None
status = _determine_client_status(is_alive, process_status, screen_health_status, log_counts)
latest_log = (
session.query(ClientLog)
.filter_by(client_uuid=client.uuid)
.order_by(desc(ClientLog.timestamp))
.first()
)
latest_error = (
session.query(ClientLog)
.filter_by(client_uuid=client.uuid, level=LogLevel.ERROR)
.order_by(desc(ClientLog.timestamp))
.first()
)
screenshot_ts = client.last_screenshot_analyzed or _infer_last_screenshot_ts(client.uuid)
screenshot_meta = _load_screenshot_metadata(client.uuid)
latest_screenshot_type = screenshot_meta.get("latest_screenshot_type") or "periodic"
priority_screenshot_type = screenshot_meta.get("last_priority_screenshot_type")
priority_screenshot_received_at = screenshot_meta.get("last_priority_received_at")
has_active_priority = _is_priority_screenshot_active(priority_screenshot_received_at)
screenshot_url = f"/screenshots/{client.uuid}/priority" if has_active_priority else f"/screenshots/{client.uuid}"
clients_payload.append({
"uuid": client.uuid,
"hostname": client.hostname,
"description": client.description,
"ip": client.ip,
"model": client.model,
"group_id": client.group_id,
"group_name": group_name,
"registration_time": client.registration_time.isoformat() if client.registration_time else None,
"last_alive": client.last_alive.isoformat() if client.last_alive else None,
"is_alive": is_alive,
"status": status,
"current_event_id": client.current_event_id,
"current_process": client.current_process,
"process_status": process_status,
"process_pid": client.process_pid,
"screen_health_status": screen_health_status,
"last_screenshot_analyzed": screenshot_ts.isoformat() if screenshot_ts else None,
"last_screenshot_hash": client.last_screenshot_hash,
"latest_screenshot_type": latest_screenshot_type,
"priority_screenshot_type": priority_screenshot_type,
"priority_screenshot_received_at": priority_screenshot_received_at,
"has_active_priority_screenshot": has_active_priority,
"screenshot_url": screenshot_url,
"log_counts_24h": {
"error": log_counts["ERROR"],
"warn": log_counts["WARN"],
"info": log_counts["INFO"],
"debug": log_counts["DEBUG"],
},
"latest_log": _serialize_log_entry(latest_log),
"latest_error": _serialize_log_entry(latest_error),
})
summary_counts["total_clients"] += 1
summary_counts["error_logs"] += log_counts["ERROR"]
summary_counts["warn_logs"] += log_counts["WARN"]
if has_active_priority:
summary_counts["active_priority_screenshots"] += 1
if is_alive:
summary_counts["online_clients"] += 1
else:
summary_counts["offline_clients"] += 1
if status == "healthy":
summary_counts["healthy_clients"] += 1
elif status == "warning":
summary_counts["warning_clients"] += 1
elif status == "critical":
summary_counts["critical_clients"] += 1
payload = {
"summary": summary_counts,
"period_hours": hours,
"grace_period_seconds": _grace_period_seconds(),
"since": since.isoformat(),
"timestamp": datetime.now(timezone.utc).isoformat(),
"clients": clients_payload,
}
session.close()
return jsonify(dict_to_camel_case(payload))
except Exception as e:
session.close()
return jsonify({"error": f"Server error: {str(e)}"}), 500
@client_logs_bp.route("/recent-errors", methods=["GET"])
@admin_or_higher
def get_recent_errors():
"""
Get recent ERROR logs across all clients
Query params:
- limit: number of entries (default 20, max 100)
Useful for system-wide error monitoring
"""
session = Session()
try:
limit = min(int(request.args.get('limit', 20)), 100)
# Get recent errors from all clients
logs = session.query(ClientLog).filter_by(
level=LogLevel.ERROR
).order_by(
desc(ClientLog.timestamp)
).limit(limit).all()
result = []
for log in logs:
result.append(_serialize_log_entry(log, include_client_uuid=True))
session.close()
return jsonify({
"errors": result,
"count": len(result)
})
except Exception as e:
session.close()
return jsonify({"error": f"Server error: {str(e)}"}), 500

View File

@@ -4,10 +4,58 @@ from flask import Blueprint, request, jsonify
from server.permissions import admin_or_higher
from server.mqtt_helper import publish_client_group, delete_client_group_message, publish_multiple_client_groups
import sys
import os
import glob
import base64
import hashlib
import json
from datetime import datetime, timezone
sys.path.append('/workspace')
clients_bp = Blueprint("clients", __name__, url_prefix="/api/clients")
VALID_SCREENSHOT_TYPES = {"periodic", "event_start", "event_stop"}
def _normalize_screenshot_type(raw_type):
if raw_type is None:
return "periodic"
normalized = str(raw_type).strip().lower()
if normalized in VALID_SCREENSHOT_TYPES:
return normalized
return "periodic"
def _parse_screenshot_timestamp(raw_timestamp):
if raw_timestamp is None:
return None
try:
if isinstance(raw_timestamp, (int, float)):
ts_value = float(raw_timestamp)
if ts_value > 1e12:
ts_value = ts_value / 1000.0
return datetime.fromtimestamp(ts_value, timezone.utc)
if isinstance(raw_timestamp, str):
ts = raw_timestamp.strip()
if not ts:
return None
if ts.isdigit():
ts_value = float(ts)
if ts_value > 1e12:
ts_value = ts_value / 1000.0
return datetime.fromtimestamp(ts_value, timezone.utc)
ts_normalized = ts.replace("Z", "+00:00") if ts.endswith("Z") else ts
parsed = datetime.fromisoformat(ts_normalized)
if parsed.tzinfo is None:
return parsed.replace(tzinfo=timezone.utc)
return parsed.astimezone(timezone.utc)
except Exception:
return None
return None
@clients_bp.route("/sync-all-groups", methods=["POST"])
@admin_or_higher
@@ -281,24 +329,24 @@ def upload_screenshot(uuid):
Screenshots are stored as {uuid}.jpg in the screenshots folder.
Keeps last 20 screenshots per client (auto-cleanup).
"""
import os
import base64
import glob
from datetime import datetime
session = Session()
client = session.query(Client).filter_by(uuid=uuid).first()
if not client:
session.close()
return jsonify({"error": "Client nicht gefunden"}), 404
session.close()
try:
screenshot_timestamp = None
screenshot_type = "periodic"
# Handle JSON payload with base64-encoded image
if request.is_json:
data = request.get_json()
if "image" not in data:
return jsonify({"error": "Missing 'image' field in JSON payload"}), 400
screenshot_timestamp = _parse_screenshot_timestamp(data.get("timestamp"))
screenshot_type = _normalize_screenshot_type(data.get("screenshot_type") or data.get("screenshotType"))
# Decode base64 image
image_data = base64.b64decode(data["image"])
@@ -314,8 +362,9 @@ def upload_screenshot(uuid):
os.makedirs(screenshots_dir, exist_ok=True)
# Store screenshot with timestamp to track latest
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"{uuid}_{timestamp}.jpg"
now_utc = screenshot_timestamp or datetime.now(timezone.utc)
timestamp = now_utc.strftime("%Y%m%d_%H%M%S_%f")
filename = f"{uuid}_{timestamp}_{screenshot_type}.jpg"
filepath = os.path.join(screenshots_dir, filename)
with open(filepath, "wb") as f:
@@ -326,9 +375,42 @@ def upload_screenshot(uuid):
with open(latest_filepath, "wb") as f:
f.write(image_data)
# Keep a dedicated copy for high-priority event screenshots.
if screenshot_type in ("event_start", "event_stop"):
priority_filepath = os.path.join(screenshots_dir, f"{uuid}_priority.jpg")
with open(priority_filepath, "wb") as f:
f.write(image_data)
metadata_path = os.path.join(screenshots_dir, f"{uuid}_meta.json")
metadata = {}
if os.path.exists(metadata_path):
try:
with open(metadata_path, "r", encoding="utf-8") as meta_file:
metadata = json.load(meta_file)
except Exception:
metadata = {}
metadata.update({
"latest_screenshot_type": screenshot_type,
"latest_received_at": now_utc.isoformat(),
})
if screenshot_type in ("event_start", "event_stop"):
metadata["last_priority_screenshot_type"] = screenshot_type
metadata["last_priority_received_at"] = now_utc.isoformat()
with open(metadata_path, "w", encoding="utf-8") as meta_file:
json.dump(metadata, meta_file)
# Update screenshot receive timestamp for monitoring dashboard
client.last_screenshot_analyzed = now_utc
client.last_screenshot_hash = hashlib.md5(image_data).hexdigest()
session.commit()
# Cleanup: keep only last 20 timestamped screenshots per client
pattern = os.path.join(screenshots_dir, f"{uuid}_*.jpg")
existing_screenshots = sorted(glob.glob(pattern))
existing_screenshots = sorted(
[path for path in glob.glob(pattern) if not path.endswith("_priority.jpg")]
)
# Keep last 20, delete older ones
max_screenshots = 20
@@ -345,11 +427,15 @@ def upload_screenshot(uuid):
"success": True,
"message": f"Screenshot received for client {uuid}",
"filename": filename,
"size": len(image_data)
"size": len(image_data),
"screenshot_type": screenshot_type,
}), 200
except Exception as e:
session.rollback()
return jsonify({"error": f"Failed to process screenshot: {str(e)}"}), 500
finally:
session.close()
@clients_bp.route("/<uuid>", methods=["DELETE"])

View File

@@ -104,6 +104,9 @@ def get_events():
"end_time": e.end.isoformat() if e.end else None,
"is_all_day": False,
"media_id": e.event_media_id,
"slideshow_interval": e.slideshow_interval,
"page_progress": e.page_progress,
"auto_progress": e.auto_progress,
"type": e.event_type.value if e.event_type else None,
"icon": get_icon_for_type(e.event_type.value if e.event_type else None),
# Recurrence metadata
@@ -267,6 +270,8 @@ def detach_event_occurrence(event_id, occurrence_date):
'event_type': master.event_type,
'event_media_id': master.event_media_id,
'slideshow_interval': getattr(master, 'slideshow_interval', None),
'page_progress': getattr(master, 'page_progress', None),
'auto_progress': getattr(master, 'auto_progress', None),
'created_by': master.created_by,
}
@@ -318,6 +323,8 @@ def detach_event_occurrence(event_id, occurrence_date):
event_type=master_data['event_type'],
event_media_id=master_data['event_media_id'],
slideshow_interval=master_data['slideshow_interval'],
page_progress=data.get("page_progress", master_data['page_progress']),
auto_progress=data.get("auto_progress", master_data['auto_progress']),
recurrence_rule=None,
recurrence_end=None,
skip_holidays=False,
@@ -361,11 +368,15 @@ def create_event():
event_type = data["event_type"]
event_media_id = None
slideshow_interval = None
page_progress = None
auto_progress = None
# Präsentation: event_media_id und slideshow_interval übernehmen
if event_type == "presentation":
event_media_id = data.get("event_media_id")
slideshow_interval = data.get("slideshow_interval")
page_progress = data.get("page_progress")
auto_progress = data.get("auto_progress")
if not event_media_id:
return jsonify({"error": "event_media_id required for presentation"}), 400
@@ -443,6 +454,8 @@ def create_event():
is_active=True,
event_media_id=event_media_id,
slideshow_interval=slideshow_interval,
page_progress=page_progress,
auto_progress=auto_progress,
autoplay=autoplay,
loop=loop,
volume=volume,
@@ -519,6 +532,10 @@ def update_event(event_id):
event.event_type = data.get("event_type", event.event_type)
event.event_media_id = data.get("event_media_id", event.event_media_id)
event.slideshow_interval = data.get("slideshow_interval", event.slideshow_interval)
if "page_progress" in data:
event.page_progress = data.get("page_progress")
if "auto_progress" in data:
event.auto_progress = data.get("auto_progress")
# Video-specific fields
if "autoplay" in data:
event.autoplay = data.get("autoplay")

View File

@@ -8,6 +8,7 @@ from server.routes.holidays import holidays_bp
from server.routes.academic_periods import academic_periods_bp
from server.routes.groups import groups_bp
from server.routes.clients import clients_bp
from server.routes.client_logs import client_logs_bp
from server.routes.auth import auth_bp
from server.routes.users import users_bp
from server.routes.system_settings import system_settings_bp
@@ -46,6 +47,7 @@ else:
app.register_blueprint(auth_bp)
app.register_blueprint(users_bp)
app.register_blueprint(clients_bp)
app.register_blueprint(client_logs_bp)
app.register_blueprint(groups_bp)
app.register_blueprint(events_bp)
app.register_blueprint(event_exceptions_bp)
@@ -66,13 +68,31 @@ def index():
return "Hello from InfoscreenAPI!"
@app.route("/screenshots/<uuid>/priority")
def get_priority_screenshot(uuid):
normalized_uuid = uuid[:-4] if uuid.lower().endswith('.jpg') else uuid
priority_filename = f"{normalized_uuid}_priority.jpg"
priority_path = os.path.join("screenshots", priority_filename)
if os.path.exists(priority_path):
return send_from_directory("screenshots", priority_filename)
return get_screenshot(uuid)
@app.route("/screenshots/<uuid>")
@app.route("/screenshots/<uuid>.jpg")
def get_screenshot(uuid):
pattern = os.path.join("screenshots", f"{uuid}*.jpg")
normalized_uuid = uuid[:-4] if uuid.lower().endswith('.jpg') else uuid
latest_filename = f"{normalized_uuid}.jpg"
latest_path = os.path.join("screenshots", latest_filename)
if os.path.exists(latest_path):
return send_from_directory("screenshots", latest_filename)
pattern = os.path.join("screenshots", f"{normalized_uuid}_*.jpg")
files = glob.glob(pattern)
if not files:
# Dummy-Bild als Redirect oder direkt als Response
return jsonify({"error": "Screenshot not found", "dummy": "https://placehold.co/400x300?text=No+Screenshot"}), 404
files.sort(reverse=True)
filename = os.path.basename(files[0])
return send_from_directory("screenshots", filename)