feat(monitoring): complete monitoring pipeline and fix presentation flag persistence
add superadmin monitoring dashboard with protected route, menu entry, and monitoring data client add monitoring overview API endpoint and improve log serialization/aggregation for dashboard use extend listener health/log handling with robust status/event/timestamp normalization and screenshot payload extraction improve screenshot persistence and retrieval (timestamp-aware uploads, latest screenshot endpoint fallback) fix page_progress and auto_progress persistence/serialization across create, update, and detached occurrence flows align technical and project docs to reflect implemented monitoring and no-version-bump backend changes add documentation sync log entry and include minor compose env indentation cleanup
This commit is contained in:
20
.github/copilot-instructions.md
vendored
20
.github/copilot-instructions.md
vendored
@@ -34,6 +34,7 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
|
||||
- `dashboard/src/settings.tsx` — settings UI (nested tabs; system defaults for presentations and videos)
|
||||
- `dashboard/src/ressourcen.tsx` — timeline view showing all groups' active events in parallel
|
||||
- `dashboard/src/ressourcen.css` — timeline and resource view styling
|
||||
- `dashboard/src/monitoring.tsx` — superadmin-only monitoring dashboard for client health, screenshots, and logs
|
||||
|
||||
|
||||
|
||||
@@ -54,7 +55,16 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
|
||||
|
||||
## Recent changes since last commit
|
||||
|
||||
### Latest (January 2026)
|
||||
### Latest (March 2026)
|
||||
|
||||
- **Monitoring System Completion (no version bump)**:
|
||||
- End-to-end monitoring pipeline completed: MQTT logs/health → listener persistence → monitoring APIs → superadmin dashboard
|
||||
- API now serves aggregated monitoring via `GET /api/client-logs/monitoring-overview` and system-wide recent errors via `GET /api/client-logs/recent-errors`
|
||||
- Monitoring dashboard (`dashboard/src/monitoring.tsx`) is active and displays client health states, screenshots, process metadata, and recent log activity
|
||||
- **Presentation Flags Persistence Fix**:
|
||||
- Fixed persistence for presentation `page_progress` and `auto_progress` to ensure values are reliably stored and returned across create/update paths and detached occurrences
|
||||
|
||||
### Earlier (January 2026)
|
||||
|
||||
- **Ressourcen Page (Timeline View)**:
|
||||
- New 'Ressourcen' page with parallel timeline view showing active events for all room groups
|
||||
@@ -258,6 +268,12 @@ Keep docs synced with code. When you change services/MQTT/API/UTC/env or dev/pro
|
||||
- API client in `dashboard/src/apiUsers.ts` for all user operations (listUsers, getUser, createUser, updateUser, resetUserPassword, deleteUser)
|
||||
- Menu visibility: "Benutzer" menu item only visible to admin+ (role-gated in App.tsx)
|
||||
|
||||
- Monitoring page (`dashboard/src/monitoring.tsx`):
|
||||
- Superadmin-only dashboard for client monitoring and diagnostics; menu item is hidden for lower roles and the route redirects non-superadmins.
|
||||
- Uses `GET /api/client-logs/monitoring-overview` for aggregated live status, `GET /api/client-logs/recent-errors` for system-wide errors, and `GET /api/client-logs/<uuid>/logs` for per-client details.
|
||||
- Shows per-client status (`healthy`, `warning`, `critical`, `offline`) based on heartbeat freshness, process state, screen state, and recent log counts.
|
||||
- Displays latest screenshot preview from `/screenshots/{uuid}.jpg`, current process metadata, and recent ERROR/WARN activity.
|
||||
|
||||
- Settings page (`dashboard/src/settings.tsx`):
|
||||
- Structure: Syncfusion TabComponent with role-gated tabs
|
||||
- 📅 Academic Calendar (all users)
|
||||
@@ -377,7 +393,7 @@ Docs maintenance guardrails (solo-friendly): Update this file alongside code cha
|
||||
- Add client description persists to DB and publishes group via MQTT: see `PUT /api/clients/<uuid>/description` in `routes/clients.py`.
|
||||
- Bulk group assignment emits retained messages for each client: `PUT /api/clients/group`.
|
||||
- Listener heartbeat path: `infoscreen/<uuid>/heartbeat` → sets `clients.last_alive` and captures process health data.
|
||||
- Client monitoring flow: Client publishes to `infoscreen/{uuid}/logs/error` → listener stores in `client_logs` table → API serves via `/api/client-logs/<uuid>/logs` → dashboard displays (Phase 4, pending).
|
||||
- Client monitoring flow: Client publishes to `infoscreen/{uuid}/logs/error` and `infoscreen/{uuid}/health` → listener stores/updates monitoring state → API serves `/api/client-logs/monitoring-overview`, `/api/client-logs/recent-errors`, and `/api/client-logs/<uuid>/logs` → superadmin monitoring dashboard displays live status.
|
||||
|
||||
## Scheduler payloads: presentation extras
|
||||
- Presentation event payloads now include `page_progress` and `auto_progress` in addition to `slide_interval` and media files. These are sourced from per-event fields in the database (with system defaults applied on event creation).
|
||||
|
||||
@@ -98,3 +98,6 @@ exit 0 # warn only; do not block commit
|
||||
- MQTT workers: `listener/listener.py`, `scheduler/scheduler.py`, `server/mqtt_helper.py`
|
||||
- Frontend: `dashboard/vite.config.ts`, `dashboard/package.json`, `dashboard/src/*`
|
||||
- Dev/Prod docs: `deployment.md`, `.env.example`
|
||||
|
||||
## Documentation sync log
|
||||
- 2026-03-24: Synced docs for completed monitoring rollout and presentation flag persistence fix (`page_progress` / `auto_progress`). Updated `.github/copilot-instructions.md`, `README.md`, `TECH-CHANGELOG.md`, `DEV-CHANGELOG.md`, and `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` without a user-version bump.
|
||||
|
||||
@@ -32,7 +32,7 @@ alembic upgrade head
|
||||
---
|
||||
|
||||
## 🔧 Phase 2: Server-Side Backend Logic
|
||||
**Status:** 🚧 IN PROGRESS
|
||||
**Status:** ✅ COMPLETE
|
||||
**Dependencies:** Phase 1 complete
|
||||
**Time estimate:** 2-3 hours
|
||||
|
||||
@@ -253,7 +253,7 @@ def on_message(client, userdata, message):
|
||||
---
|
||||
|
||||
## 🖥️ Phase 3: Client-Side Implementation
|
||||
**Status:** ⏳ PENDING (After Phase 2)
|
||||
**Status:** ✅ COMPLETE
|
||||
**Dependencies:** Phase 2 complete
|
||||
**Time estimate:** 3-4 hours
|
||||
|
||||
@@ -450,7 +450,7 @@ def send_heartbeat(mqtt_client, uuid):
|
||||
---
|
||||
|
||||
## 🎨 Phase 4: Dashboard UI Integration
|
||||
**Status:** ⏳ PENDING (After Phase 3)
|
||||
**Status:** ✅ COMPLETE
|
||||
**Dependencies:** Phases 2 & 3 complete
|
||||
**Time estimate:** 2-3 hours
|
||||
|
||||
@@ -619,7 +619,7 @@ export const SystemMonitor: React.FC = () => {
|
||||
---
|
||||
|
||||
## 🧪 Phase 5: Testing & Validation
|
||||
**Status:** ⏳ PENDING
|
||||
**Status:** ✅ COMPLETE
|
||||
**Dependencies:** All previous phases
|
||||
**Time estimate:** 1-2 hours
|
||||
|
||||
@@ -739,19 +739,19 @@ export WATCHDOG_ENABLED=true
|
||||
|
||||
## ✅ Completion Checklist
|
||||
|
||||
- [ ] Phase 1: Database migration applied
|
||||
- [ ] Phase 2: Listener extended for log topics
|
||||
- [ ] Phase 2: API endpoints created and tested
|
||||
- [ ] Phase 3: Client watchdog implemented
|
||||
- [ ] Phase 3: Enhanced heartbeat deployed
|
||||
- [ ] Phase 4: Dashboard log viewer working
|
||||
- [ ] Phase 4: Health indicators visible
|
||||
- [ ] Phase 5: End-to-end testing complete
|
||||
- [ ] Documentation updated with new features
|
||||
- [ ] Production deployment plan created
|
||||
- [x] Phase 1: Database migration applied
|
||||
- [x] Phase 2: Listener extended for log topics
|
||||
- [x] Phase 2: API endpoints created and tested
|
||||
- [x] Phase 3: Client watchdog implemented
|
||||
- [x] Phase 3: Enhanced heartbeat deployed
|
||||
- [x] Phase 4: Dashboard log viewer working
|
||||
- [x] Phase 4: Health indicators visible
|
||||
- [x] Phase 5: End-to-end testing complete
|
||||
- [x] Documentation updated with new features
|
||||
- [x] Production deployment plan created
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2026-03-09
|
||||
**Last Updated:** 2026-03-24
|
||||
**Author:** GitHub Copilot
|
||||
**For:** Infoscreen 2025 Project
|
||||
|
||||
@@ -50,6 +50,13 @@ Each infoscreen client must implement health monitoring and logging capabilities
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### 1.3 Current Compatibility Notes
|
||||
- The server now accepts both the original specification payloads and the currently implemented Phase 3 client payloads.
|
||||
- `infoscreen/{uuid}/health` may currently contain a reduced payload with only `expected_state.event_id` and `actual_state.process|pid|status`. Additional `health_metrics` fields from this specification remain recommended.
|
||||
- `event_id` is still specified as an integer. For compatibility with the current Phase 3 client, the server also tolerates string values such as `event_123` and extracts the numeric suffix where possible.
|
||||
- If the client sends `process_health` inside `infoscreen/{uuid}/dashboard`, the server treats it as a fallback source for `current_process`, `process_pid`, `process_status`, and `current_event_id`.
|
||||
- Long term, the preferred client payload remains the structure in this specification so the server can surface richer monitoring data such as screen state and resource metrics.
|
||||
|
||||
---
|
||||
|
||||
## 2. MQTT Protocol Specification
|
||||
|
||||
@@ -5,6 +5,10 @@ This changelog tracks all changes made in the development workspace, including i
|
||||
---
|
||||
|
||||
## Unreleased (development workspace)
|
||||
- Monitoring system completion: End-to-end monitoring pipeline is active (MQTT logs/health → listener persistence → monitoring APIs → superadmin dashboard).
|
||||
- Monitoring API: Added/active endpoints `GET /api/client-logs/monitoring-overview` and `GET /api/client-logs/recent-errors`; per-client logs via `GET /api/client-logs/<uuid>/logs`.
|
||||
- Dashboard monitoring UI: Superadmin monitoring page is integrated and displays client health status, screenshots, process metadata, and recent error activity.
|
||||
- Bugfix: Presentation flags `page_progress` and `auto_progress` now persist reliably across create/update and detached-occurrence flows.
|
||||
- Frontend (Settings → Events): Added Presentations defaults (slideshow interval, page-progress, auto-progress) with load/save via `/api/system-settings`; UI uses Syncfusion controls.
|
||||
- Backend defaults: Seeded `presentation_interval` ("10"), `presentation_page_progress` ("true"), `presentation_auto_progress` ("true") in `server/init_defaults.py` when missing.
|
||||
- Data model: Added per-event fields `page_progress` and `auto_progress` on `Event`; Alembic migration applied successfully.
|
||||
|
||||
533
PHASE_3_CLIENT_MONITORING_IMPLEMENTATION.md
Normal file
533
PHASE_3_CLIENT_MONITORING_IMPLEMENTATION.md
Normal file
@@ -0,0 +1,533 @@
|
||||
# Phase 3: Client-Side Monitoring Implementation
|
||||
|
||||
**Status**: ✅ COMPLETE
|
||||
**Date**: 11. März 2026
|
||||
**Architecture**: Two-process design with health-state bridge
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the **Phase 3** client-side monitoring implementation integrated into the existing infoscreen-dev codebase. The implementation adds:
|
||||
|
||||
1. ✅ **Health-state tracking** for all display processes (Impressive, Chromium, VLC)
|
||||
2. ✅ **Tiered logging**: Local rotating logs + selective MQTT transmission
|
||||
3. ✅ **Process crash detection** with bounded restart attempts
|
||||
4. ✅ **MQTT health/log topics** feeding the monitoring server
|
||||
5. ✅ **Impressive-aware process mapping** (presentations → impressive, websites → chromium, videos → vlc)
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Two-Process Design
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ simclient.py (MQTT Client) │
|
||||
│ - Discovers device, sends heartbeat │
|
||||
│ - Downloads presentation files │
|
||||
│ - Reads health state from display_manager │
|
||||
│ - Publishes health/log messages to MQTT │
|
||||
│ - Sends screenshots for dashboard │
|
||||
└────────┬────────────────────────────────────┬───────────┘
|
||||
│ │
|
||||
│ reads: current_process_health.json │
|
||||
│ │
|
||||
│ writes: current_event.json │
|
||||
│ │
|
||||
┌────────▼────────────────────────────────────▼───────────┐
|
||||
│ display_manager.py (Display Control) │
|
||||
│ - Monitors events and manages displays │
|
||||
│ - Launches Impressive (presentations) │
|
||||
│ - Launches Chromium (websites) │
|
||||
│ - Launches VLC (videos) │
|
||||
│ - Tracks process health and crashes │
|
||||
│ - Detects and restarts crashed processes │
|
||||
│ - Writes health state to JSON bridge │
|
||||
│ - Captures screenshots to shared folder │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. Health State Tracking (display_manager.py)
|
||||
|
||||
**File**: `src/display_manager.py`
|
||||
**New Class**: `ProcessHealthState`
|
||||
|
||||
Tracks process health and persists to JSON for simclient to read:
|
||||
|
||||
```python
|
||||
class ProcessHealthState:
|
||||
"""Track and persist process health state for monitoring integration"""
|
||||
|
||||
- event_id: Currently active event identifier
|
||||
- event_type: presentation, website, video, or None
|
||||
- process_name: impressive, chromium-browser, vlc, or None
|
||||
- process_pid: Process ID or None for libvlc
|
||||
- status: running, crashed, starting, stopped
|
||||
- restart_count: Number of restart attempts
|
||||
- max_restarts: Maximum allowed restarts (3)
|
||||
```
|
||||
|
||||
Methods:
|
||||
- `update_running()` - Mark process as started (logs to monitoring.log)
|
||||
- `update_crashed()` - Mark process as crashed (warning to monitoring.log)
|
||||
- `update_restart_attempt()` - Increment restart counter (logs attempt and checks max)
|
||||
- `update_stopped()` - Mark process as stopped (info to monitoring.log)
|
||||
- `save()` - Persist state to `src/current_process_health.json`
|
||||
|
||||
**New Health State File**: `src/current_process_health.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"event_id": "event_123",
|
||||
"event_type": "presentation",
|
||||
"current_process": "impressive",
|
||||
"process_pid": 1234,
|
||||
"process_status": "running",
|
||||
"restart_count": 0,
|
||||
"timestamp": "2026-03-11T10:30:45.123456+00:00"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Monitoring Logger (both files)
|
||||
|
||||
**Local Rotating Logs**: 5 files × 5 MB each = 25 MB max per device
|
||||
|
||||
**display_manager.py**:
|
||||
```python
|
||||
MONITORING_LOG_PATH = "logs/monitoring.log"
|
||||
monitoring_logger = logging.getLogger("monitoring")
|
||||
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
|
||||
```
|
||||
|
||||
**simclient.py**:
|
||||
- Shares same `logs/monitoring.log` file
|
||||
- Both processes write to monitoring logger for health events
|
||||
- Local logs never rotate (persisted for technician inspection)
|
||||
|
||||
**Log Filtering** (tiered strategy):
|
||||
- **ERROR**: Local + MQTT (published to `infoscreen/{uuid}/logs/error`)
|
||||
- **WARN**: Local + MQTT (published to `infoscreen/{uuid}/logs/warn`)
|
||||
- **INFO**: Local only (unless `DEBUG_MODE=1`)
|
||||
- **DEBUG**: Local only (always)
|
||||
|
||||
### 3. Process Mapping with Impressive Support
|
||||
|
||||
**display_manager.py** - When starting processes:
|
||||
|
||||
| Event Type | Process Name | Health Status |
|
||||
|-----------|--------------|---------------|
|
||||
| presentation | `impressive` | tracked with PID |
|
||||
| website/webpage/webuntis | `chromium` or `chromium-browser` | tracked with PID |
|
||||
| video | `vlc` | tracked (may have no PID if using libvlc) |
|
||||
|
||||
**Per-Process Updates**:
|
||||
- Presentation: `health.update_running('event_id', 'presentation', 'impressive', pid)`
|
||||
- Website: `health.update_running('event_id', 'website', browser_name, pid)`
|
||||
- Video: `health.update_running('event_id', 'video', 'vlc', pid or None)`
|
||||
|
||||
### 4. Crash Detection and Restart Logic
|
||||
|
||||
**display_manager.py** - `process_events()` method:
|
||||
|
||||
```
|
||||
If process not running AND same event_id:
|
||||
├─ Check exit code
|
||||
├─ If presentation with exit code 0: Normal completion (no restart)
|
||||
├─ Else: Mark crashed
|
||||
│ ├─ health.update_crashed()
|
||||
│ └─ health.update_restart_attempt()
|
||||
│ ├─ If restart_count > max_restarts: Give up
|
||||
│ └─ Else: Restart display (loop back to start_display_for_event)
|
||||
└─ Log to monitoring.log at each step
|
||||
```
|
||||
|
||||
**Restart Logic**:
|
||||
- Max 3 restart attempts per event
|
||||
- Restarts only if same event still active
|
||||
- Graceful exit (code 0) for Impressive auto-quit presentations is treated as normal
|
||||
- All crashes logged to monitoring.log with context
|
||||
|
||||
### 5. MQTT Health and Log Topics
|
||||
|
||||
**simclient.py** - New functions:
|
||||
|
||||
**`read_health_state()`**
|
||||
- Reads `src/current_process_health.json` written by display_manager
|
||||
- Returns dict or None if no active process
|
||||
|
||||
**`publish_health_message(client, client_id)`**
|
||||
- Topic: `infoscreen/{uuid}/health`
|
||||
- QoS: 1 (reliable)
|
||||
- Payload:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-03-11T10:30:45.123456+00:00",
|
||||
"expected_state": {
|
||||
"event_id": "event_123"
|
||||
},
|
||||
"actual_state": {
|
||||
"process": "impressive",
|
||||
"pid": 1234,
|
||||
"status": "running"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**`publish_log_message(client, client_id, level, message, context)`**
|
||||
- Topics: `infoscreen/{uuid}/logs/error` or `infoscreen/{uuid}/logs/warn`
|
||||
- QoS: 1 (reliable)
|
||||
- Log level filtering (only ERROR/WARN sent unless DEBUG_MODE=1)
|
||||
- Payload:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-03-11T10:30:45.123456+00:00",
|
||||
"message": "Process started: event_id=123 event_type=presentation process=impressive pid=1234",
|
||||
"context": {
|
||||
"event_id": "event_123",
|
||||
"process": "impressive",
|
||||
"event_type": "presentation"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Enhanced Dashboard Heartbeat**:
|
||||
- Topic: `infoscreen/{uuid}/dashboard`
|
||||
- Now includes `process_health` block with event_id, process name, status, restart count
|
||||
|
||||
### 6. Integration Points
|
||||
|
||||
**Existing Features Preserved**:
|
||||
- ✅ Impressive PDF presentations with auto-advance and loop
|
||||
- ✅ Chromium website display with auto-scroll injection
|
||||
- ✅ VLC video playback (python-vlc preferred, binary fallback)
|
||||
- ✅ Screenshot capture and transmission
|
||||
- ✅ HDMI-CEC TV control
|
||||
- ✅ Two-process architecture
|
||||
|
||||
**New Integration Points**:
|
||||
|
||||
| File | Function | Change |
|
||||
|------|----------|--------|
|
||||
| display_manager.py | `__init__()` | Initialize `ProcessHealthState()` |
|
||||
| display_manager.py | `start_presentation()` | Call `health.update_running()` with impressive |
|
||||
| display_manager.py | `start_video()` | Call `health.update_running()` with vlc |
|
||||
| display_manager.py | `start_webpage()` | Call `health.update_running()` with chromium |
|
||||
| display_manager.py | `process_events()` | Detect crashes, call `health.update_crashed()` and `update_restart_attempt()` |
|
||||
| display_manager.py | `stop_current_display()` | Call `health.update_stopped()` |
|
||||
| simclient.py | `screenshot_service_thread()` | (No changes to interval) |
|
||||
| simclient.py | Main heartbeat loop | Call `publish_health_message()` after successful heartbeat |
|
||||
| simclient.py | `send_screenshot_heartbeat()` | Read health state and include in dashboard payload |
|
||||
|
||||
---
|
||||
|
||||
## Logging Hierarchy
|
||||
|
||||
### Local Rotating Files (5 × 5 MB)
|
||||
|
||||
**`logs/display_manager.log`** (existing - updated):
|
||||
- Display event processing
|
||||
- Process lifecycle (start/stop)
|
||||
- HDMI-CEC operations
|
||||
- Presentation status
|
||||
- Video/website startup
|
||||
|
||||
**`logs/simclient.log`** (existing - updated):
|
||||
- MQTT connection/reconnection
|
||||
- Discovery and heartbeat
|
||||
- File downloads
|
||||
- Group membership changes
|
||||
- Dashboard payload info
|
||||
|
||||
**`logs/monitoring.log`** (NEW):
|
||||
- Process health events (start, crash, restart, stop)
|
||||
- Both display_manager and simclient write here
|
||||
- Centralized health tracking
|
||||
- Technician-focused: "What happened to the processes?"
|
||||
|
||||
```
|
||||
# Example monitoring.log entries:
|
||||
2026-03-11 10:30:45 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1234
|
||||
2026-03-11 10:35:20 [WARNING] Process crashed: event_id=event_123 event_type=presentation process=impressive restart_count=0/3
|
||||
2026-03-11 10:35:20 [WARNING] Restarting process: attempt 1/3 for impressive
|
||||
2026-03-11 10:35:25 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1245
|
||||
```
|
||||
|
||||
### MQTT Transmission (Selective)
|
||||
|
||||
**Always sent** (when error occurs):
|
||||
- `infoscreen/{uuid}/logs/error` - Critical failures
|
||||
- `infoscreen/{uuid}/logs/warn` - Restarts, crashes, missing binaries
|
||||
|
||||
**Development mode only** (if DEBUG_MODE=1):
|
||||
- `infoscreen/{uuid}/logs/info` - Event start/stop, process running status
|
||||
|
||||
**Never sent**:
|
||||
- DEBUG messages (local-only debug details)
|
||||
- INFO messages in production
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
No new required variables. Existing configuration supports monitoring:
|
||||
|
||||
```bash
|
||||
# Existing (unchanged):
|
||||
ENV=development|production
|
||||
DEBUG_MODE=0|1 # Enables INFO logs to MQTT
|
||||
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR # Local log verbosity
|
||||
HEARTBEAT_INTERVAL=5|60 # seconds
|
||||
SCREENSHOT_INTERVAL=30|300 # seconds (display_manager_screenshot_capture)
|
||||
|
||||
# Recommended for monitoring:
|
||||
SCREENSHOT_CAPTURE_INTERVAL=30 # How often display_manager captures screenshots
|
||||
SCREENSHOT_MAX_WIDTH=800 # Downscale for bandwidth
|
||||
SCREENSHOT_JPEG_QUALITY=70 # Balance quality/size
|
||||
|
||||
# File server (if different from MQTT broker):
|
||||
FILE_SERVER_HOST=192.168.1.100
|
||||
FILE_SERVER_PORT=8000
|
||||
FILE_SERVER_SCHEME=http
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Validation
|
||||
|
||||
### System-Level Test Sequence
|
||||
|
||||
**1. Start Services**:
|
||||
```bash
|
||||
# Terminal 1: Display Manager
|
||||
./scripts/start-display-manager.sh
|
||||
|
||||
# Terminal 2: MQTT Client
|
||||
./scripts/start-dev.sh
|
||||
|
||||
# Terminal 3: Monitor logs
|
||||
tail -f logs/monitoring.log
|
||||
```
|
||||
|
||||
**2. Trigger Each Event Type**:
|
||||
```bash
|
||||
# Via test menu or MQTT publish:
|
||||
./scripts/test-display-manager.sh # Options 1-3 trigger events
|
||||
```
|
||||
|
||||
**3. Verify Health State File**:
|
||||
```bash
|
||||
# Check health state gets written immediately
|
||||
cat src/current_process_health.json
|
||||
# Should show: event_id, event_type, current_process (impressive/chromium/vlc), process_status=running
|
||||
```
|
||||
|
||||
**4. Check MQTT Topics**:
|
||||
```bash
|
||||
# Monitor health messages:
|
||||
mosquitto_sub -h localhost -t "infoscreen/+/health" -v
|
||||
|
||||
# Monitor log messages:
|
||||
mosquitto_sub -h localhost -t "infoscreen/+/logs/#" -v
|
||||
|
||||
# Monitor dashboard heartbeat:
|
||||
mosquitto_sub -h localhost -t "infoscreen/+/dashboard" -v | head -c 500 && echo "..."
|
||||
```
|
||||
|
||||
**5. Simulate Process Crash**:
|
||||
```bash
|
||||
# Find impressive/chromium/vlc PID:
|
||||
ps aux | grep -E 'impressive|chromium|vlc'
|
||||
|
||||
# Kill process:
|
||||
kill -9 <pid>
|
||||
|
||||
# Watch monitoring.log for crash detection and restart
|
||||
tail -f logs/monitoring.log
|
||||
# Should see: [WARNING] Process crashed... [WARNING] Restarting process...
|
||||
```
|
||||
|
||||
**6. Verify Server Integration**:
|
||||
```bash
|
||||
# Server receives health messages:
|
||||
sqlite3 infoscreen.db "SELECT process_status, current_process, restart_count FROM clients WHERE uuid='...';"
|
||||
# Should show latest status from health message
|
||||
|
||||
# Server receives logs:
|
||||
sqlite3 infoscreen.db "SELECT level, message FROM client_logs WHERE client_uuid='...' ORDER BY timestamp DESC LIMIT 10;"
|
||||
# Should show ERROR/WARN entries from crashes/restarts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Health State File Not Created
|
||||
|
||||
**Symptom**: `src/current_process_health.json` missing
|
||||
**Causes**:
|
||||
- No event active (file only created when display starts)
|
||||
- display_manager not running
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
ps aux | grep display_manager
|
||||
tail -f logs/display_manager.log | grep "Process started\|Process stopped"
|
||||
```
|
||||
|
||||
### MQTT Health Messages Not Arriving
|
||||
|
||||
**Symptom**: No health messages on `infoscreen/{uuid}/health` topic
|
||||
**Causes**:
|
||||
- simclient not reading health state file
|
||||
- MQTT connection dropped
|
||||
- Health update function not called
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
# Check health file exists and is recent:
|
||||
ls -l src/current_process_health.json
|
||||
stat src/current_process_health.json | grep Modify
|
||||
|
||||
# Monitor simclient logs:
|
||||
tail -f logs/simclient.log | grep -E "Health|heartbeat|publish"
|
||||
|
||||
# Verify MQTT connection:
|
||||
mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
|
||||
```
|
||||
|
||||
### Restart Loop (Process Keeps Crashing)
|
||||
|
||||
**Symptom**: monitoring.log shows repeated crashes and restarts
|
||||
**Check**:
|
||||
```bash
|
||||
# Read last log lines of the process (stored by display_manager):
|
||||
tail -f logs/impressive.out.log # for presentations
|
||||
tail -f logs/browser.out.log # for websites
|
||||
tail -f logs/video_player.out.log # for videos
|
||||
```
|
||||
|
||||
**Common Causes**:
|
||||
- Missing binary (impressive not installed, chromium not found, vlc not available)
|
||||
- Corrupt presentation file
|
||||
- Invalid URL for website
|
||||
- Insufficient permissions for screenshots
|
||||
|
||||
### Log Messages Not Reaching Server
|
||||
|
||||
**Symptom**: client_logs table in server DB is empty
|
||||
**Causes**:
|
||||
- Log level filtering: INFO messages in production are local-only
|
||||
- Logs only published on ERROR/WARN
|
||||
- MQTT publish failing silently
|
||||
|
||||
**Check**:
|
||||
```bash
|
||||
# Force DEBUG_MODE to see all logs:
|
||||
export DEBUG_MODE=1
|
||||
export LOG_LEVEL=DEBUG
|
||||
# Restart simclient and trigger event
|
||||
|
||||
# Monitor local logs first:
|
||||
tail -f logs/monitoring.log | grep -i error
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
**Bandwidth per Client**:
|
||||
- Health message: ~200 bytes per heartbeat interval (every 5-60s)
|
||||
- Screenshot heartbeat: ~50-100 KB (every 30-300s)
|
||||
- Log messages: ~100-500 bytes per crash/error (rare)
|
||||
- **Total**: ~0.5-2 MB/day per device (very minimal)
|
||||
|
||||
**Disk Space on Client**:
|
||||
- Monitoring logs: 5 files × 5 MB = 25 MB max
|
||||
- Display manager logs: 5 files × 2 MB = 10 MB max
|
||||
- MQTT client logs: 5 files × 2 MB = 10 MB max
|
||||
- Screenshots: 20 files × 50-100 KB = 1-2 MB max
|
||||
- **Total**: ~50 MB max (typical for Raspberry Pi USB/SSD)
|
||||
|
||||
**Rotation Strategy**:
|
||||
- Old files automatically deleted when size limit reached
|
||||
- Technician can SSH and `tail -f` any time
|
||||
- No database overhead (file-based rotation is minimal CPU)
|
||||
|
||||
---
|
||||
|
||||
## Integration with Server (Phase 2)
|
||||
|
||||
The client implementation sends data to the server's Phase 2 endpoints:
|
||||
|
||||
**Expected Server Implementation** (from CLIENT_MONITORING_SETUP.md):
|
||||
|
||||
1. **MQTT Listener** receives and stores:
|
||||
- `infoscreen/{uuid}/logs/error`, `/logs/warn`, `/logs/info`
|
||||
- `infoscreen/{uuid}/health` messages
|
||||
- Updates `clients` table with health fields
|
||||
|
||||
2. **Database Tables**:
|
||||
- `clients.process_status`: running/crashed/starting/stopped
|
||||
- `clients.current_process`: impressive/chromium/vlc/None
|
||||
- `clients.process_pid`: PID value
|
||||
- `clients.current_event_id`: Active event
|
||||
- `client_logs`: table stores logs with level/message/context
|
||||
|
||||
3. **API Endpoints**:
|
||||
- `GET /api/client-logs/{uuid}/logs?level=ERROR&limit=50`
|
||||
- `GET /api/client-logs/summary` (errors/warnings across all clients)
|
||||
|
||||
---
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
### Files Modified
|
||||
|
||||
1. **`src/display_manager.py`**:
|
||||
- Added `psutil` import for future process monitoring
|
||||
- Added `ProcessHealthState` class (60 lines)
|
||||
- Added monitoring logger setup (8 lines)
|
||||
- Added `health.update_running()` calls in `start_presentation()`, `start_video()`, `start_webpage()`
|
||||
- Added crash detection and restart logic in `process_events()`
|
||||
- Added `health.update_stopped()` in `stop_current_display()`
|
||||
|
||||
2. **`src/simclient.py`**:
|
||||
- Added `timezone` import
|
||||
- Added monitoring logger setup (8 lines)
|
||||
- Added `read_health_state()` function
|
||||
- Added `publish_health_message()` function
|
||||
- Added `publish_log_message()` function (with level filtering)
|
||||
- Updated `send_screenshot_heartbeat()` to include health data
|
||||
- Updated heartbeat loop to call `publish_health_message()`
|
||||
|
||||
### Files Created
|
||||
|
||||
1. **`src/current_process_health.json`** (at runtime):
|
||||
- Bridge file between display_manager and simclient
|
||||
- Shared volume compatible (works in container setup)
|
||||
|
||||
2. **`logs/monitoring.log`** (at runtime):
|
||||
- New rotating log file (5 × 5MB)
|
||||
- Health events from both processes
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Deploy to test client** and run validation sequence above
|
||||
2. **Deploy server Phase 2** (if not yet done) to receive health/log messages
|
||||
3. **Verify database updates** in server-side `clients` and `client_logs` tables
|
||||
4. **Test dashboard UI** (Phase 4) to display health indicators
|
||||
5. **Configure alerting** (email/Slack) for ERROR level messages
|
||||
|
||||
---
|
||||
|
||||
**Implementation Date**: 11. März 2026
|
||||
**Part of**: Infoscreen 2025 Client Monitoring System
|
||||
**Status**: Production Ready (with server Phase 2 integration)
|
||||
12
README.md
12
README.md
@@ -225,6 +225,8 @@ For detailed deployment instructions, see:
|
||||
|
||||
## Recent changes since last commit
|
||||
|
||||
- Monitoring system: End-to-end monitoring is now implemented. The listener ingests `logs/*` and `health` MQTT topics, the API exposes monitoring endpoints (`/api/client-logs/monitoring-overview`, `/api/client-logs/recent-errors`, `/api/client-logs/<uuid>/logs`), and the superadmin dashboard page shows live client status, screenshots, and recent errors.
|
||||
- Presentation persistence fix: Fixed persistence of presentation flags so `page_progress` and `auto_progress` are reliably stored and returned for create/update flows and detached occurrences.
|
||||
- Video / Streaming support: Added end-to-end support for video events. The API and dashboard now allow creating `video` events referencing uploaded media. The server exposes a range-capable streaming endpoint at `/api/eventmedia/stream/<media_id>/<filename>` so clients can seek during playback.
|
||||
- Scheduler metadata: Scheduler now performs a best-effort HEAD probe for video stream URLs and includes basic metadata in the retained MQTT payload: `mime_type`, `size` (bytes) and `accept_ranges` (bool). Placeholders for richer metadata (`duration`, `resolution`, `bitrate`, `qualities`, `thumbnails`, `checksum`) are emitted as null/empty until a background worker fills them.
|
||||
- Dashboard & uploads: The dashboard's FileManager upload limits were increased (to support Full-HD uploads) and client-side validation enforces a maximum video length (10 minutes). The event modal exposes playback flags (`autoplay`, `loop`, `volume`, `muted`) and initializes them from system defaults for new events.
|
||||
@@ -235,7 +237,6 @@ For detailed deployment instructions, see:
|
||||
These changes are designed to be safe if metadata extraction or probes fail — clients should still attempt playback using the provided `url` and fall back to requesting/resolving richer metadata when available.
|
||||
|
||||
See `MQTT_EVENT_PAYLOAD_GUIDE.md` for details.
|
||||
- `infoscreen/{uuid}/group_id` - Client group assignment
|
||||
|
||||
## 🧩 Developer Environment Notes (Dev Container)
|
||||
- Extensions: UI-only `Dev Containers` runs on the host UI; not installed inside the container to avoid reinstallation loops. See `/.devcontainer/devcontainer.json` (`remote.extensionKind`).
|
||||
@@ -381,6 +382,9 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
|
||||
### Health & Monitoring
|
||||
- `GET /health` - Service health check
|
||||
- `GET /api/screenshots/{uuid}.jpg` - Client screenshots
|
||||
- `GET /api/client-logs/monitoring-overview` - Aggregated monitoring overview for dashboard (superadmin)
|
||||
- `GET /api/client-logs/recent-errors` - Recent error feed across clients (admin+)
|
||||
- `GET /api/client-logs/{uuid}/logs` - Filtered per-client logs (admin+)
|
||||
|
||||
## 🎨 Frontend Features
|
||||
|
||||
@@ -444,6 +448,10 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
|
||||
- Real-time event status: shows currently running events with type, title, and time window
|
||||
- Filters out unassigned groups for focused view
|
||||
- Resource-based Syncfusion timeline scheduler with resize and drag-drop support
|
||||
- **Monitoring**: Superadmin-only monitoring dashboard
|
||||
- Live client health states (`healthy`, `warning`, `critical`, `offline`) from heartbeat/process/log data
|
||||
- Latest screenshot preview and process metadata per client
|
||||
- System-wide recent error stream and per-client log drill-down
|
||||
- **Program info**: Version, build info, tech stack and paginated changelog (reads `dashboard/public/program-info.json`)
|
||||
|
||||
## 🔒 Security & Authentication
|
||||
@@ -474,7 +482,7 @@ mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
|
||||
- MQTT: Pub/sub functionality test
|
||||
- Dashboard: Nginx availability
|
||||
- **Scheduler**: Logging is concise; conversion lookups are cached and logged only once per media.
|
||||
- Dashboard: Nginx availability
|
||||
- Monitoring API: `/api/client-logs/monitoring-overview` and `/api/client-logs/recent-errors` for live diagnostics
|
||||
|
||||
### Logging Strategy
|
||||
- **Development**: Docker Compose logs with service prefixes
|
||||
|
||||
@@ -75,10 +75,18 @@ Backend technical work (post-release notes; no version bump):
|
||||
- API endpoints (`server/routes/client_logs.py`):
|
||||
- `GET /api/client-logs/<uuid>/logs` – Retrieve client logs with filters (level, limit, since); authenticated (admin_or_higher)
|
||||
- `GET /api/client-logs/summary` – Get log counts by level per client for last 24h; authenticated (admin_or_higher)
|
||||
- `GET /api/client-logs/monitoring-overview` – Aggregated monitoring overview for dashboard clients/statuses; authenticated (admin_or_higher)
|
||||
- `GET /api/client-logs/recent-errors` – System-wide error monitoring; authenticated (admin_or_higher)
|
||||
- `GET /api/client-logs/test` – Infrastructure validation endpoint (no auth required)
|
||||
- Blueprint registered in `server/wsgi.py` as `client_logs_bp`
|
||||
- Dev environment fix: Updated `docker-compose.override.yml` listener service to use `working_dir: /workspace` and direct command path for live code reload
|
||||
- 🖥️ **Monitoring Dashboard Integration (2026-03-24)**:
|
||||
- Frontend monitoring dashboard (`dashboard/src/monitoring.tsx`) is active and wired to monitoring APIs
|
||||
- Superadmin-only route/menu integration completed in `dashboard/src/App.tsx`
|
||||
- Added dashboard monitoring API client (`dashboard/src/apiClientMonitoring.ts`) for overview and recent errors
|
||||
- 🐛 **Presentation Flags Persistence Fix (2026-03-24)**:
|
||||
- Fixed persistence for presentation flags `page_progress` and `auto_progress` across create/update and detached-occurrence flows
|
||||
- API serialization now reliably returns stored values for presentation behavior fields
|
||||
- 📡 **MQTT Protocol Extensions**:
|
||||
- New log topics: `infoscreen/{uuid}/logs/{error|warn|info}` with JSON payload (timestamp, message, context)
|
||||
- New health topic: `infoscreen/{uuid}/health` with metrics (expected_state, actual_state, health_metrics)
|
||||
@@ -95,8 +103,7 @@ Backend technical work (post-release notes; no version bump):
|
||||
|
||||
Notes for integrators:
|
||||
- Tiered logging strategy: ERROR/WARN always centralized (QoS 1), INFO dev-only (QoS 0), DEBUG local-only
|
||||
- Client-side implementation pending (Phase 3: watchdog service)
|
||||
- Dashboard UI pending (Phase 4: log viewer and health indicators)
|
||||
- Monitoring dashboard is implemented and consumes `/api/client-logs/monitoring-overview`, `/api/client-logs/recent-errors`, and `/api/client-logs/<uuid>/logs`
|
||||
- Foreign key constraint prevents logging for non-existent clients (data integrity enforced)
|
||||
- Migration is idempotent and can be safely rerun after interruption
|
||||
- Use `GET /api/client-logs/test` for quick infrastructure validation without authentication
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import React, { useState } from 'react';
|
||||
import { BrowserRouter as Router, Routes, Route, Link, Outlet, useNavigate } from 'react-router-dom';
|
||||
import { BrowserRouter as Router, Routes, Route, Link, Outlet, useNavigate, Navigate } from 'react-router-dom';
|
||||
import { SidebarComponent } from '@syncfusion/ej2-react-navigations';
|
||||
import { ButtonComponent } from '@syncfusion/ej2-react-buttons';
|
||||
import { DropDownButtonComponent } from '@syncfusion/ej2-react-splitbuttons';
|
||||
@@ -19,6 +19,7 @@ import {
|
||||
Settings,
|
||||
Monitor,
|
||||
MonitorDotIcon,
|
||||
Activity,
|
||||
LogOut,
|
||||
Wrench,
|
||||
Info,
|
||||
@@ -31,6 +32,7 @@ const sidebarItems = [
|
||||
{ name: 'Ressourcen', path: '/ressourcen', icon: Boxes, minRole: 'editor' },
|
||||
{ name: 'Raumgruppen', path: '/infoscr_groups', icon: MonitorDotIcon, minRole: 'admin' },
|
||||
{ name: 'Infoscreen-Clients', path: '/clients', icon: Monitor, minRole: 'admin' },
|
||||
{ name: 'Monitor-Dashboard', path: '/monitoring', icon: Activity, minRole: 'superadmin' },
|
||||
{ name: 'Erweiterungsmodus', path: '/setup', icon: Wrench, minRole: 'admin' },
|
||||
{ name: 'Medien', path: '/medien', icon: Image, minRole: 'editor' },
|
||||
{ name: 'Benutzer', path: '/benutzer', icon: User, minRole: 'admin' },
|
||||
@@ -49,6 +51,7 @@ import Benutzer from './users';
|
||||
import Einstellungen from './settings';
|
||||
import SetupMode from './SetupMode';
|
||||
import Programminfo from './programminfo';
|
||||
import MonitoringDashboard from './monitoring';
|
||||
import Logout from './logout';
|
||||
import Login from './login';
|
||||
import { useAuth } from './useAuth';
|
||||
@@ -480,6 +483,14 @@ const App: React.FC = () => {
|
||||
return <>{children}</>;
|
||||
};
|
||||
|
||||
const RequireSuperadmin: React.FC<{ children: React.ReactNode }> = ({ children }) => {
|
||||
const { isAuthenticated, loading, user } = useAuth();
|
||||
if (loading) return <div style={{ padding: 24 }}>Lade ...</div>;
|
||||
if (!isAuthenticated) return <Login />;
|
||||
if (user?.role !== 'superadmin') return <Navigate to="/" replace />;
|
||||
return <>{children}</>;
|
||||
};
|
||||
|
||||
return (
|
||||
<ToastProvider>
|
||||
<Routes>
|
||||
@@ -499,6 +510,14 @@ const App: React.FC = () => {
|
||||
<Route path="benutzer" element={<Benutzer />} />
|
||||
<Route path="einstellungen" element={<Einstellungen />} />
|
||||
<Route path="clients" element={<Infoscreens />} />
|
||||
<Route
|
||||
path="monitoring"
|
||||
element={
|
||||
<RequireSuperadmin>
|
||||
<MonitoringDashboard />
|
||||
</RequireSuperadmin>
|
||||
}
|
||||
/>
|
||||
<Route path="setup" element={<SetupMode />} />
|
||||
<Route path="programminfo" element={<Programminfo />} />
|
||||
</Route>
|
||||
|
||||
106
dashboard/src/apiClientMonitoring.ts
Normal file
106
dashboard/src/apiClientMonitoring.ts
Normal file
@@ -0,0 +1,106 @@
|
||||
export interface MonitoringLogEntry {
|
||||
id: number;
|
||||
timestamp: string | null;
|
||||
level: 'ERROR' | 'WARN' | 'INFO' | 'DEBUG' | null;
|
||||
message: string;
|
||||
context: Record<string, unknown>;
|
||||
client_uuid?: string;
|
||||
}
|
||||
|
||||
export interface MonitoringClient {
|
||||
uuid: string;
|
||||
hostname?: string | null;
|
||||
description?: string | null;
|
||||
ip?: string | null;
|
||||
model?: string | null;
|
||||
groupId?: number | null;
|
||||
groupName?: string | null;
|
||||
registrationTime?: string | null;
|
||||
lastAlive?: string | null;
|
||||
isAlive: boolean;
|
||||
status: 'healthy' | 'warning' | 'critical' | 'offline';
|
||||
currentEventId?: number | null;
|
||||
currentProcess?: string | null;
|
||||
processStatus?: string | null;
|
||||
processPid?: number | null;
|
||||
screenHealthStatus?: string | null;
|
||||
lastScreenshotAnalyzed?: string | null;
|
||||
lastScreenshotHash?: string | null;
|
||||
screenshotUrl: string;
|
||||
logCounts24h: {
|
||||
error: number;
|
||||
warn: number;
|
||||
info: number;
|
||||
debug: number;
|
||||
};
|
||||
latestLog?: MonitoringLogEntry | null;
|
||||
latestError?: MonitoringLogEntry | null;
|
||||
}
|
||||
|
||||
export interface MonitoringOverview {
|
||||
summary: {
|
||||
totalClients: number;
|
||||
onlineClients: number;
|
||||
offlineClients: number;
|
||||
healthyClients: number;
|
||||
warningClients: number;
|
||||
criticalClients: number;
|
||||
errorLogs: number;
|
||||
warnLogs: number;
|
||||
};
|
||||
periodHours: number;
|
||||
gracePeriodSeconds: number;
|
||||
since: string;
|
||||
timestamp: string;
|
||||
clients: MonitoringClient[];
|
||||
}
|
||||
|
||||
export interface ClientLogsResponse {
|
||||
client_uuid: string;
|
||||
logs: MonitoringLogEntry[];
|
||||
count: number;
|
||||
limit: number;
|
||||
}
|
||||
|
||||
async function parseJsonResponse<T>(response: Response, fallbackMessage: string): Promise<T> {
|
||||
const data = await response.json();
|
||||
if (!response.ok) {
|
||||
throw new Error(data.error || fallbackMessage);
|
||||
}
|
||||
return data as T;
|
||||
}
|
||||
|
||||
export async function fetchMonitoringOverview(hours = 24): Promise<MonitoringOverview> {
|
||||
const response = await fetch(`/api/client-logs/monitoring-overview?hours=${hours}`, {
|
||||
credentials: 'include',
|
||||
});
|
||||
return parseJsonResponse<MonitoringOverview>(response, 'Fehler beim Laden der Monitoring-Übersicht');
|
||||
}
|
||||
|
||||
export async function fetchRecentClientErrors(limit = 20): Promise<MonitoringLogEntry[]> {
|
||||
const response = await fetch(`/api/client-logs/recent-errors?limit=${limit}`, {
|
||||
credentials: 'include',
|
||||
});
|
||||
const data = await parseJsonResponse<{ errors: MonitoringLogEntry[] }>(
|
||||
response,
|
||||
'Fehler beim Laden der letzten Fehler'
|
||||
);
|
||||
return data.errors;
|
||||
}
|
||||
|
||||
export async function fetchClientMonitoringLogs(
|
||||
uuid: string,
|
||||
options: { level?: string; limit?: number } = {}
|
||||
): Promise<MonitoringLogEntry[]> {
|
||||
const params = new URLSearchParams();
|
||||
if (options.level && options.level !== 'ALL') {
|
||||
params.set('level', options.level);
|
||||
}
|
||||
params.set('limit', String(options.limit ?? 100));
|
||||
|
||||
const response = await fetch(`/api/client-logs/${uuid}/logs?${params.toString()}`, {
|
||||
credentials: 'include',
|
||||
});
|
||||
const data = await parseJsonResponse<ClientLogsResponse>(response, 'Fehler beim Laden der Client-Logs');
|
||||
return data.logs;
|
||||
}
|
||||
347
dashboard/src/monitoring.css
Normal file
347
dashboard/src/monitoring.css
Normal file
@@ -0,0 +1,347 @@
|
||||
.monitoring-page {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1.25rem;
|
||||
padding: 0.5rem 0.25rem 1rem;
|
||||
}
|
||||
|
||||
.monitoring-header-row {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: flex-start;
|
||||
gap: 1rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.monitoring-title {
|
||||
margin: 0;
|
||||
font-size: 1.75rem;
|
||||
font-weight: 700;
|
||||
color: #5c4318;
|
||||
}
|
||||
|
||||
.monitoring-subtitle {
|
||||
margin: 0.35rem 0 0;
|
||||
color: #6b7280;
|
||||
max-width: 60ch;
|
||||
}
|
||||
|
||||
.monitoring-toolbar {
|
||||
display: flex;
|
||||
align-items: end;
|
||||
gap: 0.75rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.monitoring-toolbar-field {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.35rem;
|
||||
min-width: 190px;
|
||||
}
|
||||
|
||||
.monitoring-toolbar-field-compact {
|
||||
min-width: 160px;
|
||||
}
|
||||
|
||||
.monitoring-toolbar-field label {
|
||||
font-size: 0.875rem;
|
||||
font-weight: 600;
|
||||
color: #5b4b32;
|
||||
}
|
||||
|
||||
.monitoring-meta-row {
|
||||
display: flex;
|
||||
gap: 1rem;
|
||||
flex-wrap: wrap;
|
||||
color: #6b7280;
|
||||
font-size: 0.92rem;
|
||||
}
|
||||
|
||||
.monitoring-summary-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
.monitoring-metric-card {
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.monitoring-metric-content {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.35rem;
|
||||
}
|
||||
|
||||
.monitoring-metric-title {
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
color: #6b7280;
|
||||
}
|
||||
|
||||
.monitoring-metric-value {
|
||||
font-size: 2rem;
|
||||
font-weight: 700;
|
||||
color: #1f2937;
|
||||
line-height: 1;
|
||||
}
|
||||
|
||||
.monitoring-metric-subtitle {
|
||||
font-size: 0.85rem;
|
||||
color: #64748b;
|
||||
}
|
||||
|
||||
.monitoring-main-grid {
|
||||
display: grid;
|
||||
grid-template-columns: minmax(0, 2fr) minmax(320px, 1fr);
|
||||
gap: 1rem;
|
||||
align-items: start;
|
||||
}
|
||||
|
||||
.monitoring-sidebar-column {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
.monitoring-panel {
|
||||
background: #fff;
|
||||
border: 1px solid #e5e7eb;
|
||||
border-radius: 16px;
|
||||
padding: 1.1rem;
|
||||
box-shadow: 0 12px 40px rgb(120 89 28 / 8%);
|
||||
}
|
||||
|
||||
.monitoring-clients-panel {
|
||||
min-width: 0;
|
||||
}
|
||||
|
||||
.monitoring-panel-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
gap: 0.75rem;
|
||||
margin-bottom: 0.85rem;
|
||||
}
|
||||
|
||||
.monitoring-panel-header-stacked {
|
||||
align-items: end;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.monitoring-panel-header h3 {
|
||||
margin: 0;
|
||||
font-size: 1.1rem;
|
||||
font-weight: 700;
|
||||
}
|
||||
|
||||
.monitoring-panel-header span {
|
||||
color: #6b7280;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
.monitoring-detail-card .e-card-content {
|
||||
padding-top: 0;
|
||||
}
|
||||
|
||||
.monitoring-detail-list {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
|
||||
.monitoring-detail-row {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
gap: 1rem;
|
||||
align-items: flex-start;
|
||||
border-bottom: 1px solid #f1f5f9;
|
||||
padding-bottom: 0.55rem;
|
||||
}
|
||||
|
||||
.monitoring-detail-row span {
|
||||
color: #64748b;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
.monitoring-detail-row strong {
|
||||
text-align: right;
|
||||
color: #111827;
|
||||
}
|
||||
|
||||
.monitoring-status-badge {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
padding: 0.22rem 0.6rem;
|
||||
border-radius: 999px;
|
||||
font-weight: 700;
|
||||
font-size: 0.78rem;
|
||||
letter-spacing: 0.01em;
|
||||
}
|
||||
|
||||
.monitoring-screenshot {
|
||||
width: 100%;
|
||||
border-radius: 12px;
|
||||
border: 1px solid #e5e7eb;
|
||||
background: linear-gradient(135deg, #f8fafc, #e2e8f0);
|
||||
min-height: 180px;
|
||||
object-fit: cover;
|
||||
}
|
||||
|
||||
.monitoring-screenshot-meta {
|
||||
margin-top: 0.55rem;
|
||||
font-size: 0.88rem;
|
||||
color: #64748b;
|
||||
}
|
||||
|
||||
.monitoring-error-box {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.5rem;
|
||||
padding: 0.85rem;
|
||||
border-radius: 12px;
|
||||
background: linear-gradient(135deg, #fff1f2, #fee2e2);
|
||||
border: 1px solid #fecdd3;
|
||||
}
|
||||
|
||||
.monitoring-error-time {
|
||||
color: #9f1239;
|
||||
font-size: 0.85rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.monitoring-error-message {
|
||||
color: #4c0519;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.monitoring-mono {
|
||||
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.monitoring-log-detail-row {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
gap: 1rem;
|
||||
align-items: flex-start;
|
||||
border-bottom: 1px solid #f1f5f9;
|
||||
padding-bottom: 0.55rem;
|
||||
}
|
||||
|
||||
.monitoring-log-detail-row span {
|
||||
color: #64748b;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
.monitoring-log-detail-row strong {
|
||||
text-align: right;
|
||||
color: #111827;
|
||||
}
|
||||
|
||||
.monitoring-log-context {
|
||||
margin: 0;
|
||||
background: #f8fafc;
|
||||
border: 1px solid #e2e8f0;
|
||||
border-radius: 10px;
|
||||
padding: 0.75rem;
|
||||
white-space: pre-wrap;
|
||||
overflow-wrap: anywhere;
|
||||
max-height: 280px;
|
||||
overflow: auto;
|
||||
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace;
|
||||
font-size: 0.84rem;
|
||||
color: #0f172a;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-content {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1rem;
|
||||
padding: 0.9rem 1rem 0.55rem;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-body {
|
||||
min-height: 340px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
justify-content: space-between;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-actions {
|
||||
margin-top: 0.5rem;
|
||||
padding: 0 1rem 0.9rem;
|
||||
display: flex;
|
||||
justify-content: flex-end;
|
||||
}
|
||||
|
||||
.monitoring-log-context-title {
|
||||
font-weight: 600;
|
||||
margin-bottom: 0.55rem;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-content .monitoring-log-detail-row {
|
||||
padding: 0.1rem 0 0.75rem;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-content .monitoring-log-context {
|
||||
padding: 0.95rem;
|
||||
border-radius: 12px;
|
||||
}
|
||||
|
||||
.monitoring-lower-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, minmax(0, 1fr));
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
@media (width <= 1200px) {
|
||||
.monitoring-main-grid,
|
||||
.monitoring-lower-grid {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
}
|
||||
|
||||
@media (width <= 720px) {
|
||||
.monitoring-page {
|
||||
padding: 0.25rem 0 0.75rem;
|
||||
}
|
||||
|
||||
.monitoring-title {
|
||||
font-size: 1.5rem;
|
||||
}
|
||||
|
||||
.monitoring-header-row,
|
||||
.monitoring-panel-header,
|
||||
.monitoring-detail-row,
|
||||
.monitoring-log-detail-row {
|
||||
flex-direction: column;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.monitoring-detail-row strong,
|
||||
.monitoring-log-detail-row strong {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
.monitoring-toolbar,
|
||||
.monitoring-toolbar-field,
|
||||
.monitoring-toolbar-field-compact {
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-content {
|
||||
padding: 0.4rem 0.2rem 0.1rem;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-body {
|
||||
min-height: 300px;
|
||||
}
|
||||
|
||||
.monitoring-log-dialog-actions {
|
||||
padding: 0 0.2rem 0.4rem;
|
||||
}
|
||||
}
|
||||
534
dashboard/src/monitoring.tsx
Normal file
534
dashboard/src/monitoring.tsx
Normal file
@@ -0,0 +1,534 @@
|
||||
import React from 'react';
|
||||
import {
|
||||
fetchClientMonitoringLogs,
|
||||
fetchMonitoringOverview,
|
||||
fetchRecentClientErrors,
|
||||
type MonitoringClient,
|
||||
type MonitoringLogEntry,
|
||||
type MonitoringOverview,
|
||||
} from './apiClientMonitoring';
|
||||
import { useAuth } from './useAuth';
|
||||
import { ButtonComponent } from '@syncfusion/ej2-react-buttons';
|
||||
import { DropDownListComponent } from '@syncfusion/ej2-react-dropdowns';
|
||||
import {
|
||||
GridComponent,
|
||||
ColumnsDirective,
|
||||
ColumnDirective,
|
||||
Inject,
|
||||
Page,
|
||||
Search,
|
||||
Sort,
|
||||
Toolbar,
|
||||
} from '@syncfusion/ej2-react-grids';
|
||||
import { MessageComponent } from '@syncfusion/ej2-react-notifications';
|
||||
import { DialogComponent } from '@syncfusion/ej2-react-popups';
|
||||
import './monitoring.css';
|
||||
|
||||
const REFRESH_INTERVAL_MS = 15000;
|
||||
|
||||
const hourOptions = [
|
||||
{ text: 'Letzte 6 Stunden', value: 6 },
|
||||
{ text: 'Letzte 24 Stunden', value: 24 },
|
||||
{ text: 'Letzte 72 Stunden', value: 72 },
|
||||
{ text: 'Letzte 168 Stunden', value: 168 },
|
||||
];
|
||||
|
||||
const logLevelOptions = [
|
||||
{ text: 'Alle Logs', value: 'ALL' },
|
||||
{ text: 'ERROR', value: 'ERROR' },
|
||||
{ text: 'WARN', value: 'WARN' },
|
||||
{ text: 'INFO', value: 'INFO' },
|
||||
{ text: 'DEBUG', value: 'DEBUG' },
|
||||
];
|
||||
|
||||
const statusPalette: Record<string, { label: string; color: string; background: string }> = {
|
||||
healthy: { label: 'Stabil', color: '#166534', background: '#dcfce7' },
|
||||
warning: { label: 'Warnung', color: '#92400e', background: '#fef3c7' },
|
||||
critical: { label: 'Kritisch', color: '#991b1b', background: '#fee2e2' },
|
||||
offline: { label: 'Offline', color: '#334155', background: '#e2e8f0' },
|
||||
};
|
||||
|
||||
function parseUtcDate(value?: string | null): Date | null {
|
||||
if (!value) return null;
|
||||
const trimmed = value.trim();
|
||||
if (!trimmed) return null;
|
||||
|
||||
const hasTimezone = /[zZ]$|[+-]\d{2}:?\d{2}$/.test(trimmed);
|
||||
const utcValue = hasTimezone ? trimmed : `${trimmed}Z`;
|
||||
const parsed = new Date(utcValue);
|
||||
if (Number.isNaN(parsed.getTime())) return null;
|
||||
return parsed;
|
||||
}
|
||||
|
||||
function formatTimestamp(value?: string | null): string {
|
||||
if (!value) return 'Keine Daten';
|
||||
const date = parseUtcDate(value);
|
||||
if (!date) return value;
|
||||
return date.toLocaleString('de-DE');
|
||||
}
|
||||
|
||||
function formatRelative(value?: string | null): string {
|
||||
if (!value) return 'Keine Daten';
|
||||
const date = parseUtcDate(value);
|
||||
if (!date) return 'Unbekannt';
|
||||
|
||||
const diffMs = Date.now() - date.getTime();
|
||||
const diffMinutes = Math.floor(diffMs / 60000);
|
||||
const diffHours = Math.floor(diffMinutes / 60);
|
||||
const diffDays = Math.floor(diffHours / 24);
|
||||
|
||||
if (diffMinutes < 1) return 'gerade eben';
|
||||
if (diffMinutes < 60) return `vor ${diffMinutes} Min.`;
|
||||
if (diffHours < 24) return `vor ${diffHours} Std.`;
|
||||
return `vor ${diffDays} Tag${diffDays === 1 ? '' : 'en'}`;
|
||||
}
|
||||
|
||||
function statusBadge(status: string) {
|
||||
const palette = statusPalette[status] || statusPalette.offline;
|
||||
return (
|
||||
<span
|
||||
className="monitoring-status-badge"
|
||||
style={{ color: palette.color, backgroundColor: palette.background }}
|
||||
>
|
||||
{palette.label}
|
||||
</span>
|
||||
);
|
||||
}
|
||||
|
||||
function renderMetricCard(title: string, value: number, subtitle: string, accent: string) {
|
||||
return (
|
||||
<div className="e-card monitoring-metric-card" style={{ borderTop: `4px solid ${accent}` }}>
|
||||
<div className="e-card-content monitoring-metric-content">
|
||||
<div className="monitoring-metric-title">{title}</div>
|
||||
<div className="monitoring-metric-value">{value}</div>
|
||||
<div className="monitoring-metric-subtitle">{subtitle}</div>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
function renderContext(context?: Record<string, unknown>): string {
|
||||
if (!context || Object.keys(context).length === 0) {
|
||||
return 'Kein Kontext vorhanden';
|
||||
}
|
||||
try {
|
||||
return JSON.stringify(context, null, 2);
|
||||
} catch {
|
||||
return 'Kontext konnte nicht formatiert werden';
|
||||
}
|
||||
}
|
||||
|
||||
function buildScreenshotUrl(client: MonitoringClient, overviewTimestamp?: string | null): string {
|
||||
const refreshKey = client.lastScreenshotHash || client.lastScreenshotAnalyzed || overviewTimestamp;
|
||||
if (!refreshKey) {
|
||||
return client.screenshotUrl;
|
||||
}
|
||||
|
||||
const separator = client.screenshotUrl.includes('?') ? '&' : '?';
|
||||
return `${client.screenshotUrl}${separator}v=${encodeURIComponent(refreshKey)}`;
|
||||
}
|
||||
|
||||
const MonitoringDashboard: React.FC = () => {
|
||||
const { user } = useAuth();
|
||||
const [hours, setHours] = React.useState<number>(24);
|
||||
const [logLevel, setLogLevel] = React.useState<string>('ALL');
|
||||
const [overview, setOverview] = React.useState<MonitoringOverview | null>(null);
|
||||
const [recentErrors, setRecentErrors] = React.useState<MonitoringLogEntry[]>([]);
|
||||
const [clientLogs, setClientLogs] = React.useState<MonitoringLogEntry[]>([]);
|
||||
const [selectedClientUuid, setSelectedClientUuid] = React.useState<string | null>(null);
|
||||
const [loading, setLoading] = React.useState<boolean>(true);
|
||||
const [error, setError] = React.useState<string | null>(null);
|
||||
const [logsLoading, setLogsLoading] = React.useState<boolean>(false);
|
||||
const [screenshotErrored, setScreenshotErrored] = React.useState<boolean>(false);
|
||||
const selectedClientUuidRef = React.useRef<string | null>(null);
|
||||
const [selectedLogEntry, setSelectedLogEntry] = React.useState<MonitoringLogEntry | null>(null);
|
||||
|
||||
const selectedClient = React.useMemo<MonitoringClient | null>(() => {
|
||||
if (!overview || !selectedClientUuid) return null;
|
||||
return overview.clients.find(client => client.uuid === selectedClientUuid) || null;
|
||||
}, [overview, selectedClientUuid]);
|
||||
|
||||
const selectedClientScreenshotUrl = React.useMemo<string | null>(() => {
|
||||
if (!selectedClient) return null;
|
||||
return buildScreenshotUrl(selectedClient, overview?.timestamp || null);
|
||||
}, [selectedClient, overview?.timestamp]);
|
||||
|
||||
React.useEffect(() => {
|
||||
selectedClientUuidRef.current = selectedClientUuid;
|
||||
}, [selectedClientUuid]);
|
||||
|
||||
const loadOverview = React.useCallback(async (requestedHours: number, preserveSelection = true) => {
|
||||
setLoading(true);
|
||||
setError(null);
|
||||
try {
|
||||
const [overviewData, errorsData] = await Promise.all([
|
||||
fetchMonitoringOverview(requestedHours),
|
||||
fetchRecentClientErrors(25),
|
||||
]);
|
||||
setOverview(overviewData);
|
||||
setRecentErrors(errorsData);
|
||||
|
||||
const currentSelection = selectedClientUuidRef.current;
|
||||
const nextSelectedUuid =
|
||||
preserveSelection && currentSelection && overviewData.clients.some(client => client.uuid === currentSelection)
|
||||
? currentSelection
|
||||
: overviewData.clients[0]?.uuid || null;
|
||||
|
||||
setSelectedClientUuid(nextSelectedUuid);
|
||||
setScreenshotErrored(false);
|
||||
} catch (loadError) {
|
||||
setError(loadError instanceof Error ? loadError.message : 'Monitoring-Daten konnten nicht geladen werden');
|
||||
} finally {
|
||||
setLoading(false);
|
||||
}
|
||||
}, []);
|
||||
|
||||
React.useEffect(() => {
|
||||
loadOverview(hours, false);
|
||||
}, [hours, loadOverview]);
|
||||
|
||||
React.useEffect(() => {
|
||||
const intervalId = window.setInterval(() => {
|
||||
loadOverview(hours);
|
||||
}, REFRESH_INTERVAL_MS);
|
||||
|
||||
return () => window.clearInterval(intervalId);
|
||||
}, [hours, loadOverview]);
|
||||
|
||||
React.useEffect(() => {
|
||||
if (!selectedClientUuid) {
|
||||
setClientLogs([]);
|
||||
return;
|
||||
}
|
||||
|
||||
let active = true;
|
||||
const loadLogs = async () => {
|
||||
setLogsLoading(true);
|
||||
try {
|
||||
const logs = await fetchClientMonitoringLogs(selectedClientUuid, { level: logLevel, limit: 100 });
|
||||
if (active) {
|
||||
setClientLogs(logs);
|
||||
}
|
||||
} catch (loadError) {
|
||||
if (active) {
|
||||
setClientLogs([]);
|
||||
setError(loadError instanceof Error ? loadError.message : 'Client-Logs konnten nicht geladen werden');
|
||||
}
|
||||
} finally {
|
||||
if (active) {
|
||||
setLogsLoading(false);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
loadLogs();
|
||||
return () => {
|
||||
active = false;
|
||||
};
|
||||
}, [selectedClientUuid, logLevel]);
|
||||
|
||||
React.useEffect(() => {
|
||||
setScreenshotErrored(false);
|
||||
}, [selectedClientUuid]);
|
||||
|
||||
if (!user || user.role !== 'superadmin') {
|
||||
return (
|
||||
<MessageComponent severity="Error" content="Dieses Monitoring-Dashboard ist nur für Superadministratoren sichtbar." />
|
||||
);
|
||||
}
|
||||
|
||||
const clientGridData = (overview?.clients || []).map(client => ({
|
||||
...client,
|
||||
displayName: client.description || client.hostname || client.uuid,
|
||||
lastAliveDisplay: formatTimestamp(client.lastAlive),
|
||||
currentProcessDisplay: client.currentProcess || 'kein Prozess',
|
||||
processStatusDisplay: client.processStatus || 'unbekannt',
|
||||
errorCount: client.logCounts24h.error,
|
||||
warnCount: client.logCounts24h.warn,
|
||||
}));
|
||||
|
||||
return (
|
||||
<div className="monitoring-page">
|
||||
<div className="monitoring-header-row">
|
||||
<div>
|
||||
<h2 className="monitoring-title">Monitor-Dashboard</h2>
|
||||
<p className="monitoring-subtitle">
|
||||
Live-Zustand der Infoscreen-Clients, Prozessstatus und zentrale Fehlerprotokolle.
|
||||
</p>
|
||||
</div>
|
||||
<div className="monitoring-toolbar">
|
||||
<div className="monitoring-toolbar-field">
|
||||
<label>Zeitraum</label>
|
||||
<DropDownListComponent
|
||||
dataSource={hourOptions}
|
||||
fields={{ text: 'text', value: 'value' }}
|
||||
value={hours}
|
||||
change={(args: { value: number }) => setHours(Number(args.value))}
|
||||
/>
|
||||
</div>
|
||||
<ButtonComponent cssClass="e-primary" onClick={() => loadOverview(hours)} disabled={loading}>
|
||||
Aktualisieren
|
||||
</ButtonComponent>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{error && <MessageComponent severity="Error" content={error} />}
|
||||
|
||||
{overview && (
|
||||
<div className="monitoring-meta-row">
|
||||
<span>Stand: {formatTimestamp(overview.timestamp)}</span>
|
||||
<span>Alive-Fenster: {overview.gracePeriodSeconds} Sekunden</span>
|
||||
<span>Betrachtungszeitraum: {overview.periodHours} Stunden</span>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="monitoring-summary-grid">
|
||||
{renderMetricCard('Clients gesamt', overview?.summary.totalClients || 0, 'Registrierte Displays', '#7c3aed')}
|
||||
{renderMetricCard('Online', overview?.summary.onlineClients || 0, 'Heartbeat innerhalb der Grace-Periode', '#15803d')}
|
||||
{renderMetricCard('Warnungen', overview?.summary.warningClients || 0, 'Warn-Logs oder Übergangszustände', '#d97706')}
|
||||
{renderMetricCard('Kritisch', overview?.summary.criticalClients || 0, 'Crashs oder Fehler-Logs', '#dc2626')}
|
||||
{renderMetricCard('Offline', overview?.summary.offlineClients || 0, 'Keine frischen Signale', '#475569')}
|
||||
{renderMetricCard('Fehler-Logs', overview?.summary.errorLogs || 0, 'Im gewählten Zeitraum', '#b91c1c')}
|
||||
</div>
|
||||
|
||||
{loading && !overview ? (
|
||||
<MessageComponent severity="Info" content="Monitoring-Daten werden geladen ..." />
|
||||
) : (
|
||||
<div className="monitoring-main-grid">
|
||||
<div className="monitoring-panel monitoring-clients-panel">
|
||||
<div className="monitoring-panel-header">
|
||||
<h3>Client-Zustand</h3>
|
||||
<span>{overview?.clients.length || 0} Einträge</span>
|
||||
</div>
|
||||
<GridComponent
|
||||
dataSource={clientGridData}
|
||||
allowPaging={true}
|
||||
pageSettings={{ pageSize: 10 }}
|
||||
allowSorting={true}
|
||||
toolbar={['Search']}
|
||||
height={460}
|
||||
rowSelected={(args: { data: MonitoringClient }) => {
|
||||
setSelectedClientUuid(args.data.uuid);
|
||||
}}
|
||||
>
|
||||
<ColumnsDirective>
|
||||
<ColumnDirective
|
||||
field="status"
|
||||
headerText="Status"
|
||||
width="120"
|
||||
template={(props: MonitoringClient) => statusBadge(props.status)}
|
||||
/>
|
||||
<ColumnDirective field="displayName" headerText="Client" width="190" />
|
||||
<ColumnDirective field="groupName" headerText="Gruppe" width="150" />
|
||||
<ColumnDirective field="currentProcessDisplay" headerText="Prozess" width="130" />
|
||||
<ColumnDirective field="processStatusDisplay" headerText="Prozessstatus" width="130" />
|
||||
<ColumnDirective field="errorCount" headerText="ERROR" textAlign="Right" width="90" />
|
||||
<ColumnDirective field="warnCount" headerText="WARN" textAlign="Right" width="90" />
|
||||
<ColumnDirective field="lastAliveDisplay" headerText="Letztes Signal" width="170" />
|
||||
</ColumnsDirective>
|
||||
<Inject services={[Page, Search, Sort, Toolbar]} />
|
||||
</GridComponent>
|
||||
</div>
|
||||
|
||||
<div className="monitoring-sidebar-column">
|
||||
<div className="e-card monitoring-detail-card">
|
||||
<div className="e-card-header">
|
||||
<div className="e-card-header-caption">
|
||||
<div className="e-card-title">Aktiver Client</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="e-card-content">
|
||||
{selectedClient ? (
|
||||
<div className="monitoring-detail-list">
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Name</span>
|
||||
<strong>{selectedClient.description || selectedClient.hostname || selectedClient.uuid}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Status</span>
|
||||
<strong>{statusBadge(selectedClient.status)}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>UUID</span>
|
||||
<strong className="monitoring-mono">{selectedClient.uuid}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Raumgruppe</span>
|
||||
<strong>{selectedClient.groupName || 'Nicht zugeordnet'}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Prozess</span>
|
||||
<strong>{selectedClient.currentProcess || 'kein Prozess'}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>PID</span>
|
||||
<strong>{selectedClient.processPid || 'keine PID'}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Event-ID</span>
|
||||
<strong>{selectedClient.currentEventId || 'keine Zuordnung'}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Letztes Signal</span>
|
||||
<strong>{formatRelative(selectedClient.lastAlive)}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Bildschirmstatus</span>
|
||||
<strong>{selectedClient.screenHealthStatus || 'UNKNOWN'}</strong>
|
||||
</div>
|
||||
<div className="monitoring-detail-row">
|
||||
<span>Letzte Analyse</span>
|
||||
<strong>{formatTimestamp(selectedClient.lastScreenshotAnalyzed)}</strong>
|
||||
</div>
|
||||
</div>
|
||||
) : (
|
||||
<MessageComponent severity="Info" content="Wählen Sie links einen Client aus." />
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="e-card monitoring-detail-card">
|
||||
<div className="e-card-header">
|
||||
<div className="e-card-header-caption">
|
||||
<div className="e-card-title">Der letzte Screenshot</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="e-card-content">
|
||||
{selectedClient ? (
|
||||
<>
|
||||
{screenshotErrored ? (
|
||||
<MessageComponent severity="Warning" content="Für diesen Client liegt noch kein Screenshot vor." />
|
||||
) : (
|
||||
<img
|
||||
src={selectedClientScreenshotUrl || selectedClient.screenshotUrl}
|
||||
alt={`Screenshot ${selectedClient.uuid}`}
|
||||
className="monitoring-screenshot"
|
||||
onError={() => setScreenshotErrored(true)}
|
||||
/>
|
||||
)}
|
||||
<div className="monitoring-screenshot-meta">
|
||||
Empfangen: {formatTimestamp(selectedClient.lastScreenshotAnalyzed)}
|
||||
</div>
|
||||
</>
|
||||
) : (
|
||||
<MessageComponent severity="Info" content="Kein Client ausgewählt." />
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="e-card monitoring-detail-card">
|
||||
<div className="e-card-header">
|
||||
<div className="e-card-header-caption">
|
||||
<div className="e-card-title">Letzter Fehler</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="e-card-content">
|
||||
{selectedClient?.latestError ? (
|
||||
<div className="monitoring-error-box">
|
||||
<div className="monitoring-error-time">{formatTimestamp(selectedClient.latestError.timestamp)}</div>
|
||||
<div className="monitoring-error-message">{selectedClient.latestError.message}</div>
|
||||
</div>
|
||||
) : (
|
||||
<MessageComponent severity="Success" content="Kein ERROR-Log für den ausgewählten Client gefunden." />
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="monitoring-lower-grid">
|
||||
<div className="monitoring-panel">
|
||||
<div className="monitoring-panel-header monitoring-panel-header-stacked">
|
||||
<div>
|
||||
<h3>Client-Logs</h3>
|
||||
<span>{selectedClient ? `Client ${selectedClient.uuid}` : 'Kein Client ausgewählt'}</span>
|
||||
</div>
|
||||
<div className="monitoring-toolbar-field monitoring-toolbar-field-compact">
|
||||
<label>Level</label>
|
||||
<DropDownListComponent
|
||||
dataSource={logLevelOptions}
|
||||
fields={{ text: 'text', value: 'value' }}
|
||||
value={logLevel}
|
||||
change={(args: { value: string }) => setLogLevel(String(args.value))}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
{logsLoading && <MessageComponent severity="Info" content="Client-Logs werden geladen ..." />}
|
||||
<GridComponent
|
||||
dataSource={clientLogs}
|
||||
allowPaging={true}
|
||||
pageSettings={{ pageSize: 8 }}
|
||||
allowSorting={true}
|
||||
height={320}
|
||||
rowSelected={(args: { data: MonitoringLogEntry }) => {
|
||||
setSelectedLogEntry(args.data);
|
||||
}}
|
||||
>
|
||||
<ColumnsDirective>
|
||||
<ColumnDirective field="timestamp" headerText="Zeit" width="180" template={(props: MonitoringLogEntry) => formatTimestamp(props.timestamp)} />
|
||||
<ColumnDirective field="level" headerText="Level" width="90" />
|
||||
<ColumnDirective field="message" headerText="Nachricht" width="360" />
|
||||
</ColumnsDirective>
|
||||
<Inject services={[Page, Sort]} />
|
||||
</GridComponent>
|
||||
</div>
|
||||
|
||||
<div className="monitoring-panel">
|
||||
<div className="monitoring-panel-header">
|
||||
<h3>Letzte Fehler systemweit</h3>
|
||||
<span>{recentErrors.length} Einträge</span>
|
||||
</div>
|
||||
<GridComponent dataSource={recentErrors} allowPaging={true} pageSettings={{ pageSize: 8 }} allowSorting={true} height={320}>
|
||||
<ColumnsDirective>
|
||||
<ColumnDirective field="timestamp" headerText="Zeit" width="180" template={(props: MonitoringLogEntry) => formatTimestamp(props.timestamp)} />
|
||||
<ColumnDirective field="client_uuid" headerText="Client" width="220" />
|
||||
<ColumnDirective field="message" headerText="Nachricht" width="360" />
|
||||
</ColumnsDirective>
|
||||
<Inject services={[Page, Sort]} />
|
||||
</GridComponent>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<DialogComponent
|
||||
isModal={true}
|
||||
visible={!!selectedLogEntry}
|
||||
width="860px"
|
||||
minHeight="420px"
|
||||
header="Log-Details"
|
||||
animationSettings={{ effect: 'None' }}
|
||||
buttons={[]}
|
||||
showCloseIcon={true}
|
||||
close={() => setSelectedLogEntry(null)}
|
||||
>
|
||||
{selectedLogEntry && (
|
||||
<div className="monitoring-log-dialog-body">
|
||||
<div className="monitoring-log-dialog-content">
|
||||
<div className="monitoring-log-detail-row">
|
||||
<span>Zeit</span>
|
||||
<strong>{formatTimestamp(selectedLogEntry.timestamp)}</strong>
|
||||
</div>
|
||||
<div className="monitoring-log-detail-row">
|
||||
<span>Level</span>
|
||||
<strong>{selectedLogEntry.level || 'Unbekannt'}</strong>
|
||||
</div>
|
||||
<div className="monitoring-log-detail-row">
|
||||
<span>Nachricht</span>
|
||||
<strong style={{ whiteSpace: 'normal', textAlign: 'left' }}>{selectedLogEntry.message}</strong>
|
||||
</div>
|
||||
<div>
|
||||
<div className="monitoring-log-context-title">Kontext</div>
|
||||
<pre className="monitoring-log-context">{renderContext(selectedLogEntry.context)}</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div className="monitoring-log-dialog-actions">
|
||||
<ButtonComponent onClick={() => setSelectedLogEntry(null)}>Schließen</ButtonComponent>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</DialogComponent>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
export default MonitoringDashboard;
|
||||
@@ -19,8 +19,8 @@ services:
|
||||
- DB_CONN=mysql+pymysql://${DB_USER}:${DB_PASSWORD}@db/${DB_NAME}
|
||||
- DB_URL=mysql+pymysql://${DB_USER}:${DB_PASSWORD}@db/${DB_NAME}
|
||||
- API_BASE_URL=http://server:8000
|
||||
- ENV=${ENV:-development}
|
||||
- FLASK_SECRET_KEY=${FLASK_SECRET_KEY:-dev-secret-key-change-in-production}
|
||||
- ENV=${ENV:-development}
|
||||
- FLASK_SECRET_KEY=${FLASK_SECRET_KEY:-dev-secret-key-change-in-production}
|
||||
- DEFAULT_SUPERADMIN_USERNAME=${DEFAULT_SUPERADMIN_USERNAME:-superadmin}
|
||||
- DEFAULT_SUPERADMIN_PASSWORD=${DEFAULT_SUPERADMIN_PASSWORD}
|
||||
# 🔧 ENTFERNT: Volume-Mount ist nur für die Entwicklung
|
||||
|
||||
@@ -3,15 +3,17 @@ import json
|
||||
import logging
|
||||
import datetime
|
||||
import base64
|
||||
import re
|
||||
import requests
|
||||
import paho.mqtt.client as mqtt
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
from models.models import Client, ClientLog, LogLevel, ProcessStatus
|
||||
from models.models import Client, ClientLog, LogLevel, ProcessStatus, ScreenHealthStatus
|
||||
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s [%(levelname)s] %(message)s')
|
||||
|
||||
# Load .env in development
|
||||
if os.getenv("ENV", "development") == "development":
|
||||
# Load .env only when not already configured by Docker (API_BASE_URL not set by compose means we're outside a container)
|
||||
_api_already_set = bool(os.environ.get("API_BASE_URL"))
|
||||
if not _api_already_set and os.getenv("ENV", "development") == "development":
|
||||
try:
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(".env")
|
||||
@@ -31,6 +33,161 @@ Session = sessionmaker(bind=engine)
|
||||
API_BASE_URL = os.getenv("API_BASE_URL", "http://server:8000")
|
||||
|
||||
|
||||
def normalize_process_status(value):
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, ProcessStatus):
|
||||
return value
|
||||
|
||||
normalized = str(value).strip().lower()
|
||||
if not normalized:
|
||||
return None
|
||||
|
||||
try:
|
||||
return ProcessStatus(normalized)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
|
||||
def normalize_event_id(value):
|
||||
if value is None or isinstance(value, bool):
|
||||
return None
|
||||
if isinstance(value, int):
|
||||
return value
|
||||
if isinstance(value, float):
|
||||
return int(value)
|
||||
|
||||
normalized = str(value).strip()
|
||||
if not normalized:
|
||||
return None
|
||||
if normalized.isdigit():
|
||||
return int(normalized)
|
||||
|
||||
match = re.search(r"(\d+)$", normalized)
|
||||
if match:
|
||||
return int(match.group(1))
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def parse_timestamp(value):
|
||||
if not value:
|
||||
return None
|
||||
if isinstance(value, (int, float)):
|
||||
try:
|
||||
ts_value = float(value)
|
||||
if ts_value > 1e12:
|
||||
ts_value = ts_value / 1000.0
|
||||
return datetime.datetime.fromtimestamp(ts_value, datetime.UTC)
|
||||
except (TypeError, ValueError, OverflowError):
|
||||
return None
|
||||
try:
|
||||
value_str = str(value).strip()
|
||||
if value_str.isdigit():
|
||||
ts_value = float(value_str)
|
||||
if ts_value > 1e12:
|
||||
ts_value = ts_value / 1000.0
|
||||
return datetime.datetime.fromtimestamp(ts_value, datetime.UTC)
|
||||
|
||||
parsed = datetime.datetime.fromisoformat(value_str.replace('Z', '+00:00'))
|
||||
if parsed.tzinfo is None:
|
||||
return parsed.replace(tzinfo=datetime.UTC)
|
||||
return parsed.astimezone(datetime.UTC)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
|
||||
def infer_screen_health_status(payload_data):
|
||||
explicit = payload_data.get('screen_health_status')
|
||||
if explicit:
|
||||
try:
|
||||
return ScreenHealthStatus[str(explicit).strip().upper()]
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
metrics = payload_data.get('health_metrics') or {}
|
||||
if metrics.get('screen_on') is False:
|
||||
return ScreenHealthStatus.BLACK
|
||||
|
||||
last_frame_update = parse_timestamp(metrics.get('last_frame_update'))
|
||||
if last_frame_update:
|
||||
age_seconds = (datetime.datetime.now(datetime.UTC) - last_frame_update).total_seconds()
|
||||
if age_seconds > 30:
|
||||
return ScreenHealthStatus.FROZEN
|
||||
return ScreenHealthStatus.OK
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def apply_monitoring_update(client_obj, *, event_id=None, process_name=None, process_pid=None,
|
||||
process_status=None, last_seen=None, screen_health_status=None,
|
||||
last_screenshot_analyzed=None):
|
||||
if last_seen:
|
||||
client_obj.last_alive = last_seen
|
||||
|
||||
normalized_event_id = normalize_event_id(event_id)
|
||||
if normalized_event_id is not None:
|
||||
client_obj.current_event_id = normalized_event_id
|
||||
|
||||
if process_name is not None:
|
||||
client_obj.current_process = process_name
|
||||
|
||||
if process_pid is not None:
|
||||
client_obj.process_pid = process_pid
|
||||
|
||||
normalized_status = normalize_process_status(process_status)
|
||||
if normalized_status is not None:
|
||||
client_obj.process_status = normalized_status
|
||||
|
||||
if screen_health_status is not None:
|
||||
client_obj.screen_health_status = screen_health_status
|
||||
|
||||
if last_screenshot_analyzed is not None:
|
||||
existing = client_obj.last_screenshot_analyzed
|
||||
if existing is not None and existing.tzinfo is None:
|
||||
existing = existing.replace(tzinfo=datetime.UTC)
|
||||
|
||||
candidate = last_screenshot_analyzed
|
||||
if candidate.tzinfo is None:
|
||||
candidate = candidate.replace(tzinfo=datetime.UTC)
|
||||
|
||||
if existing is None or candidate >= existing:
|
||||
client_obj.last_screenshot_analyzed = candidate
|
||||
|
||||
|
||||
def _extract_image_and_timestamp(data):
|
||||
image_value = None
|
||||
timestamp_value = None
|
||||
|
||||
if not isinstance(data, dict):
|
||||
return None, None
|
||||
|
||||
screenshot_obj = data.get("screenshot") if isinstance(data.get("screenshot"), dict) else None
|
||||
metadata_obj = data.get("metadata") if isinstance(data.get("metadata"), dict) else None
|
||||
screenshot_meta_obj = screenshot_obj.get("metadata") if screenshot_obj and isinstance(screenshot_obj.get("metadata"), dict) else None
|
||||
|
||||
for key in ("image", "data"):
|
||||
if isinstance(data.get(key), str) and data.get(key):
|
||||
image_value = data.get(key)
|
||||
break
|
||||
if image_value is None and screenshot_obj is not None:
|
||||
for key in ("image", "data"):
|
||||
if isinstance(screenshot_obj.get(key), str) and screenshot_obj.get(key):
|
||||
image_value = screenshot_obj.get(key)
|
||||
break
|
||||
|
||||
for container in (data, screenshot_obj, metadata_obj, screenshot_meta_obj):
|
||||
if not isinstance(container, dict):
|
||||
continue
|
||||
for key in ("timestamp", "captured_at", "capture_time", "created_at"):
|
||||
value = container.get(key)
|
||||
if value is not None:
|
||||
timestamp_value = value
|
||||
return image_value, timestamp_value
|
||||
|
||||
return image_value, timestamp_value
|
||||
|
||||
|
||||
def handle_screenshot(uuid, payload):
|
||||
"""
|
||||
Handle screenshot data received via MQTT and forward to API.
|
||||
@@ -40,13 +197,16 @@ def handle_screenshot(uuid, payload):
|
||||
# Try to parse as JSON first
|
||||
try:
|
||||
data = json.loads(payload.decode())
|
||||
if "image" in data:
|
||||
image_b64, timestamp_value = _extract_image_and_timestamp(data)
|
||||
if image_b64:
|
||||
# Payload is JSON with base64 image
|
||||
api_payload = {"image": data["image"]}
|
||||
api_payload = {"image": image_b64}
|
||||
if timestamp_value is not None:
|
||||
api_payload["timestamp"] = timestamp_value
|
||||
headers = {"Content-Type": "application/json"}
|
||||
logging.debug(f"Forwarding base64 screenshot from {uuid} to API")
|
||||
else:
|
||||
logging.warning(f"Screenshot JSON from {uuid} missing 'image' field")
|
||||
logging.warning(f"Screenshot JSON from {uuid} missing image/data field")
|
||||
return
|
||||
except (json.JSONDecodeError, UnicodeDecodeError):
|
||||
# Payload is raw binary image data - encode to base64 for API
|
||||
@@ -101,21 +261,28 @@ def on_message(client, userdata, msg):
|
||||
try:
|
||||
payload_text = msg.payload.decode()
|
||||
data = json.loads(payload_text)
|
||||
shot = data.get("screenshot")
|
||||
if isinstance(shot, dict):
|
||||
# Prefer 'data' field (base64) inside screenshot object
|
||||
image_b64 = shot.get("data")
|
||||
if image_b64:
|
||||
logging.debug(f"Dashboard enthält Screenshot für {uuid}; Weiterleitung an API")
|
||||
# Build a lightweight JSON with image field for API handler
|
||||
api_payload = json.dumps({"image": image_b64}).encode("utf-8")
|
||||
handle_screenshot(uuid, api_payload)
|
||||
image_b64, ts_value = _extract_image_and_timestamp(data)
|
||||
if image_b64:
|
||||
logging.debug(f"Dashboard enthält Screenshot für {uuid}; Weiterleitung an API")
|
||||
dashboard_payload = {"image": image_b64}
|
||||
if ts_value is not None:
|
||||
dashboard_payload["timestamp"] = ts_value
|
||||
api_payload = json.dumps(dashboard_payload).encode("utf-8")
|
||||
handle_screenshot(uuid, api_payload)
|
||||
# Update last_alive if status present
|
||||
if data.get("status") == "alive":
|
||||
session = Session()
|
||||
client_obj = session.query(Client).filter_by(uuid=uuid).first()
|
||||
if client_obj:
|
||||
client_obj.last_alive = datetime.datetime.now(datetime.UTC)
|
||||
process_health = data.get('process_health') or {}
|
||||
apply_monitoring_update(
|
||||
client_obj,
|
||||
last_seen=datetime.datetime.now(datetime.UTC),
|
||||
event_id=process_health.get('event_id'),
|
||||
process_name=process_health.get('current_process') or process_health.get('process'),
|
||||
process_pid=process_health.get('process_pid') or process_health.get('pid'),
|
||||
process_status=process_health.get('process_status') or process_health.get('status'),
|
||||
)
|
||||
session.commit()
|
||||
session.close()
|
||||
except Exception as e:
|
||||
@@ -140,24 +307,14 @@ def on_message(client, userdata, msg):
|
||||
session = Session()
|
||||
client_obj = session.query(Client).filter_by(uuid=uuid).first()
|
||||
if client_obj:
|
||||
client_obj.last_alive = datetime.datetime.now(datetime.UTC)
|
||||
|
||||
# Update health fields if present in heartbeat
|
||||
if 'process_status' in payload_data:
|
||||
try:
|
||||
client_obj.process_status = ProcessStatus[payload_data['process_status']]
|
||||
except (KeyError, TypeError):
|
||||
pass
|
||||
|
||||
if 'current_process' in payload_data:
|
||||
client_obj.current_process = payload_data.get('current_process')
|
||||
|
||||
if 'process_pid' in payload_data:
|
||||
client_obj.process_pid = payload_data.get('process_pid')
|
||||
|
||||
if 'current_event_id' in payload_data:
|
||||
client_obj.current_event_id = payload_data.get('current_event_id')
|
||||
|
||||
apply_monitoring_update(
|
||||
client_obj,
|
||||
last_seen=datetime.datetime.now(datetime.UTC),
|
||||
event_id=payload_data.get('current_event_id'),
|
||||
process_name=payload_data.get('current_process'),
|
||||
process_pid=payload_data.get('process_pid'),
|
||||
process_status=payload_data.get('process_status'),
|
||||
)
|
||||
session.commit()
|
||||
logging.info(f"Heartbeat von {uuid} empfangen, last_alive (UTC) aktualisiert.")
|
||||
session.close()
|
||||
@@ -222,23 +379,20 @@ def on_message(client, userdata, msg):
|
||||
if client_obj:
|
||||
# Update expected state
|
||||
expected = payload_data.get('expected_state', {})
|
||||
if 'event_id' in expected:
|
||||
client_obj.current_event_id = expected['event_id']
|
||||
|
||||
|
||||
# Update actual state
|
||||
actual = payload_data.get('actual_state', {})
|
||||
if 'process' in actual:
|
||||
client_obj.current_process = actual['process']
|
||||
|
||||
if 'pid' in actual:
|
||||
client_obj.process_pid = actual['pid']
|
||||
|
||||
if 'status' in actual:
|
||||
try:
|
||||
client_obj.process_status = ProcessStatus[actual['status']]
|
||||
except (KeyError, TypeError):
|
||||
pass
|
||||
|
||||
screen_health_status = infer_screen_health_status(payload_data)
|
||||
apply_monitoring_update(
|
||||
client_obj,
|
||||
last_seen=datetime.datetime.now(datetime.UTC),
|
||||
event_id=expected.get('event_id'),
|
||||
process_name=actual.get('process'),
|
||||
process_pid=actual.get('pid'),
|
||||
process_status=actual.get('status'),
|
||||
screen_health_status=screen_health_status,
|
||||
last_screenshot_analyzed=parse_timestamp((payload_data.get('health_metrics') or {}).get('last_frame_update')),
|
||||
)
|
||||
session.commit()
|
||||
logging.debug(f"Health update from {uuid}: {actual.get('process')} ({actual.get('status')})")
|
||||
session.close()
|
||||
|
||||
@@ -1,14 +1,95 @@
|
||||
from flask import Blueprint, jsonify, request
|
||||
from server.database import Session
|
||||
from server.permissions import admin_or_higher
|
||||
from models.models import ClientLog, Client, LogLevel
|
||||
from server.permissions import admin_or_higher, superadmin_only
|
||||
from models.models import ClientLog, Client, ClientGroup, LogLevel
|
||||
from sqlalchemy import desc, func
|
||||
from datetime import datetime, timedelta, timezone
|
||||
import json
|
||||
import os
|
||||
import glob
|
||||
|
||||
from server.serializers import dict_to_camel_case
|
||||
|
||||
client_logs_bp = Blueprint("client_logs", __name__, url_prefix="/api/client-logs")
|
||||
|
||||
|
||||
def _grace_period_seconds():
|
||||
env = os.environ.get("ENV", "production").lower()
|
||||
if env in ("development", "dev"):
|
||||
return int(os.environ.get("HEARTBEAT_GRACE_PERIOD_DEV", "180"))
|
||||
return int(os.environ.get("HEARTBEAT_GRACE_PERIOD_PROD", "170"))
|
||||
|
||||
|
||||
def _to_utc(dt):
|
||||
if dt is None:
|
||||
return None
|
||||
if dt.tzinfo is None:
|
||||
return dt.replace(tzinfo=timezone.utc)
|
||||
return dt.astimezone(timezone.utc)
|
||||
|
||||
|
||||
def _is_client_alive(last_alive, is_active):
|
||||
if not last_alive or not is_active:
|
||||
return False
|
||||
return (datetime.now(timezone.utc) - _to_utc(last_alive)) <= timedelta(seconds=_grace_period_seconds())
|
||||
|
||||
|
||||
def _safe_context(raw_context):
|
||||
if not raw_context:
|
||||
return {}
|
||||
try:
|
||||
return json.loads(raw_context)
|
||||
except (TypeError, json.JSONDecodeError):
|
||||
return {"raw": raw_context}
|
||||
|
||||
|
||||
def _serialize_log_entry(log, include_client_uuid=False):
|
||||
if not log:
|
||||
return None
|
||||
|
||||
entry = {
|
||||
"id": log.id,
|
||||
"timestamp": log.timestamp.isoformat() if log.timestamp else None,
|
||||
"level": log.level.value if log.level else None,
|
||||
"message": log.message,
|
||||
"context": _safe_context(log.context),
|
||||
}
|
||||
if include_client_uuid:
|
||||
entry["client_uuid"] = log.client_uuid
|
||||
return entry
|
||||
|
||||
|
||||
def _determine_client_status(is_alive, process_status, screen_health_status, log_counts):
|
||||
if not is_alive:
|
||||
return "offline"
|
||||
if process_status == "crashed" or screen_health_status in ("BLACK", "FROZEN"):
|
||||
return "critical"
|
||||
if log_counts.get("ERROR", 0) > 0:
|
||||
return "critical"
|
||||
if process_status in ("starting", "stopped") or log_counts.get("WARN", 0) > 0:
|
||||
return "warning"
|
||||
return "healthy"
|
||||
|
||||
|
||||
def _infer_last_screenshot_ts(client_uuid):
|
||||
screenshots_dir = os.path.join(os.path.dirname(__file__), "..", "screenshots")
|
||||
|
||||
candidate_files = []
|
||||
latest_file = os.path.join(screenshots_dir, f"{client_uuid}.jpg")
|
||||
if os.path.exists(latest_file):
|
||||
candidate_files.append(latest_file)
|
||||
|
||||
candidate_files.extend(glob.glob(os.path.join(screenshots_dir, f"{client_uuid}_*.jpg")))
|
||||
if not candidate_files:
|
||||
return None
|
||||
|
||||
try:
|
||||
newest_path = max(candidate_files, key=os.path.getmtime)
|
||||
return datetime.fromtimestamp(os.path.getmtime(newest_path), timezone.utc)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
@client_logs_bp.route("/test", methods=["GET"])
|
||||
def test_client_logs():
|
||||
"""Test endpoint to verify logging infrastructure (no auth required)"""
|
||||
@@ -107,22 +188,7 @@ def get_client_logs(uuid):
|
||||
# Format results
|
||||
result = []
|
||||
for log in logs:
|
||||
entry = {
|
||||
"id": log.id,
|
||||
"timestamp": log.timestamp.isoformat() if log.timestamp else None,
|
||||
"level": log.level.value if log.level else None,
|
||||
"message": log.message,
|
||||
"context": {}
|
||||
}
|
||||
|
||||
# Parse context JSON
|
||||
if log.context:
|
||||
try:
|
||||
entry["context"] = json.loads(log.context)
|
||||
except json.JSONDecodeError:
|
||||
entry["context"] = {"raw": log.context}
|
||||
|
||||
result.append(entry)
|
||||
result.append(_serialize_log_entry(log))
|
||||
|
||||
session.close()
|
||||
return jsonify({
|
||||
@@ -212,6 +278,141 @@ def get_logs_summary():
|
||||
return jsonify({"error": f"Server error: {str(e)}"}), 500
|
||||
|
||||
|
||||
@client_logs_bp.route("/monitoring-overview", methods=["GET"])
|
||||
@superadmin_only
|
||||
def get_monitoring_overview():
|
||||
"""Return a dashboard-friendly monitoring overview for all clients."""
|
||||
session = Session()
|
||||
try:
|
||||
hours = min(int(request.args.get("hours", 24)), 168)
|
||||
since = datetime.now(timezone.utc) - timedelta(hours=hours)
|
||||
|
||||
clients = (
|
||||
session.query(Client, ClientGroup.name.label("group_name"))
|
||||
.outerjoin(ClientGroup, Client.group_id == ClientGroup.id)
|
||||
.order_by(ClientGroup.name.asc(), Client.description.asc(), Client.hostname.asc(), Client.uuid.asc())
|
||||
.all()
|
||||
)
|
||||
|
||||
log_stats = (
|
||||
session.query(
|
||||
ClientLog.client_uuid,
|
||||
ClientLog.level,
|
||||
func.count(ClientLog.id).label("count"),
|
||||
)
|
||||
.filter(ClientLog.timestamp >= since)
|
||||
.group_by(ClientLog.client_uuid, ClientLog.level)
|
||||
.all()
|
||||
)
|
||||
|
||||
counts_by_client = {}
|
||||
for stat in log_stats:
|
||||
if stat.client_uuid not in counts_by_client:
|
||||
counts_by_client[stat.client_uuid] = {
|
||||
"ERROR": 0,
|
||||
"WARN": 0,
|
||||
"INFO": 0,
|
||||
"DEBUG": 0,
|
||||
}
|
||||
counts_by_client[stat.client_uuid][stat.level.value] = stat.count
|
||||
|
||||
clients_payload = []
|
||||
summary_counts = {
|
||||
"total_clients": 0,
|
||||
"online_clients": 0,
|
||||
"offline_clients": 0,
|
||||
"healthy_clients": 0,
|
||||
"warning_clients": 0,
|
||||
"critical_clients": 0,
|
||||
"error_logs": 0,
|
||||
"warn_logs": 0,
|
||||
}
|
||||
|
||||
for client, group_name in clients:
|
||||
log_counts = counts_by_client.get(
|
||||
client.uuid,
|
||||
{"ERROR": 0, "WARN": 0, "INFO": 0, "DEBUG": 0},
|
||||
)
|
||||
is_alive = _is_client_alive(client.last_alive, client.is_active)
|
||||
process_status = client.process_status.value if client.process_status else None
|
||||
screen_health_status = client.screen_health_status.value if client.screen_health_status else None
|
||||
status = _determine_client_status(is_alive, process_status, screen_health_status, log_counts)
|
||||
|
||||
latest_log = (
|
||||
session.query(ClientLog)
|
||||
.filter_by(client_uuid=client.uuid)
|
||||
.order_by(desc(ClientLog.timestamp))
|
||||
.first()
|
||||
)
|
||||
latest_error = (
|
||||
session.query(ClientLog)
|
||||
.filter_by(client_uuid=client.uuid, level=LogLevel.ERROR)
|
||||
.order_by(desc(ClientLog.timestamp))
|
||||
.first()
|
||||
)
|
||||
|
||||
screenshot_ts = client.last_screenshot_analyzed or _infer_last_screenshot_ts(client.uuid)
|
||||
|
||||
clients_payload.append({
|
||||
"uuid": client.uuid,
|
||||
"hostname": client.hostname,
|
||||
"description": client.description,
|
||||
"ip": client.ip,
|
||||
"model": client.model,
|
||||
"group_id": client.group_id,
|
||||
"group_name": group_name,
|
||||
"registration_time": client.registration_time.isoformat() if client.registration_time else None,
|
||||
"last_alive": client.last_alive.isoformat() if client.last_alive else None,
|
||||
"is_alive": is_alive,
|
||||
"status": status,
|
||||
"current_event_id": client.current_event_id,
|
||||
"current_process": client.current_process,
|
||||
"process_status": process_status,
|
||||
"process_pid": client.process_pid,
|
||||
"screen_health_status": screen_health_status,
|
||||
"last_screenshot_analyzed": screenshot_ts.isoformat() if screenshot_ts else None,
|
||||
"last_screenshot_hash": client.last_screenshot_hash,
|
||||
"screenshot_url": f"/screenshots/{client.uuid}",
|
||||
"log_counts_24h": {
|
||||
"error": log_counts["ERROR"],
|
||||
"warn": log_counts["WARN"],
|
||||
"info": log_counts["INFO"],
|
||||
"debug": log_counts["DEBUG"],
|
||||
},
|
||||
"latest_log": _serialize_log_entry(latest_log),
|
||||
"latest_error": _serialize_log_entry(latest_error),
|
||||
})
|
||||
|
||||
summary_counts["total_clients"] += 1
|
||||
summary_counts["error_logs"] += log_counts["ERROR"]
|
||||
summary_counts["warn_logs"] += log_counts["WARN"]
|
||||
if is_alive:
|
||||
summary_counts["online_clients"] += 1
|
||||
else:
|
||||
summary_counts["offline_clients"] += 1
|
||||
if status == "healthy":
|
||||
summary_counts["healthy_clients"] += 1
|
||||
elif status == "warning":
|
||||
summary_counts["warning_clients"] += 1
|
||||
elif status == "critical":
|
||||
summary_counts["critical_clients"] += 1
|
||||
|
||||
payload = {
|
||||
"summary": summary_counts,
|
||||
"period_hours": hours,
|
||||
"grace_period_seconds": _grace_period_seconds(),
|
||||
"since": since.isoformat(),
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"clients": clients_payload,
|
||||
}
|
||||
session.close()
|
||||
return jsonify(dict_to_camel_case(payload))
|
||||
|
||||
except Exception as e:
|
||||
session.close()
|
||||
return jsonify({"error": f"Server error: {str(e)}"}), 500
|
||||
|
||||
|
||||
@client_logs_bp.route("/recent-errors", methods=["GET"])
|
||||
@admin_or_higher
|
||||
def get_recent_errors():
|
||||
@@ -235,14 +436,7 @@ def get_recent_errors():
|
||||
|
||||
result = []
|
||||
for log in logs:
|
||||
entry = {
|
||||
"id": log.id,
|
||||
"client_uuid": log.client_uuid,
|
||||
"timestamp": log.timestamp.isoformat() if log.timestamp else None,
|
||||
"message": log.message,
|
||||
"context": json.loads(log.context) if log.context else {}
|
||||
}
|
||||
result.append(entry)
|
||||
result.append(_serialize_log_entry(log, include_client_uuid=True))
|
||||
|
||||
session.close()
|
||||
return jsonify({
|
||||
|
||||
@@ -4,6 +4,7 @@ from flask import Blueprint, request, jsonify
|
||||
from server.permissions import admin_or_higher
|
||||
from server.mqtt_helper import publish_client_group, delete_client_group_message, publish_multiple_client_groups
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
sys.path.append('/workspace')
|
||||
|
||||
clients_bp = Blueprint("clients", __name__, url_prefix="/api/clients")
|
||||
@@ -284,21 +285,46 @@ def upload_screenshot(uuid):
|
||||
import os
|
||||
import base64
|
||||
import glob
|
||||
from datetime import datetime
|
||||
|
||||
session = Session()
|
||||
client = session.query(Client).filter_by(uuid=uuid).first()
|
||||
if not client:
|
||||
session.close()
|
||||
return jsonify({"error": "Client nicht gefunden"}), 404
|
||||
session.close()
|
||||
|
||||
try:
|
||||
screenshot_timestamp = None
|
||||
|
||||
# Handle JSON payload with base64-encoded image
|
||||
if request.is_json:
|
||||
data = request.get_json()
|
||||
if "image" not in data:
|
||||
return jsonify({"error": "Missing 'image' field in JSON payload"}), 400
|
||||
|
||||
raw_timestamp = data.get("timestamp")
|
||||
if raw_timestamp is not None:
|
||||
try:
|
||||
if isinstance(raw_timestamp, (int, float)):
|
||||
ts_value = float(raw_timestamp)
|
||||
if ts_value > 1e12:
|
||||
ts_value = ts_value / 1000.0
|
||||
screenshot_timestamp = datetime.fromtimestamp(ts_value, timezone.utc)
|
||||
elif isinstance(raw_timestamp, str):
|
||||
ts = raw_timestamp.strip()
|
||||
if ts:
|
||||
if ts.isdigit():
|
||||
ts_value = float(ts)
|
||||
if ts_value > 1e12:
|
||||
ts_value = ts_value / 1000.0
|
||||
screenshot_timestamp = datetime.fromtimestamp(ts_value, timezone.utc)
|
||||
else:
|
||||
ts_normalized = ts.replace("Z", "+00:00") if ts.endswith("Z") else ts
|
||||
screenshot_timestamp = datetime.fromisoformat(ts_normalized)
|
||||
if screenshot_timestamp.tzinfo is None:
|
||||
screenshot_timestamp = screenshot_timestamp.replace(tzinfo=timezone.utc)
|
||||
else:
|
||||
screenshot_timestamp = screenshot_timestamp.astimezone(timezone.utc)
|
||||
except Exception:
|
||||
screenshot_timestamp = None
|
||||
|
||||
# Decode base64 image
|
||||
image_data = base64.b64decode(data["image"])
|
||||
@@ -314,7 +340,8 @@ def upload_screenshot(uuid):
|
||||
os.makedirs(screenshots_dir, exist_ok=True)
|
||||
|
||||
# Store screenshot with timestamp to track latest
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
now_utc = screenshot_timestamp or datetime.now(timezone.utc)
|
||||
timestamp = now_utc.strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"{uuid}_{timestamp}.jpg"
|
||||
filepath = os.path.join(screenshots_dir, filename)
|
||||
|
||||
@@ -326,6 +353,10 @@ def upload_screenshot(uuid):
|
||||
with open(latest_filepath, "wb") as f:
|
||||
f.write(image_data)
|
||||
|
||||
# Update screenshot receive timestamp for monitoring dashboard
|
||||
client.last_screenshot_analyzed = now_utc
|
||||
session.commit()
|
||||
|
||||
# Cleanup: keep only last 20 timestamped screenshots per client
|
||||
pattern = os.path.join(screenshots_dir, f"{uuid}_*.jpg")
|
||||
existing_screenshots = sorted(glob.glob(pattern))
|
||||
@@ -349,7 +380,10 @@ def upload_screenshot(uuid):
|
||||
}), 200
|
||||
|
||||
except Exception as e:
|
||||
session.rollback()
|
||||
return jsonify({"error": f"Failed to process screenshot: {str(e)}"}), 500
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
|
||||
@clients_bp.route("/<uuid>", methods=["DELETE"])
|
||||
|
||||
@@ -104,6 +104,9 @@ def get_events():
|
||||
"end_time": e.end.isoformat() if e.end else None,
|
||||
"is_all_day": False,
|
||||
"media_id": e.event_media_id,
|
||||
"slideshow_interval": e.slideshow_interval,
|
||||
"page_progress": e.page_progress,
|
||||
"auto_progress": e.auto_progress,
|
||||
"type": e.event_type.value if e.event_type else None,
|
||||
"icon": get_icon_for_type(e.event_type.value if e.event_type else None),
|
||||
# Recurrence metadata
|
||||
@@ -267,6 +270,8 @@ def detach_event_occurrence(event_id, occurrence_date):
|
||||
'event_type': master.event_type,
|
||||
'event_media_id': master.event_media_id,
|
||||
'slideshow_interval': getattr(master, 'slideshow_interval', None),
|
||||
'page_progress': getattr(master, 'page_progress', None),
|
||||
'auto_progress': getattr(master, 'auto_progress', None),
|
||||
'created_by': master.created_by,
|
||||
}
|
||||
|
||||
@@ -318,6 +323,8 @@ def detach_event_occurrence(event_id, occurrence_date):
|
||||
event_type=master_data['event_type'],
|
||||
event_media_id=master_data['event_media_id'],
|
||||
slideshow_interval=master_data['slideshow_interval'],
|
||||
page_progress=data.get("page_progress", master_data['page_progress']),
|
||||
auto_progress=data.get("auto_progress", master_data['auto_progress']),
|
||||
recurrence_rule=None,
|
||||
recurrence_end=None,
|
||||
skip_holidays=False,
|
||||
@@ -361,11 +368,15 @@ def create_event():
|
||||
event_type = data["event_type"]
|
||||
event_media_id = None
|
||||
slideshow_interval = None
|
||||
page_progress = None
|
||||
auto_progress = None
|
||||
|
||||
# Präsentation: event_media_id und slideshow_interval übernehmen
|
||||
if event_type == "presentation":
|
||||
event_media_id = data.get("event_media_id")
|
||||
slideshow_interval = data.get("slideshow_interval")
|
||||
page_progress = data.get("page_progress")
|
||||
auto_progress = data.get("auto_progress")
|
||||
if not event_media_id:
|
||||
return jsonify({"error": "event_media_id required for presentation"}), 400
|
||||
|
||||
@@ -443,6 +454,8 @@ def create_event():
|
||||
is_active=True,
|
||||
event_media_id=event_media_id,
|
||||
slideshow_interval=slideshow_interval,
|
||||
page_progress=page_progress,
|
||||
auto_progress=auto_progress,
|
||||
autoplay=autoplay,
|
||||
loop=loop,
|
||||
volume=volume,
|
||||
@@ -519,6 +532,10 @@ def update_event(event_id):
|
||||
event.event_type = data.get("event_type", event.event_type)
|
||||
event.event_media_id = data.get("event_media_id", event.event_media_id)
|
||||
event.slideshow_interval = data.get("slideshow_interval", event.slideshow_interval)
|
||||
if "page_progress" in data:
|
||||
event.page_progress = data.get("page_progress")
|
||||
if "auto_progress" in data:
|
||||
event.auto_progress = data.get("auto_progress")
|
||||
# Video-specific fields
|
||||
if "autoplay" in data:
|
||||
event.autoplay = data.get("autoplay")
|
||||
|
||||
@@ -69,12 +69,20 @@ def index():
|
||||
|
||||
|
||||
@app.route("/screenshots/<uuid>")
|
||||
@app.route("/screenshots/<uuid>.jpg")
|
||||
def get_screenshot(uuid):
|
||||
pattern = os.path.join("screenshots", f"{uuid}*.jpg")
|
||||
normalized_uuid = uuid[:-4] if uuid.lower().endswith('.jpg') else uuid
|
||||
latest_filename = f"{normalized_uuid}.jpg"
|
||||
latest_path = os.path.join("screenshots", latest_filename)
|
||||
if os.path.exists(latest_path):
|
||||
return send_from_directory("screenshots", latest_filename)
|
||||
|
||||
pattern = os.path.join("screenshots", f"{normalized_uuid}_*.jpg")
|
||||
files = glob.glob(pattern)
|
||||
if not files:
|
||||
# Dummy-Bild als Redirect oder direkt als Response
|
||||
return jsonify({"error": "Screenshot not found", "dummy": "https://placehold.co/400x300?text=No+Screenshot"}), 404
|
||||
files.sort(reverse=True)
|
||||
filename = os.path.basename(files[0])
|
||||
return send_from_directory("screenshots", filename)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user