add superadmin monitoring dashboard with protected route, menu entry, and monitoring data client add monitoring overview API endpoint and improve log serialization/aggregation for dashboard use extend listener health/log handling with robust status/event/timestamp normalization and screenshot payload extraction improve screenshot persistence and retrieval (timestamp-aware uploads, latest screenshot endpoint fallback) fix page_progress and auto_progress persistence/serialization across create, update, and detached occurrence flows align technical and project docs to reflect implemented monitoring and no-version-bump backend changes add documentation sync log entry and include minor compose env indentation cleanup
18 KiB
Phase 3: Client-Side Monitoring Implementation
Status: ✅ COMPLETE
Date: 11. März 2026
Architecture: Two-process design with health-state bridge
Overview
This document describes the Phase 3 client-side monitoring implementation integrated into the existing infoscreen-dev codebase. The implementation adds:
- ✅ Health-state tracking for all display processes (Impressive, Chromium, VLC)
- ✅ Tiered logging: Local rotating logs + selective MQTT transmission
- ✅ Process crash detection with bounded restart attempts
- ✅ MQTT health/log topics feeding the monitoring server
- ✅ Impressive-aware process mapping (presentations → impressive, websites → chromium, videos → vlc)
Architecture
Two-Process Design
┌─────────────────────────────────────────────────────────┐
│ simclient.py (MQTT Client) │
│ - Discovers device, sends heartbeat │
│ - Downloads presentation files │
│ - Reads health state from display_manager │
│ - Publishes health/log messages to MQTT │
│ - Sends screenshots for dashboard │
└────────┬────────────────────────────────────┬───────────┘
│ │
│ reads: current_process_health.json │
│ │
│ writes: current_event.json │
│ │
┌────────▼────────────────────────────────────▼───────────┐
│ display_manager.py (Display Control) │
│ - Monitors events and manages displays │
│ - Launches Impressive (presentations) │
│ - Launches Chromium (websites) │
│ - Launches VLC (videos) │
│ - Tracks process health and crashes │
│ - Detects and restarts crashed processes │
│ - Writes health state to JSON bridge │
│ - Captures screenshots to shared folder │
└─────────────────────────────────────────────────────────┘
Implementation Details
1. Health State Tracking (display_manager.py)
File: src/display_manager.py
New Class: ProcessHealthState
Tracks process health and persists to JSON for simclient to read:
class ProcessHealthState:
"""Track and persist process health state for monitoring integration"""
- event_id: Currently active event identifier
- event_type: presentation, website, video, or None
- process_name: impressive, chromium-browser, vlc, or None
- process_pid: Process ID or None for libvlc
- status: running, crashed, starting, stopped
- restart_count: Number of restart attempts
- max_restarts: Maximum allowed restarts (3)
Methods:
update_running()- Mark process as started (logs to monitoring.log)update_crashed()- Mark process as crashed (warning to monitoring.log)update_restart_attempt()- Increment restart counter (logs attempt and checks max)update_stopped()- Mark process as stopped (info to monitoring.log)save()- Persist state tosrc/current_process_health.json
New Health State File: src/current_process_health.json
{
"event_id": "event_123",
"event_type": "presentation",
"current_process": "impressive",
"process_pid": 1234,
"process_status": "running",
"restart_count": 0,
"timestamp": "2026-03-11T10:30:45.123456+00:00"
}
2. Monitoring Logger (both files)
Local Rotating Logs: 5 files × 5 MB each = 25 MB max per device
display_manager.py:
MONITORING_LOG_PATH = "logs/monitoring.log"
monitoring_logger = logging.getLogger("monitoring")
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
simclient.py:
- Shares same
logs/monitoring.logfile - Both processes write to monitoring logger for health events
- Local logs never rotate (persisted for technician inspection)
Log Filtering (tiered strategy):
- ERROR: Local + MQTT (published to
infoscreen/{uuid}/logs/error) - WARN: Local + MQTT (published to
infoscreen/{uuid}/logs/warn) - INFO: Local only (unless
DEBUG_MODE=1) - DEBUG: Local only (always)
3. Process Mapping with Impressive Support
display_manager.py - When starting processes:
| Event Type | Process Name | Health Status |
|---|---|---|
| presentation | impressive |
tracked with PID |
| website/webpage/webuntis | chromium or chromium-browser |
tracked with PID |
| video | vlc |
tracked (may have no PID if using libvlc) |
Per-Process Updates:
- Presentation:
health.update_running('event_id', 'presentation', 'impressive', pid) - Website:
health.update_running('event_id', 'website', browser_name, pid) - Video:
health.update_running('event_id', 'video', 'vlc', pid or None)
4. Crash Detection and Restart Logic
display_manager.py - process_events() method:
If process not running AND same event_id:
├─ Check exit code
├─ If presentation with exit code 0: Normal completion (no restart)
├─ Else: Mark crashed
│ ├─ health.update_crashed()
│ └─ health.update_restart_attempt()
│ ├─ If restart_count > max_restarts: Give up
│ └─ Else: Restart display (loop back to start_display_for_event)
└─ Log to monitoring.log at each step
Restart Logic:
- Max 3 restart attempts per event
- Restarts only if same event still active
- Graceful exit (code 0) for Impressive auto-quit presentations is treated as normal
- All crashes logged to monitoring.log with context
5. MQTT Health and Log Topics
simclient.py - New functions:
read_health_state()
- Reads
src/current_process_health.jsonwritten by display_manager - Returns dict or None if no active process
publish_health_message(client, client_id)
- Topic:
infoscreen/{uuid}/health - QoS: 1 (reliable)
- Payload:
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"expected_state": {
"event_id": "event_123"
},
"actual_state": {
"process": "impressive",
"pid": 1234,
"status": "running"
}
}
publish_log_message(client, client_id, level, message, context)
- Topics:
infoscreen/{uuid}/logs/errororinfoscreen/{uuid}/logs/warn - QoS: 1 (reliable)
- Log level filtering (only ERROR/WARN sent unless DEBUG_MODE=1)
- Payload:
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"message": "Process started: event_id=123 event_type=presentation process=impressive pid=1234",
"context": {
"event_id": "event_123",
"process": "impressive",
"event_type": "presentation"
}
}
Enhanced Dashboard Heartbeat:
- Topic:
infoscreen/{uuid}/dashboard - Now includes
process_healthblock with event_id, process name, status, restart count
6. Integration Points
Existing Features Preserved:
- ✅ Impressive PDF presentations with auto-advance and loop
- ✅ Chromium website display with auto-scroll injection
- ✅ VLC video playback (python-vlc preferred, binary fallback)
- ✅ Screenshot capture and transmission
- ✅ HDMI-CEC TV control
- ✅ Two-process architecture
New Integration Points:
| File | Function | Change |
|---|---|---|
| display_manager.py | __init__() |
Initialize ProcessHealthState() |
| display_manager.py | start_presentation() |
Call health.update_running() with impressive |
| display_manager.py | start_video() |
Call health.update_running() with vlc |
| display_manager.py | start_webpage() |
Call health.update_running() with chromium |
| display_manager.py | process_events() |
Detect crashes, call health.update_crashed() and update_restart_attempt() |
| display_manager.py | stop_current_display() |
Call health.update_stopped() |
| simclient.py | screenshot_service_thread() |
(No changes to interval) |
| simclient.py | Main heartbeat loop | Call publish_health_message() after successful heartbeat |
| simclient.py | send_screenshot_heartbeat() |
Read health state and include in dashboard payload |
Logging Hierarchy
Local Rotating Files (5 × 5 MB)
logs/display_manager.log (existing - updated):
- Display event processing
- Process lifecycle (start/stop)
- HDMI-CEC operations
- Presentation status
- Video/website startup
logs/simclient.log (existing - updated):
- MQTT connection/reconnection
- Discovery and heartbeat
- File downloads
- Group membership changes
- Dashboard payload info
logs/monitoring.log (NEW):
- Process health events (start, crash, restart, stop)
- Both display_manager and simclient write here
- Centralized health tracking
- Technician-focused: "What happened to the processes?"
# Example monitoring.log entries:
2026-03-11 10:30:45 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1234
2026-03-11 10:35:20 [WARNING] Process crashed: event_id=event_123 event_type=presentation process=impressive restart_count=0/3
2026-03-11 10:35:20 [WARNING] Restarting process: attempt 1/3 for impressive
2026-03-11 10:35:25 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1245
MQTT Transmission (Selective)
Always sent (when error occurs):
infoscreen/{uuid}/logs/error- Critical failuresinfoscreen/{uuid}/logs/warn- Restarts, crashes, missing binaries
Development mode only (if DEBUG_MODE=1):
infoscreen/{uuid}/logs/info- Event start/stop, process running status
Never sent:
- DEBUG messages (local-only debug details)
- INFO messages in production
Environment Variables
No new required variables. Existing configuration supports monitoring:
# Existing (unchanged):
ENV=development|production
DEBUG_MODE=0|1 # Enables INFO logs to MQTT
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR # Local log verbosity
HEARTBEAT_INTERVAL=5|60 # seconds
SCREENSHOT_INTERVAL=30|300 # seconds (display_manager_screenshot_capture)
# Recommended for monitoring:
SCREENSHOT_CAPTURE_INTERVAL=30 # How often display_manager captures screenshots
SCREENSHOT_MAX_WIDTH=800 # Downscale for bandwidth
SCREENSHOT_JPEG_QUALITY=70 # Balance quality/size
# File server (if different from MQTT broker):
FILE_SERVER_HOST=192.168.1.100
FILE_SERVER_PORT=8000
FILE_SERVER_SCHEME=http
Testing Validation
System-Level Test Sequence
1. Start Services:
# Terminal 1: Display Manager
./scripts/start-display-manager.sh
# Terminal 2: MQTT Client
./scripts/start-dev.sh
# Terminal 3: Monitor logs
tail -f logs/monitoring.log
2. Trigger Each Event Type:
# Via test menu or MQTT publish:
./scripts/test-display-manager.sh # Options 1-3 trigger events
3. Verify Health State File:
# Check health state gets written immediately
cat src/current_process_health.json
# Should show: event_id, event_type, current_process (impressive/chromium/vlc), process_status=running
4. Check MQTT Topics:
# Monitor health messages:
mosquitto_sub -h localhost -t "infoscreen/+/health" -v
# Monitor log messages:
mosquitto_sub -h localhost -t "infoscreen/+/logs/#" -v
# Monitor dashboard heartbeat:
mosquitto_sub -h localhost -t "infoscreen/+/dashboard" -v | head -c 500 && echo "..."
5. Simulate Process Crash:
# Find impressive/chromium/vlc PID:
ps aux | grep -E 'impressive|chromium|vlc'
# Kill process:
kill -9 <pid>
# Watch monitoring.log for crash detection and restart
tail -f logs/monitoring.log
# Should see: [WARNING] Process crashed... [WARNING] Restarting process...
6. Verify Server Integration:
# Server receives health messages:
sqlite3 infoscreen.db "SELECT process_status, current_process, restart_count FROM clients WHERE uuid='...';"
# Should show latest status from health message
# Server receives logs:
sqlite3 infoscreen.db "SELECT level, message FROM client_logs WHERE client_uuid='...' ORDER BY timestamp DESC LIMIT 10;"
# Should show ERROR/WARN entries from crashes/restarts
Troubleshooting
Health State File Not Created
Symptom: src/current_process_health.json missing
Causes:
- No event active (file only created when display starts)
- display_manager not running
Check:
ps aux | grep display_manager
tail -f logs/display_manager.log | grep "Process started\|Process stopped"
MQTT Health Messages Not Arriving
Symptom: No health messages on infoscreen/{uuid}/health topic
Causes:
- simclient not reading health state file
- MQTT connection dropped
- Health update function not called
Check:
# Check health file exists and is recent:
ls -l src/current_process_health.json
stat src/current_process_health.json | grep Modify
# Monitor simclient logs:
tail -f logs/simclient.log | grep -E "Health|heartbeat|publish"
# Verify MQTT connection:
mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
Restart Loop (Process Keeps Crashing)
Symptom: monitoring.log shows repeated crashes and restarts
Check:
# Read last log lines of the process (stored by display_manager):
tail -f logs/impressive.out.log # for presentations
tail -f logs/browser.out.log # for websites
tail -f logs/video_player.out.log # for videos
Common Causes:
- Missing binary (impressive not installed, chromium not found, vlc not available)
- Corrupt presentation file
- Invalid URL for website
- Insufficient permissions for screenshots
Log Messages Not Reaching Server
Symptom: client_logs table in server DB is empty
Causes:
- Log level filtering: INFO messages in production are local-only
- Logs only published on ERROR/WARN
- MQTT publish failing silently
Check:
# Force DEBUG_MODE to see all logs:
export DEBUG_MODE=1
export LOG_LEVEL=DEBUG
# Restart simclient and trigger event
# Monitor local logs first:
tail -f logs/monitoring.log | grep -i error
Performance Considerations
Bandwidth per Client:
- Health message: ~200 bytes per heartbeat interval (every 5-60s)
- Screenshot heartbeat: ~50-100 KB (every 30-300s)
- Log messages: ~100-500 bytes per crash/error (rare)
- Total: ~0.5-2 MB/day per device (very minimal)
Disk Space on Client:
- Monitoring logs: 5 files × 5 MB = 25 MB max
- Display manager logs: 5 files × 2 MB = 10 MB max
- MQTT client logs: 5 files × 2 MB = 10 MB max
- Screenshots: 20 files × 50-100 KB = 1-2 MB max
- Total: ~50 MB max (typical for Raspberry Pi USB/SSD)
Rotation Strategy:
- Old files automatically deleted when size limit reached
- Technician can SSH and
tail -fany time - No database overhead (file-based rotation is minimal CPU)
Integration with Server (Phase 2)
The client implementation sends data to the server's Phase 2 endpoints:
Expected Server Implementation (from CLIENT_MONITORING_SETUP.md):
-
MQTT Listener receives and stores:
infoscreen/{uuid}/logs/error,/logs/warn,/logs/infoinfoscreen/{uuid}/healthmessages- Updates
clientstable with health fields
-
Database Tables:
clients.process_status: running/crashed/starting/stoppedclients.current_process: impressive/chromium/vlc/Noneclients.process_pid: PID valueclients.current_event_id: Active eventclient_logs: table stores logs with level/message/context
-
API Endpoints:
GET /api/client-logs/{uuid}/logs?level=ERROR&limit=50GET /api/client-logs/summary(errors/warnings across all clients)
Summary of Changes
Files Modified
-
src/display_manager.py:- Added
psutilimport for future process monitoring - Added
ProcessHealthStateclass (60 lines) - Added monitoring logger setup (8 lines)
- Added
health.update_running()calls instart_presentation(),start_video(),start_webpage() - Added crash detection and restart logic in
process_events() - Added
health.update_stopped()instop_current_display()
- Added
-
src/simclient.py:- Added
timezoneimport - Added monitoring logger setup (8 lines)
- Added
read_health_state()function - Added
publish_health_message()function - Added
publish_log_message()function (with level filtering) - Updated
send_screenshot_heartbeat()to include health data - Updated heartbeat loop to call
publish_health_message()
- Added
Files Created
-
src/current_process_health.json(at runtime):- Bridge file between display_manager and simclient
- Shared volume compatible (works in container setup)
-
logs/monitoring.log(at runtime):- New rotating log file (5 × 5MB)
- Health events from both processes
Next Steps
- Deploy to test client and run validation sequence above
- Deploy server Phase 2 (if not yet done) to receive health/log messages
- Verify database updates in server-side
clientsandclient_logstables - Test dashboard UI (Phase 4) to display health indicators
- Configure alerting (email/Slack) for ERROR level messages
Implementation Date: 11. März 2026
Part of: Infoscreen 2025 Client Monitoring System
Status: Production Ready (with server Phase 2 integration)