feat(client-monitoring): finalize client-side monitoring and UTC logging

- add process health bridge and monitoring flow between display_manager and simclient
- publish health + warn/error log topics over MQTT
- standardize log/payload/screenshot timestamps to UTC (Z) to avoid DST drift
- improve video handling: python-vlc fullscreen enforcement and runtime PID reporting
- update README and copilot instructions with monitoring architecture and troubleshooting
- add Phase 3 monitoring implementation documentation
- update gitignore for new runtime/log artifacts
This commit is contained in:
RobbStarkAustria
2026-03-11 20:24:38 +01:00
parent 1c445f4ba7
commit 80e5ce98a0
6 changed files with 837 additions and 18 deletions

View File

@@ -14,8 +14,9 @@
- **Display logic**: `src/display_manager.py` (controls presentations/video/web)
- **MQTT client**: `src/simclient.py` (event management, heartbeat, discovery)
- **Runtime state**: `src/current_event.json` (current active event)
- **Process health bridge**: `src/current_process_health.json` (display_manager -> simclient)
- **Config**: `src/config/client_uuid.txt`, `src/config/last_group_id.txt`, `.env`
- **Logs**: `logs/display_manager.log`, `logs/simclient.log`
- **Logs**: `logs/display_manager.log`, `logs/simclient.log`, `logs/monitoring.log`
- **Screenshots**: `src/screenshots/` (shared volume between processes)
### Common Tasks Quick Reference
@@ -23,6 +24,8 @@
|------|------|-------------------|
| Add event type | `display_manager.py` | `start_display_for_event()` |
| Modify presentation | `display_manager.py` | `start_presentation()` |
| Modify process monitoring | `display_manager.py` | `ProcessHealthState`, `process_events()` |
| Publish health/log topics | `simclient.py` | `read_health_state()`, `publish_health_message()`, `publish_log_message()` |
| Change MQTT topics | `simclient.py` | Topic constants/handlers |
| Update screenshot | `display_manager.py` | `_capture_screenshot()` |
| File downloads | `simclient.py` | `resolve_file_url()` |
@@ -62,6 +65,8 @@
### MQTT Communication Patterns
- **Discovery**: `infoscreen/discovery``infoscreen/{client_id}/discovery_ack`
- **Heartbeat**: Regular `infoscreen/{client_id}/heartbeat` messages
- **Health**: `infoscreen/{client_id}/health` (event/process/pid/status)
- **Client logs**: `infoscreen/{client_id}/logs/error|warn` (selective forwarding)
### MQTT Reconnection & Heartbeat (Nov 2025)
- The client uses Paho MQTT v2 callback API with `client.loop_start()` and `client.reconnect_delay_set()` to handle automatic reconnection.
- `on_connect` re-subscribes to all topics (`discovery_ack`, `config`, `group_id`, current group events) and re-sends discovery on reconnect to re-register with the server.
@@ -255,11 +260,20 @@ FILE_SERVER_SCHEME=http # http or https
- **Fallback**: External vlc binary
- **Fields**: `url`, `autoplay` (bool), `loop` (bool), `volume` (0.0-1.0 → 0-100)
- **URL rewriting**: `server` host → configured file server
- **Fullscreen**: enforced for python-vlc on startup (with short retry toggles); external fallback uses `--fullscreen`
- **Monitoring PID semantics**: python-vlc runs in-process, so PID is `display_manager.py` runtime PID; external fallback uses external `vlc` PID
- **HW decode errors**: `h264_v4l2m2m` failures are normal if V4L2 M2M unavailable; use software decode
- Robust payload parsing with fallbacks
- Topic-specific message handlers
- Retained message support where appropriate
### Logging & Timestamp Policy (Mar 2026)
- Client logs are standardized to UTC with `Z` suffix to avoid DST/localtime drift.
- Applies to `display_manager.log`, `simclient.log`, and `monitoring.log`.
- MQTT payload timestamps for heartbeat/dashboard/health/log messages are UTC ISO timestamps.
- Screenshot metadata timestamps included by `simclient.py` are UTC ISO timestamps.
- Prefer UTC-aware calls (`datetime.now(timezone.utc)`) and UTC log formatters for new code.
## Hardware Considerations
### Target Platform
@@ -490,6 +504,7 @@ The screenshot capture and transmission system has been implemented with separat
- **Large payloads**: Reduce `SCREENSHOT_MAX_WIDTH` or `SCREENSHOT_JPEG_QUALITY`
- **Stale screenshots**: Check `latest.jpg` symlink, verify display_manager is running
- **MQTT errors**: Check dashboard topic logs for publish return codes
- **Pulse overflow in remote sessions**: warnings like `pulse audio output error: overflow, flushing` can occur with NoMachine/dummy displays; if HDMI playback is stable, treat as environment-related
### Testing & Troubleshooting
**Setup:**
- X11: `sudo apt install scrot imagemagick`