fix(screenshots): harden event-triggered MQTT screenshot flow and cleanup docs

- fix race where periodic captures could overwrite pending event_start and event_stop metadata before simclient published
- keep latest.jpg and meta.json synchronized so triggered screenshots are not lost
- add stale pending trigger self-healing to recover from old or invalid metadata states
- improve non-interactive capture reliability with DISPLAY and XAUTHORITY fallbacks
- allow periodic idle captures in development mode so dashboard previews stay fresh without active events
- add deeper simclient screenshot diagnostics for trigger and metadata handling
- add regression test script for metadata preservation and trigger delivery
- add root-cause and fix documentation for the screenshot MQTT issue
- align and deduplicate README screenshot and troubleshooting sections; update release notes to March 2026
- fix scripts/start-dev.sh .env loading to ignore comments safely and remove export invalid identifier warnings
This commit is contained in:
RobbStarkAustria
2026-03-29 10:38:29 +02:00
parent cda126018f
commit d6090a6179
7 changed files with 556 additions and 244 deletions

View File

@@ -10,6 +10,8 @@
-**Client-side resize/compress** screenshots before MQTT transmission
-**Server renders PPTX → PDF via Gotenberg** (client only displays PDFs, no LibreOffice needed)
-**Keep screenshot consent notice in docs** when describing dashboard screenshot feature
-**Event-start/event-stop screenshots must preserve metadata** - See SCREENSHOT_MQTT_FIX.md for critical race condition that was fixed
-**Screenshot updates must keep `latest.jpg` and `meta.json` in sync** (simclient prefers `latest.jpg`)
### Key Files & Locations
- **Display logic**: `src/display_manager.py` (controls presentations/video/web)
@@ -408,6 +410,25 @@ When working on this codebase:
- `Lade Datei herunter von: http://<broker-ip>:8000/...`
- Followed by `"GET /... HTTP/1.1" 200` and `Datei erfolgreich heruntergeladen:`
### Screenshot MQTT Transmission Issue (Event-Start/Event-Stop)
- **Symptom**: Event-triggered screenshots (event_start, event_stop) are NOT appearing on the dashboard, only periodic screenshots transmitted
- **Root Cause**: Race condition in metadata/file-pointer handling where periodic captures can overwrite event-triggered metadata or `latest.jpg` before simclient processes it (See SCREENSHOT_MQTT_FIX.md for details)
- **What to check**:
- Display manager logs show event_start/event_stop screenshots ARE being captured: `Screenshot captured: ... type=event_start`
- But `meta.json` is stale or `latest.jpg` does not move
- MQTT heartbeats lack screenshot data at event transitions
- **How to verify the fix**:
- Run: `./test-screenshot-meta-fix.sh` should output `[SUCCESS] Event-triggered metadata preserved!`
- Check display_manager.py: `_write_screenshot_meta()` has protection logic to skip periodic overwrites of event metadata
- Check display_manager.py: periodic `latest.jpg` updates are also protected when triggered metadata is pending
- Check simclient.py: `screenshot_service_thread()` logs show pending event-triggered captures being processed immediately
- **Permanent Fix**: Already applied in display_manager.py and simclient.py. Prevents periodic captures from overwriting pending trigger state and includes stale-trigger self-healing.
### Screenshot Capture After Restart (No Active Event)
- In `ENV=development`, display_manager performs periodic idle captures so dashboard does not appear dead during no-event windows.
- In `ENV=production`, periodic captures remain event/process-driven unless `SCREENSHOT_ALWAYS=1`.
- If display_manager is started from non-interactive shells (systemd/nohup/ssh), it now attempts `DISPLAY=:0` and `XAUTHORITY=~/.Xauthority` fallback for X11 capture tools.
## Important Notes for AI Assistants
### Virtual Environment Requirements (Critical)
@@ -465,7 +486,8 @@ The screenshot capture and transmission system has been implemented with separat
- **Processing**: Downscales to max width (default 800px), JPEG compresses (default quality 70)
- **Output**: Creates timestamped files (`screenshot_YYYYMMDD_HHMMSS.jpg`) plus `latest.jpg` symlink
- **Rotation**: Keeps max N files (default 20), deletes older
- **Timing**: Only captures when display process is active (unless `SCREENSHOT_ALWAYS=1`)
- **Timing**: Production captures when display process is active (unless `SCREENSHOT_ALWAYS=1`); development allows periodic idle captures to keep dashboard fresh
- **Reliability**: Stale/invalid pending trigger metadata is ignored automatically to avoid lock-up of periodic updates
### Transmission Strategy (simclient.py)
- **Source**: Prefers `screenshots/latest.jpg` if present, falls back to newest timestamped file
@@ -510,6 +532,7 @@ The screenshot capture and transmission system has been implemented with separat
- **Stale screenshots**: Check `latest.jpg` symlink, verify display_manager is running
- **MQTT errors**: Check dashboard topic logs for publish return codes
- **Pulse overflow in remote sessions**: warnings like `pulse audio output error: overflow, flushing` can occur with NoMachine/dummy displays; if HDMI playback is stable, treat as environment-related
- **After restarts**: Ensure both processes are restarted (`simclient.py` and `display_manager.py`) so metadata consumption and capture behavior use the same code version
### Testing & Troubleshooting
**Setup:**
- X11: `sudo apt install scrot imagemagick`