Files
infoscreen-dev/SCREENSHOT_MQTT_FIX.md
RobbStarkAustria d6090a6179 fix(screenshots): harden event-triggered MQTT screenshot flow and cleanup docs
- fix race where periodic captures could overwrite pending event_start and event_stop metadata before simclient published
- keep latest.jpg and meta.json synchronized so triggered screenshots are not lost
- add stale pending trigger self-healing to recover from old or invalid metadata states
- improve non-interactive capture reliability with DISPLAY and XAUTHORITY fallbacks
- allow periodic idle captures in development mode so dashboard previews stay fresh without active events
- add deeper simclient screenshot diagnostics for trigger and metadata handling
- add regression test script for metadata preservation and trigger delivery
- add root-cause and fix documentation for the screenshot MQTT issue
- align and deduplicate README screenshot and troubleshooting sections; update release notes to March 2026
- fix scripts/start-dev.sh .env loading to ignore comments safely and remove export invalid identifier warnings
2026-03-29 10:38:29 +02:00

95 lines
3.9 KiB
Markdown

# Screenshot MQTT Transmission Issue - Root Cause & Fix
## Issue Summary
Event-triggered screenshots (event_start, event_stop) were being captured by display_manager.py but **NOT being transmitted** via MQTT from simclient.py, resulting in empty or missing data on the dashboard.
## Root Cause: Race Condition in Metadata Handling
### The Problem Timeline
1. **T=06:05:33.516Z** - Event starts (event_115)
- display_manager captures `screenshot_20260329_060533.jpg` (event_start)
- Writes `meta.json` with `"send_immediately": true, "type": "event_start"`
2. **T=06:05:33.517-06:05:47 (up to 14 seconds later)**
- simclient's screenshot_service_thread sleeps 1-2 seconds
- WINDOW: Still hasn't read the event_start meta.json
3. **T=06:05:47.935Z** - Periodic screenshot capture
- display_manager captures `screenshot_20260329_060547.jpg` (periodic)
- **BUG**: Calls `_write_screenshot_meta("periodic", ...)` which **overwrites meta.json**
- NEW meta.json: `"send_immediately": false, "type": "periodic"`
4. **T=06:05:48 (next tick)**
- simclient finally reads meta.json
- Sees: `send_immediately=false, type=periodic`
- Never transmits the event_start screenshot!
Result: Event-triggered screenshot lost, periodic screenshot sent late instead.
## Symptoms Observed
- Display manager logs show event_start/event_stop captures with correct file sizes
- MQTT messages from simclient show no screenshot data or empty arrays
- Dashboard receives only periodic screenshots, missing event transitions
- meta.json only contains periodic metadata, never event-triggered
## The Fix
### Part 1: display_manager.py - Protect Event Metadata
Modified `_write_screenshot_meta()` method to **prevent periodic screenshots from overwriting pending event-triggered metadata**:
```python
# Before writing a periodic screenshot's metadata, check if event-triggered
# metadata is still pending (send_immediately=True)
if not send_immediately and capture_type == "periodic":
if existing_meta.get('send_immediately'):
# Skip writing - preserve the event-triggered metadata
logging.debug(f"Skipping periodic meta to preserve pending {existing_meta['type']}")
return
```
**Result**: Once event_start metadata is written, it stays there until simclient processes it (within 1 second), uninterrupted by periodic captures.
### Part 2: simclient.py - Enhanced Logging
Added diagnostic logging to screenshot_service_thread to show:
- When meta.json is detected and its contents
- When triggered screenshots are being sent
- File information for troubleshooting
**Result**: Better visibility into what's happening with metadata processing.
##Verification
Test script `test-screenshot-meta-fix.sh` confirms:
```
[PROTECTED] Not overwriting pending event_start (send_immediately=True)
Current meta.json preserved: {"type": "event_start", "send_immediately": true, ...}
[SUCCESS] Event-triggered metadata preserved!
```
## How It Works Now
1. display_manager captures event_start, writes meta.json with `send_immediately=true`
2. Next periodic capture: `_write_screenshot_meta()` detects pending flag, **skips updating** meta.json
3. simclient reads meta.json within 1 second, sees `send_immediately=true`
4. Immediately calls `send_screenshot_heartbeat()`, transmits event_start screenshot
5. Clears the `send_immediately` flag
6. On next periodic capture, meta.json is safely updated
## Key Files Modified
- `src/display_manager.py` - Line ~1742: `_write_screenshot_meta()` protection logic
- `src/simclient.py` - Line ~727: Enhanced logging in `screenshot_service_thread()`
## Testing
Run the verification test:
```bash
./test-screenshot-meta-fix.sh
```
Expected output: `[SUCCESS] Event-triggered metadata preserved!`
## Impact
- Event-start and event-end screenshots now properly transmitted to MQTT
- Dashboard now receives complete event lifecycle data
- Clearer logs help diagnose future screenshot transmission issues