fix(screenshots): harden event-triggered MQTT screenshot flow and cleanup docs
- fix race where periodic captures could overwrite pending event_start and event_stop metadata before simclient published - keep latest.jpg and meta.json synchronized so triggered screenshots are not lost - add stale pending trigger self-healing to recover from old or invalid metadata states - improve non-interactive capture reliability with DISPLAY and XAUTHORITY fallbacks - allow periodic idle captures in development mode so dashboard previews stay fresh without active events - add deeper simclient screenshot diagnostics for trigger and metadata handling - add regression test script for metadata preservation and trigger delivery - add root-cause and fix documentation for the screenshot MQTT issue - align and deduplicate README screenshot and troubleshooting sections; update release notes to March 2026 - fix scripts/start-dev.sh .env loading to ignore comments safely and remove export invalid identifier warnings
This commit is contained in:
94
SCREENSHOT_MQTT_FIX.md
Normal file
94
SCREENSHOT_MQTT_FIX.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Screenshot MQTT Transmission Issue - Root Cause & Fix
|
||||
|
||||
## Issue Summary
|
||||
Event-triggered screenshots (event_start, event_stop) were being captured by display_manager.py but **NOT being transmitted** via MQTT from simclient.py, resulting in empty or missing data on the dashboard.
|
||||
|
||||
## Root Cause: Race Condition in Metadata Handling
|
||||
|
||||
### The Problem Timeline
|
||||
1. **T=06:05:33.516Z** - Event starts (event_115)
|
||||
- display_manager captures `screenshot_20260329_060533.jpg` (event_start)
|
||||
- Writes `meta.json` with `"send_immediately": true, "type": "event_start"`
|
||||
|
||||
2. **T=06:05:33.517-06:05:47 (up to 14 seconds later)**
|
||||
- simclient's screenshot_service_thread sleeps 1-2 seconds
|
||||
- WINDOW: Still hasn't read the event_start meta.json
|
||||
|
||||
3. **T=06:05:47.935Z** - Periodic screenshot capture
|
||||
- display_manager captures `screenshot_20260329_060547.jpg` (periodic)
|
||||
- **BUG**: Calls `_write_screenshot_meta("periodic", ...)` which **overwrites meta.json**
|
||||
- NEW meta.json: `"send_immediately": false, "type": "periodic"`
|
||||
|
||||
4. **T=06:05:48 (next tick)**
|
||||
- simclient finally reads meta.json
|
||||
- Sees: `send_immediately=false, type=periodic`
|
||||
- Never transmits the event_start screenshot!
|
||||
|
||||
Result: Event-triggered screenshot lost, periodic screenshot sent late instead.
|
||||
|
||||
## Symptoms Observed
|
||||
- Display manager logs show event_start/event_stop captures with correct file sizes
|
||||
- MQTT messages from simclient show no screenshot data or empty arrays
|
||||
- Dashboard receives only periodic screenshots, missing event transitions
|
||||
- meta.json only contains periodic metadata, never event-triggered
|
||||
|
||||
## The Fix
|
||||
|
||||
### Part 1: display_manager.py - Protect Event Metadata
|
||||
Modified `_write_screenshot_meta()` method to **prevent periodic screenshots from overwriting pending event-triggered metadata**:
|
||||
|
||||
```python
|
||||
# Before writing a periodic screenshot's metadata, check if event-triggered
|
||||
# metadata is still pending (send_immediately=True)
|
||||
if not send_immediately and capture_type == "periodic":
|
||||
if existing_meta.get('send_immediately'):
|
||||
# Skip writing - preserve the event-triggered metadata
|
||||
logging.debug(f"Skipping periodic meta to preserve pending {existing_meta['type']}")
|
||||
return
|
||||
```
|
||||
|
||||
**Result**: Once event_start metadata is written, it stays there until simclient processes it (within 1 second), uninterrupted by periodic captures.
|
||||
|
||||
### Part 2: simclient.py - Enhanced Logging
|
||||
Added diagnostic logging to screenshot_service_thread to show:
|
||||
- When meta.json is detected and its contents
|
||||
- When triggered screenshots are being sent
|
||||
- File information for troubleshooting
|
||||
|
||||
**Result**: Better visibility into what's happening with metadata processing.
|
||||
|
||||
##Verification
|
||||
|
||||
Test script `test-screenshot-meta-fix.sh` confirms:
|
||||
```
|
||||
[PROTECTED] Not overwriting pending event_start (send_immediately=True)
|
||||
Current meta.json preserved: {"type": "event_start", "send_immediately": true, ...}
|
||||
[SUCCESS] Event-triggered metadata preserved!
|
||||
```
|
||||
|
||||
## How It Works Now
|
||||
|
||||
1. display_manager captures event_start, writes meta.json with `send_immediately=true`
|
||||
2. Next periodic capture: `_write_screenshot_meta()` detects pending flag, **skips updating** meta.json
|
||||
3. simclient reads meta.json within 1 second, sees `send_immediately=true`
|
||||
4. Immediately calls `send_screenshot_heartbeat()`, transmits event_start screenshot
|
||||
5. Clears the `send_immediately` flag
|
||||
6. On next periodic capture, meta.json is safely updated
|
||||
|
||||
## Key Files Modified
|
||||
- `src/display_manager.py` - Line ~1742: `_write_screenshot_meta()` protection logic
|
||||
- `src/simclient.py` - Line ~727: Enhanced logging in `screenshot_service_thread()`
|
||||
|
||||
## Testing
|
||||
Run the verification test:
|
||||
```bash
|
||||
./test-screenshot-meta-fix.sh
|
||||
```
|
||||
|
||||
Expected output: `[SUCCESS] Event-triggered metadata preserved!`
|
||||
|
||||
## Impact
|
||||
- Event-start and event-end screenshots now properly transmitted to MQTT
|
||||
- Dashboard now receives complete event lifecycle data
|
||||
- Clearer logs help diagnose future screenshot transmission issues
|
||||
|
||||
Reference in New Issue
Block a user