feat(mqtt): finalize dashboard screenshot payload v2 and trigger flow

- switch dashboard payload to grouped schema v2.0 in simclient
- support immediate event-triggered screenshot sends via meta.json signaling
- update README and copilot instructions to document v2 payload and trigger behavior
- update migration checklist to reflect completed client/server rollout
This commit is contained in:
RobbStarkAustria
2026-03-30 17:53:58 +02:00
parent 77db2bc565
commit 25cf4e3322
4 changed files with 85 additions and 70 deletions

View File

@@ -12,6 +12,9 @@
-**Keep screenshot consent notice in docs** when describing dashboard screenshot feature -**Keep screenshot consent notice in docs** when describing dashboard screenshot feature
-**Event-start/event-stop screenshots must preserve metadata** - See SCREENSHOT_MQTT_FIX.md for critical race condition that was fixed -**Event-start/event-stop screenshots must preserve metadata** - See SCREENSHOT_MQTT_FIX.md for critical race condition that was fixed
-**Screenshot updates must keep `latest.jpg` and `meta.json` in sync** (simclient prefers `latest.jpg`) -**Screenshot updates must keep `latest.jpg` and `meta.json` in sync** (simclient prefers `latest.jpg`)
-**Dashboard payload uses grouped v2 schema** (`message/content/runtime/metadata`, `schema_version="2.0"`)
-**Event-triggered screenshots**: `display_manager` arms a `threading.Timer` after start/stop, captures, writes `meta.json` with `send_immediately=true`; simclient fires within ≤1s
-**Payload assembly is centralized** in `_build_dashboard_payload()` — do not build dashboard JSON at call sites
### Key Files & Locations ### Key Files & Locations
- **Display logic**: `src/display_manager.py` (controls presentations/video/web) - **Display logic**: `src/display_manager.py` (controls presentations/video/web)
@@ -488,31 +491,49 @@ The screenshot capture and transmission system has been implemented with separat
- **Rotation**: Keeps max N files (default 20), deletes older - **Rotation**: Keeps max N files (default 20), deletes older
- **Timing**: Production captures when display process is active (unless `SCREENSHOT_ALWAYS=1`); development allows periodic idle captures to keep dashboard fresh - **Timing**: Production captures when display process is active (unless `SCREENSHOT_ALWAYS=1`); development allows periodic idle captures to keep dashboard fresh
- **Reliability**: Stale/invalid pending trigger metadata is ignored automatically to avoid lock-up of periodic updates - **Reliability**: Stale/invalid pending trigger metadata is ignored automatically to avoid lock-up of periodic updates
- **Event-triggered captures**: `_trigger_event_screenshot(type, delay)` arms a one-shot `threading.Timer` after event start/stop; timer is cancelled and replaced on rapid event switches; default delays: presentation=4s, video=2s, web=5s (env-configurable)
- **IPC signal file** (`screenshots/meta.json`): written atomically by `display_manager` after each capture; contains `type`, `captured_at`, `file`, `send_immediately`; `send_immediately=true` for event-triggered, `false` for periodic
### Transmission Strategy (simclient.py) ### Transmission Strategy (simclient.py)
- **Source**: Prefers `screenshots/latest.jpg` if present, falls back to newest timestamped file - **Source**: Prefers `screenshots/latest.jpg` if present, falls back to newest timestamped file
- **Topic**: `infoscreen/{client_id}/dashboard` - **Topic**: `infoscreen/{client_id}/dashboard`
- **Format**: JSON with base64-encoded image data - **Format**: JSON with base64-encoded image data, grouped v2 schema
- **Payload Structure**: - **Schema version**: `"2.0"` (legacy flat fields removed; all fields grouped)
- **Payload builder**: `_build_dashboard_payload()` in `simclient.py` — single source of truth
- **Payload Structure** (v2):
```json ```json
{ {
"timestamp": "ISO datetime", "message": { "client_id": "UUID", "status": "alive" },
"client_id": "UUID", "content": {
"status": "alive",
"screenshot": { "screenshot": {
"filename": "latest.jpg", "filename": "latest.jpg",
"data": "base64...", "data": "base64...",
"timestamp": "ISO datetime", "timestamp": "ISO datetime",
"size": 12345 "size": 12345
}
}, },
"system_info": { "runtime": {
"hostname": "...", "system_info": { "hostname": "...", "ip": "...", "uptime": 123456.78 },
"ip": "...", "process_health": { "event_type": "...", "process_status": "...", ... }
"uptime": 123456.78 },
"metadata": {
"schema_version": "2.0",
"producer": "simclient",
"published_at": "ISO datetime",
"capture": {
"type": "periodic | event_start | event_stop",
"captured_at": "ISO datetime",
"age_s": 0.9,
"triggered": false,
"send_immediately": false
},
"transport": { "topic": "infoscreen/.../dashboard", "qos": 0, "publisher": "simclient" }
} }
} }
``` ```
- **Logging**: Logs publish success/failure with file size for monitoring - **Capture types**: `periodic` (interval-based), `event_start` (N seconds after event launch), `event_stop` (1s after process killed)
- **Triggered send**: `display_manager` sets `send_immediately=true` in `meta.json`; simclient 1-second tick detects and fires within ≤1s
- **Logging**: `Dashboard published: schema=2.0 type=<type> screenshot=<file> (<bytes>) age=<s>`
### Scalability Considerations ### Scalability Considerations
- **Client-side resize/compress**: Reduces bandwidth and broker load (recommended for 50+ clients) - **Client-side resize/compress**: Reduces bandwidth and broker load (recommended for 50+ clients)

View File

@@ -4,43 +4,43 @@ Use this checklist to migrate from legacy flat dashboard payload to grouped v2 p
## A. Client Implementation ## A. Client Implementation
- [ ] Create branch for migration work. - [x] Create branch for migration work.
- [ ] Capture one baseline message from MQTT (legacy format). - [x] Capture one baseline message from MQTT (legacy format).
- [ ] Implement one canonical payload builder function. - [x] Implement one canonical payload builder function.
- [ ] Emit grouped blocks in this order: `message`, `content`, `runtime`, `metadata`. - [x] Emit grouped blocks in this order: `message`, `content`, `runtime`, `metadata`.
- [ ] Add `metadata.schema_version = "2.0"`. - [x] Add `metadata.schema_version = "2.0"`.
- [ ] Add `metadata.producer = "simclient"`. - [x] Add `metadata.producer = "simclient"`.
- [ ] Add `metadata.published_at` in UTC ISO format. - [x] Add `metadata.published_at` in UTC ISO format.
- [ ] Map capture type to `metadata.capture.type` (`periodic`, `event_start`, `event_stop`). - [x] Map capture type to `metadata.capture.type` (`periodic`, `event_start`, `event_stop`).
- [ ] Map screenshot freshness to `metadata.capture.age_s`. - [x] Map screenshot freshness to `metadata.capture.age_s`.
- [ ] Keep screenshot object unchanged in semantics (`filename`, `data`, `timestamp`, `size`). - [x] Keep screenshot object unchanged in semantics (`filename`, `data`, `timestamp`, `size`).
- [ ] Keep trigger behavior unchanged (periodic and triggered sends still work). - [x] Keep trigger behavior unchanged (periodic and triggered sends still work).
- [ ] Add publish log fields: schema version, capture type, age. - [x] Add publish log fields: schema version, capture type, age.
- [ ] Validate all 3 paths end-to-end: - [x] Validate all 3 paths end-to-end:
- [ ] periodic - [x] periodic
- [ ] event_start - [x] event_start
- [ ] event_stop - [x] event_stop
## B. Server Migration ## B. Server Migration
- [ ] Add grouped v2 parser (`message/content/runtime/metadata`). - [x] Add grouped v2 parser (`message/content/runtime/metadata`).
- [ ] Add temporary legacy fallback parser. - [x] Add temporary legacy fallback parser.
- [ ] Normalize both parsers into one internal server model. - [x] Normalize both parsers into one internal server model.
- [ ] Mark required fields: - [x] Mark required fields:
- [ ] `message.client_id` - [x] `message.client_id`
- [ ] `message.status` - [x] `message.status`
- [ ] `metadata.schema_version` - [x] `metadata.schema_version`
- [ ] `metadata.capture.type` - [x] `metadata.capture.type`
- [ ] Keep optional fields tolerated (`runtime.process_health`, `content.screenshot`). - [x] Keep optional fields tolerated (`runtime.process_health`, `content.screenshot`).
- [ ] Update dashboard consumers to use normalized model (not raw legacy keys). - [x] Update dashboard consumers to use normalized model (not raw legacy keys).
- [ ] Add migration counters: - [x] Add migration counters:
- [ ] v2 parse success - [x] v2 parse success
- [ ] legacy fallback usage - [x] legacy fallback usage
- [ ] parse failures - [x] parse failures
- [ ] Test compatibility matrix: - [x] Test compatibility matrix:
- [ ] new client -> new server - [x] new client -> new server
- [ ] legacy client -> new server - [x] legacy client -> new server
- [ ] Run short soak in dev. - [x] Run short soak in dev.
## C. Cutover and Cleanup ## C. Cutover and Cleanup

View File

@@ -394,7 +394,7 @@ The MQTT client ([src/simclient.py](src/simclient.py)) downloads presentation fi
#### Client → Server #### Client → Server
- `infoscreen/discovery` - Initial client announcement - `infoscreen/discovery` - Initial client announcement
- `infoscreen/{client_id}/heartbeat` - Regular status updates - `infoscreen/{client_id}/heartbeat` - Regular status updates
- `infoscreen/{client_id}/dashboard` - Screenshot images (base64) - `infoscreen/{client_id}/dashboard` - Dashboard payload v2 (grouped schema: message/content/runtime/metadata, includes screenshot base64, capture type, schema version)
- `infoscreen/{client_id}/health` - Process health state (`event_id`, process, pid, status) - `infoscreen/{client_id}/health` - Process health state (`event_id`, process, pid, status)
- `infoscreen/{client_id}/logs/error` - Forwarded client error logs - `infoscreen/{client_id}/logs/error` - Forwarded client error logs
- `infoscreen/{client_id}/logs/warn` - Forwarded client warning logs - `infoscreen/{client_id}/logs/warn` - Forwarded client warning logs
@@ -587,7 +587,8 @@ stat src/screenshots/latest.jpg
**Verify simclient is reading screenshots:** **Verify simclient is reading screenshots:**
```bash ```bash
tail -f logs/simclient.log | grep -i screenshot tail -f logs/simclient.log | grep -i screenshot
# Should show: "Dashboard heartbeat sent with screenshot: latest.jpg" # Should show: "Dashboard published: schema=2.0 type=periodic screenshot=latest.jpg"
# For event transitions: "Dashboard published: schema=2.0 type=event_start ..."
``` ```
## 📚 Documentation ## 📚 Documentation
@@ -771,3 +772,8 @@ For issues or questions:
- Stale/invalid pending trigger metadata now self-heals instead of blocking periodic updates. - Stale/invalid pending trigger metadata now self-heals instead of blocking periodic updates.
- Display environment fallbacks (`DISPLAY=:0`, `XAUTHORITY`) improved for non-interactive starts. - Display environment fallbacks (`DISPLAY=:0`, `XAUTHORITY`) improved for non-interactive starts.
- Development mode allows periodic idle captures to keep dashboard previews fresh when no event is active. - Development mode allows periodic idle captures to keep dashboard previews fresh when no event is active.
- Event-triggered screenshots added: `display_manager` captures a screenshot shortly after every event start and stop and signals `simclient` via `meta.json` (`send_immediately=true`). Capture delays are content-type aware (presentation: 4s, video: 2s, web: 5s, configurable via `.env`).
- `simclient` screenshot service thread now runs on a 1-second tick instead of a blocking sleep, so triggered sends fire within ≤1s of the `meta.json` signal.
- Dashboard payload migrated to grouped v2 schema (`message`, `content`, `runtime`, `metadata`). Legacy flat fields removed. `metadata.schema_version` is `"2.0"`. Payload assembly centralized in `_build_dashboard_payload()`.
- Tunable trigger delays added: `SCREENSHOT_TRIGGER_DELAY_PRESENTATION`, `SCREENSHOT_TRIGGER_DELAY_VIDEO`, `SCREENSHOT_TRIGGER_DELAY_WEB`.
- Rapid event switches handled safely: pending trigger timer is cancelled and replaced when a new event starts before the delay expires.

View File

@@ -698,19 +698,6 @@ def _build_dashboard_payload(client_id: str, screenshot_info: dict, health: dict
} }
payload = { payload = {
# Legacy fields kept during migration so existing server parsing remains intact.
"timestamp": published_at,
"client_id": client_id,
"status": "alive",
"screenshot_type": capture_type,
"screenshot": screenshot_info,
"screenshot_age_s": screenshot_age_s,
"system_info": {
"hostname": socket.gethostname(),
"ip": get_ip(),
"uptime": time.time() # Could be replaced with actual uptime
},
# New grouped schema (v2-compat)
"message": { "message": {
"client_id": client_id, "client_id": client_id,
"status": "alive", "status": "alive",
@@ -727,7 +714,7 @@ def _build_dashboard_payload(client_id: str, screenshot_info: dict, health: dict
"process_health": process_health_payload, "process_health": process_health_payload,
}, },
"metadata": { "metadata": {
"schema_version": "2.0-compat", "schema_version": "2.0",
"producer": "simclient", "producer": "simclient",
"published_at": published_at, "published_at": published_at,
"capture": capture_meta, "capture": capture_meta,
@@ -739,9 +726,6 @@ def _build_dashboard_payload(client_id: str, screenshot_info: dict, health: dict
}, },
} }
if process_health_payload:
payload["process_health"] = process_health_payload
return payload return payload
@@ -766,10 +750,14 @@ def send_screenshot_heartbeat(client, client_id, capture_type: str = "periodic",
payload = json.dumps(heartbeat_data) payload = json.dumps(heartbeat_data)
res = client.publish(dashboard_topic, payload, qos=0) res = client.publish(dashboard_topic, payload, qos=0)
if res.rc == mqtt.MQTT_ERR_SUCCESS: if res.rc == mqtt.MQTT_ERR_SUCCESS:
age_str = f", age={heartbeat_data['metadata']['capture']['age_s']}s" if heartbeat_data['metadata']['capture']['age_s'] is not None else ""
if screenshot_info: if screenshot_info:
logging.info(f"Dashboard heartbeat sent with screenshot: {screenshot_info['filename']} ({screenshot_info['size']} bytes)") logging.info(
f"Dashboard published: schema=2.0 type={capture_type}"
f" screenshot={screenshot_info['filename']} ({screenshot_info['size']} bytes){age_str}"
)
else: else:
logging.info("Dashboard heartbeat sent (no screenshot available)") logging.info(f"Dashboard published: schema=2.0 type={capture_type} (no screenshot)")
elif res.rc == mqtt.MQTT_ERR_NO_CONN: elif res.rc == mqtt.MQTT_ERR_NO_CONN:
logging.warning("Dashboard heartbeat publish returned NO_CONN; will retry on next interval") logging.warning("Dashboard heartbeat publish returned NO_CONN; will retry on next interval")
else: else: