fix(screenshots): harden event-triggered MQTT screenshot flow and cleanup docs

- fix race where periodic captures could overwrite pending event_start and event_stop metadata before simclient published
- keep latest.jpg and meta.json synchronized so triggered screenshots are not lost
- add stale pending trigger self-healing to recover from old or invalid metadata states
- improve non-interactive capture reliability with DISPLAY and XAUTHORITY fallbacks
- allow periodic idle captures in development mode so dashboard previews stay fresh without active events
- add deeper simclient screenshot diagnostics for trigger and metadata handling
- add regression test script for metadata preservation and trigger delivery
- add root-cause and fix documentation for the screenshot MQTT issue
- align and deduplicate README screenshot and troubleshooting sections; update release notes to March 2026
- fix scripts/start-dev.sh .env loading to ignore comments safely and remove export invalid identifier warnings
This commit is contained in:
RobbStarkAustria
2026-03-29 10:38:29 +02:00
parent cda126018f
commit d6090a6179
7 changed files with 556 additions and 244 deletions

255
README.md
View File

@@ -299,6 +299,17 @@ Interactive menu for testing:
**Loop mode (infinite):**
```bash
./scripts/test-impressive-loop.sh
```
### Test MQTT Connectivity
```bash
./scripts/test-mqtt.sh
```
Verifies MQTT broker connectivity and topic access.
### Test Screenshot Capture
```bash
@@ -315,17 +326,6 @@ python3 src/display_manager.py &
sleep 15
ls -lh src/screenshots/
```
```
Verifies MQTT broker connectivity and topics.
### Test Screenshot Capture
```bash
./scripts/test-screenshot.sh
```
Captures test screenshot for dashboard monitoring.
## 🔧 Configuration Details
@@ -353,7 +353,7 @@ All configuration is done via `.env` file in the project root. Copy `.env.templa
#### Screenshot Configuration
- `SCREENSHOT_ALWAYS` - Force screenshot capture even when no display is active
- `0` - Only capture when presentation/video/web is active (recommended for production)
- `0` - In production: capture only when a display process is active; in development: periodic idle captures are allowed so dashboard stays fresh
- `1` - Always capture screenshots (useful for testing)
#### File/API Server Configuration
@@ -511,6 +511,13 @@ This is the fastest workaround if hardware decode is not required or not availab
./scripts/test-mqtt.sh
```
### MQTT reconnect and heartbeat behavior
- On reconnect, the client re-subscribes all topics in `on_connect` and re-sends discovery to re-register.
- Heartbeats are sent only when connected. During brief reconnect windows, Paho may return rc=4 (`NO_CONN`).
- A single rc=4 warning after broker restarts or short network stalls is expected; the next heartbeat usually succeeds.
- Investigate only if rc=4 repeats across multiple intervals without subsequent successful heartbeat logs.
### Monitoring and UTC timestamps
Client-side monitoring is implemented with a health-state bridge between `display_manager.py` and `simclient.py`.
@@ -541,8 +548,11 @@ Warnings such as `pulse audio output error: overflow, flushing` can appear when
```bash
echo $WAYLAND_DISPLAY # Set if Wayland
echo $DISPLAY # Set if X11
echo $XAUTHORITY # Should point to ~/.Xauthority for X11 captures
```
If `DISPLAY` is empty for non-interactive starts (systemd/nohup/ssh), the display manager now falls back to `:0` and tries `~/.Xauthority` automatically.
**Install appropriate screenshot tool:**
```bash
# For X11:
@@ -565,31 +575,25 @@ tail -f logs/display_manager.log | grep -i screenshot
# Should show: "Screenshot session=wayland" or "Screenshot session=x11"
```
**If you see stale dashboard images after restarts:**
```bash
cat src/screenshots/meta.json
stat src/screenshots/latest.jpg
```
- If `send_immediately` is stuck `true` for old metadata, restart both processes so simclient consumes and clears it.
- If `latest.jpg` timestamp does not move while new `screenshot_*.jpg` files appear, update to latest code (fix for periodic `latest.jpg` update path) and restart display_manager.
**Verify simclient is reading screenshots:**
```bash
tail -f logs/simclient.log | grep -i screenshot
# Should show: "Dashboard heartbeat sent with screenshot: latest.jpg"
```ll topic subscriptions are restored in `on_connect` and a discovery message is re-sent on reconnect to re-register the client.
- Heartbeats are sent only when connected; if publish occurs during a brief reconnect window, Paho may return rc=4 (NO_CONN). The client performs a short retry and logs the outcome.
- Occasional `Heartbeat publish failed with code: 4` after broker restart or transient network hiccups is expected and not dangerous. It indicates "not connected at this instant"; the next heartbeat typically succeeds.
- When to investigate: repeated rc=4 with no succeeding "Heartbeat sent" entries over multiple intervals.
### Screenshots not uploading
**Test screenshot capture:**
```bash
./scripts/test-screenshot.sh
ls -l screenshots/
```
**Check DISPLAY variable:**
```bash
echo $DISPLAY # Should be :0
```
## 📚 Documentation
- **IMPRESSIVE_INTEGRATION.md** - Detailed presentation system documentation
- **HDMI_CEC_SETUP.md** - HDMI-CEC setup and troubleshooting
- **src/DISPLAY_MANAGER.md** - Display Manager architecture
- **src/IMPLEMENTATION_SUMMARY.md** - Implementation overview
- **src/README.md** - MQTT client documentation
@@ -720,176 +724,11 @@ CEC_POWER_OFF_WAIT=2
### Testing
```bash
## 📸 Screenshot System
The system includes automatic screenshot capture for dashboard monitoring with support for both X11 and Wayland display servers.
### Consent Notice (Required)
By enabling dashboard screenshots, operators confirm they are authorized to capture and transmit the displayed content.
- Screenshots are sent over MQTT and can include personal data, sensitive documents, or classroom/office information shown on screen.
- Obtain required user/owner consent before enabling screenshot monitoring in production.
- Apply local policy and legal requirements (for example GDPR/DSGVO) for retention, access control, and disclosure.
- This system captures image frames only; it does not record microphone audio.
### Architecture
**Two-process design:**
1. **display_manager.py** - Captures screenshots on host OS (has access to display)
2. **simclient.py** - Transmits screenshots via MQTT (runs in container)
3. **Shared directory** - `src/screenshots/` volume-mounted between processes
### Screenshot Capture (display_manager.py)
## Recent changes (Nov 2025)
The following notable changes were added after the previous release and are included in this branch:
### Screenshot System Implementation
- **Screenshot capture** added to `display_manager.py` with background thread
- **Session detection**: Automatic Wayland vs X11 detection with appropriate tool selection
- **Wayland support**: `grim`, `gnome-screenshot`, `spectacle` (in order)
- **X11 support**: `scrot`, `import` (ImageMagick), `xwd`+`convert` (in order)
- **File management**: Timestamped screenshots plus `latest.jpg` symlink, automatic rotation
- **Transmission**: Enhanced `simclient.py` to prefer `latest.jpg`, added detailed logging
- **Dashboard topic**: Structured JSON payload with screenshot, system info, and client status
- **Configuration**: New environment variables `SCREENSHOT_CAPTURE_INTERVAL`, `SCREENSHOT_INTERVAL`, `SCREENSHOT_ALWAYS`
- **Testing mode**: `SCREENSHOT_ALWAYS=1` forces capture even without active display
### Previous Changes (Oct 2025)
- **Wayland**: `grim``gnome-screenshot``spectacle`
- **X11**: `scrot``import` (ImageMagick)`xwd`+`convert`
**Processing pipeline:**
1. Capture full-resolution screenshot to PNG
2. Downscale and compress to JPEG (hardcoded settings in display_manager.py)
3. Save timestamped file: `screenshot_YYYYMMDD_HHMMSS.jpg`
4. Create/update `latest.jpg` symlink for easy access
5. Rotate old screenshots (automatic cleanup)
**Capture timing:**
- Only captures when a display process is active (presentation/video/web)
- Can be forced with `SCREENSHOT_ALWAYS=1` for testing
- Interval configured via `SCREENSHOT_CAPTURE_INTERVAL` (default: 180 seconds)
### Screenshot Transmission (simclient.py)
**Source selection:**
- Prefers `latest.jpg` symlink (fastest, most recent)
- Falls back to newest timestamped file if symlink missing
**MQTT topic:**
```
infoscreen/{client_id}/dashboard
```
**Payload structure:**
```json
{
"timestamp": "2025-11-30T14:23:45.123456",
"client_id": "abc123-def456-789",
"status": "alive",
"screenshot": {
"filename": "latest.jpg",
"data": "<base64-encoded-image>",
"timestamp": "2025-11-30T14:23:40.000000",
"size": 45678
},
"system_info": {
"hostname": "infoscreen-pi-01",
"ip": "192.168.1.50",
"uptime": 1732977825.123456
}
}
```
### Configuration
Configuration is done via environment variables in `.env` file. See the "Environment Variables" section above for complete documentation.
Key settings:
- `SCREENSHOT_CAPTURE_INTERVAL` - How often display_manager.py captures screenshots (default: 180 seconds)
- `SCREENSHOT_INTERVAL` - How often simclient.py transmits screenshots via MQTT (default: 180 seconds)
- `SCREENSHOT_ALWAYS` - Force capture even when no display is active (useful for testing, default: 0)
### Scalability Recommendations
**Small deployments (<10 clients):**
- Default settings work well
- `SCREENSHOT_CAPTURE_INTERVAL=30-60`, `SCREENSHOT_INTERVAL=60`
**Medium deployments (10-50 clients):**
- Reduce capture frequency: `SCREENSHOT_CAPTURE_INTERVAL=60-120`
- Reduce transmission frequency: `SCREENSHOT_INTERVAL=120-180`
- Ensure broker has adequate bandwidth
**Large deployments (50+ clients):**
- Further reduce frequency: `SCREENSHOT_CAPTURE_INTERVAL=180`, `SCREENSHOT_INTERVAL=180-300`
- Monitor MQTT broker load and consider retained message limits
- Consider staggering screenshot intervals across clients
**Very large deployments (200+ clients):**
- Consider HTTP storage + MQTT metadata pattern instead of base64-over-MQTT
- Implement screenshot upload to file server, publish only URL via MQTT
- Implement hash-based deduplication to skip identical screenshots
**Note:** Screenshot image processing (resize, compression quality) is currently hardcoded in [src/display_manager.py](src/display_manager.py). Future versions may expose these as environment variables.
### Troubleshooting
**No screenshots being captured:**
```bash
# Check session type
echo "Wayland: $WAYLAND_DISPLAY" # Set if Wayland
echo "X11: $DISPLAY" # Set if X11
# Check logs for tool detection
tail -f logs/display_manager.log | grep screenshot
# Install appropriate tools
sudo apt install scrot imagemagick # X11
sudo apt install grim # Wayland
```
**Screenshots too large:**
```bash
# Reduce quality and size
SCREENSHOT_MAX_WIDTH=640
SCREENSHOT_JPEG_QUALITY=50
```
**Not transmitting over MQTT:**
```bash
# Check simclient logs
tail -f logs/simclient.log | grep -i dashboard
# Should see:
# "Dashboard heartbeat sent with screenshot: latest.jpg (45678 bytes)"
# If NO_CONN errors, check MQTT broker connectivity
```
---
**Last Updated:** November 2025
**Status:** ✅ Production Ready
**Tested On:** Raspberry Pi 5, Raspberry Pi OS (Bookworm)
## Recent changes (Nov 2025)
echo "on 0" | cec-client -s -d 1 # Turn on
echo "standby 0" | cec-client -s -d 1 # Turn off
echo "pow 0" | cec-client -s -d 1 # Check status
```
### Documentation
See [HDMI_CEC_SETUP.md](HDMI_CEC_SETUP.md) for complete documentation including:
- Detailed setup instructions
- Troubleshooting guide
- TV compatibility information
- Advanced configuration options
## 🤝 Contributing
1. Test changes with `./scripts/test-display-manager.sh`
@@ -911,24 +750,24 @@ For issues or questions:
---
**Last Updated:** October 2025
**Last Updated:** March 2026
**Status:** ✅ Production Ready
**Tested On:** Raspberry Pi 5, Raspberry Pi OS (Bookworm)
## Recent changes (Oct 2025)
## Recent Changes
The following notable changes were added after the previous release and are included in this branch:
### November 2025
- Event handling: support for scheduler-provided `event_type` values (new types: `presentation`, `webuntis`, `webpage`, `website`). The display manager now prefers `event_type` when selecting which renderer to start.
- Web display: Chromium is launched in kiosk mode for web events. `website` events (scheduler) and legacy `web` keys are both supported and normalized.
- Auto-scroll feature: automatic scrolling for long websites implemented. Two mechanisms are available:
- CDP injection: The display manager attempts to inject a small auto-scroll script via Chrome DevTools Protocol (DevTools websocket) when possible (uses `websocket-client` and `requests`). Default injection duration: 60s.
- Extension fallback: When DevTools websocket handshakes are blocked (403), a tiny local Chrome extension (`src/chrome_autoscroll`) is loaded via `--load-extension` to run a content script that performs the auto-scroll reliably.
- Autoscroll enabled only for scheduler events with `event_type: "website"` (not for general `web` or `webpage`). The extension and CDP injection are only used when autoscroll is requested for that event type.
- New test utilities:
- `scripts/test_cdp.py` — quick DevTools JSON listing + Runtime.evaluate tester
- `scripts/test_cdp_origins.py` — tries several Origin headers to diagnose 403 handshakes
- Dependencies: `src/requirements.txt` updated to include `websocket-client` (used by the CDP injector).
- Small refactors and improved logging in `src/display_manager.py` to make event dispatch and browser injection more robust.
- Screenshot pipeline implemented with a two-process model (`display_manager.py` capture, `simclient.py` transmission).
- Wayland/X11 screenshot tool fallback chains added.
- Dashboard payload format extended with screenshot and system metadata.
- Scheduler event type support extended (`presentation`, `webuntis`, `webpage`, `website`).
- Website autoscroll support added (CDP injection + extension fallback).
If you rely on autoscroll in production, review the security considerations around `--remote-debugging-port` (DevTools) and prefer the extension fallback if your Chromium build enforces strict websocket Origin policies.
### March 2026
- Event-trigger screenshots (`event_start`, `event_stop`) hardened against periodic overwrite races.
- `latest.jpg` and `meta.json` synchronization improved for reliable dashboard updates.
- Stale/invalid pending trigger metadata now self-heals instead of blocking periodic updates.
- Display environment fallbacks (`DISPLAY=:0`, `XAUTHORITY`) improved for non-interactive starts.
- Development mode allows periodic idle captures to keep dashboard previews fresh when no event is active.