diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 5903197..019fb82 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -10,6 +10,8 @@ - ✅ **Client-side resize/compress** screenshots before MQTT transmission - ✅ **Server renders PPTX → PDF via Gotenberg** (client only displays PDFs, no LibreOffice needed) - ✅ **Keep screenshot consent notice in docs** when describing dashboard screenshot feature +- ✅ **Event-start/event-stop screenshots must preserve metadata** - See SCREENSHOT_MQTT_FIX.md for critical race condition that was fixed +- ✅ **Screenshot updates must keep `latest.jpg` and `meta.json` in sync** (simclient prefers `latest.jpg`) ### Key Files & Locations - **Display logic**: `src/display_manager.py` (controls presentations/video/web) @@ -408,6 +410,25 @@ When working on this codebase: - `Lade Datei herunter von: http://:8000/...` - Followed by `"GET /... HTTP/1.1" 200` and `Datei erfolgreich heruntergeladen:` +### Screenshot MQTT Transmission Issue (Event-Start/Event-Stop) +- **Symptom**: Event-triggered screenshots (event_start, event_stop) are NOT appearing on the dashboard, only periodic screenshots transmitted +- **Root Cause**: Race condition in metadata/file-pointer handling where periodic captures can overwrite event-triggered metadata or `latest.jpg` before simclient processes it (See SCREENSHOT_MQTT_FIX.md for details) +- **What to check**: + - Display manager logs show event_start/event_stop screenshots ARE being captured: `Screenshot captured: ... type=event_start` + - But `meta.json` is stale or `latest.jpg` does not move + - MQTT heartbeats lack screenshot data at event transitions +- **How to verify the fix**: + - Run: `./test-screenshot-meta-fix.sh` should output `[SUCCESS] Event-triggered metadata preserved!` + - Check display_manager.py: `_write_screenshot_meta()` has protection logic to skip periodic overwrites of event metadata + - Check display_manager.py: periodic `latest.jpg` updates are also protected when triggered metadata is pending + - Check simclient.py: `screenshot_service_thread()` logs show pending event-triggered captures being processed immediately +- **Permanent Fix**: Already applied in display_manager.py and simclient.py. Prevents periodic captures from overwriting pending trigger state and includes stale-trigger self-healing. + +### Screenshot Capture After Restart (No Active Event) +- In `ENV=development`, display_manager performs periodic idle captures so dashboard does not appear dead during no-event windows. +- In `ENV=production`, periodic captures remain event/process-driven unless `SCREENSHOT_ALWAYS=1`. +- If display_manager is started from non-interactive shells (systemd/nohup/ssh), it now attempts `DISPLAY=:0` and `XAUTHORITY=~/.Xauthority` fallback for X11 capture tools. + ## Important Notes for AI Assistants ### Virtual Environment Requirements (Critical) @@ -465,7 +486,8 @@ The screenshot capture and transmission system has been implemented with separat - **Processing**: Downscales to max width (default 800px), JPEG compresses (default quality 70) - **Output**: Creates timestamped files (`screenshot_YYYYMMDD_HHMMSS.jpg`) plus `latest.jpg` symlink - **Rotation**: Keeps max N files (default 20), deletes older -- **Timing**: Only captures when display process is active (unless `SCREENSHOT_ALWAYS=1`) +- **Timing**: Production captures when display process is active (unless `SCREENSHOT_ALWAYS=1`); development allows periodic idle captures to keep dashboard fresh +- **Reliability**: Stale/invalid pending trigger metadata is ignored automatically to avoid lock-up of periodic updates ### Transmission Strategy (simclient.py) - **Source**: Prefers `screenshots/latest.jpg` if present, falls back to newest timestamped file @@ -510,6 +532,7 @@ The screenshot capture and transmission system has been implemented with separat - **Stale screenshots**: Check `latest.jpg` symlink, verify display_manager is running - **MQTT errors**: Check dashboard topic logs for publish return codes - **Pulse overflow in remote sessions**: warnings like `pulse audio output error: overflow, flushing` can occur with NoMachine/dummy displays; if HDMI playback is stable, treat as environment-related +- **After restarts**: Ensure both processes are restarted (`simclient.py` and `display_manager.py`) so metadata consumption and capture behavior use the same code version ### Testing & Troubleshooting **Setup:** - X11: `sudo apt install scrot imagemagick` diff --git a/README.md b/README.md index 409e32e..ca7c01a 100644 --- a/README.md +++ b/README.md @@ -299,6 +299,17 @@ Interactive menu for testing: **Loop mode (infinite):** ```bash +./scripts/test-impressive-loop.sh +``` + +### Test MQTT Connectivity + +```bash +./scripts/test-mqtt.sh +``` + +Verifies MQTT broker connectivity and topic access. + ### Test Screenshot Capture ```bash @@ -315,17 +326,6 @@ python3 src/display_manager.py & sleep 15 ls -lh src/screenshots/ ``` -``` - -Verifies MQTT broker connectivity and topics. - -### Test Screenshot Capture - -```bash -./scripts/test-screenshot.sh -``` - -Captures test screenshot for dashboard monitoring. ## 🔧 Configuration Details @@ -353,7 +353,7 @@ All configuration is done via `.env` file in the project root. Copy `.env.templa #### Screenshot Configuration - `SCREENSHOT_ALWAYS` - Force screenshot capture even when no display is active - - `0` - Only capture when presentation/video/web is active (recommended for production) + - `0` - In production: capture only when a display process is active; in development: periodic idle captures are allowed so dashboard stays fresh - `1` - Always capture screenshots (useful for testing) #### File/API Server Configuration @@ -511,6 +511,13 @@ This is the fastest workaround if hardware decode is not required or not availab ./scripts/test-mqtt.sh ``` +### MQTT reconnect and heartbeat behavior + +- On reconnect, the client re-subscribes all topics in `on_connect` and re-sends discovery to re-register. +- Heartbeats are sent only when connected. During brief reconnect windows, Paho may return rc=4 (`NO_CONN`). +- A single rc=4 warning after broker restarts or short network stalls is expected; the next heartbeat usually succeeds. +- Investigate only if rc=4 repeats across multiple intervals without subsequent successful heartbeat logs. + ### Monitoring and UTC timestamps Client-side monitoring is implemented with a health-state bridge between `display_manager.py` and `simclient.py`. @@ -541,8 +548,11 @@ Warnings such as `pulse audio output error: overflow, flushing` can appear when ```bash echo $WAYLAND_DISPLAY # Set if Wayland echo $DISPLAY # Set if X11 +echo $XAUTHORITY # Should point to ~/.Xauthority for X11 captures ``` +If `DISPLAY` is empty for non-interactive starts (systemd/nohup/ssh), the display manager now falls back to `:0` and tries `~/.Xauthority` automatically. + **Install appropriate screenshot tool:** ```bash # For X11: @@ -565,31 +575,25 @@ tail -f logs/display_manager.log | grep -i screenshot # Should show: "Screenshot session=wayland" or "Screenshot session=x11" ``` +**If you see stale dashboard images after restarts:** +```bash +cat src/screenshots/meta.json +stat src/screenshots/latest.jpg +``` + +- If `send_immediately` is stuck `true` for old metadata, restart both processes so simclient consumes and clears it. +- If `latest.jpg` timestamp does not move while new `screenshot_*.jpg` files appear, update to latest code (fix for periodic `latest.jpg` update path) and restart display_manager. + **Verify simclient is reading screenshots:** ```bash tail -f logs/simclient.log | grep -i screenshot # Should show: "Dashboard heartbeat sent with screenshot: latest.jpg" -```ll topic subscriptions are restored in `on_connect` and a discovery message is re-sent on reconnect to re-register the client. -- Heartbeats are sent only when connected; if publish occurs during a brief reconnect window, Paho may return rc=4 (NO_CONN). The client performs a short retry and logs the outcome. -- Occasional `Heartbeat publish failed with code: 4` after broker restart or transient network hiccups is expected and not dangerous. It indicates "not connected at this instant"; the next heartbeat typically succeeds. -- When to investigate: repeated rc=4 with no succeeding "Heartbeat sent" entries over multiple intervals. - -### Screenshots not uploading - -**Test screenshot capture:** -```bash -./scripts/test-screenshot.sh -ls -l screenshots/ -``` - -**Check DISPLAY variable:** -```bash -echo $DISPLAY # Should be :0 ``` ## 📚 Documentation - **IMPRESSIVE_INTEGRATION.md** - Detailed presentation system documentation +- **HDMI_CEC_SETUP.md** - HDMI-CEC setup and troubleshooting - **src/DISPLAY_MANAGER.md** - Display Manager architecture - **src/IMPLEMENTATION_SUMMARY.md** - Implementation overview - **src/README.md** - MQTT client documentation @@ -720,176 +724,11 @@ CEC_POWER_OFF_WAIT=2 ### Testing ```bash -## 📸 Screenshot System - -The system includes automatic screenshot capture for dashboard monitoring with support for both X11 and Wayland display servers. - -### Consent Notice (Required) - -By enabling dashboard screenshots, operators confirm they are authorized to capture and transmit the displayed content. - -- Screenshots are sent over MQTT and can include personal data, sensitive documents, or classroom/office information shown on screen. -- Obtain required user/owner consent before enabling screenshot monitoring in production. -- Apply local policy and legal requirements (for example GDPR/DSGVO) for retention, access control, and disclosure. -- This system captures image frames only; it does not record microphone audio. - -### Architecture - -**Two-process design:** -1. **display_manager.py** - Captures screenshots on host OS (has access to display) -2. **simclient.py** - Transmits screenshots via MQTT (runs in container) -3. **Shared directory** - `src/screenshots/` volume-mounted between processes - -### Screenshot Capture (display_manager.py) - -## Recent changes (Nov 2025) - -The following notable changes were added after the previous release and are included in this branch: - -### Screenshot System Implementation -- **Screenshot capture** added to `display_manager.py` with background thread -- **Session detection**: Automatic Wayland vs X11 detection with appropriate tool selection -- **Wayland support**: `grim`, `gnome-screenshot`, `spectacle` (in order) -- **X11 support**: `scrot`, `import` (ImageMagick), `xwd`+`convert` (in order) -- **File management**: Timestamped screenshots plus `latest.jpg` symlink, automatic rotation -- **Transmission**: Enhanced `simclient.py` to prefer `latest.jpg`, added detailed logging -- **Dashboard topic**: Structured JSON payload with screenshot, system info, and client status -- **Configuration**: New environment variables `SCREENSHOT_CAPTURE_INTERVAL`, `SCREENSHOT_INTERVAL`, `SCREENSHOT_ALWAYS` -- **Testing mode**: `SCREENSHOT_ALWAYS=1` forces capture even without active display - -### Previous Changes (Oct 2025) - - **Wayland**: `grim` → `gnome-screenshot` → `spectacle` - - **X11**: `scrot` → `import` (ImageMagick) → `xwd`+`convert` - -**Processing pipeline:** -1. Capture full-resolution screenshot to PNG -2. Downscale and compress to JPEG (hardcoded settings in display_manager.py) -3. Save timestamped file: `screenshot_YYYYMMDD_HHMMSS.jpg` -4. Create/update `latest.jpg` symlink for easy access -5. Rotate old screenshots (automatic cleanup) - -**Capture timing:** -- Only captures when a display process is active (presentation/video/web) -- Can be forced with `SCREENSHOT_ALWAYS=1` for testing -- Interval configured via `SCREENSHOT_CAPTURE_INTERVAL` (default: 180 seconds) - -### Screenshot Transmission (simclient.py) - -**Source selection:** -- Prefers `latest.jpg` symlink (fastest, most recent) -- Falls back to newest timestamped file if symlink missing - -**MQTT topic:** -``` -infoscreen/{client_id}/dashboard -``` - -**Payload structure:** -```json -{ - "timestamp": "2025-11-30T14:23:45.123456", - "client_id": "abc123-def456-789", - "status": "alive", - "screenshot": { - "filename": "latest.jpg", - "data": "", - "timestamp": "2025-11-30T14:23:40.000000", - "size": 45678 - }, - "system_info": { - "hostname": "infoscreen-pi-01", - "ip": "192.168.1.50", - "uptime": 1732977825.123456 - } -} -``` - -### Configuration - -Configuration is done via environment variables in `.env` file. See the "Environment Variables" section above for complete documentation. - -Key settings: -- `SCREENSHOT_CAPTURE_INTERVAL` - How often display_manager.py captures screenshots (default: 180 seconds) -- `SCREENSHOT_INTERVAL` - How often simclient.py transmits screenshots via MQTT (default: 180 seconds) -- `SCREENSHOT_ALWAYS` - Force capture even when no display is active (useful for testing, default: 0) - -### Scalability Recommendations - -**Small deployments (<10 clients):** -- Default settings work well -- `SCREENSHOT_CAPTURE_INTERVAL=30-60`, `SCREENSHOT_INTERVAL=60` - -**Medium deployments (10-50 clients):** -- Reduce capture frequency: `SCREENSHOT_CAPTURE_INTERVAL=60-120` -- Reduce transmission frequency: `SCREENSHOT_INTERVAL=120-180` -- Ensure broker has adequate bandwidth - -**Large deployments (50+ clients):** -- Further reduce frequency: `SCREENSHOT_CAPTURE_INTERVAL=180`, `SCREENSHOT_INTERVAL=180-300` -- Monitor MQTT broker load and consider retained message limits -- Consider staggering screenshot intervals across clients - -**Very large deployments (200+ clients):** -- Consider HTTP storage + MQTT metadata pattern instead of base64-over-MQTT -- Implement screenshot upload to file server, publish only URL via MQTT -- Implement hash-based deduplication to skip identical screenshots - -**Note:** Screenshot image processing (resize, compression quality) is currently hardcoded in [src/display_manager.py](src/display_manager.py). Future versions may expose these as environment variables. - -### Troubleshooting - -**No screenshots being captured:** -```bash -# Check session type -echo "Wayland: $WAYLAND_DISPLAY" # Set if Wayland -echo "X11: $DISPLAY" # Set if X11 - -# Check logs for tool detection -tail -f logs/display_manager.log | grep screenshot - -# Install appropriate tools -sudo apt install scrot imagemagick # X11 -sudo apt install grim # Wayland -``` - -**Screenshots too large:** -```bash -# Reduce quality and size -SCREENSHOT_MAX_WIDTH=640 -SCREENSHOT_JPEG_QUALITY=50 -``` - -**Not transmitting over MQTT:** -```bash -# Check simclient logs -tail -f logs/simclient.log | grep -i dashboard - -# Should see: -# "Dashboard heartbeat sent with screenshot: latest.jpg (45678 bytes)" - -# If NO_CONN errors, check MQTT broker connectivity -``` - ---- - -**Last Updated:** November 2025 -**Status:** ✅ Production Ready -**Tested On:** Raspberry Pi 5, Raspberry Pi OS (Bookworm) - -## Recent changes (Nov 2025) echo "on 0" | cec-client -s -d 1 # Turn on echo "standby 0" | cec-client -s -d 1 # Turn off echo "pow 0" | cec-client -s -d 1 # Check status ``` -### Documentation - -See [HDMI_CEC_SETUP.md](HDMI_CEC_SETUP.md) for complete documentation including: -- Detailed setup instructions -- Troubleshooting guide -- TV compatibility information -- Advanced configuration options - ## 🤝 Contributing 1. Test changes with `./scripts/test-display-manager.sh` @@ -911,24 +750,24 @@ For issues or questions: --- -**Last Updated:** October 2025 +**Last Updated:** March 2026 **Status:** ✅ Production Ready **Tested On:** Raspberry Pi 5, Raspberry Pi OS (Bookworm) -## Recent changes (Oct 2025) +## Recent Changes -The following notable changes were added after the previous release and are included in this branch: +### November 2025 -- Event handling: support for scheduler-provided `event_type` values (new types: `presentation`, `webuntis`, `webpage`, `website`). The display manager now prefers `event_type` when selecting which renderer to start. -- Web display: Chromium is launched in kiosk mode for web events. `website` events (scheduler) and legacy `web` keys are both supported and normalized. -- Auto-scroll feature: automatic scrolling for long websites implemented. Two mechanisms are available: - - CDP injection: The display manager attempts to inject a small auto-scroll script via Chrome DevTools Protocol (DevTools websocket) when possible (uses `websocket-client` and `requests`). Default injection duration: 60s. - - Extension fallback: When DevTools websocket handshakes are blocked (403), a tiny local Chrome extension (`src/chrome_autoscroll`) is loaded via `--load-extension` to run a content script that performs the auto-scroll reliably. -- Autoscroll enabled only for scheduler events with `event_type: "website"` (not for general `web` or `webpage`). The extension and CDP injection are only used when autoscroll is requested for that event type. -- New test utilities: - - `scripts/test_cdp.py` — quick DevTools JSON listing + Runtime.evaluate tester - - `scripts/test_cdp_origins.py` — tries several Origin headers to diagnose 403 handshakes -- Dependencies: `src/requirements.txt` updated to include `websocket-client` (used by the CDP injector). -- Small refactors and improved logging in `src/display_manager.py` to make event dispatch and browser injection more robust. +- Screenshot pipeline implemented with a two-process model (`display_manager.py` capture, `simclient.py` transmission). +- Wayland/X11 screenshot tool fallback chains added. +- Dashboard payload format extended with screenshot and system metadata. +- Scheduler event type support extended (`presentation`, `webuntis`, `webpage`, `website`). +- Website autoscroll support added (CDP injection + extension fallback). -If you rely on autoscroll in production, review the security considerations around `--remote-debugging-port` (DevTools) and prefer the extension fallback if your Chromium build enforces strict websocket Origin policies. +### March 2026 + +- Event-trigger screenshots (`event_start`, `event_stop`) hardened against periodic overwrite races. +- `latest.jpg` and `meta.json` synchronization improved for reliable dashboard updates. +- Stale/invalid pending trigger metadata now self-heals instead of blocking periodic updates. +- Display environment fallbacks (`DISPLAY=:0`, `XAUTHORITY`) improved for non-interactive starts. +- Development mode allows periodic idle captures to keep dashboard previews fresh when no event is active. diff --git a/SCREENSHOT_MQTT_FIX.md b/SCREENSHOT_MQTT_FIX.md new file mode 100644 index 0000000..80473c0 --- /dev/null +++ b/SCREENSHOT_MQTT_FIX.md @@ -0,0 +1,94 @@ +# Screenshot MQTT Transmission Issue - Root Cause & Fix + +## Issue Summary +Event-triggered screenshots (event_start, event_stop) were being captured by display_manager.py but **NOT being transmitted** via MQTT from simclient.py, resulting in empty or missing data on the dashboard. + +## Root Cause: Race Condition in Metadata Handling + +### The Problem Timeline +1. **T=06:05:33.516Z** - Event starts (event_115) + - display_manager captures `screenshot_20260329_060533.jpg` (event_start) + - Writes `meta.json` with `"send_immediately": true, "type": "event_start"` + +2. **T=06:05:33.517-06:05:47 (up to 14 seconds later)** + - simclient's screenshot_service_thread sleeps 1-2 seconds + - WINDOW: Still hasn't read the event_start meta.json + +3. **T=06:05:47.935Z** - Periodic screenshot capture + - display_manager captures `screenshot_20260329_060547.jpg` (periodic) + - **BUG**: Calls `_write_screenshot_meta("periodic", ...)` which **overwrites meta.json** + - NEW meta.json: `"send_immediately": false, "type": "periodic"` + +4. **T=06:05:48 (next tick)** + - simclient finally reads meta.json + - Sees: `send_immediately=false, type=periodic` + - Never transmits the event_start screenshot! + +Result: Event-triggered screenshot lost, periodic screenshot sent late instead. + +## Symptoms Observed +- Display manager logs show event_start/event_stop captures with correct file sizes +- MQTT messages from simclient show no screenshot data or empty arrays +- Dashboard receives only periodic screenshots, missing event transitions +- meta.json only contains periodic metadata, never event-triggered + +## The Fix + +### Part 1: display_manager.py - Protect Event Metadata +Modified `_write_screenshot_meta()` method to **prevent periodic screenshots from overwriting pending event-triggered metadata**: + +```python +# Before writing a periodic screenshot's metadata, check if event-triggered +# metadata is still pending (send_immediately=True) +if not send_immediately and capture_type == "periodic": + if existing_meta.get('send_immediately'): + # Skip writing - preserve the event-triggered metadata + logging.debug(f"Skipping periodic meta to preserve pending {existing_meta['type']}") + return +``` + +**Result**: Once event_start metadata is written, it stays there until simclient processes it (within 1 second), uninterrupted by periodic captures. + +### Part 2: simclient.py - Enhanced Logging +Added diagnostic logging to screenshot_service_thread to show: +- When meta.json is detected and its contents +- When triggered screenshots are being sent +- File information for troubleshooting + +**Result**: Better visibility into what's happening with metadata processing. + +##Verification + +Test script `test-screenshot-meta-fix.sh` confirms: +``` +[PROTECTED] Not overwriting pending event_start (send_immediately=True) +Current meta.json preserved: {"type": "event_start", "send_immediately": true, ...} +[SUCCESS] Event-triggered metadata preserved! +``` + +## How It Works Now + +1. display_manager captures event_start, writes meta.json with `send_immediately=true` +2. Next periodic capture: `_write_screenshot_meta()` detects pending flag, **skips updating** meta.json +3. simclient reads meta.json within 1 second, sees `send_immediately=true` +4. Immediately calls `send_screenshot_heartbeat()`, transmits event_start screenshot +5. Clears the `send_immediately` flag +6. On next periodic capture, meta.json is safely updated + +## Key Files Modified +- `src/display_manager.py` - Line ~1742: `_write_screenshot_meta()` protection logic +- `src/simclient.py` - Line ~727: Enhanced logging in `screenshot_service_thread()` + +## Testing +Run the verification test: +```bash +./test-screenshot-meta-fix.sh +``` + +Expected output: `[SUCCESS] Event-triggered metadata preserved!` + +## Impact +- Event-start and event-end screenshots now properly transmitted to MQTT +- Dashboard now receives complete event lifecycle data +- Clearer logs help diagnose future screenshot transmission issues + diff --git a/scripts/start-dev.sh b/scripts/start-dev.sh index ec41cff..defbc82 100755 --- a/scripts/start-dev.sh +++ b/scripts/start-dev.sh @@ -1,5 +1,14 @@ #!/bin/bash +set -e + cd "$(dirname "$0")/.." source venv/bin/activate -export $(cat .env | xargs) + +# Load .env in a shell-safe way (supports comments and quoted values). +if [ -f .env ]; then + set -a + source .env + set +a +fi + python3 src/simclient.py diff --git a/src/display_manager.py b/src/display_manager.py index abc1192..f7e9f87 100644 --- a/src/display_manager.py +++ b/src/display_manager.py @@ -39,6 +39,19 @@ for env_path in env_paths: load_dotenv(env_path) break +# Best-effort display env bootstrap for non-interactive starts (nohup/systemd/ssh). +# If both Wayland and X11 vars are missing, default to X11 :0 which is the +# common kiosk display on Raspberry Pi deployments. +if not os.environ.get("WAYLAND_DISPLAY") and not os.environ.get("DISPLAY"): + os.environ["DISPLAY"] = os.getenv("DISPLAY", ":0") + +# X11 capture tools may also require XAUTHORITY when started outside a desktop +# session shell; default to ~/.Xauthority when available. +if os.environ.get("DISPLAY") and not os.environ.get("XAUTHORITY"): + xauth_default = os.path.join(os.path.expanduser("~"), ".Xauthority") + if os.path.exists(xauth_default): + os.environ["XAUTHORITY"] = xauth_default + # Configuration ENV = os.getenv("ENV", "development") LOG_LEVEL = os.getenv("LOG_LEVEL", "DEBUG" if ENV == "development" else "INFO") @@ -48,6 +61,10 @@ SCREENSHOT_MAX_WIDTH = int(os.getenv("SCREENSHOT_MAX_WIDTH", "800")) # Width to SCREENSHOT_JPEG_QUALITY = int(os.getenv("SCREENSHOT_JPEG_QUALITY", "70")) # JPEG quality 1-95 SCREENSHOT_MAX_FILES = int(os.getenv("SCREENSHOT_MAX_FILES", "20")) # Rotate old screenshots SCREENSHOT_ALWAYS = os.getenv("SCREENSHOT_ALWAYS", "0").lower() in ("1","true","yes") +# Delay (seconds) before triggered screenshot fires after event start/stop +SCREENSHOT_TRIGGER_DELAY_PRESENTATION = int(os.getenv("SCREENSHOT_TRIGGER_DELAY_PRESENTATION", "4")) +SCREENSHOT_TRIGGER_DELAY_VIDEO = int(os.getenv("SCREENSHOT_TRIGGER_DELAY_VIDEO", "2")) +SCREENSHOT_TRIGGER_DELAY_WEB = int(os.getenv("SCREENSHOT_TRIGGER_DELAY_WEB", "5")) CHECK_INTERVAL = int(os.getenv("DISPLAY_CHECK_INTERVAL", "5")) # seconds PRESENTATION_DIR = os.path.join(os.path.dirname(__file__), "presentation") EVENT_FILE = os.path.join(os.path.dirname(__file__), "current_event.json") @@ -590,6 +607,9 @@ class DisplayManager: self._screenshot_thread = threading.Thread(target=self._screenshot_loop, daemon=True) self._screenshot_thread.start() + # Pending one-shot timer for event-triggered screenshots (event_start / event_stop) + self._pending_trigger_timer: Optional[threading.Timer] = None + self._load_client_settings(force=True) def _normalize_volume_level(self, value, default: float = 1.0) -> float: @@ -880,7 +900,9 @@ class DisplayManager: self.health.update_stopped() self.current_process = None self.current_event_data = None - + # Capture a screenshot ~1s after stop so the dashboard shows the cleared screen + self._trigger_event_screenshot("event_stop", 1.0) + # Turn off TV when display stops (with configurable delay) if turn_off_tv: self.cec.turn_off(delayed=True) @@ -1431,13 +1453,17 @@ class DisplayManager: def start_display_for_event(self, event: Dict) -> Optional[DisplayProcess]: """Start appropriate display software for the given event""" + process = None + handled = False + # First, respect explicit event_type if provided by scheduler etype = event.get('event_type') if etype: etype = etype.lower() if etype == 'presentation': - return self.start_presentation(event) - if etype in ('webuntis', 'webpage', 'website'): + process = self.start_presentation(event) + handled = True + elif etype in ('webuntis', 'webpage', 'website'): # webuntis and webpage both show a browser kiosk # Ensure the URL is taken from 'website.url' or 'web.url' # Normalize event to include a 'web' key so start_webpage can use it @@ -1448,18 +1474,25 @@ class DisplayManager: event['web']['url'] = event['website'].get('url') # Only enable autoscroll for explicit scheduler event_type 'website' autoscroll_flag = (etype == 'website') - return self.start_webpage(event, autoscroll_enabled=autoscroll_flag) + process = self.start_webpage(event, autoscroll_enabled=autoscroll_flag) + handled = True - # Fallback to legacy keys - if 'presentation' in event: - return self.start_presentation(event) - elif 'video' in event: - return self.start_video(event) - elif 'web' in event: - return self.start_webpage(event) - else: - logging.error(f"Unknown event type/structure: {list(event.keys())}") - return None + if not handled: + # Fallback to legacy keys + if 'presentation' in event: + process = self.start_presentation(event) + elif 'video' in event: + process = self.start_video(event) + elif 'web' in event: + process = self.start_webpage(event) + else: + logging.error(f"Unknown event type/structure: {list(event.keys())}") + + if process is not None: + delay = self._get_trigger_delay(event) + self._trigger_event_screenshot("event_start", delay) + + return process def _command_exists(self, command: str) -> bool: """Check if a command exists in PATH""" @@ -1718,6 +1751,130 @@ class DisplayManager: # ------------------------------------------------------------- # Screenshot capture subsystem # ------------------------------------------------------------- + + def _write_screenshot_meta(self, capture_type: str, final_path: str, send_immediately: bool = False): + """Write screenshots/meta.json atomically so simclient can detect new captures. + + IMPORTANT: Protect event-triggered metadata from being overwritten by periodic captures. + If a periodic screenshot is captured while an event-triggered one is still pending + transmission (send_immediately=True), skip writing meta.json to preserve the event's metadata. + + Args: + capture_type: 'periodic', 'event_start', or 'event_stop' + final_path: absolute path of the just-written screenshot file + send_immediately: True for triggered (event) captures, False for periodic ones + """ + try: + def _pending_trigger_is_valid(meta: Dict) -> bool: + """Return True only for fresh, actionable pending trigger metadata. + + This prevents a stale/corrupt pending flag from permanently blocking + periodic updates (meta.json/latest.jpg) if simclient was down or test + data left send_immediately=True behind. + """ + try: + if not meta.get('send_immediately'): + return False + mtype = str(meta.get('type') or '') + if mtype not in ('event_start', 'event_stop'): + return False + mfile = str(meta.get('file') or '').strip() + if not mfile: + return False + file_path = os.path.join(self.screenshot_dir, mfile) + if not os.path.exists(file_path): + logging.warning( + f"Ignoring stale pending screenshot meta: missing file '{mfile}'" + ) + return False + + captured_at_raw = meta.get('captured_at') + if not captured_at_raw: + return False + captured_at = datetime.fromisoformat(str(captured_at_raw).replace('Z', '+00:00')) + age_s = (datetime.now(timezone.utc) - captured_at.astimezone(timezone.utc)).total_seconds() + + # Guard against malformed/future timestamps that could lock + # the pipeline by appearing permanently "fresh". + if age_s < -5: + logging.warning( + f"Ignoring invalid pending screenshot meta: future captured_at (age={age_s:.1f}s)" + ) + return False + + # Triggered screenshots should be consumed quickly (<= 1s). Use a + # generous safety window to avoid false negatives under load. + if age_s > 30: + logging.warning( + f"Ignoring stale pending screenshot meta: type={mtype}, age={age_s:.1f}s" + ) + return False + + return True + except Exception: + return False + + meta_path = os.path.join(self.screenshot_dir, 'meta.json') + + # PROTECTION: Don't overwrite pending event-triggered metadata with periodic capture + if not send_immediately and capture_type == "periodic": + try: + if os.path.exists(meta_path): + with open(meta_path, 'r', encoding='utf-8') as f: + existing_meta = json.load(f) + # If there's a pending event-triggered capture, skip this periodic write + if _pending_trigger_is_valid(existing_meta): + logging.debug(f"Skipping periodic meta.json to preserve pending {existing_meta.get('type')} (send_immediately=True)") + return + except Exception: + pass # If we can't read existing meta, proceed with writing new one + + meta = { + "captured_at": datetime.now(timezone.utc).isoformat(), + "file": os.path.basename(final_path), + "type": capture_type, + "send_immediately": send_immediately, + } + + tmp_path = meta_path + '.tmp' + with open(tmp_path, 'w', encoding='utf-8') as f: + json.dump(meta, f) + os.replace(tmp_path, meta_path) + logging.debug(f"Screenshot meta written: type={capture_type}, send_immediately={send_immediately}") + except Exception as e: + logging.debug(f"Could not write screenshot meta: {e}") + + def _get_trigger_delay(self, event: Dict) -> float: + """Return the post-launch capture delay in seconds appropriate for the event type.""" + etype = (event.get('event_type') or '').lower() + if etype == 'presentation' or 'presentation' in event: + return float(SCREENSHOT_TRIGGER_DELAY_PRESENTATION) + if etype in ('webuntis', 'webpage', 'website') or 'web' in event: + return float(SCREENSHOT_TRIGGER_DELAY_WEB) + if 'video' in event: + return float(SCREENSHOT_TRIGGER_DELAY_VIDEO) + return float(SCREENSHOT_TRIGGER_DELAY_PRESENTATION) # safe default + + def _trigger_event_screenshot(self, capture_type: str, delay: float): + """Arm a one-shot timer to capture a triggered screenshot after *delay* seconds. + + Cancels any already-pending trigger so rapid event switches only produce + one screenshot after the final transition settles, not one per intermediate state. + """ + if self._pending_trigger_timer is not None: + self._pending_trigger_timer.cancel() + self._pending_trigger_timer = None + + def _do_capture(): + self._pending_trigger_timer = None + self._capture_screenshot(capture_type) + + t = threading.Timer(delay, _do_capture) + t.daemon = True + t.start() + self._pending_trigger_timer = t + logging.debug(f"Screenshot trigger armed: type={capture_type}, delay={delay}s") + def _screenshot_loop(self): """Background loop that captures screenshots periodically while an event is active. @@ -1731,7 +1888,12 @@ class DisplayManager: continue now = time.time() if now - last_capture >= SCREENSHOT_CAPTURE_INTERVAL: - if SCREENSHOT_ALWAYS or (self.current_process and self.current_process.is_running()): + process_active = bool(self.current_process and self.current_process.is_running()) + # In development we keep dashboard screenshots fresh even when idle, + # otherwise dashboards can look "dead" with stale images. + capture_idle_in_dev = (ENV == "development") + + if SCREENSHOT_ALWAYS or process_active or capture_idle_in_dev: self._capture_screenshot() last_capture = now else: @@ -1743,7 +1905,7 @@ class DisplayManager: logging.debug(f"Screenshot loop error: {e}") time.sleep(5) - def _capture_screenshot(self): + def _capture_screenshot(self, capture_type: str = "periodic"): """Capture a screenshot of the current display and store it in the shared screenshots directory. Strategy: @@ -1841,8 +2003,15 @@ class DisplayManager: logging.debug(f"xwd/convert pipeline failed: {e}") if not captured: - # Warn only occasionally - logging.warning("No screenshot tool available for current session. For X11, install 'scrot' or ImageMagick. For Wayland, install 'grim' or 'gnome-screenshot'.") + # Capture can fail in headless/TTY sessions even when tools exist. + logging.warning( + "Screenshot capture failed for current session " + f"(DISPLAY={os.environ.get('DISPLAY')}, " + f"WAYLAND_DISPLAY={os.environ.get('WAYLAND_DISPLAY')}, " + f"XDG_SESSION_TYPE={os.environ.get('XDG_SESSION_TYPE')}). " + "Ensure display-manager runs in a desktop session or exports DISPLAY/XAUTHORITY. " + "For X11 install/use 'scrot' or ImageMagick; for Wayland use 'grim' or 'gnome-screenshot'." + ) return # Open image and downscale/compress @@ -1868,13 +2037,29 @@ class DisplayManager: # Maintain latest.jpg as an atomic copy so readers never see a missing # or broken pointer while a new screenshot is being published. - latest_link = os.path.join(self.screenshot_dir, 'latest.jpg') - try: - latest_tmp = os.path.join(self.screenshot_dir, 'latest.jpg.tmp') - shutil.copyfile(final_path, latest_tmp) - os.replace(latest_tmp, latest_link) - except Exception as e: - logging.debug(f"Could not update latest.jpg: {e}") + # PROTECTION: Don't update latest.jpg for periodic captures if event-triggered is pending + should_update_latest = True + if capture_type == "periodic": + try: + meta_path = os.path.join(self.screenshot_dir, 'meta.json') + if os.path.exists(meta_path): + with open(meta_path, 'r', encoding='utf-8') as f: + existing_meta = json.load(f) + # If there's a pending event-triggered capture, don't update latest.jpg + if _pending_trigger_is_valid(existing_meta): + should_update_latest = False + logging.debug(f"Skipping latest.jpg update to preserve pending {existing_meta.get('type')} screenshot") + except Exception: + pass # If we can't read meta, proceed with updating latest.jpg + + latest_link = os.path.join(self.screenshot_dir, 'latest.jpg') + if should_update_latest: + try: + latest_tmp = os.path.join(self.screenshot_dir, 'latest.jpg.tmp') + shutil.copyfile(final_path, latest_tmp) + os.replace(latest_tmp, latest_link) + except Exception as e: + logging.debug(f"Could not update latest.jpg: {e}") # Rotate old screenshots try: @@ -1894,7 +2079,8 @@ class DisplayManager: except Exception: pass logged_size = size if size is not None else 'unknown' - logging.info(f"Screenshot captured: {os.path.basename(final_path)} ({logged_size} bytes)") + self._write_screenshot_meta(capture_type, final_path, send_immediately=(capture_type != "periodic")) + logging.info(f"Screenshot captured: {os.path.basename(final_path)} ({logged_size} bytes) type={capture_type}") except Exception as e: logging.debug(f"Screenshot capture failure: {e}") diff --git a/src/simclient.py b/src/simclient.py index 47de4d4..7711b9d 100644 --- a/src/simclient.py +++ b/src/simclient.py @@ -144,6 +144,9 @@ logging.info(f"Monitoring logger initialized: {MONITORING_LOG_PATH}") # Health state file (written by display_manager, read by simclient) HEALTH_STATE_FILE = os.path.join(os.path.dirname(__file__), "current_process_health.json") CLIENT_SETTINGS_FILE = os.path.join(os.path.dirname(__file__), "config", "client_settings.json") +# Screenshot IPC (written by display_manager, polled by simclient) +SCREENSHOT_DIR = os.path.join(os.path.dirname(__file__), "screenshots") +SCREENSHOT_META_FILE = os.path.join(SCREENSHOT_DIR, "meta.json") discovered = False @@ -635,19 +638,56 @@ def publish_log_message(client, client_id, level: str, message: str, context: di logging.debug(f"Error publishing log: {e}") -def send_screenshot_heartbeat(client, client_id): +def _read_and_clear_meta(): + """Read screenshots/meta.json and atomically clear the send_immediately flag. + + Returns the parsed dict (with the *original* send_immediately value) if the + file exists and is valid JSON, else None. The flag is cleared on disk before + returning so a crash between read and publish does not re-send on the next tick. + """ + try: + if not os.path.exists(SCREENSHOT_META_FILE): + return None + with open(SCREENSHOT_META_FILE, 'r', encoding='utf-8') as f: + meta = json.load(f) + if meta.get('send_immediately'): + # Write cleared copy atomically so the flag is gone before we return + cleared = dict(meta) + cleared['send_immediately'] = False + tmp_path = SCREENSHOT_META_FILE + '.tmp' + with open(tmp_path, 'w', encoding='utf-8') as f: + json.dump(cleared, f) + os.replace(tmp_path, SCREENSHOT_META_FILE) + return meta # original dict; send_immediately is True if it was set + except Exception as e: + logging.debug(f"Could not read screenshot meta: {e}") + return None + + +def send_screenshot_heartbeat(client, client_id, capture_type: str = "periodic"): """Send heartbeat with screenshot to server for dashboard monitoring""" try: screenshot_info = get_latest_screenshot() - + # Also read health state and include in heartbeat health = read_health_state() - + + # Compute screenshot age so the server can flag stale images + screenshot_age_s = None + if screenshot_info: + try: + ts = datetime.fromisoformat(screenshot_info["timestamp"]) + screenshot_age_s = round((datetime.now(timezone.utc) - ts).total_seconds(), 1) + except Exception: + pass + heartbeat_data = { "timestamp": datetime.now(timezone.utc).isoformat(), "client_id": client_id, "status": "alive", + "screenshot_type": capture_type, "screenshot": screenshot_info, + "screenshot_age_s": screenshot_age_s, "system_info": { "hostname": socket.gethostname(), "ip": get_ip(), @@ -685,18 +725,46 @@ def send_screenshot_heartbeat(client, client_id): def screenshot_service_thread(client, client_id): - """Background thread for screenshot monitoring and transmission""" - logging.info(f"Screenshot service started with {SCREENSHOT_INTERVAL}s interval") - + """Background thread for screenshot monitoring and transmission. + + Runs on a 1-second tick. A heartbeat is sent when either: + - display_manager set send_immediately=True in screenshots/meta.json + (event_start / event_stop triggered captures); fired within <=1 second, OR + - the periodic SCREENSHOT_INTERVAL has elapsed since the last send. + + The interval timer resets on every send, so a triggered send pushes out the + next periodic heartbeat rather than causing a double-send shortly after. + """ + logging.info(f"Screenshot service started with {SCREENSHOT_INTERVAL}s periodic interval") + last_sent = 0.0 + last_meta_type = None + while True: try: - send_screenshot_heartbeat(client, client_id) - time.sleep(SCREENSHOT_INTERVAL) + time.sleep(1) + now = time.time() + meta = _read_and_clear_meta() + triggered = bool(meta and meta.get('send_immediately')) + interval_due = (now - last_sent) >= SCREENSHOT_INTERVAL + + if meta: + current_type = meta.get('type', 'unknown') + if current_type != last_meta_type: + logging.debug(f"Meta.json detected: type={current_type}, send_immediately={meta.get('send_immediately')}, file={meta.get('file')}") + last_meta_type = current_type + + if triggered or interval_due: + capture_type = meta['type'] if (triggered and meta) else "periodic" + if triggered: + logging.info(f"Sending triggered screenshot: type={capture_type}") + send_screenshot_heartbeat(client, client_id, capture_type) + last_sent = now except Exception as e: logging.error(f"Screenshot service error: {e}") time.sleep(60) # Wait a minute before retrying + def main(): global discovered print(f"[{datetime.now(timezone.utc).isoformat()}] simclient.py: program started") diff --git a/test-screenshot-meta-fix.sh b/test-screenshot-meta-fix.sh new file mode 100644 index 0000000..8daad1b --- /dev/null +++ b/test-screenshot-meta-fix.sh @@ -0,0 +1,93 @@ +#!/bin/bash +# Test script to verify event-triggered screenshot protection +# Tests BOTH metadata and latest.jpg file protection + +set -e + +SCREENSHOT_DIR="src/screenshots" +META_FILE="$SCREENSHOT_DIR/meta.json" +LATEST_FILE="$SCREENSHOT_DIR/latest.jpg" + +echo "=== Screenshot Event-Triggered Protection Test ===" +echo "" +echo "Testing that periodic screenshots don't overwrite event-triggered captures..." +echo "" + +# Create test directory if needed +mkdir -p "$SCREENSHOT_DIR" + +# Step 1: Create mock event_start screenshot and metadata +echo "Step 1: Simulating event_start screenshot and metadata..." +echo "MOCK EVENT_START IMAGE" > "$LATEST_FILE" +cat > "$META_FILE" << 'EOF' +{"captured_at": "2026-03-29T10:05:33.516Z", "file": "screenshot_20260329_100533.jpg", "type": "event_start", "send_immediately": true} +EOF +echo " Created: $META_FILE" +echo " Created: $LATEST_FILE (with event_start content)" +echo "" + +# Step 2: Simulate periodic screenshot capture +echo "Step 2: Simulating periodic screenshot capture (should NOT overwrite meta or latest.jpg)..." +python3 << 'PYEOF' +import json +import os +import sys + +screenshot_dir = "src/screenshots" +meta_path = os.path.join(screenshot_dir, 'meta.json') +latest_path = os.path.join(screenshot_dir, 'latest.jpg') + +# Read current meta +with open(meta_path, 'r', encoding='utf-8') as f: + existing_meta = json.load(f) + +print("[CHECK] Meta status: type={}, send_immediately={}".format( + existing_meta.get('type', 'unknown'), + existing_meta.get('send_immediately', False) +)) + +# Simulate _write_screenshot_meta() protection for metadata +should_update_meta = True +if existing_meta.get('send_immediately'): + should_update_meta = False + print("[PROTECTED] Would skip meta.json update - pending {} still marked send_immediately=True".format( + existing_meta.get('type'))) + +# Simulate latest.jpg update protection +should_update_latest = True +if existing_meta.get('send_immediately'): + should_update_latest = False + print("[PROTECTED] Would skip latest.jpg update - pending {} not yet transmitted".format( + existing_meta.get('type'))) + +if not should_update_meta and not should_update_latest: + print("[SUCCESS] Both meta.json and latest.jpg protected from periodic overwrite!") + sys.exit(0) +else: + print("[FAILED] Would have overwritten protected files!") + sys.exit(1) +PYEOF + +echo "" +echo "Step 3: Verify both files still contain event_start metadata..." +if grep -q '"type": "event_start"' "$META_FILE" && grep -q '"send_immediately": true' "$META_FILE"; then + if grep -q "MOCK EVENT_START IMAGE" "$LATEST_FILE"; then + echo "[SUCCESS] Both files preserved correctly!" + echo " - meta.json: Still contains type=event_start with send_immediately=true" + echo " - latest.jpg: Still contains original event_start image" + echo "" + echo "Test passed! The fix prevents periodic screenshots from overwriting" + echo "both the metadata AND the actual screenshot file when event-triggered" + echo "captures are pending transmission." + else + echo "[FAILED] latest.jpg was overwritten!" + exit 1 + fi +else + echo "[FAILED] meta.json was overwritten!" + exit 1 +fi + +echo "" +echo "=== Test Complete ===" +