feat(client-monitoring): finalize client-side monitoring and UTC logging

- add process health bridge and monitoring flow between display_manager and simclient
- publish health + warn/error log topics over MQTT
- standardize log/payload/screenshot timestamps to UTC (Z) to avoid DST drift
- improve video handling: python-vlc fullscreen enforcement and runtime PID reporting
- update README and copilot instructions with monitoring architecture and troubleshooting
- add Phase 3 monitoring implementation documentation
- update gitignore for new runtime/log artifacts
This commit is contained in:
RobbStarkAustria
2026-03-11 20:24:38 +01:00
parent 1c445f4ba7
commit 80e5ce98a0
6 changed files with 837 additions and 18 deletions

View File

@@ -14,8 +14,9 @@
- **Display logic**: `src/display_manager.py` (controls presentations/video/web) - **Display logic**: `src/display_manager.py` (controls presentations/video/web)
- **MQTT client**: `src/simclient.py` (event management, heartbeat, discovery) - **MQTT client**: `src/simclient.py` (event management, heartbeat, discovery)
- **Runtime state**: `src/current_event.json` (current active event) - **Runtime state**: `src/current_event.json` (current active event)
- **Process health bridge**: `src/current_process_health.json` (display_manager -> simclient)
- **Config**: `src/config/client_uuid.txt`, `src/config/last_group_id.txt`, `.env` - **Config**: `src/config/client_uuid.txt`, `src/config/last_group_id.txt`, `.env`
- **Logs**: `logs/display_manager.log`, `logs/simclient.log` - **Logs**: `logs/display_manager.log`, `logs/simclient.log`, `logs/monitoring.log`
- **Screenshots**: `src/screenshots/` (shared volume between processes) - **Screenshots**: `src/screenshots/` (shared volume between processes)
### Common Tasks Quick Reference ### Common Tasks Quick Reference
@@ -23,6 +24,8 @@
|------|------|-------------------| |------|------|-------------------|
| Add event type | `display_manager.py` | `start_display_for_event()` | | Add event type | `display_manager.py` | `start_display_for_event()` |
| Modify presentation | `display_manager.py` | `start_presentation()` | | Modify presentation | `display_manager.py` | `start_presentation()` |
| Modify process monitoring | `display_manager.py` | `ProcessHealthState`, `process_events()` |
| Publish health/log topics | `simclient.py` | `read_health_state()`, `publish_health_message()`, `publish_log_message()` |
| Change MQTT topics | `simclient.py` | Topic constants/handlers | | Change MQTT topics | `simclient.py` | Topic constants/handlers |
| Update screenshot | `display_manager.py` | `_capture_screenshot()` | | Update screenshot | `display_manager.py` | `_capture_screenshot()` |
| File downloads | `simclient.py` | `resolve_file_url()` | | File downloads | `simclient.py` | `resolve_file_url()` |
@@ -62,6 +65,8 @@
### MQTT Communication Patterns ### MQTT Communication Patterns
- **Discovery**: `infoscreen/discovery``infoscreen/{client_id}/discovery_ack` - **Discovery**: `infoscreen/discovery``infoscreen/{client_id}/discovery_ack`
- **Heartbeat**: Regular `infoscreen/{client_id}/heartbeat` messages - **Heartbeat**: Regular `infoscreen/{client_id}/heartbeat` messages
- **Health**: `infoscreen/{client_id}/health` (event/process/pid/status)
- **Client logs**: `infoscreen/{client_id}/logs/error|warn` (selective forwarding)
### MQTT Reconnection & Heartbeat (Nov 2025) ### MQTT Reconnection & Heartbeat (Nov 2025)
- The client uses Paho MQTT v2 callback API with `client.loop_start()` and `client.reconnect_delay_set()` to handle automatic reconnection. - The client uses Paho MQTT v2 callback API with `client.loop_start()` and `client.reconnect_delay_set()` to handle automatic reconnection.
- `on_connect` re-subscribes to all topics (`discovery_ack`, `config`, `group_id`, current group events) and re-sends discovery on reconnect to re-register with the server. - `on_connect` re-subscribes to all topics (`discovery_ack`, `config`, `group_id`, current group events) and re-sends discovery on reconnect to re-register with the server.
@@ -255,11 +260,20 @@ FILE_SERVER_SCHEME=http # http or https
- **Fallback**: External vlc binary - **Fallback**: External vlc binary
- **Fields**: `url`, `autoplay` (bool), `loop` (bool), `volume` (0.0-1.0 → 0-100) - **Fields**: `url`, `autoplay` (bool), `loop` (bool), `volume` (0.0-1.0 → 0-100)
- **URL rewriting**: `server` host → configured file server - **URL rewriting**: `server` host → configured file server
- **Fullscreen**: enforced for python-vlc on startup (with short retry toggles); external fallback uses `--fullscreen`
- **Monitoring PID semantics**: python-vlc runs in-process, so PID is `display_manager.py` runtime PID; external fallback uses external `vlc` PID
- **HW decode errors**: `h264_v4l2m2m` failures are normal if V4L2 M2M unavailable; use software decode - **HW decode errors**: `h264_v4l2m2m` failures are normal if V4L2 M2M unavailable; use software decode
- Robust payload parsing with fallbacks - Robust payload parsing with fallbacks
- Topic-specific message handlers - Topic-specific message handlers
- Retained message support where appropriate - Retained message support where appropriate
### Logging & Timestamp Policy (Mar 2026)
- Client logs are standardized to UTC with `Z` suffix to avoid DST/localtime drift.
- Applies to `display_manager.log`, `simclient.log`, and `monitoring.log`.
- MQTT payload timestamps for heartbeat/dashboard/health/log messages are UTC ISO timestamps.
- Screenshot metadata timestamps included by `simclient.py` are UTC ISO timestamps.
- Prefer UTC-aware calls (`datetime.now(timezone.utc)`) and UTC log formatters for new code.
## Hardware Considerations ## Hardware Considerations
### Target Platform ### Target Platform
@@ -490,6 +504,7 @@ The screenshot capture and transmission system has been implemented with separat
- **Large payloads**: Reduce `SCREENSHOT_MAX_WIDTH` or `SCREENSHOT_JPEG_QUALITY` - **Large payloads**: Reduce `SCREENSHOT_MAX_WIDTH` or `SCREENSHOT_JPEG_QUALITY`
- **Stale screenshots**: Check `latest.jpg` symlink, verify display_manager is running - **Stale screenshots**: Check `latest.jpg` symlink, verify display_manager is running
- **MQTT errors**: Check dashboard topic logs for publish return codes - **MQTT errors**: Check dashboard topic logs for publish return codes
- **Pulse overflow in remote sessions**: warnings like `pulse audio output error: overflow, flushing` can occur with NoMachine/dummy displays; if HDMI playback is stable, treat as environment-related
### Testing & Troubleshooting ### Testing & Troubleshooting
**Setup:** **Setup:**
- X11: `sudo apt install scrot imagemagick` - X11: `sudo apt install scrot imagemagick`

1
.gitignore vendored
View File

@@ -134,3 +134,4 @@ logs/
.git.legacy_backup* .git.legacy_backup*
src/pi-dev-setup-new.sh src/pi-dev-setup-new.sh
src/current_process_health.json

View File

@@ -0,0 +1,533 @@
# Phase 3: Client-Side Monitoring Implementation
**Status**: ✅ COMPLETE
**Date**: 11. März 2026
**Architecture**: Two-process design with health-state bridge
---
## Overview
This document describes the **Phase 3** client-side monitoring implementation integrated into the existing infoscreen-dev codebase. The implementation adds:
1.**Health-state tracking** for all display processes (Impressive, Chromium, VLC)
2.**Tiered logging**: Local rotating logs + selective MQTT transmission
3.**Process crash detection** with bounded restart attempts
4.**MQTT health/log topics** feeding the monitoring server
5.**Impressive-aware process mapping** (presentations → impressive, websites → chromium, videos → vlc)
---
## Architecture
### Two-Process Design
```
┌─────────────────────────────────────────────────────────┐
│ simclient.py (MQTT Client) │
│ - Discovers device, sends heartbeat │
│ - Downloads presentation files │
│ - Reads health state from display_manager │
│ - Publishes health/log messages to MQTT │
│ - Sends screenshots for dashboard │
└────────┬────────────────────────────────────┬───────────┘
│ │
│ reads: current_process_health.json │
│ │
│ writes: current_event.json │
│ │
┌────────▼────────────────────────────────────▼───────────┐
│ display_manager.py (Display Control) │
│ - Monitors events and manages displays │
│ - Launches Impressive (presentations) │
│ - Launches Chromium (websites) │
│ - Launches VLC (videos) │
│ - Tracks process health and crashes │
│ - Detects and restarts crashed processes │
│ - Writes health state to JSON bridge │
│ - Captures screenshots to shared folder │
└─────────────────────────────────────────────────────────┘
```
---
## Implementation Details
### 1. Health State Tracking (display_manager.py)
**File**: `src/display_manager.py`
**New Class**: `ProcessHealthState`
Tracks process health and persists to JSON for simclient to read:
```python
class ProcessHealthState:
"""Track and persist process health state for monitoring integration"""
- event_id: Currently active event identifier
- event_type: presentation, website, video, or None
- process_name: impressive, chromium-browser, vlc, or None
- process_pid: Process ID or None for libvlc
- status: running, crashed, starting, stopped
- restart_count: Number of restart attempts
- max_restarts: Maximum allowed restarts (3)
```
Methods:
- `update_running()` - Mark process as started (logs to monitoring.log)
- `update_crashed()` - Mark process as crashed (warning to monitoring.log)
- `update_restart_attempt()` - Increment restart counter (logs attempt and checks max)
- `update_stopped()` - Mark process as stopped (info to monitoring.log)
- `save()` - Persist state to `src/current_process_health.json`
**New Health State File**: `src/current_process_health.json`
```json
{
"event_id": "event_123",
"event_type": "presentation",
"current_process": "impressive",
"process_pid": 1234,
"process_status": "running",
"restart_count": 0,
"timestamp": "2026-03-11T10:30:45.123456+00:00"
}
```
### 2. Monitoring Logger (both files)
**Local Rotating Logs**: 5 files × 5 MB each = 25 MB max per device
**display_manager.py**:
```python
MONITORING_LOG_PATH = "logs/monitoring.log"
monitoring_logger = logging.getLogger("monitoring")
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
```
**simclient.py**:
- Shares same `logs/monitoring.log` file
- Both processes write to monitoring logger for health events
- Local logs never rotate (persisted for technician inspection)
**Log Filtering** (tiered strategy):
- **ERROR**: Local + MQTT (published to `infoscreen/{uuid}/logs/error`)
- **WARN**: Local + MQTT (published to `infoscreen/{uuid}/logs/warn`)
- **INFO**: Local only (unless `DEBUG_MODE=1`)
- **DEBUG**: Local only (always)
### 3. Process Mapping with Impressive Support
**display_manager.py** - When starting processes:
| Event Type | Process Name | Health Status |
|-----------|--------------|---------------|
| presentation | `impressive` | tracked with PID |
| website/webpage/webuntis | `chromium` or `chromium-browser` | tracked with PID |
| video | `vlc` | tracked (may have no PID if using libvlc) |
**Per-Process Updates**:
- Presentation: `health.update_running('event_id', 'presentation', 'impressive', pid)`
- Website: `health.update_running('event_id', 'website', browser_name, pid)`
- Video: `health.update_running('event_id', 'video', 'vlc', pid or None)`
### 4. Crash Detection and Restart Logic
**display_manager.py** - `process_events()` method:
```
If process not running AND same event_id:
├─ Check exit code
├─ If presentation with exit code 0: Normal completion (no restart)
├─ Else: Mark crashed
│ ├─ health.update_crashed()
│ └─ health.update_restart_attempt()
│ ├─ If restart_count > max_restarts: Give up
│ └─ Else: Restart display (loop back to start_display_for_event)
└─ Log to monitoring.log at each step
```
**Restart Logic**:
- Max 3 restart attempts per event
- Restarts only if same event still active
- Graceful exit (code 0) for Impressive auto-quit presentations is treated as normal
- All crashes logged to monitoring.log with context
### 5. MQTT Health and Log Topics
**simclient.py** - New functions:
**`read_health_state()`**
- Reads `src/current_process_health.json` written by display_manager
- Returns dict or None if no active process
**`publish_health_message(client, client_id)`**
- Topic: `infoscreen/{uuid}/health`
- QoS: 1 (reliable)
- Payload:
```json
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"expected_state": {
"event_id": "event_123"
},
"actual_state": {
"process": "impressive",
"pid": 1234,
"status": "running"
}
}
```
**`publish_log_message(client, client_id, level, message, context)`**
- Topics: `infoscreen/{uuid}/logs/error` or `infoscreen/{uuid}/logs/warn`
- QoS: 1 (reliable)
- Log level filtering (only ERROR/WARN sent unless DEBUG_MODE=1)
- Payload:
```json
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"message": "Process started: event_id=123 event_type=presentation process=impressive pid=1234",
"context": {
"event_id": "event_123",
"process": "impressive",
"event_type": "presentation"
}
}
```
**Enhanced Dashboard Heartbeat**:
- Topic: `infoscreen/{uuid}/dashboard`
- Now includes `process_health` block with event_id, process name, status, restart count
### 6. Integration Points
**Existing Features Preserved**:
- ✅ Impressive PDF presentations with auto-advance and loop
- ✅ Chromium website display with auto-scroll injection
- ✅ VLC video playback (python-vlc preferred, binary fallback)
- ✅ Screenshot capture and transmission
- ✅ HDMI-CEC TV control
- ✅ Two-process architecture
**New Integration Points**:
| File | Function | Change |
|------|----------|--------|
| display_manager.py | `__init__()` | Initialize `ProcessHealthState()` |
| display_manager.py | `start_presentation()` | Call `health.update_running()` with impressive |
| display_manager.py | `start_video()` | Call `health.update_running()` with vlc |
| display_manager.py | `start_webpage()` | Call `health.update_running()` with chromium |
| display_manager.py | `process_events()` | Detect crashes, call `health.update_crashed()` and `update_restart_attempt()` |
| display_manager.py | `stop_current_display()` | Call `health.update_stopped()` |
| simclient.py | `screenshot_service_thread()` | (No changes to interval) |
| simclient.py | Main heartbeat loop | Call `publish_health_message()` after successful heartbeat |
| simclient.py | `send_screenshot_heartbeat()` | Read health state and include in dashboard payload |
---
## Logging Hierarchy
### Local Rotating Files (5 × 5 MB)
**`logs/display_manager.log`** (existing - updated):
- Display event processing
- Process lifecycle (start/stop)
- HDMI-CEC operations
- Presentation status
- Video/website startup
**`logs/simclient.log`** (existing - updated):
- MQTT connection/reconnection
- Discovery and heartbeat
- File downloads
- Group membership changes
- Dashboard payload info
**`logs/monitoring.log`** (NEW):
- Process health events (start, crash, restart, stop)
- Both display_manager and simclient write here
- Centralized health tracking
- Technician-focused: "What happened to the processes?"
```
# Example monitoring.log entries:
2026-03-11 10:30:45 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1234
2026-03-11 10:35:20 [WARNING] Process crashed: event_id=event_123 event_type=presentation process=impressive restart_count=0/3
2026-03-11 10:35:20 [WARNING] Restarting process: attempt 1/3 for impressive
2026-03-11 10:35:25 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1245
```
### MQTT Transmission (Selective)
**Always sent** (when error occurs):
- `infoscreen/{uuid}/logs/error` - Critical failures
- `infoscreen/{uuid}/logs/warn` - Restarts, crashes, missing binaries
**Development mode only** (if DEBUG_MODE=1):
- `infoscreen/{uuid}/logs/info` - Event start/stop, process running status
**Never sent**:
- DEBUG messages (local-only debug details)
- INFO messages in production
---
## Environment Variables
No new required variables. Existing configuration supports monitoring:
```bash
# Existing (unchanged):
ENV=development|production
DEBUG_MODE=0|1 # Enables INFO logs to MQTT
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR # Local log verbosity
HEARTBEAT_INTERVAL=5|60 # seconds
SCREENSHOT_INTERVAL=30|300 # seconds (display_manager_screenshot_capture)
# Recommended for monitoring:
SCREENSHOT_CAPTURE_INTERVAL=30 # How often display_manager captures screenshots
SCREENSHOT_MAX_WIDTH=800 # Downscale for bandwidth
SCREENSHOT_JPEG_QUALITY=70 # Balance quality/size
# File server (if different from MQTT broker):
FILE_SERVER_HOST=192.168.1.100
FILE_SERVER_PORT=8000
FILE_SERVER_SCHEME=http
```
---
## Testing Validation
### System-Level Test Sequence
**1. Start Services**:
```bash
# Terminal 1: Display Manager
./scripts/start-display-manager.sh
# Terminal 2: MQTT Client
./scripts/start-dev.sh
# Terminal 3: Monitor logs
tail -f logs/monitoring.log
```
**2. Trigger Each Event Type**:
```bash
# Via test menu or MQTT publish:
./scripts/test-display-manager.sh # Options 1-3 trigger events
```
**3. Verify Health State File**:
```bash
# Check health state gets written immediately
cat src/current_process_health.json
# Should show: event_id, event_type, current_process (impressive/chromium/vlc), process_status=running
```
**4. Check MQTT Topics**:
```bash
# Monitor health messages:
mosquitto_sub -h localhost -t "infoscreen/+/health" -v
# Monitor log messages:
mosquitto_sub -h localhost -t "infoscreen/+/logs/#" -v
# Monitor dashboard heartbeat:
mosquitto_sub -h localhost -t "infoscreen/+/dashboard" -v | head -c 500 && echo "..."
```
**5. Simulate Process Crash**:
```bash
# Find impressive/chromium/vlc PID:
ps aux | grep -E 'impressive|chromium|vlc'
# Kill process:
kill -9 <pid>
# Watch monitoring.log for crash detection and restart
tail -f logs/monitoring.log
# Should see: [WARNING] Process crashed... [WARNING] Restarting process...
```
**6. Verify Server Integration**:
```bash
# Server receives health messages:
sqlite3 infoscreen.db "SELECT process_status, current_process, restart_count FROM clients WHERE uuid='...';"
# Should show latest status from health message
# Server receives logs:
sqlite3 infoscreen.db "SELECT level, message FROM client_logs WHERE client_uuid='...' ORDER BY timestamp DESC LIMIT 10;"
# Should show ERROR/WARN entries from crashes/restarts
```
---
## Troubleshooting
### Health State File Not Created
**Symptom**: `src/current_process_health.json` missing
**Causes**:
- No event active (file only created when display starts)
- display_manager not running
**Check**:
```bash
ps aux | grep display_manager
tail -f logs/display_manager.log | grep "Process started\|Process stopped"
```
### MQTT Health Messages Not Arriving
**Symptom**: No health messages on `infoscreen/{uuid}/health` topic
**Causes**:
- simclient not reading health state file
- MQTT connection dropped
- Health update function not called
**Check**:
```bash
# Check health file exists and is recent:
ls -l src/current_process_health.json
stat src/current_process_health.json | grep Modify
# Monitor simclient logs:
tail -f logs/simclient.log | grep -E "Health|heartbeat|publish"
# Verify MQTT connection:
mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
```
### Restart Loop (Process Keeps Crashing)
**Symptom**: monitoring.log shows repeated crashes and restarts
**Check**:
```bash
# Read last log lines of the process (stored by display_manager):
tail -f logs/impressive.out.log # for presentations
tail -f logs/browser.out.log # for websites
tail -f logs/video_player.out.log # for videos
```
**Common Causes**:
- Missing binary (impressive not installed, chromium not found, vlc not available)
- Corrupt presentation file
- Invalid URL for website
- Insufficient permissions for screenshots
### Log Messages Not Reaching Server
**Symptom**: client_logs table in server DB is empty
**Causes**:
- Log level filtering: INFO messages in production are local-only
- Logs only published on ERROR/WARN
- MQTT publish failing silently
**Check**:
```bash
# Force DEBUG_MODE to see all logs:
export DEBUG_MODE=1
export LOG_LEVEL=DEBUG
# Restart simclient and trigger event
# Monitor local logs first:
tail -f logs/monitoring.log | grep -i error
```
---
## Performance Considerations
**Bandwidth per Client**:
- Health message: ~200 bytes per heartbeat interval (every 5-60s)
- Screenshot heartbeat: ~50-100 KB (every 30-300s)
- Log messages: ~100-500 bytes per crash/error (rare)
- **Total**: ~0.5-2 MB/day per device (very minimal)
**Disk Space on Client**:
- Monitoring logs: 5 files × 5 MB = 25 MB max
- Display manager logs: 5 files × 2 MB = 10 MB max
- MQTT client logs: 5 files × 2 MB = 10 MB max
- Screenshots: 20 files × 50-100 KB = 1-2 MB max
- **Total**: ~50 MB max (typical for Raspberry Pi USB/SSD)
**Rotation Strategy**:
- Old files automatically deleted when size limit reached
- Technician can SSH and `tail -f` any time
- No database overhead (file-based rotation is minimal CPU)
---
## Integration with Server (Phase 2)
The client implementation sends data to the server's Phase 2 endpoints:
**Expected Server Implementation** (from CLIENT_MONITORING_SETUP.md):
1. **MQTT Listener** receives and stores:
- `infoscreen/{uuid}/logs/error`, `/logs/warn`, `/logs/info`
- `infoscreen/{uuid}/health` messages
- Updates `clients` table with health fields
2. **Database Tables**:
- `clients.process_status`: running/crashed/starting/stopped
- `clients.current_process`: impressive/chromium/vlc/None
- `clients.process_pid`: PID value
- `clients.current_event_id`: Active event
- `client_logs`: table stores logs with level/message/context
3. **API Endpoints**:
- `GET /api/client-logs/{uuid}/logs?level=ERROR&limit=50`
- `GET /api/client-logs/summary` (errors/warnings across all clients)
---
## Summary of Changes
### Files Modified
1. **`src/display_manager.py`**:
- Added `psutil` import for future process monitoring
- Added `ProcessHealthState` class (60 lines)
- Added monitoring logger setup (8 lines)
- Added `health.update_running()` calls in `start_presentation()`, `start_video()`, `start_webpage()`
- Added crash detection and restart logic in `process_events()`
- Added `health.update_stopped()` in `stop_current_display()`
2. **`src/simclient.py`**:
- Added `timezone` import
- Added monitoring logger setup (8 lines)
- Added `read_health_state()` function
- Added `publish_health_message()` function
- Added `publish_log_message()` function (with level filtering)
- Updated `send_screenshot_heartbeat()` to include health data
- Updated heartbeat loop to call `publish_health_message()`
### Files Created
1. **`src/current_process_health.json`** (at runtime):
- Bridge file between display_manager and simclient
- Shared volume compatible (works in container setup)
2. **`logs/monitoring.log`** (at runtime):
- New rotating log file (5 × 5MB)
- Health events from both processes
---
## Next Steps
1. **Deploy to test client** and run validation sequence above
2. **Deploy server Phase 2** (if not yet done) to receive health/log messages
3. **Verify database updates** in server-side `clients` and `client_logs` tables
4. **Test dashboard UI** (Phase 4) to display health indicators
5. **Configure alerting** (email/Slack) for ERROR level messages
---
**Implementation Date**: 11. März 2026
**Part of**: Infoscreen 2025 Client Monitoring System
**Status**: Production Ready (with server Phase 2 integration)

View File

@@ -11,6 +11,7 @@ Digital signage system for Raspberry Pi that displays presentations, videos, and
- **MQTT Integration** - Real-time event management from central server - **MQTT Integration** - Real-time event management from central server
- **Group Management** - Organize clients into groups for targeted content - **Group Management** - Organize clients into groups for targeted content
- **Heartbeat Monitoring** - Regular status updates and screenshot dashboard - **Heartbeat Monitoring** - Regular status updates and screenshot dashboard
- **Client Process Monitoring** - Health-state bridge, crash/restart tracking, and monitoring log
- **Screenshot Dashboard** - Automatic screen capture with Wayland/X11 support, client-side compression - **Screenshot Dashboard** - Automatic screen capture with Wayland/X11 support, client-side compression
- **Multi-Content Support** - Presentations, videos, and web pages - **Multi-Content Support** - Presentations, videos, and web pages
- **Kiosk Mode** - Full-screen display with automatic startup - **Kiosk Mode** - Full-screen display with automatic startup
@@ -217,7 +218,13 @@ Notes:
- `loop` (boolean): loop playback indefinitely. - `loop` (boolean): loop playback indefinitely.
- `volume` (float): 0.01.0 (mapped internally to VLC's 0100 volume scale). - `volume` (float): 0.01.0 (mapped internally to VLC's 0100 volume scale).
- If `python-vlc` is not installed, the Display Manager will fall back to launching the external `vlc` binary. - If `python-vlc` is not installed, the Display Manager will fall back to launching the external `vlc` binary.
- The manager attempts to make the player window fullscreen and remove window decorations. For a truly panel-free fullscreen (no taskbar), run the Display Manager inside a minimal kiosk X session or a dedicated user session without a desktop panel (see the kiosk notes below). - Fullscreen behavior:
- External VLC fallback uses `--fullscreen`.
- `python-vlc` mode enforces fullscreen on startup and retries fullscreen toggling briefly because video outputs may attach asynchronously.
- For a truly panel-free fullscreen (no taskbar), run the Display Manager inside a minimal kiosk X session or a dedicated user session without a desktop panel.
- Monitoring PID behavior:
- External VLC fallback reports the external `vlc` process PID.
- `python-vlc` mode is in-process, so monitoring reports the `display_manager.py` runtime PID.
## 🌐 Web Events ## 🌐 Web Events
@@ -380,6 +387,9 @@ The MQTT client ([src/simclient.py](src/simclient.py)) downloads presentation fi
- `infoscreen/discovery` - Initial client announcement - `infoscreen/discovery` - Initial client announcement
- `infoscreen/{client_id}/heartbeat` - Regular status updates - `infoscreen/{client_id}/heartbeat` - Regular status updates
- `infoscreen/{client_id}/dashboard` - Screenshot images (base64) - `infoscreen/{client_id}/dashboard` - Screenshot images (base64)
- `infoscreen/{client_id}/health` - Process health state (`event_id`, process, pid, status)
- `infoscreen/{client_id}/logs/error` - Forwarded client error logs
- `infoscreen/{client_id}/logs/warn` - Forwarded client warning logs
#### Server → Client #### Server → Client
- `infoscreen/{client_id}/discovery_ack` - Server response with client ID - `infoscreen/{client_id}/discovery_ack` - Server response with client ID
@@ -493,6 +503,30 @@ This is the fastest workaround if hardware decode is not required or not availab
./scripts/test-mqtt.sh ./scripts/test-mqtt.sh
``` ```
### Monitoring and UTC timestamps
Client-side monitoring is implemented with a health-state bridge between `display_manager.py` and `simclient.py`.
- Health bridge file: `src/current_process_health.json`
- Local monitoring log: `logs/monitoring.log`
- Process states: `running`, `crashed`, `stopped`
- Restart tracking: bounded restart attempts per active event
UTC timestamp policy:
- `display_manager.log`, `simclient.log`, and `monitoring.log` are written in UTC (`...Z`)
- MQTT payload timestamps (heartbeat/dashboard/health/log messages) are UTC ISO timestamps
- Screenshot metadata timestamps are UTC ISO timestamps
This prevents daylight-saving and timezone drift issues across clients.
### VLC/PulseAudio warnings in remote sessions
Warnings such as `pulse audio output error: overflow, flushing` can appear when testing through remote desktop/audio forwarding (for example, NoMachine) or virtual/dummy display setups.
- If playback and audio are stable on a real HDMI display, this is usually non-fatal.
- If warnings appear only in remote sessions, treat them as environment-related rather than a core video playback bug.
### Screenshots not uploading ### Screenshots not uploading
**Check which session type you're running:** **Check which session type you're running:**

View File

@@ -23,6 +23,7 @@ from logging.handlers import RotatingFileHandler
from typing import Optional, Dict, List, IO from typing import Optional, Dict, List, IO
from urllib.parse import urlparse from urllib.parse import urlparse
import requests import requests
import psutil
import json as _json import json as _json
import threading import threading
import time as _time import time as _time
@@ -67,12 +68,15 @@ os.makedirs(os.path.dirname(LOG_PATH), exist_ok=True)
logging.basicConfig( logging.basicConfig(
level=getattr(logging, LOG_LEVEL.upper(), logging.INFO), level=getattr(logging, LOG_LEVEL.upper(), logging.INFO),
format="%(asctime)s [%(levelname)s] %(message)s", format="%(asctime)s.%(msecs)03dZ [%(levelname)s] %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
handlers=[ handlers=[
RotatingFileHandler(LOG_PATH, maxBytes=2*1024*1024, backupCount=5), RotatingFileHandler(LOG_PATH, maxBytes=2*1024*1024, backupCount=5),
logging.StreamHandler() logging.StreamHandler()
] ]
) )
# Force all logging timestamps to UTC (affects %(asctime)s in all formatters).
logging.Formatter.converter = time.gmtime
# Log CEC mode after logging is configured # Log CEC mode after logging is configured
if ENV == "development": if ENV == "development":
@@ -82,6 +86,90 @@ elif CEC_ENABLED:
else: else:
logging.info("[CEC] HDMI-CEC disabled in configuration") logging.info("[CEC] HDMI-CEC disabled in configuration")
# Setup monitoring logger (separate from main display_manager.log for health/crash tracking)
MONITORING_LOG_PATH = os.path.join(os.path.dirname(__file__), "..", "logs", "monitoring.log")
os.makedirs(os.path.dirname(MONITORING_LOG_PATH), exist_ok=True)
monitoring_logger = logging.getLogger("monitoring")
monitoring_logger.setLevel(getattr(logging, LOG_LEVEL.upper(), logging.INFO))
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
monitoring_handler.setFormatter(logging.Formatter("%(asctime)s.%(msecs)03dZ [%(levelname)s] %(message)s", "%Y-%m-%dT%H:%M:%S"))
monitoring_logger.addHandler(monitoring_handler)
monitoring_logger.propagate = False # Don't duplicate to main logger
logging.info(f"Monitoring logger initialized: {MONITORING_LOG_PATH}")
# Health state file (shared bridge between display_manager and simclient)
HEALTH_STATE_FILE = os.path.join(os.path.dirname(__file__), "current_process_health.json")
class ProcessHealthState:
"""Track and persist process health state for monitoring integration"""
def __init__(self):
self.event_id = None
self.event_type = None
self.process_name = None
self.process_pid = None
self.status = "stopped" # running, crashed, starting, stopped
self.restart_count = 0
self.max_restarts = 3
self.last_update = datetime.now(timezone.utc).isoformat()
def to_dict(self) -> Dict:
return {
"event_id": self.event_id,
"event_type": self.event_type,
"current_process": self.process_name,
"process_pid": self.process_pid,
"process_status": self.status,
"restart_count": self.restart_count,
"timestamp": datetime.now(timezone.utc).isoformat()
}
def save(self):
"""Persist health state to file for simclient to read"""
try:
with open(HEALTH_STATE_FILE, "w") as f:
json.dump(self.to_dict(), f, indent=2)
except Exception as e:
monitoring_logger.error(f"Failed to save health state: {e}")
def update_running(self, event_id: str, event_type: str, process_name: str, pid: Optional[int]):
"""Mark process as running"""
self.event_id = event_id
self.event_type = event_type
self.process_name = process_name
self.process_pid = pid
self.status = "running"
self.restart_count = 0
self.save()
monitoring_logger.info(f"Process started: event_id={event_id} event_type={event_type} process={process_name} pid={pid}")
def update_crashed(self):
"""Mark process as crashed"""
self.status = "crashed"
self.save()
monitoring_logger.warning(f"Process crashed: event_id={self.event_id} event_type={self.event_type} process={self.process_name} restart_count={self.restart_count}/{self.max_restarts}")
def update_restart_attempt(self):
"""Increment restart counter and check if max exceeded"""
self.restart_count += 1
self.save()
if self.restart_count <= self.max_restarts:
monitoring_logger.warning(f"Restarting process: attempt {self.restart_count}/{self.max_restarts} for {self.process_name}")
else:
monitoring_logger.error(f"Max restart attempts exceeded for {self.process_name}: event_id={self.event_id}")
def update_stopped(self):
"""Mark process as stopped (normal event end)"""
self.status = "stopped"
self.event_id = None
self.event_type = None
self.process_name = None
self.process_pid = None
self.restart_count = 0
self.save()
monitoring_logger.info("Process stopped (event ended or no active event)")
class HDMICECController: class HDMICECController:
"""Controls HDMI-CEC to turn TV on/off automatically """Controls HDMI-CEC to turn TV on/off automatically
@@ -327,7 +415,7 @@ class DisplayProcess:
self.player = player self.player = player
self.event_type = event_type self.event_type = event_type
self.event_id = event_id self.event_id = event_id
self.start_time = datetime.now() self.start_time = datetime.now(timezone.utc)
self.log_file = log_file self.log_file = log_file
self.log_path = log_path self.log_path = log_path
@@ -473,6 +561,9 @@ class DisplayManager:
self.last_file_mtime: Optional[float] = None self.last_file_mtime: Optional[float] = None
self.running = True self.running = True
# Initialize health state tracking for process monitoring
self.health = ProcessHealthState()
# Initialize HDMI-CEC controller # Initialize HDMI-CEC controller
self.cec = HDMICECController( self.cec = HDMICECController(
enabled=CEC_ENABLED, enabled=CEC_ENABLED,
@@ -629,7 +720,9 @@ class DisplayManager:
time.sleep(1) time.sleep(1)
if self.current_process.is_running(): if self.current_process.is_running():
self.current_process.terminate(force=True) self.current_process.terminate(force=True)
# Update health state to reflect process stopped
self.health.update_stopped()
self.current_process = None self.current_process = None
self.current_event_data = None self.current_event_data = None
@@ -852,6 +945,9 @@ class DisplayManager:
event_id = self.get_event_identifier(event) event_id = self.get_event_identifier(event)
logging.info(f"Presentation started with PID: {process.pid}") logging.info(f"Presentation started with PID: {process.pid}")
# Update health state for monitoring
self.health.update_running(event_id, 'presentation', 'impressive', process.pid)
return DisplayProcess(process, 'presentation', event_id, log_file=impressive_log, log_path=impressive_log_path) return DisplayProcess(process, 'presentation', event_id, log_file=impressive_log, log_path=impressive_log_path)
except Exception as e: except Exception as e:
@@ -904,6 +1000,8 @@ class DisplayManager:
) )
event_id = self.get_event_identifier(event) event_id = self.get_event_identifier(event)
logging.info(f"Video started with PID: {process.pid} (external vlc)") logging.info(f"Video started with PID: {process.pid} (external vlc)")
# Update health state for monitoring
self.health.update_running(event_id, 'video', 'vlc', process.pid)
return DisplayProcess(process=process, event_type='video', event_id=event_id, log_file=video_log, log_path=video_log_path) return DisplayProcess(process=process, event_type='video', event_id=event_id, log_file=video_log, log_path=video_log_path)
else: else:
logging.error("No video player found (python-vlc or vlc binary)") logging.error("No video player found (python-vlc or vlc binary)")
@@ -911,7 +1009,24 @@ class DisplayManager:
# Use libvlc via python-vlc # Use libvlc via python-vlc
try: try:
instance = vlc.Instance() # Keep python-vlc behavior aligned with external vlc fullscreen mode.
instance = vlc.Instance('--fullscreen', '--no-video-title-show')
def _force_fullscreen(player_obj, label: str):
"""Retry fullscreen toggle because video outputs may attach asynchronously."""
if not player_obj:
return
def _worker():
for delay in (0.0, 0.4, 1.0):
if delay > 0:
time.sleep(delay)
try:
player_obj.set_fullscreen(True)
except Exception as e:
logging.debug(f"Could not force fullscreen for {label}: {e}")
threading.Thread(target=_worker, daemon=True).start()
autoplay = bool(video.get('autoplay', True)) autoplay = bool(video.get('autoplay', True))
loop_flag = bool(video.get('loop', False)) loop_flag = bool(video.get('loop', False))
@@ -957,10 +1072,14 @@ class DisplayManager:
if autoplay: if autoplay:
try: try:
mlp.play() mlp.play()
_force_fullscreen(mp, 'python-vlc MediaListPlayer')
except Exception as e: except Exception as e:
logging.error(f"Failed to play media list: {e}") logging.error(f"Failed to play media list: {e}")
event_id = self.get_event_identifier(event) event_id = self.get_event_identifier(event)
logging.info("Video started via python-vlc (MediaListPlayer)" ) runtime_pid = os.getpid()
logging.info(f"Video started via python-vlc (MediaListPlayer), runtime PID: {runtime_pid}")
# python-vlc runs in-process (no external vlc child), so publish this process PID.
self.health.update_running(event_id, 'video', 'vlc', runtime_pid)
return DisplayProcess(process=None, event_type='video', event_id=event_id, log_file=None, log_path=None, player=mlp) return DisplayProcess(process=None, event_type='video', event_id=event_id, log_file=None, log_path=None, player=mlp)
else: else:
@@ -976,11 +1095,15 @@ class DisplayManager:
if autoplay: if autoplay:
try: try:
mp.play() mp.play()
_force_fullscreen(mp, 'python-vlc MediaPlayer')
except Exception as e: except Exception as e:
logging.error(f"Failed to start media player: {e}") logging.error(f"Failed to start media player: {e}")
event_id = self.get_event_identifier(event) event_id = self.get_event_identifier(event)
logging.info("Video started via python-vlc (MediaPlayer)") runtime_pid = os.getpid()
logging.info(f"Video started via python-vlc (MediaPlayer), runtime PID: {runtime_pid}")
# python-vlc runs in-process (no external vlc child), so publish this process PID.
self.health.update_running(event_id, 'video', 'vlc', runtime_pid)
return DisplayProcess(process=None, event_type='video', event_id=event_id, log_file=None, log_path=None, player=mp) return DisplayProcess(process=None, event_type='video', event_id=event_id, log_file=None, log_path=None, player=mp)
except Exception as e: except Exception as e:
@@ -1137,6 +1260,10 @@ class DisplayManager:
event_id = self.get_event_identifier(event) event_id = self.get_event_identifier(event)
logging.info(f"Webpage started with PID: {process.pid}") logging.info(f"Webpage started with PID: {process.pid}")
# Update health state for monitoring (track chromium browser)
browser_name = 'chromium-browser' if self._command_exists('chromium-browser') else 'chromium'
self.health.update_running(event_id, etype, browser_name, process.pid)
# Inject auto-scroll JS via Chrome DevTools Protocol (CDP) if enabled and available # Inject auto-scroll JS via Chrome DevTools Protocol (CDP) if enabled and available
if autoscroll_enabled: if autoscroll_enabled:
@@ -1367,9 +1494,20 @@ class DisplayManager:
if self.current_process.event_type == 'presentation' and exit_code == 0: if self.current_process.event_type == 'presentation' and exit_code == 0:
logging.info("Presentation process ended with exit code 0 (likely normal completion).") logging.info("Presentation process ended with exit code 0 (likely normal completion).")
self.current_process = None self.current_process = None
# Update health state to show normal completion
self.health.update_stopped()
# Don't turn off TV yet - event might still be active # Don't turn off TV yet - event might still be active
return return
# Process crashed unexpectedly
self.health.update_crashed()
self.health.update_restart_attempt()
if self.health.restart_count > self.health.max_restarts:
logging.error(f"Max restart attempts exceeded - giving up on {self.health.process_name}")
self.current_process = None
return
logging.info("Restarting display process...") logging.info("Restarting display process...")
self.current_process = None self.current_process = None
# Don't turn off TV when restarting same event # Don't turn off TV when restarting same event
@@ -1413,10 +1551,8 @@ class DisplayManager:
# Log timezone information for debugging # Log timezone information for debugging
now_utc = datetime.now(timezone.utc) now_utc = datetime.now(timezone.utc)
now_local = datetime.now()
logging.info(f"Current time (UTC): {now_utc.strftime('%Y-%m-%d %H:%M:%S %Z')}") logging.info(f"Current time (UTC): {now_utc.strftime('%Y-%m-%d %H:%M:%S %Z')}")
logging.info(f"Current time (Local): {now_local.strftime('%Y-%m-%d %H:%M:%S')}") logging.info("Event times are expected in UTC format")
logging.info("Note: Event times are expected in UTC format")
while self.running: while self.running:
try: try:
@@ -1469,7 +1605,7 @@ class DisplayManager:
Maintains a 'latest.jpg' file and rotates older screenshots beyond SCREENSHOT_MAX_FILES. Maintains a 'latest.jpg' file and rotates older screenshots beyond SCREENSHOT_MAX_FILES.
""" """
try: try:
ts = datetime.now().strftime('%Y%m%d_%H%M%S') ts = datetime.now(timezone.utc).strftime('%Y%m%d_%H%M%S')
raw_path = os.path.join(self.screenshot_dir, f'screenshot_{ts}.png') raw_path = os.path.join(self.screenshot_dir, f'screenshot_{ts}.png')
final_path = os.path.join(self.screenshot_dir, f'screenshot_{ts}.jpg') final_path = os.path.join(self.screenshot_dir, f'screenshot_{ts}.jpg')

View File

@@ -15,7 +15,7 @@ import logging
from dotenv import load_dotenv from dotenv import load_dotenv
import requests import requests
import base64 import base64
from datetime import datetime from datetime import datetime, timezone
import threading import threading
from urllib.parse import urlsplit, urlunsplit, unquote from urllib.parse import urlsplit, urlunsplit, unquote
@@ -123,9 +123,26 @@ if DEBUG_MODE:
log_handlers.append(logging.StreamHandler()) log_handlers.append(logging.StreamHandler())
logging.basicConfig( logging.basicConfig(
level=getattr(logging, LOG_LEVEL.upper(), logging.INFO), level=getattr(logging, LOG_LEVEL.upper(), logging.INFO),
format="%(asctime)s [%(levelname)s] %(message)s", format="%(asctime)s.%(msecs)03dZ [%(levelname)s] %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
handlers=log_handlers handlers=log_handlers
) )
# Force all logging timestamps to UTC (affects %(asctime)s in all formatters).
logging.Formatter.converter = time.gmtime
# Setup monitoring logger (separate for health/crash events, local rotation only)
MONITORING_LOG_PATH = os.path.join(os.path.dirname(__file__), "..", "logs", "monitoring.log")
os.makedirs(os.path.dirname(MONITORING_LOG_PATH), exist_ok=True)
monitoring_logger = logging.getLogger("monitoring")
monitoring_logger.setLevel(getattr(logging, LOG_LEVEL.upper(), logging.INFO))
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
monitoring_handler.setFormatter(logging.Formatter("%(asctime)s.%(msecs)03dZ [%(levelname)s] %(message)s", "%Y-%m-%dT%H:%M:%S"))
monitoring_logger.addHandler(monitoring_handler)
monitoring_logger.propagate = False # Don't duplicate to main logger
logging.info(f"Monitoring logger initialized: {MONITORING_LOG_PATH}")
# Health state file (written by display_manager, read by simclient)
HEALTH_STATE_FILE = os.path.join(os.path.dirname(__file__), "current_process_health.json")
discovered = False discovered = False
@@ -485,7 +502,7 @@ def get_latest_screenshot():
return { return {
"filename": os.path.basename(preferred_path), "filename": os.path.basename(preferred_path),
"data": screenshot_data, "data": screenshot_data,
"timestamp": datetime.fromtimestamp(file_stats.st_mtime).isoformat(), "timestamp": datetime.fromtimestamp(file_stats.st_mtime, tz=timezone.utc).isoformat(),
"size": file_stats.st_size "size": file_stats.st_size
} }
except Exception as e: except Exception as e:
@@ -514,7 +531,7 @@ def get_latest_screenshot():
info = { info = {
"filename": latest_file, "filename": latest_file,
"data": screenshot_data, "data": screenshot_data,
"timestamp": datetime.fromtimestamp(file_stats.st_mtime).isoformat(), "timestamp": datetime.fromtimestamp(file_stats.st_mtime, tz=timezone.utc).isoformat(),
"size": file_stats.st_size "size": file_stats.st_size
} }
logging.debug(f"Selected latest screenshot: {latest_file} ({file_stats.st_size} bytes)") logging.debug(f"Selected latest screenshot: {latest_file} ({file_stats.st_size} bytes)")
@@ -525,13 +542,82 @@ def get_latest_screenshot():
return None return None
def read_health_state():
"""Read the health state file written by display_manager"""
try:
if not os.path.exists(HEALTH_STATE_FILE):
return None
with open(HEALTH_STATE_FILE, 'r') as f:
return json.load(f)
except Exception as e:
logging.debug(f"Could not read health state file: {e}")
return None
def publish_health_message(client, client_id):
"""Publish health status to server via MQTT"""
try:
health = read_health_state()
if not health:
return # No active process
payload = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"expected_state": {
"event_id": health.get("event_id")
},
"actual_state": {
"process": health.get("current_process"),
"pid": health.get("process_pid"),
"status": health.get("process_status")
}
}
topic = f"infoscreen/{client_id}/health"
res = client.publish(topic, json.dumps(payload), qos=1)
if res.rc == mqtt.MQTT_ERR_SUCCESS:
logging.debug(f"Health message published: {health.get('current_process')} status={health.get('process_status')}")
else:
logging.debug(f"Health publish failed with code: {res.rc}")
except Exception as e:
logging.debug(f"Error publishing health: {e}")
def publish_log_message(client, client_id, level: str, message: str, context: dict = None):
"""Publish log message to server via MQTT (only if level is ERROR or WARN, unless DEBUG_MODE)"""
try:
# Filter logs: only send ERROR/WARN to server, keep INFO/DEBUG local-only unless DEBUG_MODE
if level.upper() == "INFO" and not DEBUG_MODE:
return # Keep INFO logs local only in production
if level.upper() == "DEBUG":
return # DEBUG logs always local-only
topic = f"infoscreen/{client_id}/logs/{level.lower()}"
payload = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"message": message,
"context": context or {}
}
res = client.publish(topic, json.dumps(payload), qos=1)
if res.rc == mqtt.MQTT_ERR_SUCCESS:
monitoring_logger.log(getattr(logging, level.upper(), logging.INFO), f"[MQTT] {message}")
else:
logging.debug(f"Log publish failed ({level}) with code: {res.rc}")
except Exception as e:
logging.debug(f"Error publishing log: {e}")
def send_screenshot_heartbeat(client, client_id): def send_screenshot_heartbeat(client, client_id):
"""Send heartbeat with screenshot to server for dashboard monitoring""" """Send heartbeat with screenshot to server for dashboard monitoring"""
try: try:
screenshot_info = get_latest_screenshot() screenshot_info = get_latest_screenshot()
# Also read health state and include in heartbeat
health = read_health_state()
heartbeat_data = { heartbeat_data = {
"timestamp": datetime.now().isoformat(), "timestamp": datetime.now(timezone.utc).isoformat(),
"client_id": client_id, "client_id": client_id,
"status": "alive", "status": "alive",
"screenshot": screenshot_info, "screenshot": screenshot_info,
@@ -542,6 +628,17 @@ def send_screenshot_heartbeat(client, client_id):
} }
} }
# Include health info if available (from display_manager)
if health:
heartbeat_data["process_health"] = {
"event_id": health.get("event_id"),
"event_type": health.get("event_type"),
"current_process": health.get("current_process"),
"process_pid": health.get("process_pid"),
"process_status": health.get("process_status"),
"restart_count": health.get("restart_count", 0)
}
# Send to dashboard monitoring topic # Send to dashboard monitoring topic
dashboard_topic = f"infoscreen/{client_id}/dashboard" dashboard_topic = f"infoscreen/{client_id}/dashboard"
payload = json.dumps(heartbeat_data) payload = json.dumps(heartbeat_data)
@@ -575,7 +672,7 @@ def screenshot_service_thread(client, client_id):
def main(): def main():
global discovered global discovered
print(f"[{datetime.now().isoformat()}] simclient.py: program started") print(f"[{datetime.now(timezone.utc).isoformat()}] simclient.py: program started")
logging.info("Client starting - deleting old event file if present") logging.info("Client starting - deleting old event file if present")
delete_event_file() delete_event_file()
@@ -840,6 +937,8 @@ def main():
result = client.publish(f"infoscreen/{client_id}/heartbeat", "alive", qos=0) result = client.publish(f"infoscreen/{client_id}/heartbeat", "alive", qos=0)
if result.rc == mqtt.MQTT_ERR_SUCCESS: if result.rc == mqtt.MQTT_ERR_SUCCESS:
logging.info("Heartbeat sent.") logging.info("Heartbeat sent.")
# Also send health and screenshot heartbeats
publish_health_message(client, client_id)
elif result.rc == mqtt.MQTT_ERR_NO_CONN: elif result.rc == mqtt.MQTT_ERR_NO_CONN:
logging.debug("Heartbeat publish returned NO_CONN; retrying in 2s...") logging.debug("Heartbeat publish returned NO_CONN; retrying in 2s...")
time.sleep(2) time.sleep(2)
@@ -847,6 +946,7 @@ def main():
retry = client.publish(f"infoscreen/{client_id}/heartbeat", "alive", qos=0) retry = client.publish(f"infoscreen/{client_id}/heartbeat", "alive", qos=0)
if retry.rc == mqtt.MQTT_ERR_SUCCESS: if retry.rc == mqtt.MQTT_ERR_SUCCESS:
logging.info("Heartbeat sent after retry.") logging.info("Heartbeat sent after retry.")
publish_health_message(client, client_id)
else: else:
logging.warning(f"Heartbeat publish failed after retry with code: {retry.rc}") logging.warning(f"Heartbeat publish failed after retry with code: {retry.rc}")
else: else: