Files
infoscreen/CLIENT_MONITORING_SPECIFICATION.md
olafn 3107d0f671 feat(monitoring): add server-side client logging and health infrastructure
- add Alembic migration c1d2e3f4g5h6 for client monitoring:
  - create client_logs table with FK to clients.uuid and performance indexes
  - extend clients with process/health tracking fields
- extend data model with ClientLog, LogLevel, ProcessStatus, and ScreenHealthStatus
- enhance listener MQTT handling:
  - subscribe to logs and health topics
  - persist client logs from infoscreen/{uuid}/logs/{level}
  - process health payloads and enrich heartbeat-derived client state
- add monitoring API blueprint server/routes/client_logs.py:
  - GET /api/client-logs/<uuid>/logs
  - GET /api/client-logs/summary
  - GET /api/client-logs/recent-errors
  - GET /api/client-logs/test
- register client_logs blueprint in server/wsgi.py
- align compose/dev runtime for listener live-code execution
- add client-side implementation docs:
  - CLIENT_MONITORING_SPECIFICATION.md
  - CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md
- update TECH-CHANGELOG.md and copilot-instructions.md:
  - document monitoring changes
  - codify post-release technical-notes/no-version-bump convention
2026-03-10 07:33:38 +00:00

29 KiB

Client-Side Monitoring Specification

Version: 1.0
Date: 2026-03-10
For: Infoscreen Client Implementation
Server Endpoint: 192.168.43.201:8000 (or your production server)
MQTT Broker: 192.168.43.201:1883 (or your production MQTT broker)


1. Overview

Each infoscreen client must implement health monitoring and logging capabilities to report status to the central server via MQTT.

1.1 Goals

  • Detect failures: Process crashes, frozen screens, content mismatches
  • Provide visibility: Real-time health status visible on server dashboard
  • Enable remote diagnosis: Centralized log storage for debugging
  • Auto-recovery: Attempt automatic restart on failure

1.2 Architecture

┌─────────────────────────────────────────┐
│         Infoscreen Client               │
│                                         │
│  ┌──────────────┐    ┌──────────────┐  │
│  │ Media Player │    │   Watchdog   │  │
│  │ (VLC/Chrome) │◄───│   Monitor    │  │
│  └──────────────┘    └──────┬───────┘  │
│                              │          │
│  ┌──────────────┐            │          │
│  │  Event Mgr   │            │          │
│  │  (receives   │            │          │
│  │   schedule)  │◄───────────┘          │
│  └──────┬───────┘                       │
│         │                               │
│  ┌──────▼───────────────────────┐      │
│  │     MQTT Client               │      │
│  │  - Heartbeat (every 60s)      │      │
│  │  - Logs (error/warn/info)     │      │
│  │  - Health metrics (every 5s)  │      │
│  └──────┬────────────────────────┘      │
└─────────┼──────────────────────────────┘
          │
          │ MQTT over TCP
          ▼
    ┌─────────────┐
    │ MQTT Broker │
    │  (server)   │
    └─────────────┘

2. MQTT Protocol Specification

2.1 Connection Parameters

Broker: 192.168.43.201 (or DNS hostname)
Port: 1883 (standard MQTT)
Protocol: MQTT v3.1.1
Client ID: "infoscreen-{client_uuid}"
Clean Session: false (retain subscriptions)
Keep Alive: 60 seconds
Username/Password: (if configured on broker)

2.2 QoS Levels

  • Heartbeat: QoS 0 (fire and forget, high frequency)
  • Logs (ERROR/WARN): QoS 1 (at least once delivery, important)
  • Logs (INFO): QoS 0 (optional, high volume)
  • Health metrics: QoS 0 (frequent, latest value matters)

3. Topic Structure & Payload Formats

3.1 Log Messages

Topic Pattern:

infoscreen/{client_uuid}/logs/{level}

Where {level} is one of: error, warn, info

Payload Format (JSON):

{
  "timestamp": "2026-03-10T07:30:00Z",
  "message": "Human-readable error description",
  "context": {
    "event_id": 42,
    "process": "vlc",
    "error_code": "NETWORK_TIMEOUT",
    "additional_key": "any relevant data"
  }
}

Field Specifications:

Field Type Required Description
timestamp string (ISO 8601 UTC) Yes When the event occurred. Use YYYY-MM-DDTHH:MM:SSZ format
message string Yes Human-readable description of the event (max 1000 chars)
context object No Additional structured data (will be stored as JSON)

Example Topics:

infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/error
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/warn
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/info

When to Send Logs:

ERROR (Always send):

  • Process crashed (VLC/Chromium/PDF viewer terminated unexpectedly)
  • Content failed to load (404, network timeout, corrupt file)
  • Hardware failure detected (display off, audio device missing)
  • Exception caught in main event loop
  • Maximum restart attempts exceeded

WARN (Always send):

  • Process restarted automatically (after crash)
  • High resource usage (CPU >80%, RAM >90%)
  • Slow performance (frame drops, lag)
  • Non-critical failures (screenshot capture failed, cache full)
  • Fallback content displayed (primary source unavailable)

INFO (Send in development, optional in production):

  • Process started successfully
  • Event transition (switched from video to presentation)
  • Content loaded successfully
  • Watchdog service started/stopped

3.2 Health Metrics

Topic Pattern:

infoscreen/{client_uuid}/health

Payload Format (JSON):

{
  "timestamp": "2026-03-10T07:30:00Z",
  "expected_state": {
    "event_id": 42,
    "event_type": "video",
    "media_file": "presentation.mp4",
    "started_at": "2026-03-10T07:15:00Z"
  },
  "actual_state": {
    "process": "vlc",
    "pid": 1234,
    "status": "running",
    "uptime_seconds": 900,
    "position": 45.3,
    "duration": 180.0
  },
  "health_metrics": {
    "screen_on": true,
    "last_frame_update": "2026-03-10T07:29:58Z",
    "frames_dropped": 2,
    "network_errors": 0,
    "cpu_percent": 15.3,
    "memory_mb": 234
  }
}

Field Specifications:

expected_state:

Field Type Required Description
event_id integer Yes Current event ID from scheduler
event_type string Yes presentation, video, website, webuntis, message
media_file string No Filename or URL of current content
started_at string (ISO 8601) Yes When this event started playing

actual_state:

Field Type Required Description
process string Yes vlc, chromium, pdf_viewer, none
pid integer No Process ID (if running)
status string Yes running, crashed, starting, stopped
uptime_seconds integer No How long process has been running
position float No Current playback position (seconds, for video/audio)
duration float No Total content duration (seconds)

health_metrics:

Field Type Required Description
screen_on boolean Yes Is display powered on?
last_frame_update string (ISO 8601) No Last time screen content changed
frames_dropped integer No Video frames dropped (performance indicator)
network_errors integer No Count of network errors in last interval
cpu_percent float No CPU usage (0-100)
memory_mb integer No RAM usage in megabytes

Sending Frequency:

  • Normal operation: Every 5 seconds
  • During startup/transition: Every 1 second
  • After error: Immediately + every 2 seconds until recovered

3.3 Enhanced Heartbeat

The existing heartbeat topic should be enhanced to include process status.

Topic Pattern:

infoscreen/{client_uuid}/heartbeat

Enhanced Payload Format (JSON):

{
  "uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
  "timestamp": "2026-03-10T07:30:00Z",
  "current_process": "vlc",
  "process_pid": 1234,
  "process_status": "running",
  "current_event_id": 42
}

New Fields (add to existing heartbeat):

Field Type Required Description
current_process string No Name of active media player process
process_pid integer No Process ID
process_status string No running, crashed, starting, stopped
current_event_id integer No Event ID currently being displayed

Sending Frequency:

  • Keep existing: Every 60 seconds
  • Include new fields if available

4. Process Monitoring Requirements

4.1 Processes to Monitor

Media Type Process Name How to Detect
Video vlc ps aux | grep vlc or pgrep vlc
Website/WebUntis chromium or chromium-browser pgrep chromium
PDF Presentation evince, okular, or custom viewer pgrep {viewer_name}

4.2 Monitoring Checks (Every 5 seconds)

Check 1: Process Alive

Goal: Verify expected process is running
Method: 
  - Get list of running processes (psutil or `ps`)
  - Check if expected process name exists
  - Match PID if known
Result:
  - If missing → status = "crashed"
  - If found → status = "running"
Action on crash:
  - Send ERROR log immediately
  - Attempt restart (max 3 attempts)
  - Send WARN log on each restart
  - If max restarts exceeded → send ERROR log, display fallback

Check 2: Process Responsive

Goal: Detect frozen processes
Method:
  - For VLC: Query HTTP interface (status.json)
  - For Chromium: Use DevTools Protocol (CDP)
  - For custom viewers: Check last screen update time
Result:
  - If same frame >30 seconds → likely frozen
  - If playback position not advancing → frozen
Action on freeze:
  - Send WARN log
  - Force refresh (reload page, seek video, next slide)
  - If refresh fails → restart process

Check 3: Content Match

Goal: Verify correct content is displayed
Method:
  - Compare expected event_id with actual media/URL
  - Check scheduled time window (is event still active?)
Result:
  - Mismatch → content error
Action:
  - Send WARN log
  - Reload correct event from scheduler

5. Process Control Interface Requirements

5.1 VLC Control

Requirement: Enable VLC HTTP interface for monitoring

Launch Command:

vlc --intf http --http-host 127.0.0.1 --http-port 8080 --http-password "vlc_password" \
    --fullscreen --loop /path/to/video.mp4

Status Query:

curl http://127.0.0.1:8080/requests/status.json --user ":vlc_password"

Response Fields to Monitor:

{
  "state": "playing",     // "playing", "paused", "stopped"
  "position": 0.25,       // 0.0-1.0 (25% through)
  "time": 45,             // seconds into playback
  "length": 180,          // total duration in seconds
  "volume": 256           // 0-512
}

5.2 Chromium Control

Requirement: Enable Chrome DevTools Protocol (CDP)

Launch Command:

chromium --remote-debugging-port=9222 --kiosk --app=https://example.com

Status Query:

curl http://127.0.0.1:9222/json

Response Fields to Monitor:

[
  {
    "url": "https://example.com",
    "title": "Page Title",
    "type": "page"
  }
]

Advanced: Use CDP WebSocket for events (page load, navigation, errors)


5.3 PDF Viewer (Custom or Standard)

Option A: Standard Viewer (e.g., Evince)

  • No built-in API
  • Monitor via process check + screenshot comparison

Option B: Custom Python Viewer

  • Implement REST API for status queries
  • Track: current page, total pages, last transition time

6. Watchdog Service Architecture

6.1 Service Components

Component 1: Process Monitor Thread

Responsibilities:
  - Check process alive every 5 seconds
  - Detect crashes and frozen processes
  - Attempt automatic restart
  - Send health metrics via MQTT

State Machine:
  IDLE → STARTING → RUNNING → (if crash) → RESTARTING → RUNNING
                             → (if max restarts) → FAILED

Component 2: MQTT Publisher Thread

Responsibilities:
  - Maintain MQTT connection
  - Send heartbeat every 60 seconds
  - Send logs on-demand (queued from other components)
  - Send health metrics every 5 seconds
  - Reconnect on connection loss

Component 3: Event Manager Integration

Responsibilities:
  - Receive event schedule from server
  - Notify watchdog of expected process/content
  - Launch media player processes
  - Handle event transitions

6.2 Service Lifecycle

On Startup:

  1. Load configuration (client UUID, MQTT broker, etc.)
  2. Connect to MQTT broker
  3. Send INFO log: "Watchdog service started"
  4. Wait for first event from scheduler

During Operation:

  1. Monitor loop runs every 5 seconds
  2. Check expected vs actual process state
  3. Send health metrics
  4. Handle failures (log + restart)

On Shutdown:

  1. Send INFO log: "Watchdog service stopping"
  2. Gracefully stop monitored processes
  3. Disconnect from MQTT
  4. Exit cleanly

7. Auto-Recovery Logic

7.1 Restart Strategy

Step 1: Detect Failure

Trigger: Process not found in process list
Action:
  - Log ERROR: "Process {name} crashed"
  - Increment restart counter
  - Check if within retry limit (max 3)

Step 2: Attempt Restart

If restart_attempts < MAX_RESTARTS:
  - Log WARN: "Attempting restart ({attempt}/{MAX_RESTARTS})"
  - Kill any zombie processes
  - Wait 2 seconds (cooldown)
  - Launch process with same parameters
  - Wait 5 seconds for startup
  - Verify process is running
  - If success: reset restart counter, log INFO
  - If fail: increment counter, repeat

Step 3: Permanent Failure

If restart_attempts >= MAX_RESTARTS:
  - Log ERROR: "Max restart attempts exceeded, failing over"
  - Display fallback content (static image with error message)
  - Send notification to server (separate alert topic, optional)
  - Wait for manual intervention or scheduler event change

7.2 Restart Cooldown

Purpose: Prevent rapid restart loops that waste resources

Implementation:

After each restart attempt:
  - Wait 2 seconds before next restart
  - After 3 failures: wait 30 seconds before trying again
  - Reset counter on successful run >5 minutes

8. Resource Monitoring

8.1 System Metrics to Track

CPU Usage:

Method: Read /proc/stat or use psutil.cpu_percent()
Frequency: Every 5 seconds
Threshold: Warn if >80% for >60 seconds

Memory Usage:

Method: Read /proc/meminfo or use psutil.virtual_memory()
Frequency: Every 5 seconds
Threshold: Warn if >90% for >30 seconds

Display Status:

Method: Check DPMS state or xset query
Frequency: Every 30 seconds
Threshold: Error if display off (unexpected)

Network Connectivity:

Method: Ping server or check MQTT connection
Frequency: Every 60 seconds
Threshold: Warn if no server connectivity

9. Development vs Production Mode

9.1 Development Mode

Enable via: Environment variable DEBUG=true or ENV=development

Behavior:

  • Send INFO level logs
  • More verbose logging to console
  • Shorter monitoring intervals (faster feedback)
  • Screenshot capture every 30 seconds
  • No rate limiting on logs

9.2 Production Mode

Enable via: ENV=production

Behavior:

  • Send only ERROR and WARN logs
  • Minimal console output
  • Standard monitoring intervals
  • Screenshot capture every 60 seconds
  • Rate limiting: max 10 logs per minute per level

10. Configuration File Format

File: /etc/infoscreen/config.json or ~/.config/infoscreen/config.json

{
  "client": {
    "uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
    "hostname": "infoscreen-room-101"
  },
  "mqtt": {
    "broker": "192.168.43.201",
    "port": 1883,
    "username": "",
    "password": "",
    "keepalive": 60
  },
  "monitoring": {
    "enabled": true,
    "health_interval_seconds": 5,
    "heartbeat_interval_seconds": 60,
    "max_restart_attempts": 3,
    "restart_cooldown_seconds": 2
  },
  "logging": {
    "level": "INFO",
    "send_info_logs": false,
    "console_output": true,
    "local_log_file": "/var/log/infoscreen/watchdog.log"
  },
  "processes": {
    "vlc": {
      "http_port": 8080,
      "http_password": "vlc_password"
    },
    "chromium": {
      "debug_port": 9222
    }
  }
}

11. Error Scenarios & Expected Behavior

Scenario 1: VLC Crashes Mid-Video

1. Watchdog detects: process_status = "crashed"
2. Send ERROR log: "VLC process crashed"
3. Attempt 1: Restart VLC with same video, seek to last position
4. If success: Send INFO log "VLC restarted successfully"
5. If fail: Repeat 2 more times
6. After 3 failures: Send ERROR "Max restarts exceeded", show fallback

Scenario 2: Network Timeout Loading Website

1. Chromium fails to load page (CDP reports error)
2. Send WARN log: "Page load timeout"
3. Attempt reload (Chromium refresh)
4. If success after 10s: Continue monitoring
5. If timeout again: Send ERROR, try restarting Chromium

Scenario 3: Display Powers Off (Hardware)

1. DPMS check detects display off
2. Send ERROR log: "Display powered off"
3. Attempt to wake display (xset dpms force on)
4. If success: Send INFO log
5. If fail: Hardware issue, alert admin

Scenario 4: High CPU Usage

1. CPU >80% for 60 seconds
2. Send WARN log: "High CPU usage: 85%"
3. Check if expected (e.g., video playback is normal)
4. If unexpected: investigate process causing it
5. If critical (>95%): consider restarting offending process

12. Testing & Validation

12.1 Manual Tests (During Development)

Test 1: Process Crash Simulation

# Start video, then kill VLC manually
killall vlc
# Expected: ERROR log sent, automatic restart within 5 seconds

Test 2: MQTT Connectivity

# Subscribe to all client topics on server
mosquitto_sub -h 192.168.43.201 -t "infoscreen/{uuid}/#" -v
# Expected: See heartbeat every 60s, health every 5s

Test 3: Log Levels

# Trigger error condition and verify log appears in database
curl http://192.168.43.201:8000/api/client-logs/test
# Expected: See new log entry with correct level/message

12.2 Acceptance Criteria

Client must:

  1. Send heartbeat every 60 seconds without gaps
  2. Send ERROR log within 5 seconds of process crash
  3. Attempt automatic restart (max 3 times)
  4. Report health metrics every 5 seconds
  5. Survive MQTT broker restart (reconnect automatically)
  6. Survive network interruption (buffer logs, send when reconnected)
  7. Use correct timestamp format (ISO 8601 UTC)
  8. Only send logs for real client UUID (FK constraint)

For process monitoring:

  • psutil - Cross-platform process and system utilities

For MQTT:

  • paho-mqtt - Official MQTT client (use v2.x with Callback API v2)

For VLC control:

  • requests - HTTP client for status queries

For Chromium control:

  • websocket-client or pychrome - Chrome DevTools Protocol

For datetime:

  • datetime (stdlib) - Use datetime.now(timezone.utc).isoformat()

Example requirements.txt:

paho-mqtt>=2.0.0
psutil>=5.9.0
requests>=2.31.0
python-dateutil>=2.8.0

14. Security Considerations

14.1 MQTT Security

  • If broker requires auth, store credentials in config file with restricted permissions (chmod 600)
  • Consider TLS/SSL for MQTT (port 8883) if on untrusted network
  • Use unique client ID to prevent impersonation

14.2 Process Control APIs

  • VLC HTTP password should be random, not default
  • Chromium debug port should bind to 127.0.0.1 only (not 0.0.0.0)
  • Restrict file system access for media player processes

14.3 Log Content

  • Do not log: Passwords, API keys, personal data
  • Sanitize: File paths (strip user directories), URLs (remove query params with tokens)

15. Performance Targets

Metric Target Acceptable Critical
Health check interval 5s 10s 30s
Crash detection time <5s <10s <30s
Restart time <10s <20s <60s
MQTT publish latency <100ms <500ms <2s
CPU usage (watchdog) <2% <5% <10%
RAM usage (watchdog) <50MB <100MB <200MB
Log message size <1KB <10KB <100KB

16. Troubleshooting Guide (For Client Development)

Issue: Logs not appearing in server database

Check:

  1. Is MQTT broker reachable? (mosquitto_pub test from client)
  2. Is client UUID correct and exists in clients table?
  3. Is timestamp format correct (ISO 8601 with 'Z')?
  4. Check server listener logs for errors

Issue: Health metrics not updating

Check:

  1. Is health loop running? (check watchdog service status)
  2. Is MQTT connected? (check connection status in logs)
  3. Is payload JSON valid? (use JSON validator)

Issue: Process restarts in loop

Check:

  1. Is media file/URL accessible?
  2. Is process command correct? (test manually)
  3. Check process exit code (crash reason)
  4. Increase restart cooldown to avoid rapid loops

17. Complete Message Flow Diagram

┌─────────────────────────────────────────────────────────┐
│                    Infoscreen Client                     │
│                                                          │
│  Event Occurs:                                           │
│    - Process crashed                                     │
│    - High CPU usage                                      │
│    - Content loaded                                      │
│                                                          │
│  ┌────────────────┐                                     │
│  │ Decision Logic │                                     │
│  │  - Is it ERROR?│                                     │
│  │  - Is it WARN? │                                     │
│  │  - Is it INFO? │                                     │
│  └────────┬───────┘                                     │
│           │                                              │
│           ▼                                              │
│  ┌────────────────────────────────┐                    │
│  │ Build JSON Payload              │                    │
│  │ {                               │                    │
│  │   "timestamp": "...",           │                    │
│  │   "message": "...",             │                    │
│  │   "context": {...}              │                    │
│  │ }                               │                    │
│  └────────┬───────────────────────┘                    │
│           │                                              │
│           ▼                                              │
│  ┌────────────────────────────────┐                    │
│  │ MQTT Publish                    │                    │
│  │ Topic: infoscreen/{uuid}/logs/error                 │
│  │ QoS: 1                          │                    │
│  └────────┬───────────────────────┘                    │
└───────────┼──────────────────────────────────────────┘
            │
            │ TCP/IP (MQTT Protocol)
            │
            ▼
     ┌──────────────┐
     │ MQTT Broker  │
     │ (Mosquitto)  │
     └──────┬───────┘
            │
            │ Topic: infoscreen/+/logs/#
            │
            ▼
     ┌──────────────────────────────┐
     │   Listener Service            │
     │   (Python)                    │
     │                               │
     │  - Parse JSON                 │
     │  - Validate UUID              │
     │  - Store in database          │
     └──────┬───────────────────────┘
            │
            ▼
     ┌──────────────────────────────┐
     │   MariaDB Database            │
     │                               │
     │   Table: client_logs          │
     │   - client_uuid               │
     │   - timestamp                 │
     │   - level                     │
     │   - message                   │
     │   - context (JSON)            │
     └──────┬───────────────────────┘
            │
            │ SQL Query
            │
            ▼
     ┌──────────────────────────────┐
     │   API Server (Flask)          │
     │                               │
     │   GET /api/client-logs/{uuid}/logs
     │   GET /api/client-logs/summary
     └──────┬───────────────────────┘
            │
            │ HTTP/JSON
            │
            ▼
     ┌──────────────────────────────┐
     │   Dashboard (React)           │
     │                               │
     │   - Display logs              │
     │   - Filter by level           │
     │   - Show health status        │
     └───────────────────────────────┘

18. Quick Reference Card

MQTT Topics Summary

infoscreen/{uuid}/logs/error    → Critical failures
infoscreen/{uuid}/logs/warn     → Non-critical issues
infoscreen/{uuid}/logs/info     → Informational (dev mode)
infoscreen/{uuid}/health        → Health metrics (every 5s)
infoscreen/{uuid}/heartbeat     → Enhanced heartbeat (every 60s)

JSON Timestamp Format

from datetime import datetime, timezone
timestamp = datetime.now(timezone.utc).isoformat()
# Output: "2026-03-10T07:30:00+00:00" or "2026-03-10T07:30:00Z"

Process Status Values

"running"  - Process is alive and responding
"crashed"  - Process terminated unexpectedly
"starting" - Process is launching (startup phase)
"stopped"  - Process intentionally stopped

Restart Logic

Max attempts: 3
Cooldown: 2 seconds between attempts
Reset: After 5 minutes of successful operation

19. Contact & Support

Server API Documentation:

  • Base URL: http://192.168.43.201:8000
  • Health check: GET /health
  • Test logs: GET /api/client-logs/test (no auth)
  • Full API docs: See CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md on server

MQTT Broker:

  • Host: 192.168.43.201
  • Port: 1883 (standard), 9001 (WebSocket)
  • Test tool: mosquitto_pub / mosquitto_sub

Database Schema:

  • Table: client_logs
  • Foreign Key: client_uuidclients.uuid (ON DELETE CASCADE)
  • Constraint: UUID must exist in clients table before logging

Server-Side Logs:

# View listener logs (processes MQTT messages)
docker compose logs -f listener

# View server logs (API requests)
docker compose logs -f server

20. Appendix: Example Implementations

A. Minimal Python Watchdog (Pseudocode)

import time
import json
import psutil
import paho.mqtt.client as mqtt
from datetime import datetime, timezone

class MinimalWatchdog:
    def __init__(self, client_uuid, mqtt_broker):
        self.uuid = client_uuid
        self.mqtt_client = mqtt.Client(callback_api_version=mqtt.CallbackAPIVersion.VERSION2)
        self.mqtt_client.connect(mqtt_broker, 1883, 60)
        self.mqtt_client.loop_start()
        
        self.expected_process = None
        self.restart_attempts = 0
        self.MAX_RESTARTS = 3
    
    def send_log(self, level, message, context=None):
        topic = f"infoscreen/{self.uuid}/logs/{level}"
        payload = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "message": message,
            "context": context or {}
        }
        self.mqtt_client.publish(topic, json.dumps(payload), qos=1)
    
    def is_process_running(self, process_name):
        for proc in psutil.process_iter(['name']):
            if process_name in proc.info['name']:
                return True
        return False
    
    def monitor_loop(self):
        while True:
            if self.expected_process:
                if not self.is_process_running(self.expected_process):
                    self.send_log("error", f"{self.expected_process} crashed")
                    if self.restart_attempts < self.MAX_RESTARTS:
                        self.restart_process()
                    else:
                        self.send_log("error", "Max restarts exceeded")
            
            time.sleep(5)

# Usage:
watchdog = MinimalWatchdog("9b8d1856-ff34-4864-a726-12de072d0f77", "192.168.43.201")
watchdog.expected_process = "vlc"
watchdog.monitor_loop()

END OF SPECIFICATION

Questions? Refer to:

  • CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md (server repo)
  • Server API: http://192.168.43.201:8000/api/client-logs/test
  • MQTT test: mosquitto_sub -h 192.168.43.201 -t infoscreen/#