feat(monitoring): add server-side client logging and health infrastructure

- add Alembic migration c1d2e3f4g5h6 for client monitoring:
  - create client_logs table with FK to clients.uuid and performance indexes
  - extend clients with process/health tracking fields
- extend data model with ClientLog, LogLevel, ProcessStatus, and ScreenHealthStatus
- enhance listener MQTT handling:
  - subscribe to logs and health topics
  - persist client logs from infoscreen/{uuid}/logs/{level}
  - process health payloads and enrich heartbeat-derived client state
- add monitoring API blueprint server/routes/client_logs.py:
  - GET /api/client-logs/<uuid>/logs
  - GET /api/client-logs/summary
  - GET /api/client-logs/recent-errors
  - GET /api/client-logs/test
- register client_logs blueprint in server/wsgi.py
- align compose/dev runtime for listener live-code execution
- add client-side implementation docs:
  - CLIENT_MONITORING_SPECIFICATION.md
  - CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md
- update TECH-CHANGELOG.md and copilot-instructions.md:
  - document monitoring changes
  - codify post-release technical-notes/no-version-bump convention
This commit is contained in:
2026-03-10 07:33:38 +00:00
parent 7746e26385
commit 3107d0f671
10 changed files with 2307 additions and 6 deletions

View File

@@ -0,0 +1,972 @@
# Client-Side Monitoring Specification
**Version:** 1.0
**Date:** 2026-03-10
**For:** Infoscreen Client Implementation
**Server Endpoint:** `192.168.43.201:8000` (or your production server)
**MQTT Broker:** `192.168.43.201:1883` (or your production MQTT broker)
---
## 1. Overview
Each infoscreen client must implement health monitoring and logging capabilities to report status to the central server via MQTT.
### 1.1 Goals
- **Detect failures:** Process crashes, frozen screens, content mismatches
- **Provide visibility:** Real-time health status visible on server dashboard
- **Enable remote diagnosis:** Centralized log storage for debugging
- **Auto-recovery:** Attempt automatic restart on failure
### 1.2 Architecture
```
┌─────────────────────────────────────────┐
│ Infoscreen Client │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Media Player │ │ Watchdog │ │
│ │ (VLC/Chrome) │◄───│ Monitor │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────────┐ │ │
│ │ Event Mgr │ │ │
│ │ (receives │ │ │
│ │ schedule) │◄───────────┘ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────────────────────┐ │
│ │ MQTT Client │ │
│ │ - Heartbeat (every 60s) │ │
│ │ - Logs (error/warn/info) │ │
│ │ - Health metrics (every 5s) │ │
│ └──────┬────────────────────────┘ │
└─────────┼──────────────────────────────┘
│ MQTT over TCP
┌─────────────┐
│ MQTT Broker │
│ (server) │
└─────────────┘
```
---
## 2. MQTT Protocol Specification
### 2.1 Connection Parameters
```
Broker: 192.168.43.201 (or DNS hostname)
Port: 1883 (standard MQTT)
Protocol: MQTT v3.1.1
Client ID: "infoscreen-{client_uuid}"
Clean Session: false (retain subscriptions)
Keep Alive: 60 seconds
Username/Password: (if configured on broker)
```
### 2.2 QoS Levels
- **Heartbeat:** QoS 0 (fire and forget, high frequency)
- **Logs (ERROR/WARN):** QoS 1 (at least once delivery, important)
- **Logs (INFO):** QoS 0 (optional, high volume)
- **Health metrics:** QoS 0 (frequent, latest value matters)
---
## 3. Topic Structure & Payload Formats
### 3.1 Log Messages
#### Topic Pattern:
```
infoscreen/{client_uuid}/logs/{level}
```
Where `{level}` is one of: `error`, `warn`, `info`
#### Payload Format (JSON):
```json
{
"timestamp": "2026-03-10T07:30:00Z",
"message": "Human-readable error description",
"context": {
"event_id": 42,
"process": "vlc",
"error_code": "NETWORK_TIMEOUT",
"additional_key": "any relevant data"
}
}
```
#### Field Specifications:
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `timestamp` | string (ISO 8601 UTC) | Yes | When the event occurred. Use `YYYY-MM-DDTHH:MM:SSZ` format |
| `message` | string | Yes | Human-readable description of the event (max 1000 chars) |
| `context` | object | No | Additional structured data (will be stored as JSON) |
#### Example Topics:
```
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/error
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/warn
infoscreen/9b8d1856-ff34-4864-a726-12de072d0f77/logs/info
```
#### When to Send Logs:
**ERROR (Always send):**
- Process crashed (VLC/Chromium/PDF viewer terminated unexpectedly)
- Content failed to load (404, network timeout, corrupt file)
- Hardware failure detected (display off, audio device missing)
- Exception caught in main event loop
- Maximum restart attempts exceeded
**WARN (Always send):**
- Process restarted automatically (after crash)
- High resource usage (CPU >80%, RAM >90%)
- Slow performance (frame drops, lag)
- Non-critical failures (screenshot capture failed, cache full)
- Fallback content displayed (primary source unavailable)
**INFO (Send in development, optional in production):**
- Process started successfully
- Event transition (switched from video to presentation)
- Content loaded successfully
- Watchdog service started/stopped
---
### 3.2 Health Metrics
#### Topic Pattern:
```
infoscreen/{client_uuid}/health
```
#### Payload Format (JSON):
```json
{
"timestamp": "2026-03-10T07:30:00Z",
"expected_state": {
"event_id": 42,
"event_type": "video",
"media_file": "presentation.mp4",
"started_at": "2026-03-10T07:15:00Z"
},
"actual_state": {
"process": "vlc",
"pid": 1234,
"status": "running",
"uptime_seconds": 900,
"position": 45.3,
"duration": 180.0
},
"health_metrics": {
"screen_on": true,
"last_frame_update": "2026-03-10T07:29:58Z",
"frames_dropped": 2,
"network_errors": 0,
"cpu_percent": 15.3,
"memory_mb": 234
}
}
```
#### Field Specifications:
**expected_state:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `event_id` | integer | Yes | Current event ID from scheduler |
| `event_type` | string | Yes | `presentation`, `video`, `website`, `webuntis`, `message` |
| `media_file` | string | No | Filename or URL of current content |
| `started_at` | string (ISO 8601) | Yes | When this event started playing |
**actual_state:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `process` | string | Yes | `vlc`, `chromium`, `pdf_viewer`, `none` |
| `pid` | integer | No | Process ID (if running) |
| `status` | string | Yes | `running`, `crashed`, `starting`, `stopped` |
| `uptime_seconds` | integer | No | How long process has been running |
| `position` | float | No | Current playback position (seconds, for video/audio) |
| `duration` | float | No | Total content duration (seconds) |
**health_metrics:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `screen_on` | boolean | Yes | Is display powered on? |
| `last_frame_update` | string (ISO 8601) | No | Last time screen content changed |
| `frames_dropped` | integer | No | Video frames dropped (performance indicator) |
| `network_errors` | integer | No | Count of network errors in last interval |
| `cpu_percent` | float | No | CPU usage (0-100) |
| `memory_mb` | integer | No | RAM usage in megabytes |
#### Sending Frequency:
- **Normal operation:** Every 5 seconds
- **During startup/transition:** Every 1 second
- **After error:** Immediately + every 2 seconds until recovered
---
### 3.3 Enhanced Heartbeat
The existing heartbeat topic should be enhanced to include process status.
#### Topic Pattern:
```
infoscreen/{client_uuid}/heartbeat
```
#### Enhanced Payload Format (JSON):
```json
{
"uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
"timestamp": "2026-03-10T07:30:00Z",
"current_process": "vlc",
"process_pid": 1234,
"process_status": "running",
"current_event_id": 42
}
```
#### New Fields (add to existing heartbeat):
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `current_process` | string | No | Name of active media player process |
| `process_pid` | integer | No | Process ID |
| `process_status` | string | No | `running`, `crashed`, `starting`, `stopped` |
| `current_event_id` | integer | No | Event ID currently being displayed |
#### Sending Frequency:
- Keep existing: **Every 60 seconds**
- Include new fields if available
---
## 4. Process Monitoring Requirements
### 4.1 Processes to Monitor
| Media Type | Process Name | How to Detect |
|------------|--------------|---------------|
| Video | `vlc` | `ps aux \| grep vlc` or `pgrep vlc` |
| Website/WebUntis | `chromium` or `chromium-browser` | `pgrep chromium` |
| PDF Presentation | `evince`, `okular`, or custom viewer | `pgrep {viewer_name}` |
### 4.2 Monitoring Checks (Every 5 seconds)
#### Check 1: Process Alive
```
Goal: Verify expected process is running
Method:
- Get list of running processes (psutil or `ps`)
- Check if expected process name exists
- Match PID if known
Result:
- If missing → status = "crashed"
- If found → status = "running"
Action on crash:
- Send ERROR log immediately
- Attempt restart (max 3 attempts)
- Send WARN log on each restart
- If max restarts exceeded → send ERROR log, display fallback
```
#### Check 2: Process Responsive
```
Goal: Detect frozen processes
Method:
- For VLC: Query HTTP interface (status.json)
- For Chromium: Use DevTools Protocol (CDP)
- For custom viewers: Check last screen update time
Result:
- If same frame >30 seconds → likely frozen
- If playback position not advancing → frozen
Action on freeze:
- Send WARN log
- Force refresh (reload page, seek video, next slide)
- If refresh fails → restart process
```
#### Check 3: Content Match
```
Goal: Verify correct content is displayed
Method:
- Compare expected event_id with actual media/URL
- Check scheduled time window (is event still active?)
Result:
- Mismatch → content error
Action:
- Send WARN log
- Reload correct event from scheduler
```
---
## 5. Process Control Interface Requirements
### 5.1 VLC Control
**Requirement:** Enable VLC HTTP interface for monitoring
**Launch Command:**
```bash
vlc --intf http --http-host 127.0.0.1 --http-port 8080 --http-password "vlc_password" \
--fullscreen --loop /path/to/video.mp4
```
**Status Query:**
```bash
curl http://127.0.0.1:8080/requests/status.json --user ":vlc_password"
```
**Response Fields to Monitor:**
```json
{
"state": "playing", // "playing", "paused", "stopped"
"position": 0.25, // 0.0-1.0 (25% through)
"time": 45, // seconds into playback
"length": 180, // total duration in seconds
"volume": 256 // 0-512
}
```
---
### 5.2 Chromium Control
**Requirement:** Enable Chrome DevTools Protocol (CDP)
**Launch Command:**
```bash
chromium --remote-debugging-port=9222 --kiosk --app=https://example.com
```
**Status Query:**
```bash
curl http://127.0.0.1:9222/json
```
**Response Fields to Monitor:**
```json
[
{
"url": "https://example.com",
"title": "Page Title",
"type": "page"
}
]
```
**Advanced:** Use CDP WebSocket for events (page load, navigation, errors)
---
### 5.3 PDF Viewer (Custom or Standard)
**Option A: Standard Viewer (e.g., Evince)**
- No built-in API
- Monitor via process check + screenshot comparison
**Option B: Custom Python Viewer**
- Implement REST API for status queries
- Track: current page, total pages, last transition time
---
## 6. Watchdog Service Architecture
### 6.1 Service Components
**Component 1: Process Monitor Thread**
```
Responsibilities:
- Check process alive every 5 seconds
- Detect crashes and frozen processes
- Attempt automatic restart
- Send health metrics via MQTT
State Machine:
IDLE → STARTING → RUNNING → (if crash) → RESTARTING → RUNNING
→ (if max restarts) → FAILED
```
**Component 2: MQTT Publisher Thread**
```
Responsibilities:
- Maintain MQTT connection
- Send heartbeat every 60 seconds
- Send logs on-demand (queued from other components)
- Send health metrics every 5 seconds
- Reconnect on connection loss
```
**Component 3: Event Manager Integration**
```
Responsibilities:
- Receive event schedule from server
- Notify watchdog of expected process/content
- Launch media player processes
- Handle event transitions
```
### 6.2 Service Lifecycle
**On Startup:**
1. Load configuration (client UUID, MQTT broker, etc.)
2. Connect to MQTT broker
3. Send INFO log: "Watchdog service started"
4. Wait for first event from scheduler
**During Operation:**
1. Monitor loop runs every 5 seconds
2. Check expected vs actual process state
3. Send health metrics
4. Handle failures (log + restart)
**On Shutdown:**
1. Send INFO log: "Watchdog service stopping"
2. Gracefully stop monitored processes
3. Disconnect from MQTT
4. Exit cleanly
---
## 7. Auto-Recovery Logic
### 7.1 Restart Strategy
**Step 1: Detect Failure**
```
Trigger: Process not found in process list
Action:
- Log ERROR: "Process {name} crashed"
- Increment restart counter
- Check if within retry limit (max 3)
```
**Step 2: Attempt Restart**
```
If restart_attempts < MAX_RESTARTS:
- Log WARN: "Attempting restart ({attempt}/{MAX_RESTARTS})"
- Kill any zombie processes
- Wait 2 seconds (cooldown)
- Launch process with same parameters
- Wait 5 seconds for startup
- Verify process is running
- If success: reset restart counter, log INFO
- If fail: increment counter, repeat
```
**Step 3: Permanent Failure**
```
If restart_attempts >= MAX_RESTARTS:
- Log ERROR: "Max restart attempts exceeded, failing over"
- Display fallback content (static image with error message)
- Send notification to server (separate alert topic, optional)
- Wait for manual intervention or scheduler event change
```
### 7.2 Restart Cooldown
**Purpose:** Prevent rapid restart loops that waste resources
**Implementation:**
```
After each restart attempt:
- Wait 2 seconds before next restart
- After 3 failures: wait 30 seconds before trying again
- Reset counter on successful run >5 minutes
```
---
## 8. Resource Monitoring
### 8.1 System Metrics to Track
**CPU Usage:**
```
Method: Read /proc/stat or use psutil.cpu_percent()
Frequency: Every 5 seconds
Threshold: Warn if >80% for >60 seconds
```
**Memory Usage:**
```
Method: Read /proc/meminfo or use psutil.virtual_memory()
Frequency: Every 5 seconds
Threshold: Warn if >90% for >30 seconds
```
**Display Status:**
```
Method: Check DPMS state or xset query
Frequency: Every 30 seconds
Threshold: Error if display off (unexpected)
```
**Network Connectivity:**
```
Method: Ping server or check MQTT connection
Frequency: Every 60 seconds
Threshold: Warn if no server connectivity
```
---
## 9. Development vs Production Mode
### 9.1 Development Mode
**Enable via:** Environment variable `DEBUG=true` or `ENV=development`
**Behavior:**
- Send INFO level logs
- More verbose logging to console
- Shorter monitoring intervals (faster feedback)
- Screenshot capture every 30 seconds
- No rate limiting on logs
### 9.2 Production Mode
**Enable via:** `ENV=production`
**Behavior:**
- Send only ERROR and WARN logs
- Minimal console output
- Standard monitoring intervals
- Screenshot capture every 60 seconds
- Rate limiting: max 10 logs per minute per level
---
## 10. Configuration File Format
### 10.1 Recommended Config: JSON
**File:** `/etc/infoscreen/config.json` or `~/.config/infoscreen/config.json`
```json
{
"client": {
"uuid": "9b8d1856-ff34-4864-a726-12de072d0f77",
"hostname": "infoscreen-room-101"
},
"mqtt": {
"broker": "192.168.43.201",
"port": 1883,
"username": "",
"password": "",
"keepalive": 60
},
"monitoring": {
"enabled": true,
"health_interval_seconds": 5,
"heartbeat_interval_seconds": 60,
"max_restart_attempts": 3,
"restart_cooldown_seconds": 2
},
"logging": {
"level": "INFO",
"send_info_logs": false,
"console_output": true,
"local_log_file": "/var/log/infoscreen/watchdog.log"
},
"processes": {
"vlc": {
"http_port": 8080,
"http_password": "vlc_password"
},
"chromium": {
"debug_port": 9222
}
}
}
```
---
## 11. Error Scenarios & Expected Behavior
### Scenario 1: VLC Crashes Mid-Video
```
1. Watchdog detects: process_status = "crashed"
2. Send ERROR log: "VLC process crashed"
3. Attempt 1: Restart VLC with same video, seek to last position
4. If success: Send INFO log "VLC restarted successfully"
5. If fail: Repeat 2 more times
6. After 3 failures: Send ERROR "Max restarts exceeded", show fallback
```
### Scenario 2: Network Timeout Loading Website
```
1. Chromium fails to load page (CDP reports error)
2. Send WARN log: "Page load timeout"
3. Attempt reload (Chromium refresh)
4. If success after 10s: Continue monitoring
5. If timeout again: Send ERROR, try restarting Chromium
```
### Scenario 3: Display Powers Off (Hardware)
```
1. DPMS check detects display off
2. Send ERROR log: "Display powered off"
3. Attempt to wake display (xset dpms force on)
4. If success: Send INFO log
5. If fail: Hardware issue, alert admin
```
### Scenario 4: High CPU Usage
```
1. CPU >80% for 60 seconds
2. Send WARN log: "High CPU usage: 85%"
3. Check if expected (e.g., video playback is normal)
4. If unexpected: investigate process causing it
5. If critical (>95%): consider restarting offending process
```
---
## 12. Testing & Validation
### 12.1 Manual Tests (During Development)
**Test 1: Process Crash Simulation**
```bash
# Start video, then kill VLC manually
killall vlc
# Expected: ERROR log sent, automatic restart within 5 seconds
```
**Test 2: MQTT Connectivity**
```bash
# Subscribe to all client topics on server
mosquitto_sub -h 192.168.43.201 -t "infoscreen/{uuid}/#" -v
# Expected: See heartbeat every 60s, health every 5s
```
**Test 3: Log Levels**
```bash
# Trigger error condition and verify log appears in database
curl http://192.168.43.201:8000/api/client-logs/test
# Expected: See new log entry with correct level/message
```
### 12.2 Acceptance Criteria
**Client must:**
1. Send heartbeat every 60 seconds without gaps
2. Send ERROR log within 5 seconds of process crash
3. Attempt automatic restart (max 3 times)
4. Report health metrics every 5 seconds
5. Survive MQTT broker restart (reconnect automatically)
6. Survive network interruption (buffer logs, send when reconnected)
7. Use correct timestamp format (ISO 8601 UTC)
8. Only send logs for real client UUID (FK constraint)
---
## 13. Python Libraries (Recommended)
**For process monitoring:**
- `psutil` - Cross-platform process and system utilities
**For MQTT:**
- `paho-mqtt` - Official MQTT client (use v2.x with Callback API v2)
**For VLC control:**
- `requests` - HTTP client for status queries
**For Chromium control:**
- `websocket-client` or `pychrome` - Chrome DevTools Protocol
**For datetime:**
- `datetime` (stdlib) - Use `datetime.now(timezone.utc).isoformat()`
**Example requirements.txt:**
```
paho-mqtt>=2.0.0
psutil>=5.9.0
requests>=2.31.0
python-dateutil>=2.8.0
```
---
## 14. Security Considerations
### 14.1 MQTT Security
- If broker requires auth, store credentials in config file with restricted permissions (`chmod 600`)
- Consider TLS/SSL for MQTT (port 8883) if on untrusted network
- Use unique client ID to prevent impersonation
### 14.2 Process Control APIs
- VLC HTTP password should be random, not default
- Chromium debug port should bind to `127.0.0.1` only (not `0.0.0.0`)
- Restrict file system access for media player processes
### 14.3 Log Content
- **Do not log:** Passwords, API keys, personal data
- **Sanitize:** File paths (strip user directories), URLs (remove query params with tokens)
---
## 15. Performance Targets
| Metric | Target | Acceptable | Critical |
|--------|--------|------------|----------|
| Health check interval | 5s | 10s | 30s |
| Crash detection time | <5s | <10s | <30s |
| Restart time | <10s | <20s | <60s |
| MQTT publish latency | <100ms | <500ms | <2s |
| CPU usage (watchdog) | <2% | <5% | <10% |
| RAM usage (watchdog) | <50MB | <100MB | <200MB |
| Log message size | <1KB | <10KB | <100KB |
---
## 16. Troubleshooting Guide (For Client Development)
### Issue: Logs not appearing in server database
**Check:**
1. Is MQTT broker reachable? (`mosquitto_pub` test from client)
2. Is client UUID correct and exists in `clients` table?
3. Is timestamp format correct (ISO 8601 with 'Z')?
4. Check server listener logs for errors
### Issue: Health metrics not updating
**Check:**
1. Is health loop running? (check watchdog service status)
2. Is MQTT connected? (check connection status in logs)
3. Is payload JSON valid? (use JSON validator)
### Issue: Process restarts in loop
**Check:**
1. Is media file/URL accessible?
2. Is process command correct? (test manually)
3. Check process exit code (crash reason)
4. Increase restart cooldown to avoid rapid loops
---
## 17. Complete Message Flow Diagram
```
┌─────────────────────────────────────────────────────────┐
│ Infoscreen Client │
│ │
│ Event Occurs: │
│ - Process crashed │
│ - High CPU usage │
│ - Content loaded │
│ │
│ ┌────────────────┐ │
│ │ Decision Logic │ │
│ │ - Is it ERROR?│ │
│ │ - Is it WARN? │ │
│ │ - Is it INFO? │ │
│ └────────┬───────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ Build JSON Payload │ │
│ │ { │ │
│ │ "timestamp": "...", │ │
│ │ "message": "...", │ │
│ │ "context": {...} │ │
│ │ } │ │
│ └────────┬───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ MQTT Publish │ │
│ │ Topic: infoscreen/{uuid}/logs/error │
│ │ QoS: 1 │ │
│ └────────┬───────────────────────┘ │
└───────────┼──────────────────────────────────────────┘
│ TCP/IP (MQTT Protocol)
┌──────────────┐
│ MQTT Broker │
│ (Mosquitto) │
└──────┬───────┘
│ Topic: infoscreen/+/logs/#
┌──────────────────────────────┐
│ Listener Service │
│ (Python) │
│ │
│ - Parse JSON │
│ - Validate UUID │
│ - Store in database │
└──────┬───────────────────────┘
┌──────────────────────────────┐
│ MariaDB Database │
│ │
│ Table: client_logs │
│ - client_uuid │
│ - timestamp │
│ - level │
│ - message │
│ - context (JSON) │
└──────┬───────────────────────┘
│ SQL Query
┌──────────────────────────────┐
│ API Server (Flask) │
│ │
│ GET /api/client-logs/{uuid}/logs
│ GET /api/client-logs/summary
└──────┬───────────────────────┘
│ HTTP/JSON
┌──────────────────────────────┐
│ Dashboard (React) │
│ │
│ - Display logs │
│ - Filter by level │
│ - Show health status │
└───────────────────────────────┘
```
---
## 18. Quick Reference Card
### MQTT Topics Summary
```
infoscreen/{uuid}/logs/error → Critical failures
infoscreen/{uuid}/logs/warn → Non-critical issues
infoscreen/{uuid}/logs/info → Informational (dev mode)
infoscreen/{uuid}/health → Health metrics (every 5s)
infoscreen/{uuid}/heartbeat → Enhanced heartbeat (every 60s)
```
### JSON Timestamp Format
```python
from datetime import datetime, timezone
timestamp = datetime.now(timezone.utc).isoformat()
# Output: "2026-03-10T07:30:00+00:00" or "2026-03-10T07:30:00Z"
```
### Process Status Values
```
"running" - Process is alive and responding
"crashed" - Process terminated unexpectedly
"starting" - Process is launching (startup phase)
"stopped" - Process intentionally stopped
```
### Restart Logic
```
Max attempts: 3
Cooldown: 2 seconds between attempts
Reset: After 5 minutes of successful operation
```
---
## 19. Contact & Support
**Server API Documentation:**
- Base URL: `http://192.168.43.201:8000`
- Health check: `GET /health`
- Test logs: `GET /api/client-logs/test` (no auth)
- Full API docs: See `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` on server
**MQTT Broker:**
- Host: `192.168.43.201`
- Port: `1883` (standard), `9001` (WebSocket)
- Test tool: `mosquitto_pub` / `mosquitto_sub`
**Database Schema:**
- Table: `client_logs`
- Foreign Key: `client_uuid``clients.uuid` (ON DELETE CASCADE)
- Constraint: UUID must exist in clients table before logging
**Server-Side Logs:**
```bash
# View listener logs (processes MQTT messages)
docker compose logs -f listener
# View server logs (API requests)
docker compose logs -f server
```
---
## 20. Appendix: Example Implementations
### A. Minimal Python Watchdog (Pseudocode)
```python
import time
import json
import psutil
import paho.mqtt.client as mqtt
from datetime import datetime, timezone
class MinimalWatchdog:
def __init__(self, client_uuid, mqtt_broker):
self.uuid = client_uuid
self.mqtt_client = mqtt.Client(callback_api_version=mqtt.CallbackAPIVersion.VERSION2)
self.mqtt_client.connect(mqtt_broker, 1883, 60)
self.mqtt_client.loop_start()
self.expected_process = None
self.restart_attempts = 0
self.MAX_RESTARTS = 3
def send_log(self, level, message, context=None):
topic = f"infoscreen/{self.uuid}/logs/{level}"
payload = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"message": message,
"context": context or {}
}
self.mqtt_client.publish(topic, json.dumps(payload), qos=1)
def is_process_running(self, process_name):
for proc in psutil.process_iter(['name']):
if process_name in proc.info['name']:
return True
return False
def monitor_loop(self):
while True:
if self.expected_process:
if not self.is_process_running(self.expected_process):
self.send_log("error", f"{self.expected_process} crashed")
if self.restart_attempts < self.MAX_RESTARTS:
self.restart_process()
else:
self.send_log("error", "Max restarts exceeded")
time.sleep(5)
# Usage:
watchdog = MinimalWatchdog("9b8d1856-ff34-4864-a726-12de072d0f77", "192.168.43.201")
watchdog.expected_process = "vlc"
watchdog.monitor_loop()
```
---
**END OF SPECIFICATION**
Questions? Refer to:
- `CLIENT_MONITORING_IMPLEMENTATION_GUIDE.md` (server repo)
- Server API: `http://192.168.43.201:8000/api/client-logs/test`
- MQTT test: `mosquitto_sub -h 192.168.43.201 -t infoscreen/#`