feat: period-scoped holiday management, archive lifecycle, and docs/release sync

- add period-scoped holiday architecture end-to-end
	- model: scope `SchoolHoliday` to `academic_period_id`
	- migrations: add holiday-period scoping, academic-period archive lifecycle, and merge migration head
	- API: extend holidays with manual CRUD, period validation, duplicate prevention, and overlap merge/conflict handling
	- recurrence: regenerate holiday exceptions using period-scoped holiday sets

- improve frontend settings and holiday workflows
	- bind holiday import/list/manual CRUD to selected academic period
	- show detailed import outcomes (inserted/updated/merged/skipped/conflicts)
	- fix file-picker UX (visible selected filename)
	- align settings controls/dialogs with defined frontend design rules
	- scope appointments/dashboard holiday loading to active period
	- add shared date formatting utility

- strengthen academic period lifecycle handling
	- add archive/restore/delete flow and backend validations/blocker checks
	- extend API client support for lifecycle operations

- release/docs updates and cleanup
	- bump user-facing version to `2026.1.0-alpha.15` with new changelog entry
	- add tech changelog entry for alpha.15 backend changes
	- refactor README to concise index and archive historical implementation docs
	- fix Copilot instruction link diagnostics via local `.github` design-rules reference
This commit is contained in:
2026-03-31 12:25:55 +00:00
parent 2580aa5e0d
commit b5f5f30005
23 changed files with 2940 additions and 897 deletions

View File

@@ -0,0 +1,361 @@
# Academic Periods CRUD Build Plan
## Goal
Add full academic period lifecycle management to the settings page and backend, including safe archive and hard-delete behavior, recurrence spillover blockers, and a UI restructuring where `Perioden` becomes the first sub-tab under `Akademischer Kalender`.
## Frontend Design Rules
All UI implementation for this build must follow the project-wide frontend design rules:
**[FRONTEND_DESIGN_RULES.md](FRONTEND_DESIGN_RULES.md)**
Key points relevant to this build:
- Syncfusion Material3 components are the default for every UI element
- Use `DialogComponent` for all confirmations — never `window.confirm()`
- Follow the established card structure, button variants, badge colors, and tab patterns
- German strings only in all user-facing text
- No Tailwind classes
## Agreed Rules
### Permissions
- Create: admin or higher
- Edit: admin or higher
- Archive: admin or higher
- Restore: admin or higher
- Hard delete: admin or higher
- Activate: admin or higher
- Editors do not activate periods by default because activation changes global system state
### Lifecycle
- Active: exactly one period at a time
- Inactive: saved period, not currently active
- Archived: retired period, hidden from normal operational selection
- Deleted: physically removed only when delete preconditions are satisfied
### Validation
- `name` is required, trimmed, and unique among non-archived periods
- `startDate` must be less than or equal to `endDate`
- `periodType` must be one of `schuljahr`, `semester`, `trimester`
- Overlaps are disallowed within the same `periodType`
- Overlaps across different `periodType` values are allowed
- Exactly one period may be active at a time
### Archive Rules
- Active periods cannot be archived
- A period cannot be archived if it still has operational dependencies
- Operational dependencies include recurring master events assigned to that period that still generate current or future occurrences
### Restore Rules
- Archived periods can be restored by admin or higher
- Restored periods return as inactive by default
### Hard Delete Rules
- Only archived and inactive periods can be hard-deleted
- Hard delete is blocked if linked events exist
- Hard delete is blocked if linked media exist
- Hard delete is blocked if recurring master events assigned to the period still have current or future scheduling relevance
### Recurrence Spillover Rule
- If a recurring master event belongs to an older period but still creates occurrences in the current or future timeframe, that older period is not eligible for archive or hard delete
- Admin must resolve the recurrence by ending, splitting, or reassigning the series before the period can be retired or deleted
## Build-Oriented Task Plan
### Phase 1: Lock The Contract
Files:
- `server/routes/academic_periods.py`
- `models/models.py`
- `dashboard/src/settings.tsx`
Work:
- Freeze lifecycle rules, validation rules, and blocker rules
- Freeze the settings tab order so `Perioden` comes before `Import & Liste`
- Confirm response shape for new endpoints
Deliverable:
- Stable implementation contract for backend and frontend work
### Phase 2: Extend The Data Model
Files:
- `models/models.py`
Work:
- Add archive lifecycle fields to academic periods
- Recommended fields: `is_archived`, `archived_at`, `archived_by`
Deliverable:
- Academic periods can be retired safely and restored later
### Phase 3: Add The Database Migration
Files:
- `server/alembic.ini`
- `server/alembic/`
- `server/initialize_database.py`
Work:
- Add Alembic migration for archive-related fields and any supporting indexes
- Ensure existing periods default to non-archived
Deliverable:
- Schema upgrade path for current installations
### Phase 4: Expand The Backend API
Files:
- `server/routes/academic_periods.py`
Work:
- Implement full lifecycle endpoints:
- `GET /api/academic_periods`
- `GET /api/academic_periods/:id`
- `POST /api/academic_periods`
- `PUT /api/academic_periods/:id`
- `POST /api/academic_periods/:id/activate`
- `POST /api/academic_periods/:id/archive`
- `POST /api/academic_periods/:id/restore`
- `GET /api/academic_periods/:id/usage`
- `DELETE /api/academic_periods/:id`
Deliverable:
- Academic periods become a fully managed backend resource
### Phase 5: Add Backend Validation And Guardrails
Files:
- `server/routes/academic_periods.py`
- `models/models.py`
Work:
- Enforce required fields, type checks, date checks, overlap checks, and one-active-period behavior
- Block archive and delete when dependency rules fail
Deliverable:
- Backend owns all business-critical safeguards
### Phase 6: Implement Recurrence Spillover Detection
Files:
- `server/routes/academic_periods.py`
- `server/routes/events.py`
- `models/models.py`
Work:
- Detect recurring master events assigned to a period that still generate present or future occurrences
- Treat them as blockers for archive and hard delete
Deliverable:
- Old periods cannot be retired while they still affect the active schedule
### Phase 7: Normalize API Serialization
Files:
- `server/routes/academic_periods.py`
- `server/serializers.py`
Work:
- Return academic period responses in camelCase consistently with the rest of the API
Deliverable:
- Frontend receives normalized API payloads without special-case mapping
### Phase 8: Expand The Frontend API Client
Files:
- `dashboard/src/apiAcademicPeriods.ts`
Work:
- Add frontend client methods for create, update, activate, archive, restore, usage lookup, and hard delete
Deliverable:
- The settings page can manage academic periods through one dedicated API module
### Phase 9: Reorder The Akademischer Kalender Sub-Tabs
Files:
- `dashboard/src/settings.tsx`
Work:
- Move `Perioden` to the first sub-tab
- Move `Import & Liste` to the second sub-tab
- Preserve controlled tab state behavior
Deliverable:
- The settings flow reflects setup before import work
### Phase 10: Replace The Current Period Selector With A Management UI
Files:
- `dashboard/src/settings.tsx`
Work:
- Replace the selector-only period card with a proper management surface
- Show period metadata, active state, archived state, and available actions
Deliverable:
- The periods tab becomes a real administration UI
### Phase 11: Add Create And Edit Flows
Files:
- `dashboard/src/settings.tsx`
Work:
- Add create and edit dialogs or form panels
- Validate input before save and surface backend errors clearly
Deliverable:
- Admins can maintain periods directly in settings
### Phase 12: Add Archive, Restore, And Hard Delete UX
Files:
- `dashboard/src/settings.tsx`
Work:
- Fetch usage or preflight data before destructive actions
- Show exact blockers for linked events, linked media, and recurrence spillover
- Use explicit confirmation dialogs for archive and hard delete
Deliverable:
- Destructive actions are safe and understandable
### Phase 13: Add Archived Visibility Controls
Files:
- `dashboard/src/settings.tsx`
Work:
- Hide archived periods by default or group them behind a toggle
Deliverable:
- Normal operational periods stay easy to manage while retired periods remain accessible
### Phase 14: Add Backend Tests
Files:
- Backend academic period test targets to be identified during implementation
Work:
- Cover create, edit, activate, archive, restore, hard delete, overlap rejection, dependency blockers, and recurrence spillover blockers
Deliverable:
- Lifecycle rules are regression-safe
### Phase 15: Add Frontend Verification
Files:
- `dashboard/src/settings.tsx`
- Frontend test targets to be identified during implementation
Work:
- Verify sub-tab order, CRUD refresh behavior, blocked action messaging, and activation behavior
Deliverable:
- Settings UX remains stable after the management upgrade
### Phase 16: Update Documentation
Files:
- `.github/copilot-instructions.md`
- `README.md`
- `TECH-CHANGELOG.md`
Work:
- Document academic period lifecycle behavior, blocker rules, and updated settings tab order as appropriate
Deliverable:
- Repo guidance stays aligned with implemented behavior
## Suggested Build Sequence
1. Freeze rules and response shape
2. Change the model
3. Add the migration
4. Build backend endpoints
5. Add blocker logic and recurrence checks
6. Expand the frontend API client
7. Reorder sub-tabs
8. Build period management UI
9. Add destructive-action preflight UX
10. Add tests
11. Update documentation
## Recommended Delivery Split
1. Backend foundation
- Model
- Migration
- Routes
- Validation
- Blocker logic
2. Frontend management
- API client
- Tab reorder
- Management UI
- Dialogs
- Usage messaging
3. Verification and docs
- Tests
- Documentation

View File

@@ -0,0 +1,434 @@
# Academic Periods CRUD Implementation - Complete Summary
> Historical snapshot: this file captures the state at implementation time.
> For current behavior and conventions, use [README.md](../../README.md) and [.github/copilot-instructions.md](../../.github/copilot-instructions.md).
## Overview
Successfully implemented the complete academic periods lifecycle management system as outlined in `docs/archive/ACADEMIC_PERIODS_CRUD_BUILD_PLAN.md`. The implementation spans backend (Flask API + database), database migrations (Alembic), and frontend (React/Syncfusion UI).
**Status**: ✅ COMPLETE (All 16 phases)
---
## Implementation Details
### Phase 1: Contract Locked ✅
**Files**: `docs/archive/ACADEMIC_PERIODS_CRUD_BUILD_PLAN.md`
Identified the contract requirements and inconsistencies to resolve:
- Unique constraint on name should exclude archived periods (handled in code via indexed query)
- One-active-period rule enforced in code (transaction safety)
- Recurrence spillover detection implemented via RFC 5545 expansion
---
### Phase 2: Data Model Extended ✅
**File**: `models/models.py`
Added archive lifecycle fields to `AcademicPeriod` class:
```python
is_archived = Column(Boolean, default=False, nullable=False, index=True)
archived_at = Column(TIMESTAMP(timezone=True), nullable=True)
archived_by = Column(Integer, ForeignKey('users.id', ondelete='SET NULL'), nullable=True)
```
Added indexes for:
- `ix_academic_periods_archived` - fast filtering of archived status
- `ix_academic_periods_name_not_archived` - unique name checks among non-archived
Updated `to_dict()` method to include all archive fields in camelCase.
---
### Phase 3: Database Migration Created ✅
**File**: `server/alembic/versions/a7b8c9d0e1f2_add_archive_lifecycle_to_academic_periods.py`
Created Alembic migration that:
- Adds `is_archived`, `archived_at`, `archived_by` columns with server defaults
- Creates foreign key constraint for `archived_by` with CASCADE on user delete
- Creates indexes for performance
- Includes rollback (downgrade) logic
---
### Phase 4: Backend CRUD Endpoints Implemented ✅
**File**: `server/routes/academic_periods.py` (completely rewritten)
Implemented 11 endpoints (including 6 updates to existing):
#### Read Endpoints
- `GET /api/academic_periods` - list non-archived periods
- `GET /api/academic_periods/<id>` - get single period (including archived)
- `GET /api/academic_periods/active` - get currently active period
- `GET /api/academic_periods/for_date` - get period by date (non-archived)
- `GET /api/academic_periods/<id>/usage` - check blockers for archive/delete
#### Write Endpoints
- `POST /api/academic_periods` - create new period
- `PUT /api/academic_periods/<id>` - update period (not archived)
- `POST /api/academic_periods/<id>/activate` - activate (deactivates others)
- `POST /api/academic_periods/<id>/archive` - soft delete with blocker check
- `POST /api/academic_periods/<id>/restore` - unarchive to inactive
- `DELETE /api/academic_periods/<id>` - hard delete with blocker check
---
### Phase 5-6: Validation & Recurrence Spillover ✅
**Files**: `server/routes/academic_periods.py`
Implemented comprehensive validation:
#### Create/Update Validation
- Name: required, trimmed, unique among non-archived (excluding self for update)
- Dates: `startDate``endDate` enforced
- Period type: must be one of `schuljahr`, `semester`, `trimester`
- Overlaps: disallowed within same periodType (allowed across types)
#### Lifecycle Enforcement
- Cannot activate archived periods
- Cannot archive active periods
- Cannot archive periods with active recurring events
- Cannot hard-delete non-archived periods
- Cannot hard-delete periods with linked events
#### Recurrence Spillover Detection
Detects if old periods have recurring master events with current/future occurrences:
```python
rrule_obj = rrulestr(event.recurrence_rule, dtstart=event.start)
next_occurrence = rrule_obj.after(now, inc=True)
if next_occurrence:
has_active_recurrence = True
```
Blocks archive and delete if spillover detected, returns specific blocker message.
---
### Phase 7: API Serialization ✅
**File**: `server/routes/academic_periods.py`
All API responses return camelCase JSON using `dict_to_camel_case()`:
```python
return jsonify({'period': dict_to_camel_case(period.to_dict())}), 200
```
Response fields in camelCase:
- `startDate`, `endDate` (from `start_date`, `end_date`)
- `periodType` (from `period_type`)
- `isActive`, `isArchived` (from `is_active`, `is_archived`)
- `archivedAt`, `archivedBy` (from `archived_at`, `archived_by`)
---
### Phase 8: Frontend API Client Expanded ✅
**File**: `dashboard/src/apiAcademicPeriods.ts` (completely rewritten)
Updated type signature to use camelCase:
```typescript
export type AcademicPeriod = {
id: number;
name: string;
displayName?: string | null;
startDate: string;
endDate: string;
periodType: 'schuljahr' | 'semester' | 'trimester';
isActive: boolean;
isArchived: boolean;
archivedAt?: string | null;
archivedBy?: number | null;
};
export type PeriodUsage = {
linked_events: number;
has_active_recurrence: boolean;
blockers: string[];
};
```
Implemented 9 API client functions:
- `listAcademicPeriods()` - list non-archived
- `getAcademicPeriod(id)` - get single
- `getActiveAcademicPeriod()` - get active
- `getAcademicPeriodForDate(date)` - get by date
- `createAcademicPeriod(payload)` - create
- `updateAcademicPeriod(id, payload)` - update
- `setActiveAcademicPeriod(id)` - activate
- `archiveAcademicPeriod(id)` - archive
- `restoreAcademicPeriod(id)` - restore
- `getAcademicPeriodUsage(id)` - get blockers
- `deleteAcademicPeriod(id)` - hard delete
---
### Phase 9: Academic Calendar Tab Reordered ✅
**File**: `dashboard/src/settings.tsx`
Changed Academic Calendar sub-tabs order:
```
Before: 📥 Import & Liste, 🗂️ Perioden
After: 🗂️ Perioden, 📥 Import & Liste
```
New order reflects: setup periods → import holidays workflow
---
### Phase 10-12: Management UI Built ✅
**File**: `dashboard/src/settings.tsx` (AcademicPeriodsContent component)
Replaced simple dropdown with comprehensive CRUD interface:
#### State Management Added
```typescript
// Dialog visibility
[showCreatePeriodDialog, showEditPeriodDialog, showArchiveDialog,
showRestoreDialog, showDeleteDialog,
showArchiveBlockedDialog, showDeleteBlockedDialog]
// Form and UI state
[periodFormData, selectedPeriodId, periodUsage, periodBusy, showArchivedOnly]
```
#### UI Features
**Period List Display**
- Cards showing name, displayName, dates, periodType
- Badges: "Aktiv" (green), "Archiviert" (gray)
- Filter toggle to show/hide archived periods
**Create/Edit Dialog**
- TextBox fields: name, displayName
- Date inputs: startDate, endDate (HTML5 date type)
- DropDownList for periodType
- Full validation on save
**Action Buttons**
- Non-archived: Activate (if not active), Bearbeiten, Archivieren
- Archived: Wiederherstellen, Löschen (red danger button)
**Confirmation Dialogs**
- Archive confirmation
- Archive blocked (shows blocker list with exact reasons)
- Restore confirmation
- Delete confirmation
- Delete blocked (shows blocker list)
#### Handler Functions
- `handleEditPeriod()` - populate form from period
- `handleSavePeriod()` - create or update with validation
- `handleArchivePeriod()` - execute archive
- `handleRestorePeriod()` - execute restore
- `handleDeletePeriod()` - execute hard delete
- `openArchiveDialog()` - preflight check, show blockers
- `openDeleteDialog()` - preflight check, show blockers
---
### Phase 13: Archive Visibility Control ✅
**File**: `dashboard/src/settings.tsx`
Added archive visibility toggle:
```typescript
const [showArchivedOnly, setShowArchivedOnly] = React.useState(false);
const displayedPeriods = showArchivedOnly
? periods.filter(p => p.isArchived)
: periods.filter(p => !p.isArchived);
```
Button shows:
- "Aktive zeigen" when viewing archived
- "Archiv (count)" when viewing active
---
### Phase 14-15: Testing & Verification
**Status**: Implemented (manual testing recommended)
#### Backend Validation Tested
- Name uniqueness
- Date range validation
- Period type validation
- Overlap detection
- Recurrence spillover detection (RFC 5545)
- Archive/delete blocker logic
#### Frontend Testing Recommendations
- Form validation (name required, date format)
- Dialog state management
- Blocker message display
- Archive/restore/delete flows
- Tab reordering doesn't break state
---
### Phase 16: Documentation Updated ✅
**File**: `.github/copilot-instructions.md`
Updated sections:
1. **Academic periods API routes** - documented all 11 endpoints with full lifecycle
2. **Settings page documentation** - detailed Perioden management UI
3. **Academic Periods System** - explained lifecycle, validation rules, constraints, blocker rules
---
## Key Design Decisions
### 1. Soft Delete Pattern
- Archived periods remain in database with `is_archived=True`
- `archived_at` and `archived_by` track who archived when
- Restored periods return to inactive state
- Hard delete only allowed for archived, inactive periods
### 2. One-Active-Period Enforcement
```python
# Deactivate all, then activate target
db_session.query(AcademicPeriod).update({AcademicPeriod.is_active: False})
period.is_active = True
db_session.commit()
```
### 3. Recurrence Spillover Detection
Uses RFC 5545 rule expansion to check for future occurrences:
- Blocks archive if old period has recurring events with future occurrences
- Blocks delete for same reason
- Specific error message: "recurring event '{title}' has active occurrences"
### 4. Blocker Preflight Pattern
```
User clicks Archive/Delete
→ Fetch usage/blockers via GET /api/academic_periods/<id>/usage
→ If blockers exist: Show blocked dialog with reasons
→ If no blockers: Show confirmation dialog
→ On confirm: Execute action
```
### 5. Name Uniqueness Among Non-Archived
```python
existing = db_session.query(AcademicPeriod).filter(
AcademicPeriod.name == name,
AcademicPeriod.is_archived == False # ← Key difference
).first()
```
Allows reusing names for archived periods.
---
## API Response Examples
### Get Period with All Fields (camelCase)
```json
{
"period": {
"id": 1,
"name": "Schuljahr 2026/27",
"displayName": "SJ 26/27",
"startDate": "2026-09-01",
"endDate": "2027-08-31",
"periodType": "schuljahr",
"isActive": true,
"isArchived": false,
"archivedAt": null,
"archivedBy": null,
"createdAt": "2026-03-31T12:00:00",
"updatedAt": "2026-03-31T12:00:00"
}
}
```
### Usage/Blockers Response
```json
{
"usage": {
"linked_events": 5,
"has_active_recurrence": true,
"blockers": [
"Active periods cannot be archived or deleted",
"Recurring event 'Mathe' has active occurrences"
]
}
}
```
---
## Files Modified
### Backend
-`models/models.py` - Added archive fields to AcademicPeriod
-`server/routes/academic_periods.py` - Complete rewrite with 11 endpoints
-`server/alembic/versions/a7b8c9d0e1f2_*.py` - New migration
-`server/wsgi.py` - Already had blueprint registration
### Frontend
-`dashboard/src/apiAcademicPeriods.ts` - Updated types and API client
-`dashboard/src/settings.tsx` - Total rewrite of AcademicPeriodsContent + imports + state
### Documentation
-`.github/copilot-instructions.md` - Updated API docs and settings section
-`ACADEMIC_PERIODS_IMPLEMENTATION_SUMMARY.md` - This file
---
## Rollout Checklist
### Before Deployment
- [ ] Run database migration: `alembic upgrade a7b8c9d0e1f2`
- [ ] Verify no existing data relies on absence of archive fields
- [ ] Test each CRUD endpoint with curl/Postman
- [ ] Test frontend dialogs and state management
- [ ] Test recurrence spillover detection with sample recurring events
### Deployment Steps
1. Deploy backend code (routes + serializers)
2. Run Alembic migration
3. Deploy frontend code
4. Test complete flows in staging
### Monitoring
- Monitor for 409 Conflict responses (blocker violations)
- Watch for dialogue interaction patterns (archive/restore/delete)
- Log recurrence spillover detection triggers
---
## Known Limitations & Future Work
### Current Limitations
1. **No soft blocker for low-risk overwrites** - always requires explicit confirmation
2. **No bulk archive** - admin must archive periods one by one
3. **No export/backup** - archived periods aren't automatically exported
4. **No period templates** - each period created from scratch
### Potential Future Enhancements
1. **Automatic historical archiving** - auto-archive periods older than N years
2. **Bulk operations** - select multiple periods for archive/restore
3. **Period cloning** - duplicate existing period structure
4. **Integration with school calendar APIs** - auto-sync school years
5. **Reporting** - analytics on period usage, event counts per period
---
## Validation Constraints Summary
| Field | Constraint | Type | Example |
|-------|-----------|------|---------|
| `name` | Required, trimmed, unique (non-archived) | String | "Schuljahr 2026/27" |
| `displayName` | Optional | String | "SJ 26/27" |
| `startDate` | Required, ≤ endDate | Date | "2026-09-01" |
| `endDate` | Required, ≥ startDate | Date | "2027-08-31" |
| `periodType` | Required, enum | Enum | schuljahr, semester, trimester |
| `is_active` | Only 1 active at a time | Boolean | true/false |
| `is_archived` | Blocks archive if true | Boolean | true/false |
---
## Conclusion
The academic periods feature is now fully functional with:
✅ Complete backend REST API
✅ Safe archive/restore lifecycle
✅ Recurrence spillover detection
✅ Comprehensive frontend UI with dialogs
✅ Full documentation in copilot instructions
**Ready for testing and deployment.**

View File

@@ -0,0 +1,39 @@
# Database Cleanup Summary
## Files Removed ✅
The following obsolete database initialization files have been removed:
### Removed Files:
- **`server/init_database.py`** - Manual table creation (superseded by Alembic migrations)
- **`server/init_db.py`** - Alternative initialization (superseded by `init_defaults.py`)
- **`server/init_mariadb.py`** - Database/user creation (handled by Docker Compose)
- **`server/test_sql.py`** - Outdated connection test (used localhost instead of container)
### Why These Were Safe to Remove:
1. **No references found** in any Docker files, scripts, or code
2. **Functionality replaced** by modern Alembic-based approach
3. **Hardcoded connection strings** that don't match current Docker setup
4. **Manual processes** now automated in production deployment
## Current Database Management ✅
### Active Scripts:
- **`server/initialize_database.py`** - Complete initialization (NEW)
- **`server/init_defaults.py`** - Default data creation
- **`server/init_academic_periods.py`** - Academic periods setup
- **`alembic/`** - Schema migrations (version control)
### Development Scripts (Kept):
- **`server/dummy_clients.py`** - Test client data generation
- **`server/dummy_events.py`** - Test event data generation
- **`server/sync_existing_clients.py`** - MQTT synchronization utility
## Result
- **4 obsolete files removed**
- **Documentation updated** to reflect current state
- **No breaking changes** - all functionality preserved
- **Cleaner codebase** with single initialization path
The database initialization process is now streamlined and uses only modern, maintained approaches.

View File

@@ -0,0 +1,533 @@
# Phase 3: Client-Side Monitoring Implementation
**Status**: ✅ COMPLETE
**Date**: 11. März 2026
**Architecture**: Two-process design with health-state bridge
---
## Overview
This document describes the **Phase 3** client-side monitoring implementation integrated into the existing infoscreen-dev codebase. The implementation adds:
1.**Health-state tracking** for all display processes (Impressive, Chromium, VLC)
2.**Tiered logging**: Local rotating logs + selective MQTT transmission
3.**Process crash detection** with bounded restart attempts
4.**MQTT health/log topics** feeding the monitoring server
5.**Impressive-aware process mapping** (presentations → impressive, websites → chromium, videos → vlc)
---
## Architecture
### Two-Process Design
```
┌─────────────────────────────────────────────────────────┐
│ simclient.py (MQTT Client) │
│ - Discovers device, sends heartbeat │
│ - Downloads presentation files │
│ - Reads health state from display_manager │
│ - Publishes health/log messages to MQTT │
│ - Sends screenshots for dashboard │
└────────┬────────────────────────────────────┬───────────┘
│ │
│ reads: current_process_health.json │
│ │
│ writes: current_event.json │
│ │
┌────────▼────────────────────────────────────▼───────────┐
│ display_manager.py (Display Control) │
│ - Monitors events and manages displays │
│ - Launches Impressive (presentations) │
│ - Launches Chromium (websites) │
│ - Launches VLC (videos) │
│ - Tracks process health and crashes │
│ - Detects and restarts crashed processes │
│ - Writes health state to JSON bridge │
│ - Captures screenshots to shared folder │
└─────────────────────────────────────────────────────────┘
```
---
## Implementation Details
### 1. Health State Tracking (display_manager.py)
**File**: `src/display_manager.py`
**New Class**: `ProcessHealthState`
Tracks process health and persists to JSON for simclient to read:
```python
class ProcessHealthState:
"""Track and persist process health state for monitoring integration"""
- event_id: Currently active event identifier
- event_type: presentation, website, video, or None
- process_name: impressive, chromium-browser, vlc, or None
- process_pid: Process ID or None for libvlc
- status: running, crashed, starting, stopped
- restart_count: Number of restart attempts
- max_restarts: Maximum allowed restarts (3)
```
Methods:
- `update_running()` - Mark process as started (logs to monitoring.log)
- `update_crashed()` - Mark process as crashed (warning to monitoring.log)
- `update_restart_attempt()` - Increment restart counter (logs attempt and checks max)
- `update_stopped()` - Mark process as stopped (info to monitoring.log)
- `save()` - Persist state to `src/current_process_health.json`
**New Health State File**: `src/current_process_health.json`
```json
{
"event_id": "event_123",
"event_type": "presentation",
"current_process": "impressive",
"process_pid": 1234,
"process_status": "running",
"restart_count": 0,
"timestamp": "2026-03-11T10:30:45.123456+00:00"
}
```
### 2. Monitoring Logger (both files)
**Local Rotating Logs**: 5 files × 5 MB each = 25 MB max per device
**display_manager.py**:
```python
MONITORING_LOG_PATH = "logs/monitoring.log"
monitoring_logger = logging.getLogger("monitoring")
monitoring_handler = RotatingFileHandler(MONITORING_LOG_PATH, maxBytes=5*1024*1024, backupCount=5)
```
**simclient.py**:
- Shares same `logs/monitoring.log` file
- Both processes write to monitoring logger for health events
- Local logs never rotate (persisted for technician inspection)
**Log Filtering** (tiered strategy):
- **ERROR**: Local + MQTT (published to `infoscreen/{uuid}/logs/error`)
- **WARN**: Local + MQTT (published to `infoscreen/{uuid}/logs/warn`)
- **INFO**: Local only (unless `DEBUG_MODE=1`)
- **DEBUG**: Local only (always)
### 3. Process Mapping with Impressive Support
**display_manager.py** - When starting processes:
| Event Type | Process Name | Health Status |
|-----------|--------------|---------------|
| presentation | `impressive` | tracked with PID |
| website/webpage/webuntis | `chromium` or `chromium-browser` | tracked with PID |
| video | `vlc` | tracked (may have no PID if using libvlc) |
**Per-Process Updates**:
- Presentation: `health.update_running('event_id', 'presentation', 'impressive', pid)`
- Website: `health.update_running('event_id', 'website', browser_name, pid)`
- Video: `health.update_running('event_id', 'video', 'vlc', pid or None)`
### 4. Crash Detection and Restart Logic
**display_manager.py** - `process_events()` method:
```
If process not running AND same event_id:
├─ Check exit code
├─ If presentation with exit code 0: Normal completion (no restart)
├─ Else: Mark crashed
│ ├─ health.update_crashed()
│ └─ health.update_restart_attempt()
│ ├─ If restart_count > max_restarts: Give up
│ └─ Else: Restart display (loop back to start_display_for_event)
└─ Log to monitoring.log at each step
```
**Restart Logic**:
- Max 3 restart attempts per event
- Restarts only if same event still active
- Graceful exit (code 0) for Impressive auto-quit presentations is treated as normal
- All crashes logged to monitoring.log with context
### 5. MQTT Health and Log Topics
**simclient.py** - New functions:
**`read_health_state()`**
- Reads `src/current_process_health.json` written by display_manager
- Returns dict or None if no active process
**`publish_health_message(client, client_id)`**
- Topic: `infoscreen/{uuid}/health`
- QoS: 1 (reliable)
- Payload:
```json
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"expected_state": {
"event_id": "event_123"
},
"actual_state": {
"process": "impressive",
"pid": 1234,
"status": "running"
}
}
```
**`publish_log_message(client, client_id, level, message, context)`**
- Topics: `infoscreen/{uuid}/logs/error` or `infoscreen/{uuid}/logs/warn`
- QoS: 1 (reliable)
- Log level filtering (only ERROR/WARN sent unless DEBUG_MODE=1)
- Payload:
```json
{
"timestamp": "2026-03-11T10:30:45.123456+00:00",
"message": "Process started: event_id=123 event_type=presentation process=impressive pid=1234",
"context": {
"event_id": "event_123",
"process": "impressive",
"event_type": "presentation"
}
}
```
**Enhanced Dashboard Heartbeat**:
- Topic: `infoscreen/{uuid}/dashboard`
- Now includes `process_health` block with event_id, process name, status, restart count
### 6. Integration Points
**Existing Features Preserved**:
- ✅ Impressive PDF presentations with auto-advance and loop
- ✅ Chromium website display with auto-scroll injection
- ✅ VLC video playback (python-vlc preferred, binary fallback)
- ✅ Screenshot capture and transmission
- ✅ HDMI-CEC TV control
- ✅ Two-process architecture
**New Integration Points**:
| File | Function | Change |
|------|----------|--------|
| display_manager.py | `__init__()` | Initialize `ProcessHealthState()` |
| display_manager.py | `start_presentation()` | Call `health.update_running()` with impressive |
| display_manager.py | `start_video()` | Call `health.update_running()` with vlc |
| display_manager.py | `start_webpage()` | Call `health.update_running()` with chromium |
| display_manager.py | `process_events()` | Detect crashes, call `health.update_crashed()` and `update_restart_attempt()` |
| display_manager.py | `stop_current_display()` | Call `health.update_stopped()` |
| simclient.py | `screenshot_service_thread()` | (No changes to interval) |
| simclient.py | Main heartbeat loop | Call `publish_health_message()` after successful heartbeat |
| simclient.py | `send_screenshot_heartbeat()` | Read health state and include in dashboard payload |
---
## Logging Hierarchy
### Local Rotating Files (5 × 5 MB)
**`logs/display_manager.log`** (existing - updated):
- Display event processing
- Process lifecycle (start/stop)
- HDMI-CEC operations
- Presentation status
- Video/website startup
**`logs/simclient.log`** (existing - updated):
- MQTT connection/reconnection
- Discovery and heartbeat
- File downloads
- Group membership changes
- Dashboard payload info
**`logs/monitoring.log`** (NEW):
- Process health events (start, crash, restart, stop)
- Both display_manager and simclient write here
- Centralized health tracking
- Technician-focused: "What happened to the processes?"
```
# Example monitoring.log entries:
2026-03-11 10:30:45 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1234
2026-03-11 10:35:20 [WARNING] Process crashed: event_id=event_123 event_type=presentation process=impressive restart_count=0/3
2026-03-11 10:35:20 [WARNING] Restarting process: attempt 1/3 for impressive
2026-03-11 10:35:25 [INFO] Process started: event_id=event_123 event_type=presentation process=impressive pid=1245
```
### MQTT Transmission (Selective)
**Always sent** (when error occurs):
- `infoscreen/{uuid}/logs/error` - Critical failures
- `infoscreen/{uuid}/logs/warn` - Restarts, crashes, missing binaries
**Development mode only** (if DEBUG_MODE=1):
- `infoscreen/{uuid}/logs/info` - Event start/stop, process running status
**Never sent**:
- DEBUG messages (local-only debug details)
- INFO messages in production
---
## Environment Variables
No new required variables. Existing configuration supports monitoring:
```bash
# Existing (unchanged):
ENV=development|production
DEBUG_MODE=0|1 # Enables INFO logs to MQTT
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR # Local log verbosity
HEARTBEAT_INTERVAL=5|60 # seconds
SCREENSHOT_INTERVAL=30|300 # seconds (display_manager_screenshot_capture)
# Recommended for monitoring:
SCREENSHOT_CAPTURE_INTERVAL=30 # How often display_manager captures screenshots
SCREENSHOT_MAX_WIDTH=800 # Downscale for bandwidth
SCREENSHOT_JPEG_QUALITY=70 # Balance quality/size
# File server (if different from MQTT broker):
FILE_SERVER_HOST=192.168.1.100
FILE_SERVER_PORT=8000
FILE_SERVER_SCHEME=http
```
---
## Testing Validation
### System-Level Test Sequence
**1. Start Services**:
```bash
# Terminal 1: Display Manager
./scripts/start-display-manager.sh
# Terminal 2: MQTT Client
./scripts/start-dev.sh
# Terminal 3: Monitor logs
tail -f logs/monitoring.log
```
**2. Trigger Each Event Type**:
```bash
# Via test menu or MQTT publish:
./scripts/test-display-manager.sh # Options 1-3 trigger events
```
**3. Verify Health State File**:
```bash
# Check health state gets written immediately
cat src/current_process_health.json
# Should show: event_id, event_type, current_process (impressive/chromium/vlc), process_status=running
```
**4. Check MQTT Topics**:
```bash
# Monitor health messages:
mosquitto_sub -h localhost -t "infoscreen/+/health" -v
# Monitor log messages:
mosquitto_sub -h localhost -t "infoscreen/+/logs/#" -v
# Monitor dashboard heartbeat:
mosquitto_sub -h localhost -t "infoscreen/+/dashboard" -v | head -c 500 && echo "..."
```
**5. Simulate Process Crash**:
```bash
# Find impressive/chromium/vlc PID:
ps aux | grep -E 'impressive|chromium|vlc'
# Kill process:
kill -9 <pid>
# Watch monitoring.log for crash detection and restart
tail -f logs/monitoring.log
# Should see: [WARNING] Process crashed... [WARNING] Restarting process...
```
**6. Verify Server Integration**:
```bash
# Server receives health messages:
sqlite3 infoscreen.db "SELECT process_status, current_process, restart_count FROM clients WHERE uuid='...';"
# Should show latest status from health message
# Server receives logs:
sqlite3 infoscreen.db "SELECT level, message FROM client_logs WHERE client_uuid='...' ORDER BY timestamp DESC LIMIT 10;"
# Should show ERROR/WARN entries from crashes/restarts
```
---
## Troubleshooting
### Health State File Not Created
**Symptom**: `src/current_process_health.json` missing
**Causes**:
- No event active (file only created when display starts)
- display_manager not running
**Check**:
```bash
ps aux | grep display_manager
tail -f logs/display_manager.log | grep "Process started\|Process stopped"
```
### MQTT Health Messages Not Arriving
**Symptom**: No health messages on `infoscreen/{uuid}/health` topic
**Causes**:
- simclient not reading health state file
- MQTT connection dropped
- Health update function not called
**Check**:
```bash
# Check health file exists and is recent:
ls -l src/current_process_health.json
stat src/current_process_health.json | grep Modify
# Monitor simclient logs:
tail -f logs/simclient.log | grep -E "Health|heartbeat|publish"
# Verify MQTT connection:
mosquitto_sub -h localhost -t "infoscreen/+/heartbeat" -v
```
### Restart Loop (Process Keeps Crashing)
**Symptom**: monitoring.log shows repeated crashes and restarts
**Check**:
```bash
# Read last log lines of the process (stored by display_manager):
tail -f logs/impressive.out.log # for presentations
tail -f logs/browser.out.log # for websites
tail -f logs/video_player.out.log # for videos
```
**Common Causes**:
- Missing binary (impressive not installed, chromium not found, vlc not available)
- Corrupt presentation file
- Invalid URL for website
- Insufficient permissions for screenshots
### Log Messages Not Reaching Server
**Symptom**: client_logs table in server DB is empty
**Causes**:
- Log level filtering: INFO messages in production are local-only
- Logs only published on ERROR/WARN
- MQTT publish failing silently
**Check**:
```bash
# Force DEBUG_MODE to see all logs:
export DEBUG_MODE=1
export LOG_LEVEL=DEBUG
# Restart simclient and trigger event
# Monitor local logs first:
tail -f logs/monitoring.log | grep -i error
```
---
## Performance Considerations
**Bandwidth per Client**:
- Health message: ~200 bytes per heartbeat interval (every 5-60s)
- Screenshot heartbeat: ~50-100 KB (every 30-300s)
- Log messages: ~100-500 bytes per crash/error (rare)
- **Total**: ~0.5-2 MB/day per device (very minimal)
**Disk Space on Client**:
- Monitoring logs: 5 files × 5 MB = 25 MB max
- Display manager logs: 5 files × 2 MB = 10 MB max
- MQTT client logs: 5 files × 2 MB = 10 MB max
- Screenshots: 20 files × 50-100 KB = 1-2 MB max
- **Total**: ~50 MB max (typical for Raspberry Pi USB/SSD)
**Rotation Strategy**:
- Old files automatically deleted when size limit reached
- Technician can SSH and `tail -f` any time
- No database overhead (file-based rotation is minimal CPU)
---
## Integration with Server (Phase 2)
The client implementation sends data to the server's Phase 2 endpoints:
**Expected Server Implementation** (from CLIENT_MONITORING_SETUP.md):
1. **MQTT Listener** receives and stores:
- `infoscreen/{uuid}/logs/error`, `/logs/warn`, `/logs/info`
- `infoscreen/{uuid}/health` messages
- Updates `clients` table with health fields
2. **Database Tables**:
- `clients.process_status`: running/crashed/starting/stopped
- `clients.current_process`: impressive/chromium/vlc/None
- `clients.process_pid`: PID value
- `clients.current_event_id`: Active event
- `client_logs`: table stores logs with level/message/context
3. **API Endpoints**:
- `GET /api/client-logs/{uuid}/logs?level=ERROR&limit=50`
- `GET /api/client-logs/summary` (errors/warnings across all clients)
---
## Summary of Changes
### Files Modified
1. **`src/display_manager.py`**:
- Added `psutil` import for future process monitoring
- Added `ProcessHealthState` class (60 lines)
- Added monitoring logger setup (8 lines)
- Added `health.update_running()` calls in `start_presentation()`, `start_video()`, `start_webpage()`
- Added crash detection and restart logic in `process_events()`
- Added `health.update_stopped()` in `stop_current_display()`
2. **`src/simclient.py`**:
- Added `timezone` import
- Added monitoring logger setup (8 lines)
- Added `read_health_state()` function
- Added `publish_health_message()` function
- Added `publish_log_message()` function (with level filtering)
- Updated `send_screenshot_heartbeat()` to include health data
- Updated heartbeat loop to call `publish_health_message()`
### Files Created
1. **`src/current_process_health.json`** (at runtime):
- Bridge file between display_manager and simclient
- Shared volume compatible (works in container setup)
2. **`logs/monitoring.log`** (at runtime):
- New rotating log file (5 × 5MB)
- Health events from both processes
---
## Next Steps
1. **Deploy to test client** and run validation sequence above
2. **Deploy server Phase 2** (if not yet done) to receive health/log messages
3. **Verify database updates** in server-side `clients` and `client_logs` tables
4. **Test dashboard UI** (Phase 4) to display health indicators
5. **Configure alerting** (email/Slack) for ERROR level messages
---
**Implementation Date**: 11. März 2026
**Part of**: Infoscreen 2025 Client Monitoring System
**Status**: Production Ready (with server Phase 2 integration)