docs: refactor docs structure and tighten assistant instruction policy

shrink root README into a landing page with a docs map and focused contributor guidance
add TV_POWER_RUNBOOK as the canonical TV power rollout and canary runbook
add CHANGELOG and move project history out of README-style docs
refactor src README into a developer-focused guide (architecture, runtime files, MQTT, debugging)
prune redundant older HDMI docs and keep a canonical HDMI_CEC_SETUP path
update copilot instructions to a high-signal policy format with strict anti-shadow-README design rules
align references across docs to current files, scripts, and TV power behavior
This commit is contained in:
RobbStarkAustria
2026-04-01 10:01:58 +02:00
parent fb0980aa88
commit 82f43f75ba
20 changed files with 2228 additions and 2267 deletions

213
TV_POWER_RUNBOOK.md Normal file
View File

@@ -0,0 +1,213 @@
# TV Power Runbook
Operational runbook for Phase 1 TV power coordination using MQTT power intent plus local HDMI-CEC fallback.
## Scope
This runbook covers:
- `POWER_CONTROL_MODE` rollout
- canary validation
- expected log signatures
- rollback
- common failure checks
Contract reference:
- [TV_POWER_INTENT_SERVER_CONTRACT_V1.md](TV_POWER_INTENT_SERVER_CONTRACT_V1.md)
## Topics and Runtime Files
Phase 1 topic:
- `infoscreen/groups/{group_id}/power/intent`
Telemetry topic:
- `infoscreen/{client_id}/power/state`
Runtime files:
- `src/power_intent_state.json`
- `src/power_state.json`
- `src/current_process_health.json`
## Power Control Modes
- `local`: ignore MQTT intent and use local event-time CEC logic.
- `hybrid`: prefer fresh MQTT intent and fall back to local timing when missing, stale, or invalid.
- `mqtt`: MQTT intent is authoritative; stale or missing intent triggers safe delayed-off behavior.
Recommended rollout path:
1. Start with `local`.
2. Canary with `hybrid`.
3. Roll out `hybrid` fleet-wide after stable observation.
4. Use `mqtt` only if you explicitly want strict server authority.
## Gate 1: Local Mode
Set in `.env`:
```bash
POWER_CONTROL_MODE=local
```
Expected startup log signature:
```text
[INFO] Power control mode: local
```
Expected behavior:
- No MQTT power intent application.
- Existing CEC behavior remains unchanged.
## Gate 2: Hybrid Canary
On one client or one canary group:
```bash
POWER_CONTROL_MODE=hybrid
./scripts/restart-all.sh
```
Expected startup logs:
```text
[INFO] Power state service thread started
[INFO] Subscribed to power intent topic: infoscreen/groups/<id>/power/intent
[INFO] Power control mode: hybrid
```
### Valid ON Intent
Expected sequence:
```text
[INFO] Power intent accepted: id=<uuid> desired_state=on reason=active_event ...
[INFO] Applying MQTT power intent ON id=<uuid> reason=active_event
[INFO] TV turned ON successfully
[INFO] Power state published: state=on source=mqtt_intent result=ok
```
### Valid OFF Intent
Expected sequence:
```text
[INFO] Power intent accepted: id=<uuid> desired_state=off reason=no_active_event ...
[INFO] Applying MQTT power intent OFF id=<uuid> reason=no_active_event
[INFO] Power state published: state=off source=mqtt_intent result=ok
```
### Expired Intent
Expected rejection:
```text
[WARNING] Rejected power intent: intent expired
```
### Malformed Intent
Expected rejection:
```text
[WARNING] Rejected power intent: missing required field: intent_id
```
### Retained Clear
When you clear the retained topic, the broker delivers an empty payload.
Expected log:
```text
[INFO] Power intent retained message cleared (empty payload)
```
This is normal and should not be treated as a parse error.
## Validation Commands
Use:
```bash
./scripts/test-power-intent.sh
./scripts/test-hdmi-cec.sh
```
Useful test-power-intent paths:
- Option 1: publish valid ON intent.
- Option 2: publish valid OFF intent.
- Option 3: publish stale intent.
- Option 4: publish malformed intent.
- Option 5: clear retained topic with an empty retained payload.
- Option 6: inspect runtime JSON files.
- Option 8: subscribe to the power-state topic.
Useful manual checks:
```bash
tail -f logs/display_manager.log src/simclient.log
cat src/power_intent_state.json
cat src/power_state.json
cat src/current_process_health.json
```
## Rollback
To leave canary mode:
```bash
POWER_CONTROL_MODE=local
./scripts/restart-all.sh
```
Expected result:
- MQTT power intent handling becomes inactive.
- Local CEC fallback remains in place.
## Fleet Rollout Gate
Roll out `hybrid` more widely only after:
- zero unintended TV-off events between adjacent events,
- valid ON/OFF actions apply cleanly,
- duplicate refreshes are logged as `result=skipped`,
- stale and malformed intents are rejected without side effects,
- retained clear events no longer produce noisy warnings.
Suggested observation window:
- at least 7 days on a canary client or canary group.
## Common Symptoms
| Symptom | Check | Likely Action |
|---|---|---|
| Intent never arrives | `src/power_intent_state.json` missing or invalid | Check broker connectivity and group assignment |
| `intent expired` appears repeatedly | client clock and server publish cadence | verify NTP and server refresh interval |
| TV turns off between adjacent events | `src/power_state.json` shows `local_fallback` or stale intent at transition | inspect server timing and boundary coverage |
| Repeated power state publishes with `skipped` | duplicate intent refreshes only | normal dedupe behavior |
| Clear retained intent logs warning | old code path still running | restart services and verify latest code |
## Dashboard Observability
`src/current_process_health.json` includes a `power_control` block similar to:
```json
"power_control": {
"mode": "hybrid",
"source": "mqtt_intent",
"last_intent_id": "4a7fe3bc-...",
"last_action": "on",
"last_power_at": "2026-04-01T06:00:05Z"
}
```
This is the fastest local check for what the display manager last did and why.