docs: archive legacy guides and streamline copilot instructions governance
This commit is contained in:
193
docs/archive/TV_POWER_PHASE_1_CANARY_VALIDATION.md
Normal file
193
docs/archive/TV_POWER_PHASE_1_CANARY_VALIDATION.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# TV Power Coordination Canary Validation Checklist (Phase 1)
|
||||
|
||||
Manual verification checklist for Phase-1 server-side group-level power-intent publishing before production rollout.
|
||||
|
||||
## Preconditions
|
||||
- Scheduler running with `POWER_INTENT_PUBLISH_ENABLED=true`
|
||||
- One canary group selected for testing (example: group_id=1)
|
||||
- Mosquitto broker running and accessible
|
||||
- Database with seeded test data (canary group with events)
|
||||
|
||||
## Validation Scenarios
|
||||
|
||||
### 1. Baseline Payload Structure
|
||||
**Goal**: Retained topic shows correct Phase-1 contract.
|
||||
|
||||
Instructions:
|
||||
1. Subscribe to `infoscreen/groups/1/power/intent` (canary group, QoS 1)
|
||||
2. Verify received payload contains:
|
||||
- `schema_version: "1.0"`
|
||||
- `group_id: 1`
|
||||
- `desired_state: "on"` or `"off"` (string)
|
||||
- `reason: "active_event"` or `"no_active_event"` (string)
|
||||
- `intent_id: "<uuid>"` (not empty, valid UUID v4 format)
|
||||
- `issued_at: "2026-03-31T14:22:15Z"` (ISO 8601 with Z suffix)
|
||||
- `expires_at: "2026-03-31T14:24:00Z"` (ISO 8601 with Z suffix, always > issued_at)
|
||||
- `poll_interval_sec: 30` (integer, matches scheduler poll interval)
|
||||
- `active_event_ids: [...]` (array; empty when off)
|
||||
- `event_window_start: "...Z"` or `null`
|
||||
- `event_window_end: "...Z"` or `null`
|
||||
|
||||
**Pass criteria**: All fields present, correct types and formats, no extra/malformed fields.
|
||||
|
||||
### 2. Event Start Transition
|
||||
**Goal**: ON intent published immediately when event becomes active.
|
||||
|
||||
Instructions:
|
||||
1. Create an event for canary group starting 2 minutes from now
|
||||
2. Wait for event start time
|
||||
3. Check retained topic immediately after event start
|
||||
4. Verify `desired_state: "on"` and `reason: "active_event"`
|
||||
5. Note the `intent_id` value
|
||||
|
||||
**Pass criteria**:
|
||||
- `desired_state: "on"` appears within 30 seconds of event start
|
||||
- No OFF in between (if a prior OFF existed)
|
||||
|
||||
### 3. Event End Transition
|
||||
**Goal**: OFF intent published when last active event ends.
|
||||
|
||||
Instructions:
|
||||
1. In setup from Scenario 2, wait for the event to end (< 5 min duration)
|
||||
2. Check retained topic after end time
|
||||
3. Verify `desired_state: "off"` and `reason: "no_active_event"`
|
||||
|
||||
**Pass criteria**:
|
||||
- `desired_state: "off"` appears within 30 seconds of event end
|
||||
- New `intent_id` generated (different from Scenario 2)
|
||||
|
||||
### 4. Adjacent Events (No OFF Blip)
|
||||
**Goal**: When one event ends and next starts immediately after, no OFF is published.
|
||||
|
||||
Instructions:
|
||||
1. Create two consecutive events for canary group, each 3 minutes:
|
||||
- Event A: 14:00-14:03
|
||||
- Event B: 14:03-14:06
|
||||
2. Watch retained topic through both event boundaries
|
||||
3. Capture all `desired_state` changes
|
||||
|
||||
**Pass criteria**:
|
||||
- `desired_state: "on"` throughout both events
|
||||
- No OFF at 14:03 (boundary between them)
|
||||
- One or two transitions total (transition at A start only, or at A start + semantic change reasons)
|
||||
|
||||
### 5. Heartbeat Republish (Unchanged Intent)
|
||||
**Goal**: Intent republishes each poll cycle with same intent_id if state unchanged.
|
||||
|
||||
Instructions:
|
||||
1. Create a long-duration event (15+ minutes) for canary group
|
||||
2. Subscribe to power intent topic
|
||||
3. Capture timestamps and intent_ids for 3 consecutive poll cycles (90 seconds with default 30s polls)
|
||||
4. Verify:
|
||||
- Payload received at T, T+30s, T+60s
|
||||
- Same `intent_id` across all three
|
||||
- Different `issued_at` timestamps (should increment by ~30s)
|
||||
|
||||
**Pass criteria**:
|
||||
- At least 3 payloads received within ~90 seconds
|
||||
- Same `intent_id` for all
|
||||
- Each `issued_at` is later than previous
|
||||
- Each `expires_at` is 90 seconds after its `issued_at`
|
||||
|
||||
### 6. Scheduler Restart (Immediate Republish)
|
||||
**Goal**: On scheduler process start, immediate published active intent.
|
||||
|
||||
Instructions:
|
||||
1. Create and start an event for canary group (duration ≥ 5 minutes)
|
||||
2. Wait for event to be active
|
||||
3. Kill and restart scheduler process
|
||||
4. Check retained topic within 5 seconds after restart
|
||||
5. Verify `desired_state: "on"` and `reason: "active_event"`
|
||||
|
||||
**Pass criteria**:
|
||||
- Correct ON intent retained within 5 seconds of restart
|
||||
- No OFF published during restart/reconnect
|
||||
|
||||
### 7. Broker Reconnection (Retained Recovery)
|
||||
**Goal**: On MQTT reconnect, scheduler republishes cached intents.
|
||||
|
||||
Instructions:
|
||||
1. Create and start an event for canary group
|
||||
2. Subscribe to power intent topic
|
||||
3. Note the current `intent_id` and payload
|
||||
4. Restart Mosquitto broker (simulates network interruption)
|
||||
5. Verify retained topic is immediately republished after reconnect
|
||||
|
||||
**Pass criteria**:
|
||||
- Correct ON intent reappears on retained topic within 5 seconds of broker restart
|
||||
- Same `intent_id` (no new transition UUID)
|
||||
- Publish metrics show `retained_republish_total` incremented
|
||||
|
||||
### 8. Feature Flag Disable
|
||||
**Goal**: No power-intent publishes when feature disabled.
|
||||
|
||||
Instructions:
|
||||
1. Set `POWER_INTENT_PUBLISH_ENABLED=false` in scheduler env
|
||||
2. Restart scheduler
|
||||
3. Create and start a new event for canary group
|
||||
4. Subscribe to power intent topic
|
||||
5. Wait 90 seconds
|
||||
|
||||
**Pass criteria**:
|
||||
- No messages on `infoscreen/groups/1/power/intent`
|
||||
- Scheduler logs show no `event=power_intent_publish*` lines
|
||||
|
||||
### 9. Scheduler Logs Inspection
|
||||
**Goal**: Logs contain structured fields for observability.
|
||||
|
||||
Instructions:
|
||||
1. Run canary with one active event
|
||||
2. Collect scheduler logs for 60 seconds
|
||||
3. Filter for `event=power_intent_publish` lines
|
||||
|
||||
**Pass criteria**:
|
||||
- Each log line contains: `group_id`, `desired_state`, `reason`, `intent_id`, `issued_at`, `expires_at`, `transition_publish`, `heartbeat_publish`, `topic`, `qos`, `retained`
|
||||
- No malformed JSON in payloads
|
||||
- Error logs (if any) are specific and actionable
|
||||
|
||||
### 10. Expiry Validation
|
||||
**Goal**: Payloads never published with `expires_at <= issued_at`.
|
||||
|
||||
Instructions:
|
||||
1. Capture power-intent payloads for 120+ seconds
|
||||
2. Parse `issued_at` and `expires_at` for each
|
||||
3. Verify `expires_at > issued_at` for all
|
||||
|
||||
**Pass criteria**:
|
||||
- All 100% of payloads have valid expiry window
|
||||
- Typical delta is 90 seconds (min expiry)
|
||||
|
||||
## Summary Report Template
|
||||
|
||||
After running all scenarios, capture:
|
||||
|
||||
```
|
||||
Canary Validation Report
|
||||
Date: [date]
|
||||
Scheduler version: [git commit hash]
|
||||
Test group ID: [id]
|
||||
Environment: [dev/test/prod]
|
||||
|
||||
Scenario Results:
|
||||
1. Baseline Payload: ✓/✗ [notes]
|
||||
2. Event Start: ✓/✗ [notes]
|
||||
3. Event End: ✓/✗ [notes]
|
||||
4. Adjacent Events: ✓/✗ [notes]
|
||||
5. Heartbeat Republish: ✓/✗ [notes]
|
||||
6. Restart: ✓/✗ [notes]
|
||||
7. Broker Reconnect: ✓/✗ [notes]
|
||||
8. Feature Flag: ✓/✗ [notes]
|
||||
9. Logs: ✓/✗ [notes]
|
||||
10. Expiry Validation: ✓/✗ [notes]
|
||||
|
||||
Overall: [Ready for production / Blockers found]
|
||||
Issues: [list if any]
|
||||
```
|
||||
|
||||
## Rollout Gate
|
||||
Power-intent Phase 1 is ready for production rollout only when:
|
||||
- All 10 scenarios pass
|
||||
- Zero unintended OFF between adjacent events
|
||||
- All log fields present and correct
|
||||
- Feature flag default remains `false`
|
||||
- Transition latency <= 30 seconds nominal case
|
||||
Reference in New Issue
Block a user