Files
infoscreen/TV_POWER_CANARY_VALIDATION_CHECKLIST.md

6.5 KiB

TV Power Coordination Canary Validation Checklist (Phase 1)

Manual verification checklist for Phase-1 server-side group-level power-intent publishing before production rollout.

Preconditions

  • Scheduler running with POWER_INTENT_PUBLISH_ENABLED=true
  • One canary group selected for testing (example: group_id=1)
  • Mosquitto broker running and accessible
  • Database with seeded test data (canary group with events)

Validation Scenarios

1. Baseline Payload Structure

Goal: Retained topic shows correct Phase-1 contract.

Instructions:

  1. Subscribe to infoscreen/groups/1/power/intent (canary group, QoS 1)
  2. Verify received payload contains:
    • schema_version: "v1"
    • group_id: 1
    • desired_state: "on" or "off" (string)
    • reason: "active_event" or "no_active_event" (string)
    • intent_id: "<uuid>" (not empty, valid UUID v4 format)
    • issued_at: "2026-03-31T14:22:15Z" (ISO 8601 with Z suffix)
    • expires_at: "2026-03-31T14:24:00Z" (ISO 8601 with Z suffix, always > issued_at)
    • poll_interval_sec: 30 (integer, matches scheduler poll interval)

Pass criteria: All fields present, correct types and formats, no extra/malformed fields.

2. Event Start Transition

Goal: ON intent published immediately when event becomes active.

Instructions:

  1. Create an event for canary group starting 2 minutes from now
  2. Wait for event start time
  3. Check retained topic immediately after event start
  4. Verify desired_state: "on" and reason: "active_event"
  5. Note the intent_id value

Pass criteria:

  • desired_state: "on" appears within 30 seconds of event start
  • No OFF in between (if a prior OFF existed)

3. Event End Transition

Goal: OFF intent published when last active event ends.

Instructions:

  1. In setup from Scenario 2, wait for the event to end (< 5 min duration)
  2. Check retained topic after end time
  3. Verify desired_state: "off" and reason: "no_active_event"

Pass criteria:

  • desired_state: "off" appears within 30 seconds of event end
  • New intent_id generated (different from Scenario 2)

4. Adjacent Events (No OFF Blip)

Goal: When one event ends and next starts immediately after, no OFF is published.

Instructions:

  1. Create two consecutive events for canary group, each 3 minutes:
    • Event A: 14:00-14:03
    • Event B: 14:03-14:06
  2. Watch retained topic through both event boundaries
  3. Capture all desired_state changes

Pass criteria:

  • desired_state: "on" throughout both events
  • No OFF at 14:03 (boundary between them)
  • One or two transitions total (transition at A start only, or at A start + semantic change reasons)

5. Heartbeat Republish (Unchanged Intent)

Goal: Intent republishes each poll cycle with same intent_id if state unchanged.

Instructions:

  1. Create a long-duration event (15+ minutes) for canary group
  2. Subscribe to power intent topic
  3. Capture timestamps and intent_ids for 3 consecutive poll cycles (90 seconds with default 30s polls)
  4. Verify:
    • Payload received at T, T+30s, T+60s
    • Same intent_id across all three
    • Different issued_at timestamps (should increment by ~30s)

Pass criteria:

  • At least 3 payloads received within ~90 seconds
  • Same intent_id for all
  • Each issued_at is later than previous
  • Each expires_at is 90 seconds after its issued_at

6. Scheduler Restart (Immediate Republish)

Goal: On scheduler process start, immediate published active intent.

Instructions:

  1. Create and start an event for canary group (duration ≥ 5 minutes)
  2. Wait for event to be active
  3. Kill and restart scheduler process
  4. Check retained topic within 5 seconds after restart
  5. Verify desired_state: "on" and reason: "active_event"

Pass criteria:

  • Correct ON intent retained within 5 seconds of restart
  • No OFF published during restart/reconnect

7. Broker Reconnection (Retained Recovery)

Goal: On MQTT reconnect, scheduler republishes cached intents.

Instructions:

  1. Create and start an event for canary group
  2. Subscribe to power intent topic
  3. Note the current intent_id and payload
  4. Restart Mosquitto broker (simulates network interruption)
  5. Verify retained topic is immediately republished after reconnect

Pass criteria:

  • Correct ON intent reappears on retained topic within 5 seconds of broker restart
  • Same intent_id (no new transition UUID)
  • Publish metrics show retained_republish_total incremented

8. Feature Flag Disable

Goal: No power-intent publishes when feature disabled.

Instructions:

  1. Set POWER_INTENT_PUBLISH_ENABLED=false in scheduler env
  2. Restart scheduler
  3. Create and start a new event for canary group
  4. Subscribe to power intent topic
  5. Wait 90 seconds

Pass criteria:

  • No messages on infoscreen/groups/1/power/intent
  • Scheduler logs show no event=power_intent_publish* lines

9. Scheduler Logs Inspection

Goal: Logs contain structured fields for observability.

Instructions:

  1. Run canary with one active event
  2. Collect scheduler logs for 60 seconds
  3. Filter for event=power_intent_publish lines

Pass criteria:

  • Each log line contains: group_id, desired_state, reason, intent_id, issued_at, expires_at, transition_publish, heartbeat_publish, topic, qos, retained
  • No malformed JSON in payloads
  • Error logs (if any) are specific and actionable

10. Expiry Validation

Goal: Payloads never published with expires_at <= issued_at.

Instructions:

  1. Capture power-intent payloads for 120+ seconds
  2. Parse issued_at and expires_at for each
  3. Verify expires_at > issued_at for all

Pass criteria:

  • All 100% of payloads have valid expiry window
  • Typical delta is 90 seconds (min expiry)

Summary Report Template

After running all scenarios, capture:

Canary Validation Report
Date: [date]
Scheduler version: [git commit hash]
Test group ID: [id]
Environment: [dev/test/prod]

Scenario Results:
1. Baseline Payload: ✓/✗ [notes]
2. Event Start: ✓/✗ [notes]
3. Event End: ✓/✗ [notes]
4. Adjacent Events: ✓/✗ [notes]
5. Heartbeat Republish: ✓/✗ [notes]
6. Restart: ✓/✗ [notes]
7. Broker Reconnect: ✓/✗ [notes]
8. Feature Flag: ✓/✗ [notes]
9. Logs: ✓/✗ [notes]
10. Expiry Validation: ✓/✗ [notes]

Overall: [Ready for production / Blockers found]
Issues: [list if any]

Rollout Gate

Power-intent Phase 1 is ready for production rollout only when:

  • All 10 scenarios pass
  • Zero unintended OFF between adjacent events
  • All log fields present and correct
  • Feature flag default remains false
  • Transition latency <= 30 seconds nominal case