feat: crash recovery, service_failed monitoring, broker health fields, command expiry sweep
- Add GET /api/clients/crashed endpoint (process_status=crashed or stale heartbeat) - Add restart_app command action with same lifecycle + lockout as reboot_host - Scheduler: crash auto-recovery loop (CRASH_RECOVERY_ENABLED flag, lockout, MQTT publish) - Scheduler: unconditional command expiry sweep per poll cycle (sweep_expired_commands) - Listener: subscribe to infoscreen/+/service_failed; persist service_failed_at + unit - Listener: extract broker_connection block from health payload; persist reconnect_count + last_disconnect_at - DB migration b1c2d3e4f5a6: service_failed_at, service_failed_unit, mqtt_reconnect_count, mqtt_last_disconnect_at on clients - Add GET /api/clients/service_failed and POST /api/clients/<uuid>/clear_service_failed - Monitoring overview API: include mqtt_reconnect_count + mqtt_last_disconnect_at per client - Frontend: orange service-failed alert panel (hidden when empty, auto-refresh, quittieren action) - Frontend: MQTT reconnect count + last disconnect in client detail panel - MQTT auth hardening: listener/scheduler/server use env credentials; broker enforces allow_anonymous false - Client command lifecycle foundation: ClientCommand model, reboot_host/shutdown_host, full ACK lifecycle - Docs: TECH-CHANGELOG, DEV-CHANGELOG, MQTT_EVENT_PAYLOAD_GUIDE, copilot-instructions updated - Add implementation-plans/, RESTART_VALIDATION_CHECKLIST.md, TODO.md
This commit is contained in:
@@ -147,6 +147,14 @@ class Client(Base):
|
||||
screen_health_status = Column(Enum(ScreenHealthStatus), nullable=True, server_default='UNKNOWN')
|
||||
last_screenshot_hash = Column(String(32), nullable=True)
|
||||
|
||||
# Systemd service-failed tracking
|
||||
service_failed_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
service_failed_unit = Column(String(128), nullable=True)
|
||||
|
||||
# MQTT broker connection health
|
||||
mqtt_reconnect_count = Column(Integer, nullable=True)
|
||||
mqtt_last_disconnect_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
|
||||
|
||||
class ClientLog(Base):
|
||||
__tablename__ = 'client_logs'
|
||||
@@ -164,6 +172,33 @@ class ClientLog(Base):
|
||||
)
|
||||
|
||||
|
||||
class ClientCommand(Base):
|
||||
__tablename__ = 'client_commands'
|
||||
|
||||
id = Column(Integer, primary_key=True, autoincrement=True)
|
||||
command_id = Column(String(36), nullable=False, unique=True, index=True)
|
||||
client_uuid = Column(String(36), ForeignKey('clients.uuid', ondelete='CASCADE'), nullable=False, index=True)
|
||||
action = Column(String(32), nullable=False, index=True)
|
||||
status = Column(String(40), nullable=False, index=True)
|
||||
reason = Column(Text, nullable=True)
|
||||
requested_by = Column(Integer, ForeignKey('users.id', ondelete='SET NULL'), nullable=True, index=True)
|
||||
issued_at = Column(TIMESTAMP(timezone=True), nullable=False)
|
||||
expires_at = Column(TIMESTAMP(timezone=True), nullable=False)
|
||||
published_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
acked_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
execution_started_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
completed_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
failed_at = Column(TIMESTAMP(timezone=True), nullable=True)
|
||||
error_code = Column(String(64), nullable=True)
|
||||
error_message = Column(Text, nullable=True)
|
||||
created_at = Column(TIMESTAMP(timezone=True), server_default=func.current_timestamp(), nullable=False)
|
||||
updated_at = Column(TIMESTAMP(timezone=True), server_default=func.current_timestamp(), onupdate=func.current_timestamp(), nullable=False)
|
||||
|
||||
__table_args__ = (
|
||||
Index('ix_client_commands_client_status_created', 'client_uuid', 'status', 'created_at'),
|
||||
)
|
||||
|
||||
|
||||
class EventType(enum.Enum):
|
||||
presentation = "presentation"
|
||||
website = "website"
|
||||
|
||||
Reference in New Issue
Block a user