Files

olaf f55c1fe6f1 Pool sensor v2: VCC monitoring, database resilience, receiver improvements

- Added voltage monitoring table and storage pipeline
- Extended pool payload to 17 bytes with VCC field (protocol v2)
- Improved database connection pool resilience (reduced pool size, aggressive recycling, pool disposal on failure)
- Added environment variable support for database configuration
- Fixed receiver MQTT deprecation warning (CallbackAPIVersion.VERSION2)
- Silenced excessive RSSI status logging in receiver
- Added reset flag tracking and reporting
- Updated Docker compose with DB config and log rotation limits

2026-01-25 11:25:15 +00:00

5.4 KiB

Raw Blame History

Database Connectivity Issues - Analysis & Fixes

Problem Summary

The NAS container experiences intermittent database connectivity failures with the error:

Exception during reset or similar
_mysql_connector.MySQLInterfaceError: Lost connection to MySQL server during query

While Docker for Desktop works reliably and MySQL Workbench can connect without issues.

Root Causes Identified

1. Aggressive Connection Pool Settings

Old config: pool_size=5 + max_overflow=10 = up to 15 simultaneous connections
Problem: Creates excessive connections that exhaust database resources or trigger connection limits
Result: Pool reset failures when trying to return/reset dead connections

2. Insufficient Connection Recycling

Old config: pool_recycle=1800 (30 minutes)
Problem: Connections held too long; database may timeout/close them due to wait_timeout or network issues
Result: When SQLAlchemy tries to reuse connections, they're already dead

3. Conflicting autocommit Setting

Old config: autocommit=True in connect_args
Problem: When autocommit is enabled, there's nothing to rollback, but SQLAlchemy still tries during pool reset
Result: Rollback fails on dead connections → traceback logged

4. Pool Reset on Dead Connections

Config: pool_reset_on_return="none" (correct) but didn't dispose pool on failure
Problem: When a connection dies, the pool kept trying to reuse it
Result: Repeated failures until the next retry window (30 seconds)

5. Network/Database Timeout Issues (NAS-specific)

Likely cause: NAS MariaDB has aggressive connection timeouts
Or: Container network has higher packet loss/latency than Docker Desktop
Or: Pool exhaustion prevents new connections from being established

Applied Fixes

✅ Fix 1: Conservative Connection Pool (Lines 183-195)

pool_size=3,             # Reduced from 5
max_overflow=5,          # Reduced from 10
pool_recycle=300,        # Reduced from 1800 (every 5 mins vs 30 mins)
autocommit=False,        # Removed - let SQLAlchemy manage transactions

Why this works:

Fewer simultaneous connections = less resource contention
Aggressive recycling = avoids stale connections killed by database
Proper transaction management = cleaner rollback handling

✅ Fix 2: Pool Disposal on Connection Failure (Lines 530-533)

except exc.OperationalError as e:
    sql_engine.dispose()  # ← CRITICAL: Force all connections to be closed/recreated
    logger.warning(f"Lost database connectivity: {e}")

Why this works:

When connection fails, dump the entire pool
Next connection attempt gets fresh connections
Avoids repeated failures trying to reuse dead connections

✅ Fix 3: Environment Variable Support (Lines 169-175)

DB_HOST = os.getenv("DB_HOST", "192.168.43.102")
DB_PORT = int(os.getenv("DB_PORT", "3306"))
# ... etc

Why this matters:

Different deployments can now use different database hosts
Docker Desktop can use 192.168.43.102
NAS can use mariadb (Docker DNS) or different IP if needed

Recommended MariaDB Configuration

The NAS MariaDB should have appropriate timeout settings:

-- Check current settings
SHOW VARIABLES LIKE 'wait_timeout';
SHOW VARIABLES LIKE 'interactive_timeout';
SHOW VARIABLES LIKE 'max_connections';
SHOW VARIABLES LIKE 'max_allowed_packet';

-- Recommended settings (in /etc/mysql/mariadb.conf.d/50-server.cnf)
[mysqld]
wait_timeout = 600                 # 10 minutes (allow idle connections longer)
interactive_timeout = 600
max_connections = 100              # Ensure enough for pool + workbench
max_allowed_packet = 64M

Deployment Instructions

For Docker Desktop:

# Use default or override in your compose
docker-compose -f docker-compose.yml up

For NAS:

Update your docker-compose or environment file:

environment:
  - DB_HOST=192.168.43.102  # or your NAS's actual IP/hostname
  - DB_PORT=3306
  - DB_USER=weatherdata
  - DB_PASSWORD=cfCU$swM!HfK82%*
  - DB_NAME=weatherdata
  - DB_CONNECT_TIMEOUT=5

Monitoring

The application now logs database configuration at startup:

DB config: host=192.168.43.102:3306, user=weatherdata, db=weatherdata

Monitor the logs for:

"Database reachable again" → Connection recovered
"Lost database connectivity" → Transient failure detected and pool disposed
"Stored batch locally" → Data queued to SQLite while DB unavailable

Testing

Test 1: Verify Environment Variables

# Run container with override
docker run -e DB_HOST=test-host ... python datacollector.py
# Check log: "DB config: host=test-host:3306"

Test 2: Simulate Connection Loss

# In Python shell connected to container
import requests
requests.get('http://container:port/shutdown')  # Reconnect simulation
# Should see: "Database still unreachable" → "Database reachable again"

Test 3: Monitor Pool State

Enable pool logging:

echo_pool=True  # Line 195 in datacollector.py

Expected Behavior After Fix

✅ Connection pool adapts to transient failures
✅ Stale connections are recycled frequently
✅ Pool is disposed on failure to prevent cascading errors
✅ Different environments can specify different hosts
✅ Data is cached locally if database is temporarily unavailable

5.4 KiB Raw Blame History