Files
weatherstation-datacollector/DATABASE_CONNECTIVITY_FIXES.md
olaf f55c1fe6f1 Pool sensor v2: VCC monitoring, database resilience, receiver improvements
- Added voltage monitoring table and storage pipeline
- Extended pool payload to 17 bytes with VCC field (protocol v2)
- Improved database connection pool resilience (reduced pool size, aggressive recycling, pool disposal on failure)
- Added environment variable support for database configuration
- Fixed receiver MQTT deprecation warning (CallbackAPIVersion.VERSION2)
- Silenced excessive RSSI status logging in receiver
- Added reset flag tracking and reporting
- Updated Docker compose with DB config and log rotation limits
2026-01-25 11:25:15 +00:00

5.4 KiB

Database Connectivity Issues - Analysis & Fixes

Problem Summary

The NAS container experiences intermittent database connectivity failures with the error:

Exception during reset or similar
_mysql_connector.MySQLInterfaceError: Lost connection to MySQL server during query

While Docker for Desktop works reliably and MySQL Workbench can connect without issues.


Root Causes Identified

1. Aggressive Connection Pool Settings

  • Old config: pool_size=5 + max_overflow=10 = up to 15 simultaneous connections
  • Problem: Creates excessive connections that exhaust database resources or trigger connection limits
  • Result: Pool reset failures when trying to return/reset dead connections

2. Insufficient Connection Recycling

  • Old config: pool_recycle=1800 (30 minutes)
  • Problem: Connections held too long; database may timeout/close them due to wait_timeout or network issues
  • Result: When SQLAlchemy tries to reuse connections, they're already dead

3. Conflicting autocommit Setting

  • Old config: autocommit=True in connect_args
  • Problem: When autocommit is enabled, there's nothing to rollback, but SQLAlchemy still tries during pool reset
  • Result: Rollback fails on dead connections → traceback logged

4. Pool Reset on Dead Connections

  • Config: pool_reset_on_return="none" (correct) but didn't dispose pool on failure
  • Problem: When a connection dies, the pool kept trying to reuse it
  • Result: Repeated failures until the next retry window (30 seconds)

5. Network/Database Timeout Issues (NAS-specific)

  • Likely cause: NAS MariaDB has aggressive connection timeouts
  • Or: Container network has higher packet loss/latency than Docker Desktop
  • Or: Pool exhaustion prevents new connections from being established

Applied Fixes

Fix 1: Conservative Connection Pool (Lines 183-195)

pool_size=3,             # Reduced from 5
max_overflow=5,          # Reduced from 10
pool_recycle=300,        # Reduced from 1800 (every 5 mins vs 30 mins)
autocommit=False,        # Removed - let SQLAlchemy manage transactions

Why this works:

  • Fewer simultaneous connections = less resource contention
  • Aggressive recycling = avoids stale connections killed by database
  • Proper transaction management = cleaner rollback handling

Fix 2: Pool Disposal on Connection Failure (Lines 530-533)

except exc.OperationalError as e:
    sql_engine.dispose()  # ← CRITICAL: Force all connections to be closed/recreated
    logger.warning(f"Lost database connectivity: {e}")

Why this works:

  • When connection fails, dump the entire pool
  • Next connection attempt gets fresh connections
  • Avoids repeated failures trying to reuse dead connections

Fix 3: Environment Variable Support (Lines 169-175)

DB_HOST = os.getenv("DB_HOST", "192.168.43.102")
DB_PORT = int(os.getenv("DB_PORT", "3306"))
# ... etc

Why this matters:

  • Different deployments can now use different database hosts
  • Docker Desktop can use 192.168.43.102
  • NAS can use mariadb (Docker DNS) or different IP if needed

The NAS MariaDB should have appropriate timeout settings:

-- Check current settings
SHOW VARIABLES LIKE 'wait_timeout';
SHOW VARIABLES LIKE 'interactive_timeout';
SHOW VARIABLES LIKE 'max_connections';
SHOW VARIABLES LIKE 'max_allowed_packet';

-- Recommended settings (in /etc/mysql/mariadb.conf.d/50-server.cnf)
[mysqld]
wait_timeout = 600                 # 10 minutes (allow idle connections longer)
interactive_timeout = 600
max_connections = 100              # Ensure enough for pool + workbench
max_allowed_packet = 64M

Deployment Instructions

For Docker Desktop:

# Use default or override in your compose
docker-compose -f docker-compose.yml up

For NAS:

Update your docker-compose or environment file:

environment:
  - DB_HOST=192.168.43.102  # or your NAS's actual IP/hostname
  - DB_PORT=3306
  - DB_USER=weatherdata
  - DB_PASSWORD=cfCU$swM!HfK82%*
  - DB_NAME=weatherdata
  - DB_CONNECT_TIMEOUT=5

Monitoring

The application now logs database configuration at startup:

DB config: host=192.168.43.102:3306, user=weatherdata, db=weatherdata

Monitor the logs for:

  • "Database reachable again" → Connection recovered
  • "Lost database connectivity" → Transient failure detected and pool disposed
  • "Stored batch locally" → Data queued to SQLite while DB unavailable

Testing

Test 1: Verify Environment Variables

# Run container with override
docker run -e DB_HOST=test-host ... python datacollector.py
# Check log: "DB config: host=test-host:3306"

Test 2: Simulate Connection Loss

# In Python shell connected to container
import requests
requests.get('http://container:port/shutdown')  # Reconnect simulation
# Should see: "Database still unreachable" → "Database reachable again"

Test 3: Monitor Pool State

Enable pool logging:

echo_pool=True  # Line 195 in datacollector.py

Expected Behavior After Fix

  • Connection pool adapts to transient failures
  • Stale connections are recycled frequently
  • Pool is disposed on failure to prevent cascading errors
  • Different environments can specify different hosts
  • Data is cached locally if database is temporarily unavailable