# Database Connectivity Issues - Analysis & Fixes ## Problem Summary The NAS container experiences **intermittent database connectivity** failures with the error: ``` Exception during reset or similar _mysql_connector.MySQLInterfaceError: Lost connection to MySQL server during query ``` While Docker for Desktop works reliably and MySQL Workbench can connect without issues. --- ## Root Causes Identified ### 1. **Aggressive Connection Pool Settings** - **Old config**: `pool_size=5` + `max_overflow=10` = up to 15 simultaneous connections - **Problem**: Creates excessive connections that exhaust database resources or trigger connection limits - **Result**: Pool reset failures when trying to return/reset dead connections ### 2. **Insufficient Connection Recycling** - **Old config**: `pool_recycle=1800` (30 minutes) - **Problem**: Connections held too long; database may timeout/close them due to `wait_timeout` or network issues - **Result**: When SQLAlchemy tries to reuse connections, they're already dead ### 3. **Conflicting autocommit Setting** - **Old config**: `autocommit=True` in connect_args - **Problem**: When autocommit is enabled, there's nothing to rollback, but SQLAlchemy still tries during pool reset - **Result**: Rollback fails on dead connections → traceback logged ### 4. **Pool Reset on Dead Connections** - **Config**: `pool_reset_on_return="none"` (correct) but **didn't dispose pool on failure** - **Problem**: When a connection dies, the pool kept trying to reuse it - **Result**: Repeated failures until the next retry window (30 seconds) ### 5. **Network/Database Timeout Issues (NAS-specific)** - **Likely cause**: NAS MariaDB has aggressive connection timeouts - **Or**: Container network has higher packet loss/latency than Docker Desktop - **Or**: Pool exhaustion prevents new connections from being established --- ## Applied Fixes ### ✅ Fix 1: Conservative Connection Pool (Lines 183-195) ```python pool_size=3, # Reduced from 5 max_overflow=5, # Reduced from 10 pool_recycle=300, # Reduced from 1800 (every 5 mins vs 30 mins) autocommit=False, # Removed - let SQLAlchemy manage transactions ``` **Why this works:** - Fewer simultaneous connections = less resource contention - Aggressive recycling = avoids stale connections killed by database - Proper transaction management = cleaner rollback handling ### ✅ Fix 2: Pool Disposal on Connection Failure (Lines 530-533) ```python except exc.OperationalError as e: sql_engine.dispose() # ← CRITICAL: Force all connections to be closed/recreated logger.warning(f"Lost database connectivity: {e}") ``` **Why this works:** - When connection fails, dump the entire pool - Next connection attempt gets fresh connections - Avoids repeated failures trying to reuse dead connections ### ✅ Fix 3: Environment Variable Support (Lines 169-175) ```python DB_HOST = os.getenv("DB_HOST", "192.168.43.102") DB_PORT = int(os.getenv("DB_PORT", "3306")) # ... etc ``` **Why this matters:** - Different deployments can now use different database hosts - Docker Desktop can use `192.168.43.102` - NAS can use `mariadb` (Docker DNS) or different IP if needed --- ## Recommended MariaDB Configuration The NAS MariaDB should have appropriate timeout settings: ```sql -- Check current settings SHOW VARIABLES LIKE 'wait_timeout'; SHOW VARIABLES LIKE 'interactive_timeout'; SHOW VARIABLES LIKE 'max_connections'; SHOW VARIABLES LIKE 'max_allowed_packet'; -- Recommended settings (in /etc/mysql/mariadb.conf.d/50-server.cnf) [mysqld] wait_timeout = 600 # 10 minutes (allow idle connections longer) interactive_timeout = 600 max_connections = 100 # Ensure enough for pool + workbench max_allowed_packet = 64M ``` --- ## Deployment Instructions ### For Docker Desktop: ```bash # Use default or override in your compose docker-compose -f docker-compose.yml up ``` ### For NAS: Update your docker-compose or environment file: ```yaml environment: - DB_HOST=192.168.43.102 # or your NAS's actual IP/hostname - DB_PORT=3306 - DB_USER=weatherdata - DB_PASSWORD=cfCU$swM!HfK82%* - DB_NAME=weatherdata - DB_CONNECT_TIMEOUT=5 ``` --- ## Monitoring The application now logs database configuration at startup: ``` DB config: host=192.168.43.102:3306, user=weatherdata, db=weatherdata ``` Monitor the logs for: - **"Database reachable again"** → Connection recovered - **"Lost database connectivity"** → Transient failure detected and pool disposed - **"Stored batch locally"** → Data queued to SQLite while DB unavailable --- ## Testing ### Test 1: Verify Environment Variables ```bash # Run container with override docker run -e DB_HOST=test-host ... python datacollector.py # Check log: "DB config: host=test-host:3306" ``` ### Test 2: Simulate Connection Loss ```python # In Python shell connected to container import requests requests.get('http://container:port/shutdown') # Reconnect simulation # Should see: "Database still unreachable" → "Database reachable again" ``` ### Test 3: Monitor Pool State Enable pool logging: ```python echo_pool=True # Line 195 in datacollector.py ``` --- ## Expected Behavior After Fix - ✅ Connection pool adapts to transient failures - ✅ Stale connections are recycled frequently - ✅ Pool is disposed on failure to prevent cascading errors - ✅ Different environments can specify different hosts - ✅ Data is cached locally if database is temporarily unavailable