Pool sensor v2: VCC monitoring, database resilience, receiver improvements
- Added voltage monitoring table and storage pipeline - Extended pool payload to 17 bytes with VCC field (protocol v2) - Improved database connection pool resilience (reduced pool size, aggressive recycling, pool disposal on failure) - Added environment variable support for database configuration - Fixed receiver MQTT deprecation warning (CallbackAPIVersion.VERSION2) - Silenced excessive RSSI status logging in receiver - Added reset flag tracking and reporting - Updated Docker compose with DB config and log rotation limits
This commit is contained in:
176
DATABASE_CONNECTIVITY_FIXES.md
Normal file
176
DATABASE_CONNECTIVITY_FIXES.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Database Connectivity Issues - Analysis & Fixes
|
||||
|
||||
## Problem Summary
|
||||
|
||||
The NAS container experiences **intermittent database connectivity** failures with the error:
|
||||
```
|
||||
Exception during reset or similar
|
||||
_mysql_connector.MySQLInterfaceError: Lost connection to MySQL server during query
|
||||
```
|
||||
|
||||
While Docker for Desktop works reliably and MySQL Workbench can connect without issues.
|
||||
|
||||
---
|
||||
|
||||
## Root Causes Identified
|
||||
|
||||
### 1. **Aggressive Connection Pool Settings**
|
||||
- **Old config**: `pool_size=5` + `max_overflow=10` = up to 15 simultaneous connections
|
||||
- **Problem**: Creates excessive connections that exhaust database resources or trigger connection limits
|
||||
- **Result**: Pool reset failures when trying to return/reset dead connections
|
||||
|
||||
### 2. **Insufficient Connection Recycling**
|
||||
- **Old config**: `pool_recycle=1800` (30 minutes)
|
||||
- **Problem**: Connections held too long; database may timeout/close them due to `wait_timeout` or network issues
|
||||
- **Result**: When SQLAlchemy tries to reuse connections, they're already dead
|
||||
|
||||
### 3. **Conflicting autocommit Setting**
|
||||
- **Old config**: `autocommit=True` in connect_args
|
||||
- **Problem**: When autocommit is enabled, there's nothing to rollback, but SQLAlchemy still tries during pool reset
|
||||
- **Result**: Rollback fails on dead connections → traceback logged
|
||||
|
||||
### 4. **Pool Reset on Dead Connections**
|
||||
- **Config**: `pool_reset_on_return="none"` (correct) but **didn't dispose pool on failure**
|
||||
- **Problem**: When a connection dies, the pool kept trying to reuse it
|
||||
- **Result**: Repeated failures until the next retry window (30 seconds)
|
||||
|
||||
### 5. **Network/Database Timeout Issues (NAS-specific)**
|
||||
- **Likely cause**: NAS MariaDB has aggressive connection timeouts
|
||||
- **Or**: Container network has higher packet loss/latency than Docker Desktop
|
||||
- **Or**: Pool exhaustion prevents new connections from being established
|
||||
|
||||
---
|
||||
|
||||
## Applied Fixes
|
||||
|
||||
### ✅ Fix 1: Conservative Connection Pool (Lines 183-195)
|
||||
|
||||
```python
|
||||
pool_size=3, # Reduced from 5
|
||||
max_overflow=5, # Reduced from 10
|
||||
pool_recycle=300, # Reduced from 1800 (every 5 mins vs 30 mins)
|
||||
autocommit=False, # Removed - let SQLAlchemy manage transactions
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
- Fewer simultaneous connections = less resource contention
|
||||
- Aggressive recycling = avoids stale connections killed by database
|
||||
- Proper transaction management = cleaner rollback handling
|
||||
|
||||
### ✅ Fix 2: Pool Disposal on Connection Failure (Lines 530-533)
|
||||
|
||||
```python
|
||||
except exc.OperationalError as e:
|
||||
sql_engine.dispose() # ← CRITICAL: Force all connections to be closed/recreated
|
||||
logger.warning(f"Lost database connectivity: {e}")
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
- When connection fails, dump the entire pool
|
||||
- Next connection attempt gets fresh connections
|
||||
- Avoids repeated failures trying to reuse dead connections
|
||||
|
||||
### ✅ Fix 3: Environment Variable Support (Lines 169-175)
|
||||
|
||||
```python
|
||||
DB_HOST = os.getenv("DB_HOST", "192.168.43.102")
|
||||
DB_PORT = int(os.getenv("DB_PORT", "3306"))
|
||||
# ... etc
|
||||
```
|
||||
|
||||
**Why this matters:**
|
||||
- Different deployments can now use different database hosts
|
||||
- Docker Desktop can use `192.168.43.102`
|
||||
- NAS can use `mariadb` (Docker DNS) or different IP if needed
|
||||
|
||||
---
|
||||
|
||||
## Recommended MariaDB Configuration
|
||||
|
||||
The NAS MariaDB should have appropriate timeout settings:
|
||||
|
||||
```sql
|
||||
-- Check current settings
|
||||
SHOW VARIABLES LIKE 'wait_timeout';
|
||||
SHOW VARIABLES LIKE 'interactive_timeout';
|
||||
SHOW VARIABLES LIKE 'max_connections';
|
||||
SHOW VARIABLES LIKE 'max_allowed_packet';
|
||||
|
||||
-- Recommended settings (in /etc/mysql/mariadb.conf.d/50-server.cnf)
|
||||
[mysqld]
|
||||
wait_timeout = 600 # 10 minutes (allow idle connections longer)
|
||||
interactive_timeout = 600
|
||||
max_connections = 100 # Ensure enough for pool + workbench
|
||||
max_allowed_packet = 64M
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Instructions
|
||||
|
||||
### For Docker Desktop:
|
||||
```bash
|
||||
# Use default or override in your compose
|
||||
docker-compose -f docker-compose.yml up
|
||||
```
|
||||
|
||||
### For NAS:
|
||||
Update your docker-compose or environment file:
|
||||
```yaml
|
||||
environment:
|
||||
- DB_HOST=192.168.43.102 # or your NAS's actual IP/hostname
|
||||
- DB_PORT=3306
|
||||
- DB_USER=weatherdata
|
||||
- DB_PASSWORD=cfCU$swM!HfK82%*
|
||||
- DB_NAME=weatherdata
|
||||
- DB_CONNECT_TIMEOUT=5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
The application now logs database configuration at startup:
|
||||
```
|
||||
DB config: host=192.168.43.102:3306, user=weatherdata, db=weatherdata
|
||||
```
|
||||
|
||||
Monitor the logs for:
|
||||
- **"Database reachable again"** → Connection recovered
|
||||
- **"Lost database connectivity"** → Transient failure detected and pool disposed
|
||||
- **"Stored batch locally"** → Data queued to SQLite while DB unavailable
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Test 1: Verify Environment Variables
|
||||
```bash
|
||||
# Run container with override
|
||||
docker run -e DB_HOST=test-host ... python datacollector.py
|
||||
# Check log: "DB config: host=test-host:3306"
|
||||
```
|
||||
|
||||
### Test 2: Simulate Connection Loss
|
||||
```python
|
||||
# In Python shell connected to container
|
||||
import requests
|
||||
requests.get('http://container:port/shutdown') # Reconnect simulation
|
||||
# Should see: "Database still unreachable" → "Database reachable again"
|
||||
```
|
||||
|
||||
### Test 3: Monitor Pool State
|
||||
Enable pool logging:
|
||||
```python
|
||||
echo_pool=True # Line 195 in datacollector.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Expected Behavior After Fix
|
||||
|
||||
- ✅ Connection pool adapts to transient failures
|
||||
- ✅ Stale connections are recycled frequently
|
||||
- ✅ Pool is disposed on failure to prevent cascading errors
|
||||
- ✅ Different environments can specify different hosts
|
||||
- ✅ Data is cached locally if database is temporarily unavailable
|
||||
Reference in New Issue
Block a user