Initial commit - copied workspace after database cleanup

2025-10-10 15:20:14 +00:00
commit 1efe40a03b
142 changed files with 23625 additions and 0 deletions
--- a/pptx_conversion_guide.md
+++ b/pptx_conversion_guide.md
@@ -0,0 +1,477 @@
+# Recommended Implementation: PPTX-to-PDF Conversion System
+
+## Architecture Overview
+
+**Asynchronous server-side conversion with database tracking**
+
+```
+User Upload → API saves PPTX + DB entry → Job in Queue 
+                                                ↓
+Client requests → API checks DB status → PDF ready? → Download PDF
+                                       → Pending? → "Please wait"
+                                       → Failed? → Retry/Error
+```
+
+## 1. Database Schema
+
+```sql
+CREATE TABLE media_files (
+    id UUID PRIMARY KEY,
+    filename VARCHAR(255),
+    original_path VARCHAR(512),
+    file_type VARCHAR(10),
+    mime_type VARCHAR(100),
+    uploaded_at TIMESTAMP,
+    updated_at TIMESTAMP
+);
+
+CREATE TABLE conversions (
+    id UUID PRIMARY KEY,
+    source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
+    target_format VARCHAR(10),          -- 'pdf'
+    target_path VARCHAR(512),           -- Path to generated PDF
+    status VARCHAR(20),                 -- 'pending', 'processing', 'ready', 'failed'
+    started_at TIMESTAMP,
+    completed_at TIMESTAMP,
+    error_message TEXT,
+    file_hash VARCHAR(64)               -- Hash of PPTX for cache invalidation
+);
+
+CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);
+```
+
+## 2. Components
+
+### **API Server (existing)**
+- Accepts uploads
+- Creates DB entries
+- Enqueues jobs
+- Delivers status and files
+
+### **Background Worker (new)**
+- Runs as separate process in **same container** as API
+- Processes conversion jobs from queue
+- Can run multiple worker instances in parallel
+- Technology: Python RQ, Celery, or similar
+
+### **Message Queue**
+- Redis (recommended for start - simple, fast)
+- Alternative: RabbitMQ for more features
+
+### **Redis Container (new)**
+- Separate container for Redis
+- Handles job queue
+- Minimal resource footprint
+
+## 3. Detailed Workflow
+
+### **Upload Process:**
+
+```python
+@app.post("/upload")
+async def upload_file(file):
+    # 1. Save PPTX
+    file_path = save_to_disk(file)
+    
+    # 2. DB entry for original file
+    file_record = db.create_media_file({
+        'filename': file.filename,
+        'original_path': file_path,
+        'file_type': 'pptx'
+    })
+    
+    # 3. Create conversion record
+    conversion = db.create_conversion({
+        'source_file_id': file_record.id,
+        'target_format': 'pdf',
+        'status': 'pending',
+        'file_hash': calculate_hash(file_path)
+    })
+    
+    # 4. Enqueue job (asynchronous!)
+    queue.enqueue(convert_to_pdf, conversion.id)
+    
+    # 5. Return immediately to user
+    return {
+        'file_id': file_record.id,
+        'status': 'uploaded',
+        'conversion_status': 'pending'
+    }
+```
+
+### **Worker Process:**
+
+```python
+def convert_to_pdf(conversion_id):
+    conversion = db.get_conversion(conversion_id)
+    source_file = db.get_media_file(conversion.source_file_id)
+    
+    # Status update: processing
+    db.update_conversion(conversion_id, {
+        'status': 'processing',
+        'started_at': now()
+    })
+    
+    try:
+        # LibreOffice Conversion
+        pdf_path = f"/data/converted/{conversion.id}.pdf"
+        subprocess.run([
+            'libreoffice',
+            '--headless',
+            '--convert-to', 'pdf',
+            '--outdir', '/data/converted/',
+            source_file.original_path
+        ], check=True)
+        
+        # Success
+        db.update_conversion(conversion_id, {
+            'status': 'ready',
+            'target_path': pdf_path,
+            'completed_at': now()
+        })
+        
+    except Exception as e:
+        # Error
+        db.update_conversion(conversion_id, {
+            'status': 'failed',
+            'error_message': str(e),
+            'completed_at': now()
+        })
+```
+
+### **Client Download:**
+
+```python
+@app.get("/files/{file_id}/display")
+async def get_display_file(file_id):
+    file = db.get_media_file(file_id)
+    
+    # Only for PPTX: check PDF conversion
+    if file.file_type == 'pptx':
+        conversion = db.get_latest_conversion(file.id, target_format='pdf')
+        
+        if not conversion:
+            # Shouldn't happen, but just to be safe
+            trigger_new_conversion(file.id)
+            return {'status': 'pending', 'message': 'Conversion is being created'}
+        
+        if conversion.status == 'ready':
+            return FileResponse(conversion.target_path)
+        
+        elif conversion.status == 'failed':
+            # Optional: Auto-retry
+            trigger_new_conversion(file.id)
+            return {'status': 'failed', 'error': conversion.error_message}
+        
+        else:  # pending or processing
+            return {'status': conversion.status, 'message': 'Please wait...'}
+    
+    # Serve other file types directly
+    return FileResponse(file.original_path)
+```
+
+## 4. Docker Setup
+
+```yaml
+version: '3.8'
+
+services:
+  # Your API Server
+  api:
+    build: ./api
+    command: uvicorn main:app --host 0.0.0.0 --port 8000
+    ports:
+      - "8000:8000"
+    volumes:
+      - ./data/uploads:/data/uploads
+      - ./data/converted:/data/converted
+    environment:
+      - REDIS_URL=redis://redis:6379
+      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
+    depends_on:
+      - redis
+      - postgres
+    restart: unless-stopped
+
+  # Worker (same codebase as API, different command)
+  worker:
+    build: ./api  # Same build as API!
+    command: python worker.py  # or: rq worker
+    volumes:
+      - ./data/uploads:/data/uploads
+      - ./data/converted:/data/converted
+    environment:
+      - REDIS_URL=redis://redis:6379
+      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
+    depends_on:
+      - redis
+      - postgres
+    restart: unless-stopped
+    # Optional: Multiple workers
+    deploy:
+      replicas: 2
+
+  # Redis - separate container
+  redis:
+    image: redis:7-alpine
+    volumes:
+      - redis-data:/data
+    # Optional: persistent configuration
+    command: redis-server --appendonly yes
+    restart: unless-stopped
+
+  # Your existing Postgres
+  postgres:
+    image: postgres:15
+    environment:
+      - POSTGRES_DB=infoscreen
+      - POSTGRES_PASSWORD=password
+    volumes:
+      - postgres-data:/var/lib/postgresql/data
+    restart: unless-stopped
+
+  # Optional: Redis Commander (UI for debugging)
+  redis-commander:
+    image: rediscommander/redis-commander
+    environment:
+      - REDIS_HOSTS=local:redis:6379
+    ports:
+      - "8081:8081"
+    depends_on:
+      - redis
+
+volumes:
+  redis-data:
+  postgres-data:
+```
+
+## 5. Container Communication
+
+Containers communicate via **Docker's internal network**:
+
+```python
+# In your API/Worker code:
+import redis
+
+# Connection to Redis
+redis_client = redis.from_url('redis://redis:6379')
+#                              ^^^^^^ 
+#                              Container name = hostname in Docker network
+```
+
+Docker automatically creates DNS entries, so `redis` resolves to the Redis container.
+
+## 6. Client Behavior (Pi5)
+
+```python
+# On the Pi5 client
+def display_file(file_id):
+    response = api.get(f"/files/{file_id}/display")
+    
+    if response.content_type == 'application/pdf':
+        # PDF is ready
+        download_and_display(response)
+        subprocess.run(['impressive', downloaded_pdf])
+    
+    elif response.json()['status'] in ['pending', 'processing']:
+        # Wait and retry
+        show_loading_screen("Presentation is being prepared...")
+        time.sleep(5)
+        display_file(file_id)  # Retry
+    
+    else:
+        # Error
+        show_error_screen("Error loading presentation")
+```
+
+## 7. Additional Features
+
+### **Cache Invalidation on PPTX Update:**
+
+```python
+@app.put("/files/{file_id}")
+async def update_file(file_id, new_file):
+    # Delete old conversions
+    db.mark_conversions_as_obsolete(file_id)
+    
+    # Update file
+    update_media_file(file_id, new_file)
+    
+    # Trigger new conversion
+    trigger_conversion(file_id, 'pdf')
+```
+
+### **Status API for Monitoring:**
+
+```python
+@app.get("/admin/conversions/status")
+async def get_conversion_stats():
+    return {
+        'pending': db.count(status='pending'),
+        'processing': db.count(status='processing'),
+        'failed': db.count(status='failed'),
+        'avg_duration_seconds': db.avg_duration()
+    }
+```
+
+### **Cleanup Job (Cronjob):**
+
+```python
+def cleanup_old_conversions():
+    # Remove PDFs from deleted files
+    db.delete_orphaned_conversions()
+    
+    # Clean up old failed conversions
+    db.delete_old_failed_conversions(older_than_days=7)
+```
+
+## 8. Redis Container Details
+
+### **Why Separate Container?**
+
+✅ **Separation of Concerns**: Each service has its own responsibility  
+✅ **Independent Lifecycle Management**: Redis can be restarted/updated independently  
+✅ **Better Scaling**: Redis can be moved to different hardware  
+✅ **Easier Backup**: Redis data can be backed up separately  
+✅ **Standard Docker Pattern**: Microservices architecture  
+
+### **Resource Usage:**
+- RAM: ~10-50 MB for your use case
+- CPU: Minimal
+- Disk: Only for persistence (optional)
+
+For 10 clients with occasional PPTX uploads, this is absolutely no problem.
+
+## 9. Advantages of This Solution
+
+✅ **Scalable**: Workers can be scaled horizontally  
+✅ **Performant**: Clients don't wait for conversion  
+✅ **Robust**: Status tracking and error handling  
+✅ **Maintainable**: Clear separation of responsibilities  
+✅ **Transparent**: Status queryable at any time  
+✅ **Efficient**: One-time conversion per file  
+✅ **Future-proof**: Easily extensible for other formats  
+✅ **Professional**: Industry-standard architecture  
+
+## 10. Migration Path
+
+### **Phase 1 (MVP):**
+- 1 worker process in API container
+- Redis for queue (separate container)
+- Basic DB schema
+- Simple retry logic
+
+### **Phase 2 (as needed):**
+- Multiple worker instances
+- Dedicated conversion service container
+- Monitoring & alerting
+- Prioritization logic
+- Advanced caching strategies
+
+**Start simple, scale when needed!**
+
+## 11. Key Decisions Summary
+
+| Aspect | Decision | Reason |
+|--------|----------|--------|
+| **Conversion Location** | Server-side | One conversion per file, consistent results |
+| **Conversion Timing** | Asynchronous (on upload) | No client waiting time, predictable performance |
+| **Data Storage** | Database-tracked | Status visibility, robust error handling |
+| **Queue System** | Redis (separate container) | Standard pattern, scalable, maintainable |
+| **Worker Architecture** | Background process in API container | Simple start, easy to separate later |
+
+## 12. File Flow Diagram
+
+```
+┌─────────────┐
+│ User Upload │
+│   (PPTX)    │
+└──────┬──────┘
+       │
+       ▼
+┌──────────────────┐
+│   API Server     │
+│ 1. Save PPTX     │
+│ 2. Create DB rec │
+│ 3. Enqueue job   │
+└──────┬───────────┘
+       │
+       ▼
+┌──────────────────┐
+│  Redis Queue     │◄─────┐
+└──────┬───────────┘      │
+       │                  │
+       ▼                  │
+┌──────────────────┐      │
+│  Worker Process  │      │
+│ 1. Get job       │      │
+│ 2. Convert PPTX  │      │
+│ 3. Update DB     │      │
+└──────┬───────────┘      │
+       │                  │
+       ▼                  │
+┌──────────────────┐      │
+│   PDF Storage    │      │
+└──────┬───────────┘      │
+       │                  │
+       ▼                  │
+┌──────────────────┐      │
+│ Client Requests  │      │
+│ 1. Check DB      │      │
+│ 2. Download PDF  │      │
+│ 3. Display       │──────┘
+└──────────────────┘
+  (via impressive)
+```
+
+## 13. Implementation Checklist
+
+### Database Setup
+- [ ] Create `media_files` table
+- [ ] Create `conversions` table
+- [ ] Add indexes for performance
+- [ ] Set up foreign key constraints
+
+### API Changes
+- [ ] Modify upload endpoint to create DB records
+- [ ] Add conversion job enqueueing
+- [ ] Implement file download endpoint with status checking
+- [ ] Add status API for monitoring
+- [ ] Implement cache invalidation on file update
+
+### Worker Setup
+- [ ] Create worker script/module
+- [ ] Implement LibreOffice conversion logic
+- [ ] Add error handling and retry logic
+- [ ] Set up logging and monitoring
+
+### Docker Configuration
+- [ ] Add Redis container to docker-compose.yml
+- [ ] Configure worker container
+- [ ] Set up volume mounts for file storage
+- [ ] Configure environment variables
+- [ ] Set up container dependencies
+
+### Client Updates
+- [ ] Modify client to check conversion status
+- [ ] Implement retry logic for pending conversions
+- [ ] Add loading/waiting screens
+- [ ] Implement error handling
+
+### Testing
+- [ ] Test upload → conversion → download flow
+- [ ] Test multiple concurrent conversions
+- [ ] Test error handling (corrupted PPTX, etc.)
+- [ ] Test cache invalidation on file update
+- [ ] Load test with multiple clients
+
+### Monitoring & Operations
+- [ ] Set up logging for conversions
+- [ ] Implement cleanup job for old files
+- [ ] Add metrics for conversion times
+- [ ] Set up alerts for failed conversions
+- [ ] Document backup procedures
+
+---
+
+**This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!**