# Recommended Implementation: PPTX-to-PDF Conversion System ## Architecture Overview **Asynchronous server-side conversion with database tracking** ``` User Upload → API saves PPTX + DB entry → Job in Queue ↓ Client requests → API checks DB status → PDF ready? → Download PDF → Pending? → "Please wait" → Failed? → Retry/Error ``` ## 1. Database Schema ```sql CREATE TABLE media_files ( id UUID PRIMARY KEY, filename VARCHAR(255), original_path VARCHAR(512), file_type VARCHAR(10), mime_type VARCHAR(100), uploaded_at TIMESTAMP, updated_at TIMESTAMP ); CREATE TABLE conversions ( id UUID PRIMARY KEY, source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE, target_format VARCHAR(10), -- 'pdf' target_path VARCHAR(512), -- Path to generated PDF status VARCHAR(20), -- 'pending', 'processing', 'ready', 'failed' started_at TIMESTAMP, completed_at TIMESTAMP, error_message TEXT, file_hash VARCHAR(64) -- Hash of PPTX for cache invalidation ); CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format); ``` ## 2. Components ### **API Server (existing)** - Accepts uploads - Creates DB entries - Enqueues jobs - Delivers status and files ### **Background Worker (new)** - Runs as separate process in **same container** as API - Processes conversion jobs from queue - Can run multiple worker instances in parallel - Technology: Python RQ, Celery, or similar ### **Message Queue** - Redis (recommended for start - simple, fast) - Alternative: RabbitMQ for more features ### **Redis Container (new)** - Separate container for Redis - Handles job queue - Minimal resource footprint ## 3. Detailed Workflow ### **Upload Process:** ```python @app.post("/upload") async def upload_file(file): # 1. Save PPTX file_path = save_to_disk(file) # 2. DB entry for original file file_record = db.create_media_file({ 'filename': file.filename, 'original_path': file_path, 'file_type': 'pptx' }) # 3. Create conversion record conversion = db.create_conversion({ 'source_file_id': file_record.id, 'target_format': 'pdf', 'status': 'pending', 'file_hash': calculate_hash(file_path) }) # 4. Enqueue job (asynchronous!) queue.enqueue(convert_to_pdf, conversion.id) # 5. Return immediately to user return { 'file_id': file_record.id, 'status': 'uploaded', 'conversion_status': 'pending' } ``` ### **Worker Process:** ```python def convert_to_pdf(conversion_id): conversion = db.get_conversion(conversion_id) source_file = db.get_media_file(conversion.source_file_id) # Status update: processing db.update_conversion(conversion_id, { 'status': 'processing', 'started_at': now() }) try: # LibreOffice Conversion pdf_path = f"/data/converted/{conversion.id}.pdf" subprocess.run([ 'libreoffice', '--headless', '--convert-to', 'pdf', '--outdir', '/data/converted/', source_file.original_path ], check=True) # Success db.update_conversion(conversion_id, { 'status': 'ready', 'target_path': pdf_path, 'completed_at': now() }) except Exception as e: # Error db.update_conversion(conversion_id, { 'status': 'failed', 'error_message': str(e), 'completed_at': now() }) ``` ### **Client Download:** ```python @app.get("/files/{file_id}/display") async def get_display_file(file_id): file = db.get_media_file(file_id) # Only for PPTX: check PDF conversion if file.file_type == 'pptx': conversion = db.get_latest_conversion(file.id, target_format='pdf') if not conversion: # Shouldn't happen, but just to be safe trigger_new_conversion(file.id) return {'status': 'pending', 'message': 'Conversion is being created'} if conversion.status == 'ready': return FileResponse(conversion.target_path) elif conversion.status == 'failed': # Optional: Auto-retry trigger_new_conversion(file.id) return {'status': 'failed', 'error': conversion.error_message} else: # pending or processing return {'status': conversion.status, 'message': 'Please wait...'} # Serve other file types directly return FileResponse(file.original_path) ``` ## 4. Docker Setup ```yaml version: '3.8' services: # Your API Server api: build: ./api command: uvicorn main:app --host 0.0.0.0 --port 8000 ports: - "8000:8000" volumes: - ./data/uploads:/data/uploads - ./data/converted:/data/converted environment: - REDIS_URL=redis://redis:6379 - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen depends_on: - redis - postgres restart: unless-stopped # Worker (same codebase as API, different command) worker: build: ./api # Same build as API! command: python worker.py # or: rq worker volumes: - ./data/uploads:/data/uploads - ./data/converted:/data/converted environment: - REDIS_URL=redis://redis:6379 - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen depends_on: - redis - postgres restart: unless-stopped # Optional: Multiple workers deploy: replicas: 2 # Redis - separate container redis: image: redis:7-alpine volumes: - redis-data:/data # Optional: persistent configuration command: redis-server --appendonly yes restart: unless-stopped # Your existing Postgres postgres: image: postgres:15 environment: - POSTGRES_DB=infoscreen - POSTGRES_PASSWORD=password volumes: - postgres-data:/var/lib/postgresql/data restart: unless-stopped # Optional: Redis Commander (UI for debugging) redis-commander: image: rediscommander/redis-commander environment: - REDIS_HOSTS=local:redis:6379 ports: - "8081:8081" depends_on: - redis volumes: redis-data: postgres-data: ``` ## 5. Container Communication Containers communicate via **Docker's internal network**: ```python # In your API/Worker code: import redis # Connection to Redis redis_client = redis.from_url('redis://redis:6379') # ^^^^^^ # Container name = hostname in Docker network ``` Docker automatically creates DNS entries, so `redis` resolves to the Redis container. ## 6. Client Behavior (Pi5) ```python # On the Pi5 client def display_file(file_id): response = api.get(f"/files/{file_id}/display") if response.content_type == 'application/pdf': # PDF is ready download_and_display(response) subprocess.run(['impressive', downloaded_pdf]) elif response.json()['status'] in ['pending', 'processing']: # Wait and retry show_loading_screen("Presentation is being prepared...") time.sleep(5) display_file(file_id) # Retry else: # Error show_error_screen("Error loading presentation") ``` ## 7. Additional Features ### **Cache Invalidation on PPTX Update:** ```python @app.put("/files/{file_id}") async def update_file(file_id, new_file): # Delete old conversions db.mark_conversions_as_obsolete(file_id) # Update file update_media_file(file_id, new_file) # Trigger new conversion trigger_conversion(file_id, 'pdf') ``` ### **Status API for Monitoring:** ```python @app.get("/admin/conversions/status") async def get_conversion_stats(): return { 'pending': db.count(status='pending'), 'processing': db.count(status='processing'), 'failed': db.count(status='failed'), 'avg_duration_seconds': db.avg_duration() } ``` ### **Cleanup Job (Cronjob):** ```python def cleanup_old_conversions(): # Remove PDFs from deleted files db.delete_orphaned_conversions() # Clean up old failed conversions db.delete_old_failed_conversions(older_than_days=7) ``` ## 8. Redis Container Details ### **Why Separate Container?** ✅ **Separation of Concerns**: Each service has its own responsibility ✅ **Independent Lifecycle Management**: Redis can be restarted/updated independently ✅ **Better Scaling**: Redis can be moved to different hardware ✅ **Easier Backup**: Redis data can be backed up separately ✅ **Standard Docker Pattern**: Microservices architecture ### **Resource Usage:** - RAM: ~10-50 MB for your use case - CPU: Minimal - Disk: Only for persistence (optional) For 10 clients with occasional PPTX uploads, this is absolutely no problem. ## 9. Advantages of This Solution ✅ **Scalable**: Workers can be scaled horizontally ✅ **Performant**: Clients don't wait for conversion ✅ **Robust**: Status tracking and error handling ✅ **Maintainable**: Clear separation of responsibilities ✅ **Transparent**: Status queryable at any time ✅ **Efficient**: One-time conversion per file ✅ **Future-proof**: Easily extensible for other formats ✅ **Professional**: Industry-standard architecture ## 10. Migration Path ### **Phase 1 (MVP):** - 1 worker process in API container - Redis for queue (separate container) - Basic DB schema - Simple retry logic ### **Phase 2 (as needed):** - Multiple worker instances - Dedicated conversion service container - Monitoring & alerting - Prioritization logic - Advanced caching strategies **Start simple, scale when needed!** ## 11. Key Decisions Summary | Aspect | Decision | Reason | |--------|----------|--------| | **Conversion Location** | Server-side | One conversion per file, consistent results | | **Conversion Timing** | Asynchronous (on upload) | No client waiting time, predictable performance | | **Data Storage** | Database-tracked | Status visibility, robust error handling | | **Queue System** | Redis (separate container) | Standard pattern, scalable, maintainable | | **Worker Architecture** | Background process in API container | Simple start, easy to separate later | ## 12. File Flow Diagram ``` ┌─────────────┐ │ User Upload │ │ (PPTX) │ └──────┬──────┘ │ ▼ ┌──────────────────┐ │ API Server │ │ 1. Save PPTX │ │ 2. Create DB rec │ │ 3. Enqueue job │ └──────┬───────────┘ │ ▼ ┌──────────────────┐ │ Redis Queue │◄─────┐ └──────┬───────────┘ │ │ │ ▼ │ ┌──────────────────┐ │ │ Worker Process │ │ │ 1. Get job │ │ │ 2. Convert PPTX │ │ │ 3. Update DB │ │ └──────┬───────────┘ │ │ │ ▼ │ ┌──────────────────┐ │ │ PDF Storage │ │ └──────┬───────────┘ │ │ │ ▼ │ ┌──────────────────┐ │ │ Client Requests │ │ │ 1. Check DB │ │ │ 2. Download PDF │ │ │ 3. Display │──────┘ └──────────────────┘ (via impressive) ``` ## 13. Implementation Checklist ### Database Setup - [ ] Create `media_files` table - [ ] Create `conversions` table - [ ] Add indexes for performance - [ ] Set up foreign key constraints ### API Changes - [ ] Modify upload endpoint to create DB records - [ ] Add conversion job enqueueing - [ ] Implement file download endpoint with status checking - [ ] Add status API for monitoring - [ ] Implement cache invalidation on file update ### Worker Setup - [ ] Create worker script/module - [ ] Implement LibreOffice conversion logic - [ ] Add error handling and retry logic - [ ] Set up logging and monitoring ### Docker Configuration - [ ] Add Redis container to docker-compose.yml - [ ] Configure worker container - [ ] Set up volume mounts for file storage - [ ] Configure environment variables - [ ] Set up container dependencies ### Client Updates - [ ] Modify client to check conversion status - [ ] Implement retry logic for pending conversions - [ ] Add loading/waiting screens - [ ] Implement error handling ### Testing - [ ] Test upload → conversion → download flow - [ ] Test multiple concurrent conversions - [ ] Test error handling (corrupted PPTX, etc.) - [ ] Test cache invalidation on file update - [ ] Load test with multiple clients ### Monitoring & Operations - [ ] Set up logging for conversions - [ ] Implement cleanup job for old files - [ ] Add metrics for conversion times - [ ] Set up alerts for failed conversions - [ ] Document backup procedures --- **This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!**