DB/model Add Conversion model + ConversionStatus enum (pending, processing, ready, failed) Alembic migrations: create conversions table, indexes, unique (source_event_media_id, target_format, file_hash), and NOT NULL on file_hash API Enqueue on upload (ppt|pptx|odp) in routes/eventmedia.py: compute sha256, upsert Conversion, enqueue job New routes: POST /api/conversions/<media_id>/pdf — ensure/enqueue conversion GET /api/conversions/<media_id>/status — latest status/details GET /api/files/converted/<path> — serve converted PDFs Register conversions blueprint in wsgi Worker server/worker.py: convert_event_media_to_pdf Calls Gotenberg /forms/libreoffice/convert, writes to server/media/converted/ Updates Conversion status, timestamps, error messages Fix media root resolution to /server/media Prefer function enqueue over string path; expose server.worker in package init for RQ string compatibility Queue/infra server/task_queue.py: RQ queue helper (REDIS_URL, default redis://redis:6379/0) docker-compose: Add redis and gotenberg services Add worker service (rq worker conversions) Pass REDIS_URL and GOTENBERG_URL to server/worker Mount shared media volume in prod for API/worker parity docker-compose.override: Add dev redis/gotenberg/worker services Ensure PYTHONPATH + working_dir allow importing server.worker Use rq CLI instead of python -m rq for worker Dashboard dev: run as appropriate user/root and pre-create/chown caches to avoid EACCES Dashboard dev UX Vite: set cacheDir .vite to avoid EACCES in node_modules Disable Node inspector by default to avoid port conflicts Docs Update copilot-instructions.md with conversion system: flow, services, env vars, endpoints, storage paths, and data model
13 KiB
Recommended Implementation: PPTX-to-PDF Conversion System
Architecture Overview
Asynchronous server-side conversion with database tracking
User Upload → API saves PPTX + DB entry → Job in Queue
↓
Client requests → API checks DB status → PDF ready? → Download PDF
→ Pending? → "Please wait"
→ Failed? → Retry/Error
1. Database Schema
CREATE TABLE media_files (
id UUID PRIMARY KEY,
filename VARCHAR(255),
original_path VARCHAR(512),
file_type VARCHAR(10),
mime_type VARCHAR(100),
uploaded_at TIMESTAMP,
updated_at TIMESTAMP
);
CREATE TABLE conversions (
id UUID PRIMARY KEY,
source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
target_format VARCHAR(10), -- 'pdf'
target_path VARCHAR(512), -- Path to generated PDF
status VARCHAR(20), -- 'pending', 'processing', 'ready', 'failed'
started_at TIMESTAMP,
completed_at TIMESTAMP,
error_message TEXT,
file_hash VARCHAR(64) -- Hash of PPTX for cache invalidation
);
CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);
2. Components
API Server (existing)
- Accepts uploads
- Creates DB entries
- Enqueues jobs
- Delivers status and files
Background Worker (new)
- Runs as separate process in same container as API
- Processes conversion jobs from queue
- Can run multiple worker instances in parallel
- Technology: Python RQ, Celery, or similar
Message Queue
- Redis (recommended for start - simple, fast)
- Alternative: RabbitMQ for more features
Redis Container (new)
- Separate container for Redis
- Handles job queue
- Minimal resource footprint
3. Detailed Workflow
Upload Process:
@app.post("/upload")
async def upload_file(file):
# 1. Save PPTX
file_path = save_to_disk(file)
# 2. DB entry for original file
file_record = db.create_media_file({
'filename': file.filename,
'original_path': file_path,
'file_type': 'pptx'
})
# 3. Create conversion record
conversion = db.create_conversion({
'source_file_id': file_record.id,
'target_format': 'pdf',
'status': 'pending',
'file_hash': calculate_hash(file_path)
})
# 4. Enqueue job (asynchronous!)
queue.enqueue(convert_to_pdf, conversion.id)
# 5. Return immediately to user
return {
'file_id': file_record.id,
'status': 'uploaded',
'conversion_status': 'pending'
}
Worker Process:
def convert_to_pdf(conversion_id):
conversion = db.get_conversion(conversion_id)
source_file = db.get_media_file(conversion.source_file_id)
# Status update: processing
db.update_conversion(conversion_id, {
'status': 'processing',
'started_at': now()
})
try:
# LibreOffice Conversion
pdf_path = f"/data/converted/{conversion.id}.pdf"
subprocess.run([
'libreoffice',
'--headless',
'--convert-to', 'pdf',
'--outdir', '/data/converted/',
source_file.original_path
], check=True)
# Success
db.update_conversion(conversion_id, {
'status': 'ready',
'target_path': pdf_path,
'completed_at': now()
})
except Exception as e:
# Error
db.update_conversion(conversion_id, {
'status': 'failed',
'error_message': str(e),
'completed_at': now()
})
Client Download:
@app.get("/files/{file_id}/display")
async def get_display_file(file_id):
file = db.get_media_file(file_id)
# Only for PPTX: check PDF conversion
if file.file_type == 'pptx':
conversion = db.get_latest_conversion(file.id, target_format='pdf')
if not conversion:
# Shouldn't happen, but just to be safe
trigger_new_conversion(file.id)
return {'status': 'pending', 'message': 'Conversion is being created'}
if conversion.status == 'ready':
return FileResponse(conversion.target_path)
elif conversion.status == 'failed':
# Optional: Auto-retry
trigger_new_conversion(file.id)
return {'status': 'failed', 'error': conversion.error_message}
else: # pending or processing
return {'status': conversion.status, 'message': 'Please wait...'}
# Serve other file types directly
return FileResponse(file.original_path)
4. Docker Setup
version: '3.8'
services:
# Your API Server
api:
build: ./api
command: uvicorn main:app --host 0.0.0.0 --port 8000
ports:
- "8000:8000"
volumes:
- ./data/uploads:/data/uploads
- ./data/converted:/data/converted
environment:
- REDIS_URL=redis://redis:6379
- DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
depends_on:
- redis
- postgres
restart: unless-stopped
# Worker (same codebase as API, different command)
worker:
build: ./api # Same build as API!
command: python worker.py # or: rq worker
volumes:
- ./data/uploads:/data/uploads
- ./data/converted:/data/converted
environment:
- REDIS_URL=redis://redis:6379
- DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
depends_on:
- redis
- postgres
restart: unless-stopped
# Optional: Multiple workers
deploy:
replicas: 2
# Redis - separate container
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
# Optional: persistent configuration
command: redis-server --appendonly yes
restart: unless-stopped
# Your existing Postgres
postgres:
image: postgres:15
environment:
- POSTGRES_DB=infoscreen
- POSTGRES_PASSWORD=password
volumes:
- postgres-data:/var/lib/postgresql/data
restart: unless-stopped
# Optional: Redis Commander (UI for debugging)
redis-commander:
image: rediscommander/redis-commander
environment:
- REDIS_HOSTS=local:redis:6379
ports:
- "8081:8081"
depends_on:
- redis
volumes:
redis-data:
postgres-data:
5. Container Communication
Containers communicate via Docker's internal network:
# In your API/Worker code:
import redis
# Connection to Redis
redis_client = redis.from_url('redis://redis:6379')
# ^^^^^^
# Container name = hostname in Docker network
Docker automatically creates DNS entries, so redis resolves to the Redis container.
6. Client Behavior (Pi5)
# On the Pi5 client
def display_file(file_id):
response = api.get(f"/files/{file_id}/display")
if response.content_type == 'application/pdf':
# PDF is ready
download_and_display(response)
subprocess.run(['impressive', downloaded_pdf])
elif response.json()['status'] in ['pending', 'processing']:
# Wait and retry
show_loading_screen("Presentation is being prepared...")
time.sleep(5)
display_file(file_id) # Retry
else:
# Error
show_error_screen("Error loading presentation")
7. Additional Features
Cache Invalidation on PPTX Update:
@app.put("/files/{file_id}")
async def update_file(file_id, new_file):
# Delete old conversions
db.mark_conversions_as_obsolete(file_id)
# Update file
update_media_file(file_id, new_file)
# Trigger new conversion
trigger_conversion(file_id, 'pdf')
Status API for Monitoring:
@app.get("/admin/conversions/status")
async def get_conversion_stats():
return {
'pending': db.count(status='pending'),
'processing': db.count(status='processing'),
'failed': db.count(status='failed'),
'avg_duration_seconds': db.avg_duration()
}
Cleanup Job (Cronjob):
def cleanup_old_conversions():
# Remove PDFs from deleted files
db.delete_orphaned_conversions()
# Clean up old failed conversions
db.delete_old_failed_conversions(older_than_days=7)
8. Redis Container Details
Why Separate Container?
✅ Separation of Concerns: Each service has its own responsibility
✅ Independent Lifecycle Management: Redis can be restarted/updated independently
✅ Better Scaling: Redis can be moved to different hardware
✅ Easier Backup: Redis data can be backed up separately
✅ Standard Docker Pattern: Microservices architecture
Resource Usage:
- RAM: ~10-50 MB for your use case
- CPU: Minimal
- Disk: Only for persistence (optional)
For 10 clients with occasional PPTX uploads, this is absolutely no problem.
9. Advantages of This Solution
✅ Scalable: Workers can be scaled horizontally
✅ Performant: Clients don't wait for conversion
✅ Robust: Status tracking and error handling
✅ Maintainable: Clear separation of responsibilities
✅ Transparent: Status queryable at any time
✅ Efficient: One-time conversion per file
✅ Future-proof: Easily extensible for other formats
✅ Professional: Industry-standard architecture
10. Migration Path
Phase 1 (MVP):
- 1 worker process in API container
- Redis for queue (separate container)
- Basic DB schema
- Simple retry logic
Phase 2 (as needed):
- Multiple worker instances
- Dedicated conversion service container
- Monitoring & alerting
- Prioritization logic
- Advanced caching strategies
Start simple, scale when needed!
11. Key Decisions Summary
| Aspect | Decision | Reason |
|---|---|---|
| Conversion Location | Server-side | One conversion per file, consistent results |
| Conversion Timing | Asynchronous (on upload) | No client waiting time, predictable performance |
| Data Storage | Database-tracked | Status visibility, robust error handling |
| Queue System | Redis (separate container) | Standard pattern, scalable, maintainable |
| Worker Architecture | Background process in API container | Simple start, easy to separate later |
12. File Flow Diagram
┌─────────────┐
│ User Upload │
│ (PPTX) │
└──────┬──────┘
│
▼
┌──────────────────┐
│ API Server │
│ 1. Save PPTX │
│ 2. Create DB rec │
│ 3. Enqueue job │
└──────┬───────────┘
│
▼
┌──────────────────┐
│ Redis Queue │◄─────┐
└──────┬───────────┘ │
│ │
▼ │
┌──────────────────┐ │
│ Worker Process │ │
│ 1. Get job │ │
│ 2. Convert PPTX │ │
│ 3. Update DB │ │
└──────┬───────────┘ │
│ │
▼ │
┌──────────────────┐ │
│ PDF Storage │ │
└──────┬───────────┘ │
│ │
▼ │
┌──────────────────┐ │
│ Client Requests │ │
│ 1. Check DB │ │
│ 2. Download PDF │ │
│ 3. Display │──────┘
└──────────────────┘
(via impressive)
13. Implementation Checklist
Database Setup
- Create
media_filestable - Create
conversionstable - Add indexes for performance
- Set up foreign key constraints
API Changes
- Modify upload endpoint to create DB records
- Add conversion job enqueueing
- Implement file download endpoint with status checking
- Add status API for monitoring
- Implement cache invalidation on file update
Worker Setup
- Create worker script/module
- Implement LibreOffice conversion logic
- Add error handling and retry logic
- Set up logging and monitoring
Docker Configuration
- Add Redis container to docker-compose.yml
- Configure worker container
- Set up volume mounts for file storage
- Configure environment variables
- Set up container dependencies
Client Updates
- Modify client to check conversion status
- Implement retry logic for pending conversions
- Add loading/waiting screens
- Implement error handling
Testing
- Test upload → conversion → download flow
- Test multiple concurrent conversions
- Test error handling (corrupted PPTX, etc.)
- Test cache invalidation on file update
- Load test with multiple clients
Monitoring & Operations
- Set up logging for conversions
- Implement cleanup job for old files
- Add metrics for conversion times
- Set up alerts for failed conversions
- Document backup procedures
This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!