DB/model Add Conversion model + ConversionStatus enum (pending, processing, ready, failed) Alembic migrations: create conversions table, indexes, unique (source_event_media_id, target_format, file_hash), and NOT NULL on file_hash API Enqueue on upload (ppt|pptx|odp) in routes/eventmedia.py: compute sha256, upsert Conversion, enqueue job New routes: POST /api/conversions/<media_id>/pdf — ensure/enqueue conversion GET /api/conversions/<media_id>/status — latest status/details GET /api/files/converted/<path> — serve converted PDFs Register conversions blueprint in wsgi Worker server/worker.py: convert_event_media_to_pdf Calls Gotenberg /forms/libreoffice/convert, writes to server/media/converted/ Updates Conversion status, timestamps, error messages Fix media root resolution to /server/media Prefer function enqueue over string path; expose server.worker in package init for RQ string compatibility Queue/infra server/task_queue.py: RQ queue helper (REDIS_URL, default redis://redis:6379/0) docker-compose: Add redis and gotenberg services Add worker service (rq worker conversions) Pass REDIS_URL and GOTENBERG_URL to server/worker Mount shared media volume in prod for API/worker parity docker-compose.override: Add dev redis/gotenberg/worker services Ensure PYTHONPATH + working_dir allow importing server.worker Use rq CLI instead of python -m rq for worker Dashboard dev: run as appropriate user/root and pre-create/chown caches to avoid EACCES Dashboard dev UX Vite: set cacheDir .vite to avoid EACCES in node_modules Disable Node inspector by default to avoid port conflicts Docs Update copilot-instructions.md with conversion system: flow, services, env vars, endpoints, storage paths, and data model
477 lines
13 KiB
Markdown
477 lines
13 KiB
Markdown
# Recommended Implementation: PPTX-to-PDF Conversion System
|
|
|
|
## Architecture Overview
|
|
|
|
**Asynchronous server-side conversion with database tracking**
|
|
|
|
```
|
|
User Upload → API saves PPTX + DB entry → Job in Queue
|
|
↓
|
|
Client requests → API checks DB status → PDF ready? → Download PDF
|
|
→ Pending? → "Please wait"
|
|
→ Failed? → Retry/Error
|
|
```
|
|
|
|
## 1. Database Schema
|
|
|
|
```sql
|
|
CREATE TABLE media_files (
|
|
id UUID PRIMARY KEY,
|
|
filename VARCHAR(255),
|
|
original_path VARCHAR(512),
|
|
file_type VARCHAR(10),
|
|
mime_type VARCHAR(100),
|
|
uploaded_at TIMESTAMP,
|
|
updated_at TIMESTAMP
|
|
);
|
|
|
|
CREATE TABLE conversions (
|
|
id UUID PRIMARY KEY,
|
|
source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
|
|
target_format VARCHAR(10), -- 'pdf'
|
|
target_path VARCHAR(512), -- Path to generated PDF
|
|
status VARCHAR(20), -- 'pending', 'processing', 'ready', 'failed'
|
|
started_at TIMESTAMP,
|
|
completed_at TIMESTAMP,
|
|
error_message TEXT,
|
|
file_hash VARCHAR(64) -- Hash of PPTX for cache invalidation
|
|
);
|
|
|
|
CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);
|
|
```
|
|
|
|
## 2. Components
|
|
|
|
### **API Server (existing)**
|
|
- Accepts uploads
|
|
- Creates DB entries
|
|
- Enqueues jobs
|
|
- Delivers status and files
|
|
|
|
### **Background Worker (new)**
|
|
- Runs as separate process in **same container** as API
|
|
- Processes conversion jobs from queue
|
|
- Can run multiple worker instances in parallel
|
|
- Technology: Python RQ, Celery, or similar
|
|
|
|
### **Message Queue**
|
|
- Redis (recommended for start - simple, fast)
|
|
- Alternative: RabbitMQ for more features
|
|
|
|
### **Redis Container (new)**
|
|
- Separate container for Redis
|
|
- Handles job queue
|
|
- Minimal resource footprint
|
|
|
|
## 3. Detailed Workflow
|
|
|
|
### **Upload Process:**
|
|
|
|
```python
|
|
@app.post("/upload")
|
|
async def upload_file(file):
|
|
# 1. Save PPTX
|
|
file_path = save_to_disk(file)
|
|
|
|
# 2. DB entry for original file
|
|
file_record = db.create_media_file({
|
|
'filename': file.filename,
|
|
'original_path': file_path,
|
|
'file_type': 'pptx'
|
|
})
|
|
|
|
# 3. Create conversion record
|
|
conversion = db.create_conversion({
|
|
'source_file_id': file_record.id,
|
|
'target_format': 'pdf',
|
|
'status': 'pending',
|
|
'file_hash': calculate_hash(file_path)
|
|
})
|
|
|
|
# 4. Enqueue job (asynchronous!)
|
|
queue.enqueue(convert_to_pdf, conversion.id)
|
|
|
|
# 5. Return immediately to user
|
|
return {
|
|
'file_id': file_record.id,
|
|
'status': 'uploaded',
|
|
'conversion_status': 'pending'
|
|
}
|
|
```
|
|
|
|
### **Worker Process:**
|
|
|
|
```python
|
|
def convert_to_pdf(conversion_id):
|
|
conversion = db.get_conversion(conversion_id)
|
|
source_file = db.get_media_file(conversion.source_file_id)
|
|
|
|
# Status update: processing
|
|
db.update_conversion(conversion_id, {
|
|
'status': 'processing',
|
|
'started_at': now()
|
|
})
|
|
|
|
try:
|
|
# LibreOffice Conversion
|
|
pdf_path = f"/data/converted/{conversion.id}.pdf"
|
|
subprocess.run([
|
|
'libreoffice',
|
|
'--headless',
|
|
'--convert-to', 'pdf',
|
|
'--outdir', '/data/converted/',
|
|
source_file.original_path
|
|
], check=True)
|
|
|
|
# Success
|
|
db.update_conversion(conversion_id, {
|
|
'status': 'ready',
|
|
'target_path': pdf_path,
|
|
'completed_at': now()
|
|
})
|
|
|
|
except Exception as e:
|
|
# Error
|
|
db.update_conversion(conversion_id, {
|
|
'status': 'failed',
|
|
'error_message': str(e),
|
|
'completed_at': now()
|
|
})
|
|
```
|
|
|
|
### **Client Download:**
|
|
|
|
```python
|
|
@app.get("/files/{file_id}/display")
|
|
async def get_display_file(file_id):
|
|
file = db.get_media_file(file_id)
|
|
|
|
# Only for PPTX: check PDF conversion
|
|
if file.file_type == 'pptx':
|
|
conversion = db.get_latest_conversion(file.id, target_format='pdf')
|
|
|
|
if not conversion:
|
|
# Shouldn't happen, but just to be safe
|
|
trigger_new_conversion(file.id)
|
|
return {'status': 'pending', 'message': 'Conversion is being created'}
|
|
|
|
if conversion.status == 'ready':
|
|
return FileResponse(conversion.target_path)
|
|
|
|
elif conversion.status == 'failed':
|
|
# Optional: Auto-retry
|
|
trigger_new_conversion(file.id)
|
|
return {'status': 'failed', 'error': conversion.error_message}
|
|
|
|
else: # pending or processing
|
|
return {'status': conversion.status, 'message': 'Please wait...'}
|
|
|
|
# Serve other file types directly
|
|
return FileResponse(file.original_path)
|
|
```
|
|
|
|
## 4. Docker Setup
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
# Your API Server
|
|
api:
|
|
build: ./api
|
|
command: uvicorn main:app --host 0.0.0.0 --port 8000
|
|
ports:
|
|
- "8000:8000"
|
|
volumes:
|
|
- ./data/uploads:/data/uploads
|
|
- ./data/converted:/data/converted
|
|
environment:
|
|
- REDIS_URL=redis://redis:6379
|
|
- DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
|
|
depends_on:
|
|
- redis
|
|
- postgres
|
|
restart: unless-stopped
|
|
|
|
# Worker (same codebase as API, different command)
|
|
worker:
|
|
build: ./api # Same build as API!
|
|
command: python worker.py # or: rq worker
|
|
volumes:
|
|
- ./data/uploads:/data/uploads
|
|
- ./data/converted:/data/converted
|
|
environment:
|
|
- REDIS_URL=redis://redis:6379
|
|
- DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
|
|
depends_on:
|
|
- redis
|
|
- postgres
|
|
restart: unless-stopped
|
|
# Optional: Multiple workers
|
|
deploy:
|
|
replicas: 2
|
|
|
|
# Redis - separate container
|
|
redis:
|
|
image: redis:7-alpine
|
|
volumes:
|
|
- redis-data:/data
|
|
# Optional: persistent configuration
|
|
command: redis-server --appendonly yes
|
|
restart: unless-stopped
|
|
|
|
# Your existing Postgres
|
|
postgres:
|
|
image: postgres:15
|
|
environment:
|
|
- POSTGRES_DB=infoscreen
|
|
- POSTGRES_PASSWORD=password
|
|
volumes:
|
|
- postgres-data:/var/lib/postgresql/data
|
|
restart: unless-stopped
|
|
|
|
# Optional: Redis Commander (UI for debugging)
|
|
redis-commander:
|
|
image: rediscommander/redis-commander
|
|
environment:
|
|
- REDIS_HOSTS=local:redis:6379
|
|
ports:
|
|
- "8081:8081"
|
|
depends_on:
|
|
- redis
|
|
|
|
volumes:
|
|
redis-data:
|
|
postgres-data:
|
|
```
|
|
|
|
## 5. Container Communication
|
|
|
|
Containers communicate via **Docker's internal network**:
|
|
|
|
```python
|
|
# In your API/Worker code:
|
|
import redis
|
|
|
|
# Connection to Redis
|
|
redis_client = redis.from_url('redis://redis:6379')
|
|
# ^^^^^^
|
|
# Container name = hostname in Docker network
|
|
```
|
|
|
|
Docker automatically creates DNS entries, so `redis` resolves to the Redis container.
|
|
|
|
## 6. Client Behavior (Pi5)
|
|
|
|
```python
|
|
# On the Pi5 client
|
|
def display_file(file_id):
|
|
response = api.get(f"/files/{file_id}/display")
|
|
|
|
if response.content_type == 'application/pdf':
|
|
# PDF is ready
|
|
download_and_display(response)
|
|
subprocess.run(['impressive', downloaded_pdf])
|
|
|
|
elif response.json()['status'] in ['pending', 'processing']:
|
|
# Wait and retry
|
|
show_loading_screen("Presentation is being prepared...")
|
|
time.sleep(5)
|
|
display_file(file_id) # Retry
|
|
|
|
else:
|
|
# Error
|
|
show_error_screen("Error loading presentation")
|
|
```
|
|
|
|
## 7. Additional Features
|
|
|
|
### **Cache Invalidation on PPTX Update:**
|
|
|
|
```python
|
|
@app.put("/files/{file_id}")
|
|
async def update_file(file_id, new_file):
|
|
# Delete old conversions
|
|
db.mark_conversions_as_obsolete(file_id)
|
|
|
|
# Update file
|
|
update_media_file(file_id, new_file)
|
|
|
|
# Trigger new conversion
|
|
trigger_conversion(file_id, 'pdf')
|
|
```
|
|
|
|
### **Status API for Monitoring:**
|
|
|
|
```python
|
|
@app.get("/admin/conversions/status")
|
|
async def get_conversion_stats():
|
|
return {
|
|
'pending': db.count(status='pending'),
|
|
'processing': db.count(status='processing'),
|
|
'failed': db.count(status='failed'),
|
|
'avg_duration_seconds': db.avg_duration()
|
|
}
|
|
```
|
|
|
|
### **Cleanup Job (Cronjob):**
|
|
|
|
```python
|
|
def cleanup_old_conversions():
|
|
# Remove PDFs from deleted files
|
|
db.delete_orphaned_conversions()
|
|
|
|
# Clean up old failed conversions
|
|
db.delete_old_failed_conversions(older_than_days=7)
|
|
```
|
|
|
|
## 8. Redis Container Details
|
|
|
|
### **Why Separate Container?**
|
|
|
|
✅ **Separation of Concerns**: Each service has its own responsibility
|
|
✅ **Independent Lifecycle Management**: Redis can be restarted/updated independently
|
|
✅ **Better Scaling**: Redis can be moved to different hardware
|
|
✅ **Easier Backup**: Redis data can be backed up separately
|
|
✅ **Standard Docker Pattern**: Microservices architecture
|
|
|
|
### **Resource Usage:**
|
|
- RAM: ~10-50 MB for your use case
|
|
- CPU: Minimal
|
|
- Disk: Only for persistence (optional)
|
|
|
|
For 10 clients with occasional PPTX uploads, this is absolutely no problem.
|
|
|
|
## 9. Advantages of This Solution
|
|
|
|
✅ **Scalable**: Workers can be scaled horizontally
|
|
✅ **Performant**: Clients don't wait for conversion
|
|
✅ **Robust**: Status tracking and error handling
|
|
✅ **Maintainable**: Clear separation of responsibilities
|
|
✅ **Transparent**: Status queryable at any time
|
|
✅ **Efficient**: One-time conversion per file
|
|
✅ **Future-proof**: Easily extensible for other formats
|
|
✅ **Professional**: Industry-standard architecture
|
|
|
|
## 10. Migration Path
|
|
|
|
### **Phase 1 (MVP):**
|
|
- 1 worker process in API container
|
|
- Redis for queue (separate container)
|
|
- Basic DB schema
|
|
- Simple retry logic
|
|
|
|
### **Phase 2 (as needed):**
|
|
- Multiple worker instances
|
|
- Dedicated conversion service container
|
|
- Monitoring & alerting
|
|
- Prioritization logic
|
|
- Advanced caching strategies
|
|
|
|
**Start simple, scale when needed!**
|
|
|
|
## 11. Key Decisions Summary
|
|
|
|
| Aspect | Decision | Reason |
|
|
|--------|----------|--------|
|
|
| **Conversion Location** | Server-side | One conversion per file, consistent results |
|
|
| **Conversion Timing** | Asynchronous (on upload) | No client waiting time, predictable performance |
|
|
| **Data Storage** | Database-tracked | Status visibility, robust error handling |
|
|
| **Queue System** | Redis (separate container) | Standard pattern, scalable, maintainable |
|
|
| **Worker Architecture** | Background process in API container | Simple start, easy to separate later |
|
|
|
|
## 12. File Flow Diagram
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ User Upload │
|
|
│ (PPTX) │
|
|
└──────┬──────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ API Server │
|
|
│ 1. Save PPTX │
|
|
│ 2. Create DB rec │
|
|
│ 3. Enqueue job │
|
|
└──────┬───────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Redis Queue │◄─────┐
|
|
└──────┬───────────┘ │
|
|
│ │
|
|
▼ │
|
|
┌──────────────────┐ │
|
|
│ Worker Process │ │
|
|
│ 1. Get job │ │
|
|
│ 2. Convert PPTX │ │
|
|
│ 3. Update DB │ │
|
|
└──────┬───────────┘ │
|
|
│ │
|
|
▼ │
|
|
┌──────────────────┐ │
|
|
│ PDF Storage │ │
|
|
└──────┬───────────┘ │
|
|
│ │
|
|
▼ │
|
|
┌──────────────────┐ │
|
|
│ Client Requests │ │
|
|
│ 1. Check DB │ │
|
|
│ 2. Download PDF │ │
|
|
│ 3. Display │──────┘
|
|
└──────────────────┘
|
|
(via impressive)
|
|
```
|
|
|
|
## 13. Implementation Checklist
|
|
|
|
### Database Setup
|
|
- [ ] Create `media_files` table
|
|
- [ ] Create `conversions` table
|
|
- [ ] Add indexes for performance
|
|
- [ ] Set up foreign key constraints
|
|
|
|
### API Changes
|
|
- [ ] Modify upload endpoint to create DB records
|
|
- [ ] Add conversion job enqueueing
|
|
- [ ] Implement file download endpoint with status checking
|
|
- [ ] Add status API for monitoring
|
|
- [ ] Implement cache invalidation on file update
|
|
|
|
### Worker Setup
|
|
- [ ] Create worker script/module
|
|
- [ ] Implement LibreOffice conversion logic
|
|
- [ ] Add error handling and retry logic
|
|
- [ ] Set up logging and monitoring
|
|
|
|
### Docker Configuration
|
|
- [ ] Add Redis container to docker-compose.yml
|
|
- [ ] Configure worker container
|
|
- [ ] Set up volume mounts for file storage
|
|
- [ ] Configure environment variables
|
|
- [ ] Set up container dependencies
|
|
|
|
### Client Updates
|
|
- [ ] Modify client to check conversion status
|
|
- [ ] Implement retry logic for pending conversions
|
|
- [ ] Add loading/waiting screens
|
|
- [ ] Implement error handling
|
|
|
|
### Testing
|
|
- [ ] Test upload → conversion → download flow
|
|
- [ ] Test multiple concurrent conversions
|
|
- [ ] Test error handling (corrupted PPTX, etc.)
|
|
- [ ] Test cache invalidation on file update
|
|
- [ ] Load test with multiple clients
|
|
|
|
### Monitoring & Operations
|
|
- [ ] Set up logging for conversions
|
|
- [ ] Implement cleanup job for old files
|
|
- [ ] Add metrics for conversion times
|
|
- [ ] Set up alerts for failed conversions
|
|
- [ ] Document backup procedures
|
|
|
|
---
|
|
|
|
**This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!** |