816 lines
24 KiB
Markdown
816 lines
24 KiB
Markdown
# Recommended Implementation: PPTX-to-PDF Conversion System with Gotenberg
|
||
|
||
## Architecture Overview
|
||
|
||
**Asynchronous server-side conversion using Gotenberg with shared storage**
|
||
|
||
```
|
||
User Upload → API saves PPTX → Job in Queue → Worker calls Gotenberg API
|
||
↓
|
||
Gotenberg converts via shared volume
|
||
↓
|
||
Client requests → API checks DB status → PDF ready? → Download PDF from shared storage
|
||
→ Pending? → "Please wait"
|
||
→ Failed? → Retry/Error
|
||
```
|
||
|
||
## 1. Database Schema
|
||
|
||
```sql
|
||
CREATE TABLE media_files (
|
||
id UUID PRIMARY KEY,
|
||
filename VARCHAR(255),
|
||
original_path VARCHAR(512),
|
||
file_type VARCHAR(10),
|
||
mime_type VARCHAR(100),
|
||
uploaded_at TIMESTAMP,
|
||
updated_at TIMESTAMP
|
||
);
|
||
|
||
CREATE TABLE conversions (
|
||
id UUID PRIMARY KEY,
|
||
source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
|
||
target_format VARCHAR(10), -- 'pdf'
|
||
target_path VARCHAR(512), -- Path to generated PDF
|
||
status VARCHAR(20), -- 'pending', 'processing', 'ready', 'failed'
|
||
started_at TIMESTAMP,
|
||
completed_at TIMESTAMP,
|
||
error_message TEXT,
|
||
file_hash VARCHAR(64) -- Hash of PPTX for cache invalidation
|
||
);
|
||
|
||
CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);
|
||
```
|
||
|
||
## 2. Components
|
||
|
||
### **API Server (existing)**
|
||
- Accepts uploads
|
||
- Creates DB entries
|
||
- Enqueues jobs
|
||
- Delivers status and files
|
||
|
||
### **Background Worker (new)**
|
||
- Runs as separate process in **same container** as API
|
||
- Processes conversion jobs from queue
|
||
- Calls Gotenberg API for conversion
|
||
- Updates database with results
|
||
- Technology: Python RQ, Celery, or similar
|
||
|
||
### **Gotenberg Container (new)**
|
||
- Dedicated conversion service
|
||
- HTTP API for document conversion
|
||
- Handles LibreOffice conversions internally
|
||
- Accesses files via shared volume
|
||
|
||
### **Message Queue**
|
||
- Redis (recommended for start - simple, fast)
|
||
- Alternative: RabbitMQ for more features
|
||
|
||
### **Redis Container (separate)**
|
||
- Handles job queue
|
||
- Minimal resource footprint
|
||
|
||
### **Shared Storage**
|
||
- Docker volume mounted to all containers that need file access
|
||
- API, Worker, and Gotenberg all access same files
|
||
- Simplifies file exchange between services
|
||
|
||
## 3. Detailed Workflow
|
||
|
||
### **Upload Process:**
|
||
|
||
```python
|
||
@app.post("/upload")
|
||
async def upload_file(file):
|
||
# 1. Save PPTX to shared volume
|
||
file_path = save_to_disk(file) # e.g., /shared/uploads/abc123.pptx
|
||
|
||
# 2. DB entry for original file
|
||
file_record = db.create_media_file({
|
||
'filename': file.filename,
|
||
'original_path': file_path,
|
||
'file_type': 'pptx'
|
||
})
|
||
|
||
# 3. Create conversion record
|
||
conversion = db.create_conversion({
|
||
'source_file_id': file_record.id,
|
||
'target_format': 'pdf',
|
||
'status': 'pending',
|
||
'file_hash': calculate_hash(file_path)
|
||
})
|
||
|
||
# 4. Enqueue job (asynchronous!)
|
||
queue.enqueue(convert_to_pdf_via_gotenberg, conversion.id)
|
||
|
||
# 5. Return immediately to user
|
||
return {
|
||
'file_id': file_record.id,
|
||
'status': 'uploaded',
|
||
'conversion_status': 'pending'
|
||
}
|
||
```
|
||
|
||
### **Worker Process (calls Gotenberg):**
|
||
|
||
```python
|
||
import requests
|
||
import os
|
||
|
||
GOTENBERG_URL = os.getenv('GOTENBERG_URL', 'http://gotenberg:3000')
|
||
|
||
def convert_to_pdf_via_gotenberg(conversion_id):
|
||
conversion = db.get_conversion(conversion_id)
|
||
source_file = db.get_media_file(conversion.source_file_id)
|
||
|
||
# Status update: processing
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'processing',
|
||
'started_at': now()
|
||
})
|
||
|
||
try:
|
||
# Prepare output path
|
||
pdf_filename = f"{conversion.id}.pdf"
|
||
pdf_path = f"/shared/converted/{pdf_filename}"
|
||
|
||
# Call Gotenberg API
|
||
# Gotenberg accesses the file via shared volume
|
||
with open(source_file.original_path, 'rb') as f:
|
||
files = {
|
||
'files': (os.path.basename(source_file.original_path), f)
|
||
}
|
||
|
||
response = requests.post(
|
||
f'{GOTENBERG_URL}/forms/libreoffice/convert',
|
||
files=files,
|
||
timeout=300 # 5 minutes timeout
|
||
)
|
||
response.raise_for_status()
|
||
|
||
# Save PDF to shared volume
|
||
with open(pdf_path, 'wb') as pdf_file:
|
||
pdf_file.write(response.content)
|
||
|
||
# Success
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'ready',
|
||
'target_path': pdf_path,
|
||
'completed_at': now()
|
||
})
|
||
|
||
except requests.exceptions.Timeout:
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'failed',
|
||
'error_message': 'Conversion timeout after 5 minutes',
|
||
'completed_at': now()
|
||
})
|
||
except requests.exceptions.RequestException as e:
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'failed',
|
||
'error_message': f'Gotenberg API error: {str(e)}',
|
||
'completed_at': now()
|
||
})
|
||
except Exception as e:
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'failed',
|
||
'error_message': str(e),
|
||
'completed_at': now()
|
||
})
|
||
```
|
||
|
||
### **Alternative: Direct File Access via Shared Volume**
|
||
|
||
If you prefer Gotenberg to read from shared storage directly (more efficient for large files):
|
||
|
||
```python
|
||
def convert_to_pdf_via_gotenberg_shared(conversion_id):
|
||
conversion = db.get_conversion(conversion_id)
|
||
source_file = db.get_media_file(conversion.source_file_id)
|
||
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'processing',
|
||
'started_at': now()
|
||
})
|
||
|
||
try:
|
||
pdf_filename = f"{conversion.id}.pdf"
|
||
pdf_path = f"/shared/converted/{pdf_filename}"
|
||
|
||
# Gotenberg reads directly from shared volume
|
||
# We just tell it where to find the file
|
||
with open(source_file.original_path, 'rb') as f:
|
||
files = {'files': f}
|
||
|
||
response = requests.post(
|
||
f'{GOTENBERG_URL}/forms/libreoffice/convert',
|
||
files=files,
|
||
timeout=300
|
||
)
|
||
response.raise_for_status()
|
||
|
||
# Write result to shared volume
|
||
with open(pdf_path, 'wb') as pdf_file:
|
||
pdf_file.write(response.content)
|
||
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'ready',
|
||
'target_path': pdf_path,
|
||
'completed_at': now()
|
||
})
|
||
|
||
except Exception as e:
|
||
db.update_conversion(conversion_id, {
|
||
'status': 'failed',
|
||
'error_message': str(e),
|
||
'completed_at': now()
|
||
})
|
||
```
|
||
|
||
### **Client Download:**
|
||
|
||
```python
|
||
@app.get("/files/{file_id}/display")
|
||
async def get_display_file(file_id):
|
||
file = db.get_media_file(file_id)
|
||
|
||
# Only for PPTX: check PDF conversion
|
||
if file.file_type == 'pptx':
|
||
conversion = db.get_latest_conversion(file.id, target_format='pdf')
|
||
|
||
if not conversion:
|
||
# Shouldn't happen, but just to be safe
|
||
trigger_new_conversion(file.id)
|
||
return {'status': 'pending', 'message': 'Conversion is being created'}
|
||
|
||
if conversion.status == 'ready':
|
||
# Serve PDF from shared storage
|
||
return FileResponse(conversion.target_path)
|
||
|
||
elif conversion.status == 'failed':
|
||
# Optional: Auto-retry
|
||
trigger_new_conversion(file.id)
|
||
return {'status': 'failed', 'error': conversion.error_message}
|
||
|
||
else: # pending or processing
|
||
return {'status': conversion.status, 'message': 'Please wait...'}
|
||
|
||
# Serve other file types directly
|
||
return FileResponse(file.original_path)
|
||
```
|
||
|
||
## 4. Docker Setup
|
||
|
||
```yaml
|
||
version: '3.8'
|
||
|
||
services:
|
||
# Your API Server
|
||
api:
|
||
build: ./api
|
||
command: uvicorn main:app --host 0.0.0.0 --port 8000
|
||
ports:
|
||
- "8000:8000"
|
||
volumes:
|
||
- shared-storage:/shared # Shared volume
|
||
environment:
|
||
- REDIS_URL=redis://redis:6379
|
||
- DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
|
||
- GOTENBERG_URL=http://gotenberg:3000
|
||
depends_on:
|
||
- redis
|
||
- postgres
|
||
- gotenberg
|
||
restart: unless-stopped
|
||
|
||
# Worker (same codebase as API, different command)
|
||
worker:
|
||
build: ./api # Same build as API!
|
||
command: python worker.py # or: rq worker
|
||
volumes:
|
||
- shared-storage:/shared # Shared volume
|
||
environment:
|
||
- REDIS_URL=redis://redis:6379
|
||
- DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
|
||
- GOTENBERG_URL=http://gotenberg:3000
|
||
depends_on:
|
||
- redis
|
||
- postgres
|
||
- gotenberg
|
||
restart: unless-stopped
|
||
# Optional: Multiple workers
|
||
deploy:
|
||
replicas: 2
|
||
|
||
# Gotenberg - Document Conversion Service
|
||
gotenberg:
|
||
image: gotenberg/gotenberg:8
|
||
# Gotenberg doesn't need the shared volume if files are sent via HTTP
|
||
# But mount it if you want direct file access
|
||
volumes:
|
||
- shared-storage:/shared # Optional: for direct file access
|
||
environment:
|
||
# Gotenberg configuration
|
||
- GOTENBERG_API_TIMEOUT=300s
|
||
- GOTENBERG_LOG_LEVEL=info
|
||
restart: unless-stopped
|
||
# Resource limits (optional but recommended)
|
||
deploy:
|
||
resources:
|
||
limits:
|
||
cpus: '2.0'
|
||
memory: 2G
|
||
reservations:
|
||
cpus: '0.5'
|
||
memory: 512M
|
||
|
||
# Redis - separate container
|
||
redis:
|
||
image: redis:7-alpine
|
||
volumes:
|
||
- redis-data:/data
|
||
command: redis-server --appendonly yes
|
||
restart: unless-stopped
|
||
|
||
# Your existing Postgres
|
||
postgres:
|
||
image: postgres:15
|
||
environment:
|
||
- POSTGRES_DB=infoscreen
|
||
- POSTGRES_PASSWORD=password
|
||
volumes:
|
||
- postgres-data:/var/lib/postgresql/data
|
||
restart: unless-stopped
|
||
|
||
# Optional: Redis Commander (UI for debugging)
|
||
redis-commander:
|
||
image: rediscommander/redis-commander
|
||
environment:
|
||
- REDIS_HOSTS=local:redis:6379
|
||
ports:
|
||
- "8081:8081"
|
||
depends_on:
|
||
- redis
|
||
|
||
volumes:
|
||
shared-storage: # New: Shared storage for all file operations
|
||
redis-data:
|
||
postgres-data:
|
||
```
|
||
|
||
## 5. Storage Structure
|
||
|
||
```
|
||
/shared/
|
||
├── uploads/ # Original uploaded files (PPTX, etc.)
|
||
│ ├── abc123.pptx
|
||
│ ├── def456.pptx
|
||
│ └── ...
|
||
└── converted/ # Converted PDF files
|
||
├── uuid-1.pdf
|
||
├── uuid-2.pdf
|
||
└── ...
|
||
```
|
||
|
||
## 6. Gotenberg Integration Details
|
||
|
||
### **Gotenberg API Endpoints:**
|
||
|
||
Gotenberg provides various conversion endpoints:
|
||
|
||
```python
|
||
# LibreOffice conversion (for PPTX, DOCX, ODT, etc.)
|
||
POST http://gotenberg:3000/forms/libreoffice/convert
|
||
|
||
# HTML to PDF
|
||
POST http://gotenberg:3000/forms/chromium/convert/html
|
||
|
||
# Markdown to PDF
|
||
POST http://gotenberg:3000/forms/chromium/convert/markdown
|
||
|
||
# Merge PDFs
|
||
POST http://gotenberg:3000/forms/pdfengines/merge
|
||
```
|
||
|
||
### **Example Conversion Request:**
|
||
|
||
```python
|
||
import requests
|
||
|
||
def convert_with_gotenberg(input_file_path, output_file_path):
|
||
"""
|
||
Convert document using Gotenberg
|
||
"""
|
||
with open(input_file_path, 'rb') as f:
|
||
files = {
|
||
'files': (os.path.basename(input_file_path), f,
|
||
'application/vnd.openxmlformats-officedocument.presentationml.presentation')
|
||
}
|
||
|
||
# Optional: Add conversion parameters
|
||
data = {
|
||
'landscape': 'false', # Portrait mode
|
||
'nativePageRanges': '1-', # All pages
|
||
}
|
||
|
||
response = requests.post(
|
||
'http://gotenberg:3000/forms/libreoffice/convert',
|
||
files=files,
|
||
data=data,
|
||
timeout=300
|
||
)
|
||
|
||
if response.status_code == 200:
|
||
with open(output_file_path, 'wb') as out:
|
||
out.write(response.content)
|
||
return True
|
||
else:
|
||
raise Exception(f"Gotenberg error: {response.status_code} - {response.text}")
|
||
```
|
||
|
||
### **Advanced Options:**
|
||
|
||
```python
|
||
# With custom PDF properties
|
||
data = {
|
||
'landscape': 'false',
|
||
'nativePageRanges': '1-10', # Only first 10 pages
|
||
'pdfFormat': 'PDF/A-1a', # PDF/A format
|
||
'exportFormFields': 'false',
|
||
}
|
||
|
||
# With password protection
|
||
data = {
|
||
'userPassword': 'secret123',
|
||
'ownerPassword': 'admin456',
|
||
}
|
||
```
|
||
|
||
## 7. Client Behavior (Pi5)
|
||
|
||
```python
|
||
# On the Pi5 client
|
||
def display_file(file_id):
|
||
response = api.get(f"/files/{file_id}/display")
|
||
|
||
if response.content_type == 'application/pdf':
|
||
# PDF is ready
|
||
download_and_display(response)
|
||
subprocess.run(['impressive', downloaded_pdf])
|
||
|
||
elif response.json()['status'] in ['pending', 'processing']:
|
||
# Wait and retry
|
||
show_loading_screen("Presentation is being prepared...")
|
||
time.sleep(5)
|
||
display_file(file_id) # Retry
|
||
|
||
else:
|
||
# Error
|
||
show_error_screen("Error loading presentation")
|
||
```
|
||
|
||
## 8. Additional Features
|
||
|
||
### **Cache Invalidation on PPTX Update:**
|
||
|
||
```python
|
||
@app.put("/files/{file_id}")
|
||
async def update_file(file_id, new_file):
|
||
# Delete old conversions and PDFs
|
||
conversions = db.get_conversions_for_file(file_id)
|
||
for conv in conversions:
|
||
if conv.target_path and os.path.exists(conv.target_path):
|
||
os.remove(conv.target_path)
|
||
|
||
db.mark_conversions_as_obsolete(file_id)
|
||
|
||
# Update file
|
||
update_media_file(file_id, new_file)
|
||
|
||
# Trigger new conversion
|
||
trigger_conversion(file_id, 'pdf')
|
||
```
|
||
|
||
### **Status API for Monitoring:**
|
||
|
||
```python
|
||
@app.get("/admin/conversions/status")
|
||
async def get_conversion_stats():
|
||
return {
|
||
'pending': db.count(status='pending'),
|
||
'processing': db.count(status='processing'),
|
||
'failed': db.count(status='failed'),
|
||
'avg_duration_seconds': db.avg_duration(),
|
||
'gotenberg_health': check_gotenberg_health()
|
||
}
|
||
|
||
def check_gotenberg_health():
|
||
try:
|
||
response = requests.get(
|
||
f'{GOTENBERG_URL}/health',
|
||
timeout=5
|
||
)
|
||
return response.status_code == 200
|
||
except:
|
||
return False
|
||
```
|
||
|
||
### **Cleanup Job (Cronjob):**
|
||
|
||
```python
|
||
def cleanup_old_conversions():
|
||
# Remove PDFs from deleted files
|
||
orphaned = db.get_orphaned_conversions()
|
||
for conv in orphaned:
|
||
if conv.target_path and os.path.exists(conv.target_path):
|
||
os.remove(conv.target_path)
|
||
db.delete_conversion(conv.id)
|
||
|
||
# Clean up old failed conversions
|
||
old_failed = db.get_old_failed_conversions(older_than_days=7)
|
||
for conv in old_failed:
|
||
db.delete_conversion(conv.id)
|
||
```
|
||
|
||
## 9. Advantages of Using Gotenberg
|
||
|
||
✅ **Specialized Service**: Optimized specifically for document conversion
|
||
✅ **No LibreOffice Management**: Gotenberg handles LibreOffice lifecycle internally
|
||
✅ **Better Resource Management**: Isolated conversion process
|
||
✅ **HTTP API**: Clean, standard interface
|
||
✅ **Production Ready**: Battle-tested, actively maintained
|
||
✅ **Multiple Formats**: Supports PPTX, DOCX, ODT, HTML, Markdown, etc.
|
||
✅ **PDF Features**: Merge, encrypt, watermark PDFs
|
||
✅ **Health Checks**: Built-in health endpoint
|
||
✅ **Horizontal Scaling**: Can run multiple Gotenberg instances
|
||
✅ **Memory Safe**: Automatic cleanup and restart on issues
|
||
|
||
## 10. Migration Path
|
||
|
||
### **Phase 1 (MVP):**
|
||
- 1 worker process in API container
|
||
- Redis for queue (separate container)
|
||
- Gotenberg for conversion (separate container)
|
||
- Basic DB schema
|
||
- Shared volume for file exchange
|
||
- Simple retry logic
|
||
|
||
### **Phase 2 (as needed):**
|
||
- Multiple worker instances
|
||
- Multiple Gotenberg instances (load balancing)
|
||
- Monitoring & alerting
|
||
- Prioritization logic
|
||
- Advanced caching strategies
|
||
- PDF optimization/compression
|
||
|
||
**Start simple, scale when needed!**
|
||
|
||
## 11. Key Decisions Summary
|
||
|
||
| Aspect | Decision | Reason |
|
||
|--------|----------|--------|
|
||
| **Conversion Location** | Server-side (Gotenberg) | One conversion per file, consistent results |
|
||
| **Conversion Service** | Dedicated Gotenberg container | Specialized, production-ready, better isolation |
|
||
| **Conversion Timing** | Asynchronous (on upload) | No client waiting time, predictable performance |
|
||
| **Data Storage** | Database-tracked | Status visibility, robust error handling |
|
||
| **File Exchange** | Shared Docker volume | Simple, efficient, no network overhead |
|
||
| **Queue System** | Redis (separate container) | Standard pattern, scalable, maintainable |
|
||
| **Worker Architecture** | Background process in API container | Simple start, easy to separate later |
|
||
|
||
## 12. File Flow Diagram
|
||
|
||
```
|
||
┌─────────────┐
|
||
│ User Upload │
|
||
│ (PPTX) │
|
||
└──────┬──────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ API Server │
|
||
│ 1. Save to /shared │
|
||
│ 2. Create DB record │
|
||
│ 3. Enqueue job │
|
||
└──────┬───────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────┐
|
||
│ Redis Queue │
|
||
└──────┬───────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ Worker Process │
|
||
│ 1. Get job │
|
||
│ 2. Call Gotenberg │
|
||
│ 3. Update DB │
|
||
└──────┬───────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ Gotenberg │
|
||
│ 1. Read from /shared │
|
||
│ 2. Convert PPTX │
|
||
│ 3. Return PDF │
|
||
└──────┬───────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ Worker saves PDF │
|
||
│ to /shared/converted│
|
||
└──────┬───────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────┐
|
||
│ Client Requests │
|
||
│ 1. Check DB │
|
||
│ 2. Download PDF │
|
||
│ 3. Display │
|
||
└──────────────────────┘
|
||
(via impressive)
|
||
```
|
||
|
||
## 13. Implementation Checklist
|
||
|
||
### Database Setup
|
||
- [ ] Create `media_files` table
|
||
- [ ] Create `conversions` table
|
||
- [ ] Add indexes for performance
|
||
- [ ] Set up foreign key constraints
|
||
|
||
### Storage Setup
|
||
- [ ] Create shared Docker volume
|
||
- [ ] Set up directory structure (/shared/uploads, /shared/converted)
|
||
- [ ] Configure proper permissions
|
||
|
||
### API Changes
|
||
- [ ] Modify upload endpoint to save to shared storage
|
||
- [ ] Create DB records for uploads
|
||
- [ ] Add conversion job enqueueing
|
||
- [ ] Implement file download endpoint with status checking
|
||
- [ ] Add status API for monitoring
|
||
- [ ] Implement cache invalidation on file update
|
||
|
||
### Worker Setup
|
||
- [ ] Create worker script/module
|
||
- [ ] Implement Gotenberg API calls
|
||
- [ ] Add error handling and retry logic
|
||
- [ ] Set up logging and monitoring
|
||
- [ ] Handle timeouts and failures
|
||
|
||
### Docker Configuration
|
||
- [ ] Add Gotenberg container to docker-compose.yml
|
||
- [ ] Add Redis container to docker-compose.yml
|
||
- [ ] Configure worker container
|
||
- [ ] Set up shared volume mounts
|
||
- [ ] Configure environment variables
|
||
- [ ] Set up container dependencies
|
||
- [ ] Configure resource limits for Gotenberg
|
||
|
||
### Client Updates
|
||
- [ ] Modify client to check conversion status
|
||
- [ ] Implement retry logic for pending conversions
|
||
- [ ] Add loading/waiting screens
|
||
- [ ] Implement error handling
|
||
|
||
### Testing
|
||
- [ ] Test upload → conversion → download flow
|
||
- [ ] Test multiple concurrent conversions
|
||
- [ ] Test error handling (corrupted PPTX, etc.)
|
||
- [ ] Test Gotenberg timeout handling
|
||
- [ ] Test cache invalidation on file update
|
||
- [ ] Load test with multiple clients
|
||
- [ ] Test Gotenberg health checks
|
||
|
||
### Monitoring & Operations
|
||
- [ ] Set up logging for conversions
|
||
- [ ] Monitor Gotenberg health endpoint
|
||
- [ ] Implement cleanup job for old files
|
||
- [ ] Add metrics for conversion times
|
||
- [ ] Set up alerts for failed conversions
|
||
- [ ] Monitor shared storage disk usage
|
||
- [ ] Document backup procedures
|
||
|
||
### Security
|
||
- [ ] Validate file types before conversion
|
||
- [ ] Set file size limits
|
||
- [ ] Sanitize filenames
|
||
- [ ] Implement rate limiting
|
||
- [ ] Secure inter-container communication
|
||
|
||
## 14. Gotenberg Configuration Options
|
||
|
||
### **Environment Variables:**
|
||
|
||
```yaml
|
||
gotenberg:
|
||
image: gotenberg/gotenberg:8
|
||
environment:
|
||
# API Configuration
|
||
- GOTENBERG_API_TIMEOUT=300s
|
||
- GOTENBERG_API_PORT=3000
|
||
|
||
# Logging
|
||
- GOTENBERG_LOG_LEVEL=info # debug, info, warn, error
|
||
|
||
# LibreOffice
|
||
- GOTENBERG_LIBREOFFICE_DISABLE_ROUTES=false
|
||
- GOTENBERG_LIBREOFFICE_AUTO_START=true
|
||
|
||
# Chromium (if needed for HTML/Markdown)
|
||
- GOTENBERG_CHROMIUM_DISABLE_ROUTES=true # Disable if not needed
|
||
|
||
# Resource limits
|
||
- GOTENBERG_LIBREOFFICE_MAX_QUEUE_SIZE=100
|
||
```
|
||
|
||
### **Custom Gotenberg Configuration:**
|
||
|
||
For advanced configurations, create a `gotenberg.yml`:
|
||
|
||
```yaml
|
||
api:
|
||
timeout: 300s
|
||
port: 3000
|
||
|
||
libreoffice:
|
||
autoStart: true
|
||
maxQueueSize: 100
|
||
|
||
chromium:
|
||
disableRoutes: true
|
||
```
|
||
|
||
Mount it in docker-compose:
|
||
|
||
```yaml
|
||
gotenberg:
|
||
image: gotenberg/gotenberg:8
|
||
volumes:
|
||
- ./gotenberg.yml:/etc/gotenberg/config.yml:ro
|
||
- shared-storage:/shared
|
||
```
|
||
|
||
## 15. Troubleshooting
|
||
|
||
### **Common Issues:**
|
||
|
||
**Gotenberg timeout:**
|
||
```python
|
||
# Increase timeout for large files
|
||
response = requests.post(
|
||
f'{GOTENBERG_URL}/forms/libreoffice/convert',
|
||
files=files,
|
||
timeout=600 # 10 minutes for large PPTX
|
||
)
|
||
```
|
||
|
||
**Memory issues:**
|
||
```yaml
|
||
# Increase Gotenberg memory limit
|
||
gotenberg:
|
||
deploy:
|
||
resources:
|
||
limits:
|
||
memory: 4G
|
||
```
|
||
|
||
**File permission issues:**
|
||
```bash
|
||
# Ensure proper permissions on shared volume
|
||
chmod -R 755 /shared
|
||
chown -R 1000:1000 /shared
|
||
```
|
||
|
||
**Gotenberg not responding:**
|
||
```python
|
||
# Check health before conversion
|
||
def ensure_gotenberg_healthy():
|
||
try:
|
||
response = requests.get(f'{GOTENBERG_URL}/health', timeout=5)
|
||
if response.status_code != 200:
|
||
raise Exception("Gotenberg unhealthy")
|
||
except Exception as e:
|
||
logger.error(f"Gotenberg health check failed: {e}")
|
||
raise
|
||
```
|
||
|
||
---
|
||
|
||
**This architecture provides a production-ready, scalable solution using Gotenberg as a specialized conversion service with efficient file sharing via Docker volumes!**
|
||
|
||
## 16. Best Practices Specific to Infoscreen
|
||
|
||
- Idempotency by content: Always compute a SHA‑256 of the uploaded source and include it in the unique key (source_event_media_id, target_format, file_hash). This prevents duplicate work for identical content and auto-busts cache on change.
|
||
- Strict MIME/type validation: Accept only .ppt, .pptx, .odp for conversion. Reject unknown types early. Consider reading the first bytes (magic) for extra safety.
|
||
- Bounded retries with jitter: Retry conversions on transient HTTP 5xx or timeouts up to N times with exponential backoff. Do not retry on 4xx or clear user errors.
|
||
- Output naming: Derive deterministic output paths under media/converted/, e.g., <basename>.pdf. Ensure no path traversal and sanitize names.
|
||
- Timeouts and size limits: Enforce server-side max upload size and per-job conversion timeout (e.g., 10 minutes). Return clear errors for oversized/long-running files.
|
||
- Isolation and quotas: Set CPU/memory limits for Gotenberg; consider a concurrency cap per worker to avoid DB starvation.
|
||
- Health probes before work: Check Gotenberg /health prior to enqueue spikes; fail-fast to avoid queue pile-ups when Gotenberg is down.
|
||
- Observability: Log job IDs, file hashes, durations, and sizes. Expose a small /api/conversions/status summary for operational visibility.
|
||
- Cleanup policy: Periodically delete orphaned conversions (media deleted) and failed jobs older than X days. Keep successful PDFs aligned with DB rows.
|
||
- Security: Never trust client paths; always resolve relative to the known media root. Do not expose the shared volume directly; serve via API only.
|
||
- Backpressure: If queue length exceeds a threshold, surface 503/“try later” on new uploads or pause enqueue to protect the system.
|