Files

olaf fcc0dfbb0f feat(conversions): end-to-end PPT/PPTX/ODP -> PDF pipeline with RQ worker + Gotenberg

DB/model

Add Conversion model + ConversionStatus enum (pending, processing, ready, failed)
Alembic migrations: create conversions table, indexes, unique (source_event_media_id, target_format, file_hash), and NOT NULL on file_hash
API

Enqueue on upload (ppt|pptx|odp) in routes/eventmedia.py: compute sha256, upsert Conversion, enqueue job
New routes:
POST /api/conversions/<media_id>/pdf — ensure/enqueue conversion
GET /api/conversions/<media_id>/status — latest status/details
GET /api/files/converted/<path> — serve converted PDFs
Register conversions blueprint in wsgi
Worker

server/worker.py: convert_event_media_to_pdf
Calls Gotenberg /forms/libreoffice/convert, writes to server/media/converted/
Updates Conversion status, timestamps, error messages
Fix media root resolution to /server/media
Prefer function enqueue over string path; expose server.worker in package init for RQ string compatibility
Queue/infra

server/task_queue.py: RQ queue helper (REDIS_URL, default redis://redis:6379/0)
docker-compose:
Add redis and gotenberg services
Add worker service (rq worker conversions)
Pass REDIS_URL and GOTENBERG_URL to server/worker
Mount shared media volume in prod for API/worker parity
docker-compose.override:
Add dev redis/gotenberg/worker services
Ensure PYTHONPATH + working_dir allow importing server.worker
Use rq CLI instead of python -m rq for worker
Dashboard dev: run as appropriate user/root and pre-create/chown caches to avoid EACCES
Dashboard dev UX

Vite: set cacheDir .vite to avoid EACCES in node_modules
Disable Node inspector by default to avoid port conflicts
Docs

Update copilot-instructions.md with conversion system: flow, services, env vars, endpoints, storage paths, and data model

2025-10-07 19:06:09 +00:00

13 KiB

Raw Blame History

Recommended Implementation: PPTX-to-PDF Conversion System

Architecture Overview

Asynchronous server-side conversion with database tracking

User Upload → API saves PPTX + DB entry → Job in Queue 
                                                ↓
Client requests → API checks DB status → PDF ready? → Download PDF
                                       → Pending? → "Please wait"
                                       → Failed? → Retry/Error

1. Database Schema

CREATE TABLE media_files (
    id UUID PRIMARY KEY,
    filename VARCHAR(255),
    original_path VARCHAR(512),
    file_type VARCHAR(10),
    mime_type VARCHAR(100),
    uploaded_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE conversions (
    id UUID PRIMARY KEY,
    source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
    target_format VARCHAR(10),          -- 'pdf'
    target_path VARCHAR(512),           -- Path to generated PDF
    status VARCHAR(20),                 -- 'pending', 'processing', 'ready', 'failed'
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    error_message TEXT,
    file_hash VARCHAR(64)               -- Hash of PPTX for cache invalidation
);

CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);

2. Components

API Server (existing)

Accepts uploads
Creates DB entries
Enqueues jobs
Delivers status and files

Background Worker (new)

Runs as separate process in same container as API
Processes conversion jobs from queue
Can run multiple worker instances in parallel
Technology: Python RQ, Celery, or similar

Message Queue

Redis (recommended for start - simple, fast)
Alternative: RabbitMQ for more features

Redis Container (new)

Separate container for Redis
Handles job queue
Minimal resource footprint

3. Detailed Workflow

Upload Process:

@app.post("/upload")
async def upload_file(file):
    # 1. Save PPTX
    file_path = save_to_disk(file)
    
    # 2. DB entry for original file
    file_record = db.create_media_file({
        'filename': file.filename,
        'original_path': file_path,
        'file_type': 'pptx'
    })
    
    # 3. Create conversion record
    conversion = db.create_conversion({
        'source_file_id': file_record.id,
        'target_format': 'pdf',
        'status': 'pending',
        'file_hash': calculate_hash(file_path)
    })
    
    # 4. Enqueue job (asynchronous!)
    queue.enqueue(convert_to_pdf, conversion.id)
    
    # 5. Return immediately to user
    return {
        'file_id': file_record.id,
        'status': 'uploaded',
        'conversion_status': 'pending'
    }

Worker Process:

def convert_to_pdf(conversion_id):
    conversion = db.get_conversion(conversion_id)
    source_file = db.get_media_file(conversion.source_file_id)
    
    # Status update: processing
    db.update_conversion(conversion_id, {
        'status': 'processing',
        'started_at': now()
    })
    
    try:
        # LibreOffice Conversion
        pdf_path = f"/data/converted/{conversion.id}.pdf"
        subprocess.run([
            'libreoffice',
            '--headless',
            '--convert-to', 'pdf',
            '--outdir', '/data/converted/',
            source_file.original_path
        ], check=True)
        
        # Success
        db.update_conversion(conversion_id, {
            'status': 'ready',
            'target_path': pdf_path,
            'completed_at': now()
        })
        
    except Exception as e:
        # Error
        db.update_conversion(conversion_id, {
            'status': 'failed',
            'error_message': str(e),
            'completed_at': now()
        })

Client Download:

@app.get("/files/{file_id}/display")
async def get_display_file(file_id):
    file = db.get_media_file(file_id)
    
    # Only for PPTX: check PDF conversion
    if file.file_type == 'pptx':
        conversion = db.get_latest_conversion(file.id, target_format='pdf')
        
        if not conversion:
            # Shouldn't happen, but just to be safe
            trigger_new_conversion(file.id)
            return {'status': 'pending', 'message': 'Conversion is being created'}
        
        if conversion.status == 'ready':
            return FileResponse(conversion.target_path)
        
        elif conversion.status == 'failed':
            # Optional: Auto-retry
            trigger_new_conversion(file.id)
            return {'status': 'failed', 'error': conversion.error_message}
        
        else:  # pending or processing
            return {'status': conversion.status, 'message': 'Please wait...'}
    
    # Serve other file types directly
    return FileResponse(file.original_path)

4. Docker Setup

version: '3.8'

services:
  # Your API Server
  api:
    build: ./api
    command: uvicorn main:app --host 0.0.0.0 --port 8000
    ports:
      - "8000:8000"
    volumes:
      - ./data/uploads:/data/uploads
      - ./data/converted:/data/converted
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
    depends_on:
      - redis
      - postgres
    restart: unless-stopped

  # Worker (same codebase as API, different command)
  worker:
    build: ./api  # Same build as API!
    command: python worker.py  # or: rq worker
    volumes:
      - ./data/uploads:/data/uploads
      - ./data/converted:/data/converted
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
    depends_on:
      - redis
      - postgres
    restart: unless-stopped
    # Optional: Multiple workers
    deploy:
      replicas: 2

  # Redis - separate container
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    # Optional: persistent configuration
    command: redis-server --appendonly yes
    restart: unless-stopped

  # Your existing Postgres
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=infoscreen
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    restart: unless-stopped

  # Optional: Redis Commander (UI for debugging)
  redis-commander:
    image: rediscommander/redis-commander
    environment:
      - REDIS_HOSTS=local:redis:6379
    ports:
      - "8081:8081"
    depends_on:
      - redis

volumes:
  redis-data:
  postgres-data:

5. Container Communication

Containers communicate via Docker's internal network:

# In your API/Worker code:
import redis

# Connection to Redis
redis_client = redis.from_url('redis://redis:6379')
#                              ^^^^^^ 
#                              Container name = hostname in Docker network

Docker automatically creates DNS entries, so redis resolves to the Redis container.

6. Client Behavior (Pi5)

# On the Pi5 client
def display_file(file_id):
    response = api.get(f"/files/{file_id}/display")
    
    if response.content_type == 'application/pdf':
        # PDF is ready
        download_and_display(response)
        subprocess.run(['impressive', downloaded_pdf])
    
    elif response.json()['status'] in ['pending', 'processing']:
        # Wait and retry
        show_loading_screen("Presentation is being prepared...")
        time.sleep(5)
        display_file(file_id)  # Retry
    
    else:
        # Error
        show_error_screen("Error loading presentation")

7. Additional Features

Cache Invalidation on PPTX Update:

@app.put("/files/{file_id}")
async def update_file(file_id, new_file):
    # Delete old conversions
    db.mark_conversions_as_obsolete(file_id)
    
    # Update file
    update_media_file(file_id, new_file)
    
    # Trigger new conversion
    trigger_conversion(file_id, 'pdf')

Status API for Monitoring:

@app.get("/admin/conversions/status")
async def get_conversion_stats():
    return {
        'pending': db.count(status='pending'),
        'processing': db.count(status='processing'),
        'failed': db.count(status='failed'),
        'avg_duration_seconds': db.avg_duration()
    }

Cleanup Job (Cronjob):

def cleanup_old_conversions():
    # Remove PDFs from deleted files
    db.delete_orphaned_conversions()
    
    # Clean up old failed conversions
    db.delete_old_failed_conversions(older_than_days=7)

8. Redis Container Details

Why Separate Container?

✅ Separation of Concerns: Each service has its own responsibility
✅ Independent Lifecycle Management: Redis can be restarted/updated independently
✅ Better Scaling: Redis can be moved to different hardware
✅ Easier Backup: Redis data can be backed up separately
✅ Standard Docker Pattern: Microservices architecture

Resource Usage:

RAM: ~10-50 MB for your use case
CPU: Minimal
Disk: Only for persistence (optional)

For 10 clients with occasional PPTX uploads, this is absolutely no problem.

9. Advantages of This Solution

✅ Scalable: Workers can be scaled horizontally
✅ Performant: Clients don't wait for conversion
✅ Robust: Status tracking and error handling
✅ Maintainable: Clear separation of responsibilities
✅ Transparent: Status queryable at any time
✅ Efficient: One-time conversion per file
✅ Future-proof: Easily extensible for other formats
✅ Professional: Industry-standard architecture

10. Migration Path

Phase 1 (MVP):

1 worker process in API container
Redis for queue (separate container)
Basic DB schema
Simple retry logic

Phase 2 (as needed):

Multiple worker instances
Dedicated conversion service container
Monitoring & alerting
Prioritization logic
Advanced caching strategies

Start simple, scale when needed!

11. Key Decisions Summary

Aspect	Decision	Reason
Conversion Location	Server-side	One conversion per file, consistent results
Conversion Timing	Asynchronous (on upload)	No client waiting time, predictable performance
Data Storage	Database-tracked	Status visibility, robust error handling
Queue System	Redis (separate container)	Standard pattern, scalable, maintainable
Worker Architecture	Background process in API container	Simple start, easy to separate later

12. File Flow Diagram

┌─────────────┐
│ User Upload │
│   (PPTX)    │
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│   API Server     │
│ 1. Save PPTX     │
│ 2. Create DB rec │
│ 3. Enqueue job   │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│  Redis Queue     │◄─────┐
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│  Worker Process  │      │
│ 1. Get job       │      │
│ 2. Convert PPTX  │      │
│ 3. Update DB     │      │
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│   PDF Storage    │      │
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│ Client Requests  │      │
│ 1. Check DB      │      │
│ 2. Download PDF  │      │
│ 3. Display       │──────┘
└──────────────────┘
  (via impressive)

13. Implementation Checklist

Database Setup

Create media_files table
Create conversions table
Add indexes for performance
Set up foreign key constraints

API Changes

Modify upload endpoint to create DB records
Add conversion job enqueueing
Implement file download endpoint with status checking
Add status API for monitoring
Implement cache invalidation on file update

Worker Setup

Create worker script/module
Implement LibreOffice conversion logic
Add error handling and retry logic
Set up logging and monitoring

Docker Configuration

Add Redis container to docker-compose.yml
Configure worker container
Set up volume mounts for file storage
Configure environment variables
Set up container dependencies

Client Updates

Modify client to check conversion status
Implement retry logic for pending conversions
Add loading/waiting screens
Implement error handling

Testing

Test upload → conversion → download flow
Test multiple concurrent conversions
Test error handling (corrupted PPTX, etc.)
Test cache invalidation on file update
Load test with multiple clients

Monitoring & Operations

Set up logging for conversions
Implement cleanup job for old files
Add metrics for conversion times
Set up alerts for failed conversions
Document backup procedures

This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!

13 KiB Raw Blame History