Files
infoscreen/pptx_conversion_guide.md
olaf fcc0dfbb0f feat(conversions): end-to-end PPT/PPTX/ODP -> PDF pipeline with RQ worker + Gotenberg
DB/model

Add Conversion model + ConversionStatus enum (pending, processing, ready, failed)
Alembic migrations: create conversions table, indexes, unique (source_event_media_id, target_format, file_hash), and NOT NULL on file_hash
API

Enqueue on upload (ppt|pptx|odp) in routes/eventmedia.py: compute sha256, upsert Conversion, enqueue job
New routes:
POST /api/conversions/<media_id>/pdf — ensure/enqueue conversion
GET /api/conversions/<media_id>/status — latest status/details
GET /api/files/converted/<path> — serve converted PDFs
Register conversions blueprint in wsgi
Worker

server/worker.py: convert_event_media_to_pdf
Calls Gotenberg /forms/libreoffice/convert, writes to server/media/converted/
Updates Conversion status, timestamps, error messages
Fix media root resolution to /server/media
Prefer function enqueue over string path; expose server.worker in package init for RQ string compatibility
Queue/infra

server/task_queue.py: RQ queue helper (REDIS_URL, default redis://redis:6379/0)
docker-compose:
Add redis and gotenberg services
Add worker service (rq worker conversions)
Pass REDIS_URL and GOTENBERG_URL to server/worker
Mount shared media volume in prod for API/worker parity
docker-compose.override:
Add dev redis/gotenberg/worker services
Ensure PYTHONPATH + working_dir allow importing server.worker
Use rq CLI instead of python -m rq for worker
Dashboard dev: run as appropriate user/root and pre-create/chown caches to avoid EACCES
Dashboard dev UX

Vite: set cacheDir .vite to avoid EACCES in node_modules
Disable Node inspector by default to avoid port conflicts
Docs

Update copilot-instructions.md with conversion system: flow, services, env vars, endpoints, storage paths, and data model
2025-10-07 19:06:09 +00:00

13 KiB

Recommended Implementation: PPTX-to-PDF Conversion System

Architecture Overview

Asynchronous server-side conversion with database tracking

User Upload → API saves PPTX + DB entry → Job in Queue 
                                                ↓
Client requests → API checks DB status → PDF ready? → Download PDF
                                       → Pending? → "Please wait"
                                       → Failed? → Retry/Error

1. Database Schema

CREATE TABLE media_files (
    id UUID PRIMARY KEY,
    filename VARCHAR(255),
    original_path VARCHAR(512),
    file_type VARCHAR(10),
    mime_type VARCHAR(100),
    uploaded_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE conversions (
    id UUID PRIMARY KEY,
    source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
    target_format VARCHAR(10),          -- 'pdf'
    target_path VARCHAR(512),           -- Path to generated PDF
    status VARCHAR(20),                 -- 'pending', 'processing', 'ready', 'failed'
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    error_message TEXT,
    file_hash VARCHAR(64)               -- Hash of PPTX for cache invalidation
);

CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);

2. Components

API Server (existing)

  • Accepts uploads
  • Creates DB entries
  • Enqueues jobs
  • Delivers status and files

Background Worker (new)

  • Runs as separate process in same container as API
  • Processes conversion jobs from queue
  • Can run multiple worker instances in parallel
  • Technology: Python RQ, Celery, or similar

Message Queue

  • Redis (recommended for start - simple, fast)
  • Alternative: RabbitMQ for more features

Redis Container (new)

  • Separate container for Redis
  • Handles job queue
  • Minimal resource footprint

3. Detailed Workflow

Upload Process:

@app.post("/upload")
async def upload_file(file):
    # 1. Save PPTX
    file_path = save_to_disk(file)
    
    # 2. DB entry for original file
    file_record = db.create_media_file({
        'filename': file.filename,
        'original_path': file_path,
        'file_type': 'pptx'
    })
    
    # 3. Create conversion record
    conversion = db.create_conversion({
        'source_file_id': file_record.id,
        'target_format': 'pdf',
        'status': 'pending',
        'file_hash': calculate_hash(file_path)
    })
    
    # 4. Enqueue job (asynchronous!)
    queue.enqueue(convert_to_pdf, conversion.id)
    
    # 5. Return immediately to user
    return {
        'file_id': file_record.id,
        'status': 'uploaded',
        'conversion_status': 'pending'
    }

Worker Process:

def convert_to_pdf(conversion_id):
    conversion = db.get_conversion(conversion_id)
    source_file = db.get_media_file(conversion.source_file_id)
    
    # Status update: processing
    db.update_conversion(conversion_id, {
        'status': 'processing',
        'started_at': now()
    })
    
    try:
        # LibreOffice Conversion
        pdf_path = f"/data/converted/{conversion.id}.pdf"
        subprocess.run([
            'libreoffice',
            '--headless',
            '--convert-to', 'pdf',
            '--outdir', '/data/converted/',
            source_file.original_path
        ], check=True)
        
        # Success
        db.update_conversion(conversion_id, {
            'status': 'ready',
            'target_path': pdf_path,
            'completed_at': now()
        })
        
    except Exception as e:
        # Error
        db.update_conversion(conversion_id, {
            'status': 'failed',
            'error_message': str(e),
            'completed_at': now()
        })

Client Download:

@app.get("/files/{file_id}/display")
async def get_display_file(file_id):
    file = db.get_media_file(file_id)
    
    # Only for PPTX: check PDF conversion
    if file.file_type == 'pptx':
        conversion = db.get_latest_conversion(file.id, target_format='pdf')
        
        if not conversion:
            # Shouldn't happen, but just to be safe
            trigger_new_conversion(file.id)
            return {'status': 'pending', 'message': 'Conversion is being created'}
        
        if conversion.status == 'ready':
            return FileResponse(conversion.target_path)
        
        elif conversion.status == 'failed':
            # Optional: Auto-retry
            trigger_new_conversion(file.id)
            return {'status': 'failed', 'error': conversion.error_message}
        
        else:  # pending or processing
            return {'status': conversion.status, 'message': 'Please wait...'}
    
    # Serve other file types directly
    return FileResponse(file.original_path)

4. Docker Setup

version: '3.8'

services:
  # Your API Server
  api:
    build: ./api
    command: uvicorn main:app --host 0.0.0.0 --port 8000
    ports:
      - "8000:8000"
    volumes:
      - ./data/uploads:/data/uploads
      - ./data/converted:/data/converted
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
    depends_on:
      - redis
      - postgres
    restart: unless-stopped

  # Worker (same codebase as API, different command)
  worker:
    build: ./api  # Same build as API!
    command: python worker.py  # or: rq worker
    volumes:
      - ./data/uploads:/data/uploads
      - ./data/converted:/data/converted
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
    depends_on:
      - redis
      - postgres
    restart: unless-stopped
    # Optional: Multiple workers
    deploy:
      replicas: 2

  # Redis - separate container
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    # Optional: persistent configuration
    command: redis-server --appendonly yes
    restart: unless-stopped

  # Your existing Postgres
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=infoscreen
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    restart: unless-stopped

  # Optional: Redis Commander (UI for debugging)
  redis-commander:
    image: rediscommander/redis-commander
    environment:
      - REDIS_HOSTS=local:redis:6379
    ports:
      - "8081:8081"
    depends_on:
      - redis

volumes:
  redis-data:
  postgres-data:

5. Container Communication

Containers communicate via Docker's internal network:

# In your API/Worker code:
import redis

# Connection to Redis
redis_client = redis.from_url('redis://redis:6379')
#                              ^^^^^^ 
#                              Container name = hostname in Docker network

Docker automatically creates DNS entries, so redis resolves to the Redis container.

6. Client Behavior (Pi5)

# On the Pi5 client
def display_file(file_id):
    response = api.get(f"/files/{file_id}/display")
    
    if response.content_type == 'application/pdf':
        # PDF is ready
        download_and_display(response)
        subprocess.run(['impressive', downloaded_pdf])
    
    elif response.json()['status'] in ['pending', 'processing']:
        # Wait and retry
        show_loading_screen("Presentation is being prepared...")
        time.sleep(5)
        display_file(file_id)  # Retry
    
    else:
        # Error
        show_error_screen("Error loading presentation")

7. Additional Features

Cache Invalidation on PPTX Update:

@app.put("/files/{file_id}")
async def update_file(file_id, new_file):
    # Delete old conversions
    db.mark_conversions_as_obsolete(file_id)
    
    # Update file
    update_media_file(file_id, new_file)
    
    # Trigger new conversion
    trigger_conversion(file_id, 'pdf')

Status API for Monitoring:

@app.get("/admin/conversions/status")
async def get_conversion_stats():
    return {
        'pending': db.count(status='pending'),
        'processing': db.count(status='processing'),
        'failed': db.count(status='failed'),
        'avg_duration_seconds': db.avg_duration()
    }

Cleanup Job (Cronjob):

def cleanup_old_conversions():
    # Remove PDFs from deleted files
    db.delete_orphaned_conversions()
    
    # Clean up old failed conversions
    db.delete_old_failed_conversions(older_than_days=7)

8. Redis Container Details

Why Separate Container?

Separation of Concerns: Each service has its own responsibility
Independent Lifecycle Management: Redis can be restarted/updated independently
Better Scaling: Redis can be moved to different hardware
Easier Backup: Redis data can be backed up separately
Standard Docker Pattern: Microservices architecture

Resource Usage:

  • RAM: ~10-50 MB for your use case
  • CPU: Minimal
  • Disk: Only for persistence (optional)

For 10 clients with occasional PPTX uploads, this is absolutely no problem.

9. Advantages of This Solution

Scalable: Workers can be scaled horizontally
Performant: Clients don't wait for conversion
Robust: Status tracking and error handling
Maintainable: Clear separation of responsibilities
Transparent: Status queryable at any time
Efficient: One-time conversion per file
Future-proof: Easily extensible for other formats
Professional: Industry-standard architecture

10. Migration Path

Phase 1 (MVP):

  • 1 worker process in API container
  • Redis for queue (separate container)
  • Basic DB schema
  • Simple retry logic

Phase 2 (as needed):

  • Multiple worker instances
  • Dedicated conversion service container
  • Monitoring & alerting
  • Prioritization logic
  • Advanced caching strategies

Start simple, scale when needed!

11. Key Decisions Summary

Aspect Decision Reason
Conversion Location Server-side One conversion per file, consistent results
Conversion Timing Asynchronous (on upload) No client waiting time, predictable performance
Data Storage Database-tracked Status visibility, robust error handling
Queue System Redis (separate container) Standard pattern, scalable, maintainable
Worker Architecture Background process in API container Simple start, easy to separate later

12. File Flow Diagram

┌─────────────┐
│ User Upload │
│   (PPTX)    │
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│   API Server     │
│ 1. Save PPTX     │
│ 2. Create DB rec │
│ 3. Enqueue job   │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│  Redis Queue     │◄─────┐
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│  Worker Process  │      │
│ 1. Get job       │      │
│ 2. Convert PPTX  │      │
│ 3. Update DB     │      │
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│   PDF Storage    │      │
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│ Client Requests  │      │
│ 1. Check DB      │      │
│ 2. Download PDF  │      │
│ 3. Display       │──────┘
└──────────────────┘
  (via impressive)

13. Implementation Checklist

Database Setup

  • Create media_files table
  • Create conversions table
  • Add indexes for performance
  • Set up foreign key constraints

API Changes

  • Modify upload endpoint to create DB records
  • Add conversion job enqueueing
  • Implement file download endpoint with status checking
  • Add status API for monitoring
  • Implement cache invalidation on file update

Worker Setup

  • Create worker script/module
  • Implement LibreOffice conversion logic
  • Add error handling and retry logic
  • Set up logging and monitoring

Docker Configuration

  • Add Redis container to docker-compose.yml
  • Configure worker container
  • Set up volume mounts for file storage
  • Configure environment variables
  • Set up container dependencies

Client Updates

  • Modify client to check conversion status
  • Implement retry logic for pending conversions
  • Add loading/waiting screens
  • Implement error handling

Testing

  • Test upload → conversion → download flow
  • Test multiple concurrent conversions
  • Test error handling (corrupted PPTX, etc.)
  • Test cache invalidation on file update
  • Load test with multiple clients

Monitoring & Operations

  • Set up logging for conversions
  • Implement cleanup job for old files
  • Add metrics for conversion times
  • Set up alerts for failed conversions
  • Document backup procedures

This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!