infoscreen/pptx_conversion_guide.md

# Recommended Implementation: PPTX-to-PDF Conversion System

## Architecture Overview

**Asynchronous server-side conversion with database tracking**

```
User Upload → API saves PPTX + DB entry → Job in Queue
                                                ↓
Client requests → API checks DB status → PDF ready? → Download PDF
                                       → Pending? → "Please wait"
                                       → Failed? → Retry/Error
```

## 1. Database Schema

```sql
CREATE TABLE media_files (
    id UUID PRIMARY KEY,
    filename VARCHAR(255),
    original_path VARCHAR(512),
    file_type VARCHAR(10),
    mime_type VARCHAR(100),
    uploaded_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE conversions (
    id UUID PRIMARY KEY,
    source_file_id UUID REFERENCES media_files(id) ON DELETE CASCADE,
    target_format VARCHAR(10),          -- 'pdf'
    target_path VARCHAR(512),           -- Path to generated PDF
    status VARCHAR(20),                 -- 'pending', 'processing', 'ready', 'failed'
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    error_message TEXT,
    file_hash VARCHAR(64)               -- Hash of PPTX for cache invalidation
);

CREATE INDEX idx_conversions_source ON conversions(source_file_id, target_format);
```

## 2. Components

### **API Server (existing)**
- Accepts uploads
- Creates DB entries
- Enqueues jobs
- Delivers status and files

### **Background Worker (new)**
- Runs as separate process in **same container** as API
- Processes conversion jobs from queue
- Can run multiple worker instances in parallel
- Technology: Python RQ, Celery, or similar

### **Message Queue**
- Redis (recommended for start - simple, fast)
- Alternative: RabbitMQ for more features

### **Redis Container (new)**
- Separate container for Redis
- Handles job queue
- Minimal resource footprint

## 3. Detailed Workflow

### **Upload Process:**

```python
@app.post("/upload")
async def upload_file(file):
    # 1. Save PPTX
    file_path = save_to_disk(file)

    # 2. DB entry for original file
    file_record = db.create_media_file({
        'filename': file.filename,
        'original_path': file_path,
        'file_type': 'pptx'
    })

    # 3. Create conversion record
    conversion = db.create_conversion({
        'source_file_id': file_record.id,
        'target_format': 'pdf',
        'status': 'pending',
        'file_hash': calculate_hash(file_path)
    })

    # 4. Enqueue job (asynchronous!)
    queue.enqueue(convert_to_pdf, conversion.id)

    # 5. Return immediately to user
    return {
        'file_id': file_record.id,
        'status': 'uploaded',
        'conversion_status': 'pending'
    }
```

### **Worker Process:**

```python
def convert_to_pdf(conversion_id):
    conversion = db.get_conversion(conversion_id)
    source_file = db.get_media_file(conversion.source_file_id)

    # Status update: processing
    db.update_conversion(conversion_id, {
        'status': 'processing',
        'started_at': now()
    })

    try:
        # LibreOffice Conversion
        pdf_path = f"/data/converted/{conversion.id}.pdf"
        subprocess.run([
            'libreoffice',
            '--headless',
            '--convert-to', 'pdf',
            '--outdir', '/data/converted/',
            source_file.original_path
        ], check=True)

        # Success
        db.update_conversion(conversion_id, {
            'status': 'ready',
            'target_path': pdf_path,
            'completed_at': now()
        })

    except Exception as e:
        # Error
        db.update_conversion(conversion_id, {
            'status': 'failed',
            'error_message': str(e),
            'completed_at': now()
        })
```

### **Client Download:**

```python
@app.get("/files/{file_id}/display")
async def get_display_file(file_id):
    file = db.get_media_file(file_id)

    # Only for PPTX: check PDF conversion
    if file.file_type == 'pptx':
        conversion = db.get_latest_conversion(file.id, target_format='pdf')

        if not conversion:
            # Shouldn't happen, but just to be safe
            trigger_new_conversion(file.id)
            return {'status': 'pending', 'message': 'Conversion is being created'}

        if conversion.status == 'ready':
            return FileResponse(conversion.target_path)

        elif conversion.status == 'failed':
            # Optional: Auto-retry
            trigger_new_conversion(file.id)
            return {'status': 'failed', 'error': conversion.error_message}

        else:  # pending or processing
            return {'status': conversion.status, 'message': 'Please wait...'}

    # Serve other file types directly
    return FileResponse(file.original_path)
```

## 4. Docker Setup

```yaml
version: '3.8'

services:
  # Your API Server
  api:
    build: ./api
    command: uvicorn main:app --host 0.0.0.0 --port 8000
    ports:
      - "8000:8000"
    volumes:
      - ./data/uploads:/data/uploads
      - ./data/converted:/data/converted
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
    depends_on:
      - redis
      - postgres
    restart: unless-stopped

  # Worker (same codebase as API, different command)
  worker:
    build: ./api  # Same build as API!
    command: python worker.py  # or: rq worker
    volumes:
      - ./data/uploads:/data/uploads
      - ./data/converted:/data/converted
    environment:
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/infoscreen
    depends_on:
      - redis
      - postgres
    restart: unless-stopped
    # Optional: Multiple workers
    deploy:
      replicas: 2

  # Redis - separate container
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data
    # Optional: persistent configuration
    command: redis-server --appendonly yes
    restart: unless-stopped

  # Your existing Postgres
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=infoscreen
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
    restart: unless-stopped

  # Optional: Redis Commander (UI for debugging)
  redis-commander:
    image: rediscommander/redis-commander
    environment:
      - REDIS_HOSTS=local:redis:6379
    ports:
      - "8081:8081"
    depends_on:
      - redis

volumes:
  redis-data:
  postgres-data:
```

## 5. Container Communication

Containers communicate via **Docker's internal network**:

```python
# In your API/Worker code:
import redis

# Connection to Redis
redis_client = redis.from_url('redis://redis:6379')
#                              ^^^^^^
#                              Container name = hostname in Docker network
```

Docker automatically creates DNS entries, so `redis` resolves to the Redis container.

## 6. Client Behavior (Pi5)

```python
# On the Pi5 client
def display_file(file_id):
    response = api.get(f"/files/{file_id}/display")

    if response.content_type == 'application/pdf':
        # PDF is ready
        download_and_display(response)
        subprocess.run(['impressive', downloaded_pdf])

    elif response.json()['status'] in ['pending', 'processing']:
        # Wait and retry
        show_loading_screen("Presentation is being prepared...")
        time.sleep(5)
        display_file(file_id)  # Retry

    else:
        # Error
        show_error_screen("Error loading presentation")
```

## 7. Additional Features

### **Cache Invalidation on PPTX Update:**

```python
@app.put("/files/{file_id}")
async def update_file(file_id, new_file):
    # Delete old conversions
    db.mark_conversions_as_obsolete(file_id)

    # Update file
    update_media_file(file_id, new_file)

    # Trigger new conversion
    trigger_conversion(file_id, 'pdf')
```

### **Status API for Monitoring:**

```python
@app.get("/admin/conversions/status")
async def get_conversion_stats():
    return {
        'pending': db.count(status='pending'),
        'processing': db.count(status='processing'),
        'failed': db.count(status='failed'),
        'avg_duration_seconds': db.avg_duration()
    }
```

### **Cleanup Job (Cronjob):**

```python
def cleanup_old_conversions():
    # Remove PDFs from deleted files
    db.delete_orphaned_conversions()

    # Clean up old failed conversions
    db.delete_old_failed_conversions(older_than_days=7)
```

## 8. Redis Container Details

### **Why Separate Container?**

✅ **Separation of Concerns**: Each service has its own responsibility
✅ **Independent Lifecycle Management**: Redis can be restarted/updated independently
✅ **Better Scaling**: Redis can be moved to different hardware
✅ **Easier Backup**: Redis data can be backed up separately
✅ **Standard Docker Pattern**: Microservices architecture

### **Resource Usage:**
- RAM: ~10-50 MB for your use case
- CPU: Minimal
- Disk: Only for persistence (optional)

For 10 clients with occasional PPTX uploads, this is absolutely no problem.

## 9. Advantages of This Solution

✅ **Scalable**: Workers can be scaled horizontally
✅ **Performant**: Clients don't wait for conversion
✅ **Robust**: Status tracking and error handling
✅ **Maintainable**: Clear separation of responsibilities
✅ **Transparent**: Status queryable at any time
✅ **Efficient**: One-time conversion per file
✅ **Future-proof**: Easily extensible for other formats
✅ **Professional**: Industry-standard architecture

## 10. Migration Path

### **Phase 1 (MVP):**
- 1 worker process in API container
- Redis for queue (separate container)
- Basic DB schema
- Simple retry logic

### **Phase 2 (as needed):**
- Multiple worker instances
- Dedicated conversion service container
- Monitoring & alerting
- Prioritization logic
- Advanced caching strategies

**Start simple, scale when needed!**

## 11. Key Decisions Summary

| Aspect | Decision | Reason |
|--------|----------|--------|
| **Conversion Location** | Server-side | One conversion per file, consistent results |
| **Conversion Timing** | Asynchronous (on upload) | No client waiting time, predictable performance |
| **Data Storage** | Database-tracked | Status visibility, robust error handling |
| **Queue System** | Redis (separate container) | Standard pattern, scalable, maintainable |
| **Worker Architecture** | Background process in API container | Simple start, easy to separate later |

## 12. File Flow Diagram

```
┌─────────────┐
│ User Upload │
│   (PPTX)    │
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│   API Server     │
│ 1. Save PPTX     │
│ 2. Create DB rec │
│ 3. Enqueue job   │
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│  Redis Queue     │◄─────┐
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│  Worker Process  │      │
│ 1. Get job       │      │
│ 2. Convert PPTX  │      │
│ 3. Update DB     │      │
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│   PDF Storage    │      │
└──────┬───────────┘      │
       │                  │
       ▼                  │
┌──────────────────┐      │
│ Client Requests  │      │
│ 1. Check DB      │      │
│ 2. Download PDF  │      │
│ 3. Display       │──────┘
└──────────────────┘
  (via impressive)
```

## 13. Implementation Checklist

### Database Setup
- [ ] Create `media_files` table
- [ ] Create `conversions` table
- [ ] Add indexes for performance
- [ ] Set up foreign key constraints

### API Changes
- [ ] Modify upload endpoint to create DB records
- [ ] Add conversion job enqueueing
- [ ] Implement file download endpoint with status checking
- [ ] Add status API for monitoring
- [ ] Implement cache invalidation on file update

### Worker Setup
- [ ] Create worker script/module
- [ ] Implement LibreOffice conversion logic
- [ ] Add error handling and retry logic
- [ ] Set up logging and monitoring

### Docker Configuration
- [ ] Add Redis container to docker-compose.yml
- [ ] Configure worker container
- [ ] Set up volume mounts for file storage
- [ ] Configure environment variables
- [ ] Set up container dependencies

### Client Updates
- [ ] Modify client to check conversion status
- [ ] Implement retry logic for pending conversions
- [ ] Add loading/waiting screens
- [ ] Implement error handling

### Testing
- [ ] Test upload → conversion → download flow
- [ ] Test multiple concurrent conversions
- [ ] Test error handling (corrupted PPTX, etc.)
- [ ] Test cache invalidation on file update
- [ ] Load test with multiple clients

### Monitoring & Operations
- [ ] Set up logging for conversions
- [ ] Implement cleanup job for old files
- [ ] Add metrics for conversion times
- [ ] Set up alerts for failed conversions
- [ ] Document backup procedures

---

**This architecture provides a solid foundation that's simple to start with but scales professionally as your needs grow!**