feat(conversions): end-to-end PPT/PPTX/ODP -> PDF pipeline with RQ worker + Gotenberg

DB/model

Add Conversion model + ConversionStatus enum (pending, processing, ready, failed)
Alembic migrations: create conversions table, indexes, unique (source_event_media_id, target_format, file_hash), and NOT NULL on file_hash
API

Enqueue on upload (ppt|pptx|odp) in routes/eventmedia.py: compute sha256, upsert Conversion, enqueue job
New routes:
POST /api/conversions/<media_id>/pdf — ensure/enqueue conversion
GET /api/conversions/<media_id>/status — latest status/details
GET /api/files/converted/<path> — serve converted PDFs
Register conversions blueprint in wsgi
Worker

server/worker.py: convert_event_media_to_pdf
Calls Gotenberg /forms/libreoffice/convert, writes to server/media/converted/
Updates Conversion status, timestamps, error messages
Fix media root resolution to /server/media
Prefer function enqueue over string path; expose server.worker in package init for RQ string compatibility
Queue/infra

server/task_queue.py: RQ queue helper (REDIS_URL, default redis://redis:6379/0)
docker-compose:
Add redis and gotenberg services
Add worker service (rq worker conversions)
Pass REDIS_URL and GOTENBERG_URL to server/worker
Mount shared media volume in prod for API/worker parity
docker-compose.override:
Add dev redis/gotenberg/worker services
Ensure PYTHONPATH + working_dir allow importing server.worker
Use rq CLI instead of python -m rq for worker
Dashboard dev: run as appropriate user/root and pre-create/chown caches to avoid EACCES
Dashboard dev UX

Vite: set cacheDir .vite to avoid EACCES in node_modules
Disable Node inspector by default to avoid port conflicts
Docs

Update copilot-instructions.md with conversion system: flow, services, env vars, endpoints, storage paths, and data model
This commit is contained in:
2025-10-07 19:06:09 +00:00
parent 80bf8bc58d
commit fcc0dfbb0f
20 changed files with 1809 additions and 422 deletions

View File

@@ -1,7 +1,10 @@
from re import A
from flask import Blueprint, request, jsonify, send_from_directory
from server.database import Session
from models.models import EventMedia, MediaType
from models.models import EventMedia, MediaType, Conversion, ConversionStatus
from server.task_queue import get_queue
from server.worker import convert_event_media_to_pdf
import hashlib
import os
eventmedia_bp = Blueprint('eventmedia', __name__, url_prefix='/api/eventmedia')
@@ -134,6 +137,41 @@ def filemanager_upload():
uploaded_at=datetime.now(timezone.utc)
)
session.add(media)
session.commit()
# Enqueue conversion for office presentation types
if media_type in {MediaType.ppt, MediaType.pptx, MediaType.odp}:
# compute file hash
h = hashlib.sha256()
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b""):
h.update(chunk)
file_hash = h.hexdigest()
# upsert Conversion row
conv = (
session.query(Conversion)
.filter_by(
source_event_media_id=media.id,
target_format='pdf',
file_hash=file_hash,
)
.one_or_none()
)
if not conv:
conv = Conversion(
source_event_media_id=media.id,
target_format='pdf',
status=ConversionStatus.pending,
file_hash=file_hash,
)
session.add(conv)
session.commit()
if conv.status in {ConversionStatus.pending, ConversionStatus.failed}:
q = get_queue()
q.enqueue(convert_event_media_to_pdf, conv.id)
session.commit()
return jsonify({'success': True})