feat(conversions): end-to-end PPT/PPTX/ODP -> PDF pipeline with RQ worker + Gotenberg

DB/model

Add Conversion model + ConversionStatus enum (pending, processing, ready, failed)
Alembic migrations: create conversions table, indexes, unique (source_event_media_id, target_format, file_hash), and NOT NULL on file_hash
API

Enqueue on upload (ppt|pptx|odp) in routes/eventmedia.py: compute sha256, upsert Conversion, enqueue job
New routes:
POST /api/conversions/<media_id>/pdf — ensure/enqueue conversion
GET /api/conversions/<media_id>/status — latest status/details
GET /api/files/converted/<path> — serve converted PDFs
Register conversions blueprint in wsgi
Worker

server/worker.py: convert_event_media_to_pdf
Calls Gotenberg /forms/libreoffice/convert, writes to server/media/converted/
Updates Conversion status, timestamps, error messages
Fix media root resolution to /server/media
Prefer function enqueue over string path; expose server.worker in package init for RQ string compatibility
Queue/infra

server/task_queue.py: RQ queue helper (REDIS_URL, default redis://redis:6379/0)
docker-compose:
Add redis and gotenberg services
Add worker service (rq worker conversions)
Pass REDIS_URL and GOTENBERG_URL to server/worker
Mount shared media volume in prod for API/worker parity
docker-compose.override:
Add dev redis/gotenberg/worker services
Ensure PYTHONPATH + working_dir allow importing server.worker
Use rq CLI instead of python -m rq for worker
Dashboard dev: run as appropriate user/root and pre-create/chown caches to avoid EACCES
Dashboard dev UX

Vite: set cacheDir .vite to avoid EACCES in node_modules
Disable Node inspector by default to avoid port conflicts
Docs

Update copilot-instructions.md with conversion system: flow, services, env vars, endpoints, storage paths, and data model
This commit is contained in:
2025-10-07 19:06:09 +00:00
parent 80bf8bc58d
commit fcc0dfbb0f
20 changed files with 1809 additions and 422 deletions

View File

@@ -227,3 +227,45 @@ class SchoolHoliday(Base):
"source_file_name": self.source_file_name,
"imported_at": self.imported_at.isoformat() if self.imported_at else None,
}
# --- Conversions: Track PPT/PPTX/ODP -> PDF processing state ---
class ConversionStatus(enum.Enum):
pending = "pending"
processing = "processing"
ready = "ready"
failed = "failed"
class Conversion(Base):
__tablename__ = 'conversions'
id = Column(Integer, primary_key=True, autoincrement=True)
# Source media to be converted
source_event_media_id = Column(
Integer,
ForeignKey('event_media.id', ondelete='CASCADE'),
nullable=False,
index=True,
)
target_format = Column(String(10), nullable=False,
index=True) # e.g. 'pdf'
# relative to server/media
target_path = Column(String(512), nullable=True)
status = Column(Enum(ConversionStatus), nullable=False,
default=ConversionStatus.pending)
file_hash = Column(String(64), nullable=False) # sha256 of source file
started_at = Column(TIMESTAMP(timezone=True), nullable=True)
completed_at = Column(TIMESTAMP(timezone=True), nullable=True)
error_message = Column(Text, nullable=True)
__table_args__ = (
# Fast lookup per media/format
Index('ix_conv_source_target', 'source_event_media_id', 'target_format'),
# Operational filtering
Index('ix_conv_status_target', 'status', 'target_format'),
# Idempotency: same source + target + file content should be unique
UniqueConstraint('source_event_media_id', 'target_format',
'file_hash', name='uq_conv_source_target_hash'),
)