video-upscaling/README.md

# Local AI Video Upscaling on Ubuntu (RTX 4070 Super)

This project uses:
- `ffmpeg` / `ffprobe` for demux/remux and frame handling
- `Real-ESRGAN (PyTorch + CUDA)` as default upscaler backend on NVIDIA GPUs
- optional legacy `realesrgan-ncnn-vulkan` backend
- `upscale_video.py` as Python controller/orchestrator

## 1) System setup (Ubuntu)

### NVIDIA driver + Vulkan check
```bash
nvidia-smi
vulkaninfo | head
```

If missing tools:
```bash
sudo apt update
sudo apt install -y ffmpeg vulkan-tools mesa-vulkan-drivers unzip wget
```

Install/update NVIDIA driver with Ubuntu tooling if needed:
```bash
sudo ubuntu-drivers autoinstall
sudo reboot
```

## 2) Install Real-ESRGAN backend

### Default (recommended): PyTorch + CUDA

Inside project venv:
```bash
python -m pip install -r requirements.txt
python -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
```

The script automatically downloads model weights on first run into `~/.cache/realesrgan`.

### Optional legacy: ncnn-vulkan binary

### a) Download and install binary

Download `realesrgan-ncnn-vulkan` Linux release from the official Real-ESRGAN releases page,
extract it, and add the binary folder to your `PATH`.

Example:
```bash
mkdir -p ~/tools && cd ~/tools
# Replace URL with latest Linux release zip from official Real-ESRGAN releases
wget <REAL_ESRGAN_LINUX_RELEASE_ZIP_URL> -O realesrgan.zip
unzip realesrgan.zip -d realesrgan
REAL_ESRGAN_DIR="$(find "$HOME/tools/realesrgan" -maxdepth 2 -type f -name realesrgan-ncnn-vulkan -printf '%h\n' | head -n 1)"
echo "Found binary dir: $REAL_ESRGAN_DIR"
echo "export PATH=\"$REAL_ESRGAN_DIR:\$PATH\"" >> ~/.bashrc
source ~/.bashrc
command -v realesrgan-ncnn-vulkan
realesrgan-ncnn-vulkan -h
```

If the command is still not found, test directly with absolute path:
```bash
find "$HOME/tools/realesrgan" -maxdepth 3 -type f -name realesrgan-ncnn-vulkan
```

### b) Download model files

The binary needs model files to work. Download them from the official Real-ESRGAN repository:

```bash
cd ~/tools/realesrgan
mkdir -p models
cd models

# Download all common models
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesr-animevideov3.pth
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth

# For ncnn-vulkan, you need the converted .param and .bin files
# Download pre-converted models:
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesr-animevideov3-x2.pth -O realesr-animevideov3-x2.pth
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesr-animevideov3-x3.pth -O realesr-animevideov3-x3.pth
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesr-animevideov3-x4.pth -O realesr-animevideov3-x4.pth
```

**Important**: The ncnn-vulkan version needs specific model formats. If models still fail, download the complete model pack:
```bash
cd ~/tools/realesrgan
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesrgan-ncnn-vulkan-20220424-ubuntu.zip
unzip -j realesrgan-ncnn-vulkan-20220424-ubuntu.zip "realesrgan-ncnn-vulkan/models/*" -d models/
```

## 3) Create and use the Python environment

Recommended (pyenv + venv, avoids PEP668/system-pip issues):
```bash
cd /home/admin_n/python/video-upscaling
pyenv install -s 3.11.14
pyenv shell 3.11.14
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
```

The script works without `tqdm`, but with dependencies installed you get a clean single-line progress bar.

Note: Python `3.14` currently fails with `basicsr/realesrgan` build errors. Use Python `3.11.x`.

## 4) Run the Python controller

From this project directory:
```bash
python3 upscale_video.py \
  -i input.mp4 \
  -o output_upscaled.mp4 \
  --backend pytorch \
  --model realesrgan-x4plus \
  --scale 2
```

PyTorch backend uses `.pth` model weights. You can pass a custom weight file via `--model-path /path/model.pth`.
For legacy ncnn backend, pass `--backend ncnn` plus your existing ncnn binary/model setup.

By default, temporary working files are created on `/mnt/winsteam`.
Override if needed with `--temp-root /some/other/path`.

By default, GPU selection uses `--gpu-id auto`.
To force a specific Vulkan GPU, pass e.g. `--gpu-id 0`.

### Useful options
- `--model realesr-animevideov3` for animation/anime-like sources
- `--model realesrgan-x4plus` for natural/live-action footage
- `--backend pytorch|ncnn` choose upscaler backend (default `pytorch`)
- `--model-path /path/to/model.pth` for custom PyTorch weight file
- `--weights-dir ~/.cache/realesrgan` where auto-downloaded PyTorch weights are stored
- `--scale 2|3|4`
- `--tile-size 128` (or 256) if you hit VRAM limits
- `--jobs 2:2:2` to tune throughput (ncnn backend only)
- `--crf 14` for higher output quality (bigger file)
- `--keep-temp` to keep extracted and processed frame directories
- `--temp-root /mnt/winsteam` for temp workspace location
- `--gpu-id auto` (or `--gpu-id 0`, `--gpu-id 1`, etc.)
- `--fp32` for PyTorch FP32 inference (default is FP16 on CUDA)
- `--test-seconds 60` to process only first N seconds for validation
- `--pre-vf "hqdn3d=1.5:1.5:6:6"` to denoise/deblock before upscaling

During upscaling, the script prints live status every ~2 seconds:
- processed/total frames
- percentage
- current average fps
- ETA (remaining time)

**Note**: Audio is automatically re-encoded to AAC 192kbps for maximum compatibility.

**Progress display**: Real-ESRGAN's verbose output is suppressed. The script shows clean progress with tqdm (if installed) or simple periodic updates otherwise.

**Aspect ratio handling**: Input frames are normalized to square pixels before upscaling (`SAR=1`).
For anamorphic sources (for example 720x576 PAL 16:9), this avoids “squeezed” frame geometry.
The conversion uses non-cropping width expansion, so source frame content is preserved.

## 5) Typical tuning for RTX 4070 Super

Start with:
- `--scale 2`
- `--jobs 2:2:2`
- `--tile-size 0`

If you see memory errors, lower memory pressure using:
- `--tile-size 128`
- `--jobs 1:2:2`

## 6) Optional quality upgrades

For best final quality, you can output with HEVC:
```bash
python3 upscale_video.py -i input.mp4 -o output_hevc.mp4 --codec libx265 --crf 18
```

## 7) GPU ID mapping (`nvidia-smi` vs Vulkan `-g`)

This section is mainly relevant for legacy `ncnn` backend. PyTorch backend usually follows CUDA GPU indexing.

Check NVIDIA GPUs:
```bash
nvidia-smi --query-gpu=index,name,uuid,pci.bus_id --format=csv,noheader
```

Check Vulkan devices:
```bash
vulkaninfo --summary
```

Match by GPU name or UUID:
- `nvidia-smi` UUID format: `GPU-xxxxxxxx-...`
- `vulkaninfo` UUID format: `xxxxxxxx-...` (same value without `GPU-` prefix)

Example from this setup:
- Vulkan `GPU0` = `NVIDIA GeForce RTX 4070 SUPER`
- `nvidia-smi` index `1` = `NVIDIA GeForce RTX 4070 SUPER`

So for the RTX 4070 SUPER here, use Vulkan ID `-g 0`.

With this script, that is:
```bash
python3 upscale_video.py -i input.mp4 -o output.mp4 --gpu-id 0
```

Quick test run example (first 60 seconds only):
```bash
python3 upscale_video.py -i input.mp4 -o output_test.mp4 --model realesrgan-x4plus --model-path ~/tools/realesrgan/models --scale 2 --gpu-id 0 --test-seconds 60
```

---

If you want, this can be extended with:
- batch-folder processing
- automatic model selection by content type
- optional frame interpolation (RIFE) for smoother motion