GitHub

A Customized Version of the Original SUPIR Project

Further customized from: https://github.com/yushan777/SUPIR-Demo

Deployment Steps:

For Pod Mode:

Push Docker image to registry (Docker Hub, etc.)
In RunPod:

Create new Pod
Select your custom image
Attach network volume at /workspace
Set ports: 7860 for Gradio
Launch with environment: MODE_TO_RUN=pod

For Serverless:

Same Docker image
In RunPod:

Create Serverless Endpoint
Use same image
Attach network volume
Set environment: MODE_TO_RUN=serverless

--

Removed the heavy LLaVA implementation.
Added safetensors support.
Updated dependencies.
Replaced SoftMax with SDPA for default attention.
Removed use_linear_control_scale (linear_s_stage2) and use_linear_cfg_scale (linear_CFG) arguments.
- Uses the start and end scale values to determine whether linear scaling will be used/have effect or not.
Renamed arguments to make settings a bit more intuitive (more alignment with kijai's SUPIR ComfyUI custom nodes)
- spt_linear_CFG -> cfg_scale_start
- s_cfg -> cfg_scale_end
- spt_linear_s_stage2 -> control_scale_start
- s_stage2 -> control_scale_end
Added --skip_denoise_stage argument to bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (before sampling stage) since you do not want artifacts to be considered detail to be enhanced. You might want to skip this step if your image is already high quality.
Refactor: Renamed symbol upsacle in original code to upscale
Moved CLIP paths to a yaml config file.
Exposed sampler_tile_size and sampler_tile_stride to make them overridable when using TiledRestoreEDMSampler
SUPIR Settings saved into PNGInfo metadata
Parallel processing for Tiled VAE encoding/decoding
Improved memory management. On each run, it clears unused GPU (VRAM), cleans up Python's leftover crap, and releases unused RAM back to the system (Linux only).

Installation

Prerequisites:

Python 3.12
Git

Clone repo

git clone https://github.com/yushan777/SUPIR-Demo.git
cd SUPIR-Demo

# For Linux only
chmod +x *.sh

Install Environment

# Linux
./install_linux_local.sh

# Linux (Vast.ai)
./install_vastai.sh

# Windows
install_win_local.bat

Download Models

You can download the models at the same time while the venv is being installed (in a separate terminal)

# Linux
./download_models.sh

# Windows
download_models.bat

Manually Downloading The Models

ℹ️ See more information

If you prefer to Download the models manually or in your own time below are the links.
Additionally, if you already have these models then you can simply symlink them to the locations to save on storage space.

SmolVLM-500M-Instruct

For captioning input image in the Gradio demo.

SmolVLM-500M-Instruct Place all files into models/SmolVLM-500M-Instruct

SUPIR Models

Unless you have more than 24GB of VRAM, you should download the FP16 variants FP16 Versions

SUPIR-v0Q (FP16)
SUPIR-v0F (FP16)
Download and place the model files in the models/SUPIR/ directory.

FP32 Versions

SUPIR-v0Q (FP32)
SUPIR-v0F (FP32)
Download and place the model files in the models/SUPIR/ directory.

CLIP Models

CLIP Encoder-1
Place in models/CLIP1
CLIP Encoder-2
Place in models/CLIP2

SDXL Model

Juggernaut-XL_v9_RunDiffusionPhoto_v2
Place in models/SDXL
You can use your own preferred SDXL Model. One that specialises in realism, photographic will work better.

There are two SUPIR model variants: v0Q and v0F.

SUPIR-v0Q The v0Q model (Quality) is trained on a wide range of degradations, making it robust and effective across varied real-world scenarios. However, this broad generalization comes at a cost—when applied to images with only mild degradation, v0Q might overcompensate, hallucinate or alter details that are already mostly intact. This behavior stems from its training bias toward assuming significant visual damage.
SUPIR-v0F In contrast, the v0F model (Fidelity) is specifically trained on lighter degradation patterns. Its Stage1 encoder is tuned to better preserve fine details and structure, resulting in restorations that are more faithful to the input when the degradation is minimal. As a result, v0F is the preferred choice for high-fidelity restoration where subtle preservation is more critical than aggressive enhancement.

If necessary, edit Custom Path for Checkpoints. Otherwise leave these alone.

* [options/SUPIR_v0.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F.
* [options/SUPIR_v0_tiled.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F.

Gradio Demo

source venv/bin/activate
python3 run_supir_gradio.py

# or you can start it with the bash script (contains the above two commands)
chmod +x launch_gradio.sh
./launch_gradio.sh

Default Settings

Default Settings can be set in the file defaults.json. If it doesn't exist, just copy and rename defaults_example.json

CLI Demo

# for cli test
python3 run_supir.py --img_path 'input/bottle.png' --save_dir ./output --SUPIR_sign Q --upscale 2 --use_tile_vae --loading_half_params

python3 run_supir.py \
--img_path 'input/woman-low-res-sq.jpg' \
--save_dir ./output \
--SUPIR_sign Q \
--upscale 2 \
--seed 1234567891 \
--img_caption 'A woman has dark brown eyes, dark curly hair wearing a dark scarf on her head. She is wearing a black shirt on with a pattern on it. The wall behind her is brown and green.' \
--edm_steps=50 \
--s_churn=5 \
--cfg_scale_start=2.0 \
--cfg_scale_end=4.0 \
--control_scale_start=0.9 \
--control_scale_end=0.9 \
--loading_half_params \
--use_tile_vae

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.

Processing Times / Memory Usage

Sampler: TiledRestoreEDMSampler
Tiled VAE: True
Number of Workers: 1
Linux, 64GB RAM

Upscale	4090 Time	4090 VRAM	4080 Time	4080 VRAM	4070 Time	4070 VRAM
2x	111 secs	14.0GB	227 secs	13.7GB	301 secs	11.7GB
3x	315 secs	14.1GB	475 secs	13.8GB	652 secs	11.7GB
4x	606 secs	14.6GB	910 secs	13.9GB	1625 secs	11.7GB
5x	992 secs	15.0GB	1492 secs	14.6GB	OOM	OOM

Arguments

Argument	Description
`img_path`	Path to the input image. (required)
`save_dir`	Directory to save the output.
`SUPIR_sign`	Model type. Options: `['F', 'Q']` Default: `'Q'` Q model (Quality) Trained on diverse, heavy degradations, making it robust for real-world damage. However, it may overcorrect or hallucinate when used on lightly degraded images due to its bias toward severe restoration. F model (Fidelity) Optimized for mild degradations, preserving fine details and structure. Ideal for high-fidelity tasks where subtle restoration is preferred over aggressive enhancement.
`skip_denoise_stage`	Skips the VAE Denoiser Stage. Default: `'False'` Bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (if you inspected it at this stage). This is to avoid SUPIR treating low-res/compression artifacts as detail to be enhanced. You may wish to skip this step if: - 1) You want do do your own pre-processing OR - 2) Input image is clean and free of low-res/compression artifacts or other degradations - Can sometimes make closeups of skin textures a bit unnatural.
`sampler_mode`	Sampler choice. Options: `['TiledRestoreEDMSampler', 'RestoreEDMSampler']` Default: `'TiledRestoreEDMSampler' (uses less VRAM)`
`seed`	Random seed for reproducibility. Default: `1234`
`Use Upscale to..`	If on, use `Update to width` and `Update to height` values for upscaling. If off, then `Upscale by` factor will be used.
`Upscale to width`	Upscale input image width to specified dimension if `Use Upscale to..` is on. Minimum: 1024
`Upscale to height`	Upscale input image height to specified dimension if `Use Upscale to..` is on. Minimum: 1024
`Upscale by`	Upscale factor for the input image. Default: `2` Upscaling of the input image is performed before the denoising and sampling stage. Both dimensions are multiplied by the upscale value. If the smaller of the dimensions is still < 1024px, the image is further enlarged to minimum of 1024px (aspect ratio maintained).
`***`	Notes about Upscaling: The reason for the minimum of 1024 is to give SDXL a comfortable working resolution. Note that dimensions are snapped to the nearest multiple of 64. The sweet spot seems to be between 2x and 4x (1024x1024) or 4x and 8x (512x512). Beyond that, the quality begins to collapse. The higher the scale factor, the slower the process.
`min_size`	Minimum output resolution. Default: `1024`
`num_samples`	Number of images to generate per input. Default: `1`
`img_caption`	Specific caption for the input image. Default: `''` This caption is combined with `a_prompt`.
`a_prompt`	Additional positive prompt (appended to input caption). Default: `Cinematic, High Contrast, highly detailed, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations.`
`n_prompt`	Negative prompt. Default: `painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth`
`edm_steps`	Number of diffusion steps. Default: `50`
`s_churn`	controls how much extra randomness is added during the process. This helps the model explore more options and avoid getting stuck on a limited result. Default: `5` `0`: No noise (deterministic) `1–5`: Mild/moderate `6–10+`: Strong
`s_noise`	Scales s_churn noise strength. Default: `1.003` Slightly < 1: More stable Slightly > 1: More variation
`cfg_scale_start`	Prompt guidance strength start. Default: `2.0`
`cfg_scale_end`	Prompt guidance strength end. Default: `4` `1.0`: Weak (ignores prompt) `7.5`: Strong (follows prompt closely) If `cfg_scale_start` and `cfg_scale_end` have the same value, no scaling occurs. When these values differ, linear scheduling is applied from start to end. They can also be reversed for creative strategies.
`control_scale_start`	Structural guidance from input image, start strength. Default: `0.9`
`control_scale_end`	Structural guidance from input image, end strength. Default: `0.9` `0.0`: Disabled `0.1–0.5`: Light `0.6–1.0`: Balanced (default) `1.1–1.5+`: Very strong Same value = fixed. Different values = scheduled.
`restoration_scale`	Early-stage restoration strength. Controls how strongly the model pulls the structure of the output image back toward the original image. Only applies during the early stages of sampling when the noise level is high. Default: `0` (disabled).
`color_fix_type`	Color adjustment method. Default: `'Wavelet'` Options: `['None', 'AdaIn', 'Wavelet']`
`loading_half_params`	Loads the SUPIR model weights in half precision (FP16). Default: `False` Reduces VRAM usage and increases speed at the cost of slight precision loss.
`diff_dtype`	Precision to use for the diffusion model only. Allows overriding default precision independently, unless `loading_half_params` is set. Default: `'fp16'` Options: `['fp32', 'fp16', 'bf16']`
`ae_dtype`	Autoencoder precision. Default: `'bf16'` Options: `['fp32', 'bf16']`
`use_tile_vae`	Enables tile-based encoding/decoding for memory efficiency with large images. Default: `False`
`encoder_tile_size`	Tile size when encoding (when `use_tile_vae` is enabled). TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available. Encoder: - VRAM > 16GB: 3072 - VRAM > 12GB: 2048 - VRAM > 8GB: 1536 - VRAM <= 8GB: 960 - No GPU: 512
`decoder_tile_size`	Tile size when encoding (when `use_tile_vae` is enabled). TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available. Decoder: - VRAM > 30GB: 256 - VRAM > 16GB: 192 - VRAM > 12GB: 128 - VRAM > 8GB: 96 - VRAM <= 8GB: 64 - No GPU: 64
`Number of Workers`	Number of parallel CPU processes for VAE encoding/decoding. Improves speed on multi-core CPUs by efficiently preparing data for the GPU. Default: `4`
`sampler_tile_size`	Tile size for `TiledRestoreEDMSampler`. This is the size of each tile that the image is divided into during tiled sampling. Example: `tile_size` of 128 → image is split into 128×128 pixel tiles.
`sampler_tile_stride`	Tile stride for `TiledRestoreEDMSampler`. Controls overlap between tiles during sampling. Smaller `tile_stride` = more overlap, better blending, more compute. Larger `tile_stride` = less overlap, faster, may cause seams. `Overlap = tile_size - tile_stride` Examples: - tile_size = 128, stride = 64 → 64 px overlap.

Images from Pixabay
Original SUPIR Repository
Kijai's SUPIR Custom Nodes for ComfyUI

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
SUPIR		SUPIR
Y7		Y7
configs/clip1		configs/clip1
input		input
models		models
options		options
sgm		sgm
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SDXL_MODEL_SELECTION.md		SDXL_MODEL_SELECTION.md
build_and_push.sh		build_and_push.sh
defaults_example.json		defaults_example.json
handler.py		handler.py
launch_gradio.sh		launch_gradio.sh
memory-mon.sh		memory-mon.sh
requirements.txt		requirements.txt
run_supir_cli.py		run_supir_cli.py
run_supir_gradio.py		run_supir_gradio.py
start.sh		start.sh
test_attention_comparison.py		test_attention_comparison.py
test_full_comparison.py		test_full_comparison.py
test_model_comparison.py		test_model_comparison.py
test_sdxl_selection.py		test_sdxl_selection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Customized Version of the Original SUPIR Project

Further customized from: https://github.com/yushan777/SUPIR-Demo

Installation

Prerequisites:

Clone repo

Install Environment

Download Models

Manually Downloading The Models

SmolVLM-500M-Instruct

SUPIR Models

CLIP Models

SDXL Model

Gradio Demo

Default Settings

CLI Demo

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.

Processing Times / Memory Usage

Arguments

About

Uh oh!

Releases

Packages

Languages

License

NotAmaan/suptest

Folders and files

Latest commit

History

Repository files navigation

A Customized Version of the Original SUPIR Project

Further customized from: https://github.com/yushan777/SUPIR-Demo

Installation

Prerequisites:

Clone repo

Install Environment

Download Models

Manually Downloading The Models

SmolVLM-500M-Instruct

SUPIR Models

CLIP Models

SDXL Model

Gradio Demo

Default Settings

CLI Demo

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.

Processing Times / Memory Usage

Arguments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages