Skip to content

NotAmaan/suptest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Customized Version of the Original SUPIR Project

Deployment Steps:

For Pod Mode:

  1. Push Docker image to registry (Docker Hub, etc.)
  2. In RunPod:
  • Create new Pod
  • Select your custom image
  • Attach network volume at /workspace
  • Set ports: 7860 for Gradio
  • Launch with environment: MODE_TO_RUN=pod

For Serverless:

  1. Same Docker image
  2. In RunPod:
  • Create Serverless Endpoint
  • Use same image
  • Attach network volume
  • Set environment: MODE_TO_RUN=serverless

--

  • Removed the heavy LLaVA implementation.
  • Added safetensors support.
  • Updated dependencies.
  • Replaced SoftMax with SDPA for default attention.
  • Removed use_linear_control_scale (linear_s_stage2) and use_linear_cfg_scale (linear_CFG) arguments.
    • Uses the start and end scale values to determine whether linear scaling will be used/have effect or not.
  • Renamed arguments to make settings a bit more intuitive (more alignment with kijai's SUPIR ComfyUI custom nodes)
    • spt_linear_CFG -> cfg_scale_start
    • s_cfg -> cfg_scale_end
    • spt_linear_s_stage2 -> control_scale_start
    • s_stage2 -> control_scale_end
  • Added --skip_denoise_stage argument to bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (before sampling stage) since you do not want artifacts to be considered detail to be enhanced. You might want to skip this step if your image is already high quality.
  • Refactor: Renamed symbol upsacle in original code to upscale
  • Moved CLIP paths to a yaml config file.
  • Exposed sampler_tile_size and sampler_tile_stride to make them overridable when using TiledRestoreEDMSampler
  • SUPIR Settings saved into PNGInfo metadata
  • Parallel processing for Tiled VAE encoding/decoding
  • Improved memory management. On each run, it clears unused GPU (VRAM), cleans up Python's leftover crap, and releases unused RAM back to the system (Linux only).

Installation

Prerequisites:

  • Python 3.12
  • Git

Clone repo

git clone https://github.com/yushan777/SUPIR-Demo.git
cd SUPIR-Demo

# For Linux only
chmod +x *.sh

Install Environment

# Linux
./install_linux_local.sh

# Linux (Vast.ai)
./install_vastai.sh

# Windows
install_win_local.bat

Download Models

You can download the models at the same time while the venv is being installed (in a separate terminal)

# Linux
./download_models.sh

# Windows
download_models.bat

Manually Downloading The Models

ℹ️ See more information

If you prefer to Download the models manually or in your own time below are the links.
Additionally, if you already have these models then you can simply symlink them to the locations to save on storage space.

SmolVLM-500M-Instruct

For captioning input image in the Gradio demo.

SUPIR Models

Unless you have more than 24GB of VRAM, you should download the FP16 variants FP16 Versions

FP32 Versions

CLIP Models

SDXL Model

There are two SUPIR model variants: v0Q and v0F.

  • SUPIR-v0Q The v0Q model (Quality) is trained on a wide range of degradations, making it robust and effective across varied real-world scenarios. However, this broad generalization comes at a cost—when applied to images with only mild degradation, v0Q might overcompensate, hallucinate or alter details that are already mostly intact. This behavior stems from its training bias toward assuming significant visual damage.

  • SUPIR-v0F In contrast, the v0F model (Fidelity) is specifically trained on lighter degradation patterns. Its Stage1 encoder is tuned to better preserve fine details and structure, resulting in restorations that are more faithful to the input when the degradation is minimal. As a result, v0F is the preferred choice for high-fidelity restoration where subtle preservation is more critical than aggressive enhancement.

  1. If necessary, edit Custom Path for Checkpoints. Otherwise leave these alone.
    * [options/SUPIR_v0.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F.
    * [options/SUPIR_v0_tiled.yaml] --> SDXL_CKPT, SUPIR_CKPT_Q, SUPIR_CKPT_F.
    

Gradio Demo

source venv/bin/activate
python3 run_supir_gradio.py

# or you can start it with the bash script (contains the above two commands)
chmod +x launch_gradio.sh
./launch_gradio.sh

Default Settings

Default Settings can be set in the file defaults.json. If it doesn't exist, just copy and rename defaults_example.json

CLI Demo

# for cli test
python3 run_supir.py --img_path 'input/bottle.png' --save_dir ./output --SUPIR_sign Q --upscale 2 --use_tile_vae --loading_half_params

python3 run_supir.py \
--img_path 'input/woman-low-res-sq.jpg' \
--save_dir ./output \
--SUPIR_sign Q \
--upscale 2 \
--seed 1234567891 \
--img_caption 'A woman has dark brown eyes, dark curly hair wearing a dark scarf on her head. She is wearing a black shirt on with a pattern on it. The wall behind her is brown and green.' \
--edm_steps=50 \
--s_churn=5 \
--cfg_scale_start=2.0 \
--cfg_scale_end=4.0 \
--control_scale_start=0.9 \
--control_scale_end=0.9 \
--loading_half_params \
--use_tile_vae

Tested on Linux Mint, WSL, and Windows 11. It seems to run faster under Linux.


Processing Times / Memory Usage

Sampler: TiledRestoreEDMSampler
Tiled VAE: True
Number of Workers: 1
Linux, 64GB RAM

Upscale 4090
Time
4090
VRAM
4080
Time
4080
VRAM
4070
Time
4070
VRAM
2x 111 secs 14.0GB 227 secs 13.7GB 301 secs 11.7GB
3x 315 secs 14.1GB 475 secs 13.8GB 652 secs 11.7GB
4x 606 secs 14.6GB 910 secs 13.9GB 1625 secs 11.7GB
5x 992 secs 15.0GB 1492 secs 14.6GB OOM OOM

Arguments

Argument Description
img_path Path to the input image. (required)
save_dir Directory to save the output.
SUPIR_sign Model type. Options: ['F', 'Q']
Default: 'Q'
Q model (Quality) Trained on diverse, heavy degradations, making it robust for real-world damage. However, it may overcorrect or hallucinate when used on lightly degraded images due to its bias toward severe restoration.
F model (Fidelity) Optimized for mild degradations, preserving fine details and structure. Ideal for high-fidelity tasks where subtle restoration is preferred over aggressive enhancement.
skip_denoise_stage Skips the VAE Denoiser Stage. Default: 'False'
Bypass the artifact removal preprocessing step that uses the specialized VAE denoise encoder. This usually ends up with the image slightly softened (if you inspected it at this stage). This is to avoid SUPIR treating low-res/compression artifacts as detail to be enhanced.
You may wish to skip this step if:
- 1) You want do do your own pre-processing OR
- 2) Input image is clean and free of low-res/compression artifacts or other degradations
     - Can sometimes make closeups of skin textures a bit unnatural.
sampler_mode Sampler choice. Options: ['TiledRestoreEDMSampler', 'RestoreEDMSampler']
Default: 'TiledRestoreEDMSampler' (uses less VRAM)
seed Random seed for reproducibility. Default: 1234
Use Upscale to.. If on, use Update to width and Update to height values for upscaling. If off, then Upscale by factor will be used.
Upscale to width Upscale input image width to specified dimension if Use Upscale to.. is on.
Minimum: 1024
Upscale to height Upscale input image height to specified dimension if Use Upscale to.. is on.
Minimum: 1024
Upscale by Upscale factor for the input image.
Default: 2
Upscaling of the input image is performed before the denoising and sampling stage.
Both dimensions are multiplied by the upscale value. If the smaller of the dimensions is still < 1024px, the image is further enlarged to minimum of
1024px (aspect ratio maintained).
*** Notes about Upscaling:
The reason for the minimum of 1024 is to give SDXL a comfortable working resolution. Note that dimensions are snapped to the nearest multiple
of 64. The sweet spot seems to be between 2x and 4x (1024x1024) or 4x and 8x (512x512). Beyond that, the quality begins to collapse.
The higher the scale factor, the slower the process.
min_size Minimum output resolution. Default: 1024
num_samples Number of images to generate per input. Default: 1
img_caption Specific caption for the input image.
Default: ''
This caption is combined with a_prompt.
a_prompt Additional positive prompt (appended to input caption).
Default:
Cinematic, High Contrast, highly detailed, taken using a Canon EOS R camera, hyper detailed photo - realistic maximum detail, 32k, Color Grading, ultra HD, extreme meticulous detailing, skin pore detailing, hyper sharpness, perfect without deformations.
n_prompt Negative prompt.
Default:
painting, oil painting, illustration, drawing, art, sketch, cartoon, CG Style, 3D render, unreal engine, blurring, dirty, messy, worst quality, low quality, frames, watermark, signature, jpeg artifacts, deformed, lowres, over-smooth
edm_steps Number of diffusion steps. Default: 50
s_churn controls how much extra randomness is added during the process. This helps the model explore more options and avoid getting stuck on a limited result. Default: 5
0: No noise (deterministic)
1–5: Mild/moderate
6–10+: Strong
s_noise Scales s_churn noise strength. Default: 1.003
Slightly < 1: More stable
Slightly > 1: More variation
cfg_scale_start Prompt guidance strength start.
Default: 2.0
cfg_scale_end Prompt guidance strength end.
Default: 4
1.0: Weak (ignores prompt)
7.5: Strong (follows prompt closely)
If cfg_scale_start and cfg_scale_end have the same value, no scaling occurs. When these values differ, linear scheduling is applied from start to end. They can also be reversed for creative strategies.
control_scale_start Structural guidance from input image, start strength. Default: 0.9
control_scale_end Structural guidance from input image, end strength. Default: 0.9
0.0: Disabled
0.1–0.5: Light
0.6–1.0: Balanced (default)
1.1–1.5+: Very strong
Same value = fixed. Different values = scheduled.
restoration_scale Early-stage restoration strength. Controls how strongly the model pulls the structure of the output image back toward the original image. Only applies during the early stages of sampling when the noise level is high.
Default: 0 (disabled).
color_fix_type Color adjustment method. Default: 'Wavelet'
Options: ['None', 'AdaIn', 'Wavelet']
loading_half_params Loads the SUPIR model weights in half precision (FP16).
Default: False
Reduces VRAM usage and increases speed at the cost of slight precision loss.
diff_dtype Precision to use for the diffusion model only.
Allows overriding default precision independently, unless loading_half_params is set.
Default: 'fp16'
Options: ['fp32', 'fp16', 'bf16']
ae_dtype Autoencoder precision.
Default: 'bf16'
Options: ['fp32', 'bf16']
use_tile_vae Enables tile-based encoding/decoding for memory efficiency with large images.
Default: False
encoder_tile_size Tile size when encoding (when use_tile_vae is enabled).
TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available.
Encoder:
- VRAM > 16GB: 3072
- VRAM > 12GB: 2048
- VRAM > 8GB: 1536
- VRAM <= 8GB: 960
- No GPU: 512
decoder_tile_size Tile size when encoding (when use_tile_vae is enabled).
TileVAE code has recommended tile sizes based on available VRAM if a CUDA device is available.
Decoder:
- VRAM > 30GB: 256
- VRAM > 16GB: 192
- VRAM > 12GB: 128
- VRAM > 8GB: 96
- VRAM <= 8GB: 64
- No GPU: 64
Number of Workers Number of parallel CPU processes for VAE encoding/decoding.
Improves speed on multi-core CPUs by efficiently preparing data for the GPU.
Default: 4
sampler_tile_size Tile size for TiledRestoreEDMSampler.
This is the size of each tile that the image is divided into during tiled sampling.
Example: tile_size of 128 → image is split into 128×128 pixel tiles.
sampler_tile_stride Tile stride for TiledRestoreEDMSampler.
Controls overlap between tiles during sampling.
Smaller tile_stride = more overlap, better blending, more compute.
Larger tile_stride = less overlap, faster, may cause seams.
Overlap = tile_size - tile_stride
Examples:
- tile_size = 128, stride = 64 → 64 px overlap.

Images from Pixabay
Original SUPIR Repository
Kijai's SUPIR Custom Nodes for ComfyUI

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages