Add ColQwen3 and ColQwen3MoE by selimcavas · Pull Request #355 · illuin-tech/colpali

selimcavas · 2025-11-08T20:51:16Z

add ColQwen3 + ColQwen3MoE wrappers around Qwen3VL backbones
add training entrypoint for Qwen3
updated transformers to v4.57.1 to access the Qwen3-VL backbones

I am currently unable to test the training script due to GPU constraints, this is mostly a draft implementation done with codex. MoE processing is currently the same with dense, I kept the implementation seperate to leave room for the implementations to diverge later.

selimcavas · 2025-11-10T08:11:17Z

I forgot to tag @ManuelFay

ManuelFay · 2025-11-10T10:10:11Z

Surely pretty nice - but we can't merge things that have not been tested through training !

@QuentinJGMace has been training Qwen3VL models recently, there is a branch open already. I'll let this one open so he can see cherry pick what he wants from both branches !
Thanks for the contrib!

QuentinJGMace · 2025-11-10T11:04:11Z

Hey @selimcavas ! thanks for the contrib.

I'm not sure about implementing support for MoE models as I don't think we'll train one (and none exists at the moment). But if one is trained one day I'll be happy to merge the code to support it.

As @ManuelFay said, i've been experimenting a bit with qwen3, as I'm soon on (long) hollidays i'm not sure when a new model will come out, but one should be eventually :)

ManuelFay · 2025-11-10T11:12:50Z

Maybe we can pass this off to @mlconti1 ?

selimcavas · 2025-11-10T11:16:27Z

Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model

athrael-soju · 2025-11-20T22:34:07Z

Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model

I'm casually training Colqwen3-vl-2B on an RTX 5090. I'm expecting it to take roughly 16 hours, with checkpoints every 250 steps and tracking via wandb.

PR in my fork, if you want to have a look: https://github.com/athrael-soju/colpali/pull/6/files

I think it's got potential of being a great colpali model and the recipe is already there from previous models, so why not?

mlconti1 · 2025-11-21T08:11:31Z

Hi, sorry for the delay, just came back from holidays too!
Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!

athrael-soju · 2025-11-22T11:17:29Z

Hi, sorry for the delay, just came back from holidays too!

Indeed I was interested in taking up from where @QuentinJGMace left off, we might have some ideas for new data mixes, but so far nothing running. I'll try to find some time next week to have a look at that, thanks for sharing @athrael-soju and let us know how the run goes!

It plateaued before 1 epoch unfortunately. I've been having issues with the dataset and had to also update some files from colpali_engine to get it to run.

I recall not having any of these issues when I was experimenting with colintern.

Feel free to check my PR if you get a chance, but I'll try again soon.

Mungeryang · 2025-12-25T09:20:52Z

@ManuelFay Last month, I have trained a Colqwen3 based on Qwen3-VL-2B-Instruct model with 2 * Nvidia RTX A100GPUs. If I can pull the fork to our repo? Welcome your criticism and suggestions~
https://github.com/Mungeryang/colqwen3
https://huggingface.co/goodman2001/colqwen3-v0.2
Best,
munger

ManuelFay · 2025-12-26T14:57:04Z

Hey hey ! Any differences with the code here ?

Mungeryang · 2025-12-29T12:45:06Z

Hey hey ! Any differences with the code here ?↳

Init the colqwen3-base model; Inherits from the Qwen3VLForConditionalGeneration class.

craftsangjae · 2026-02-10T07:49:27Z

hello, everyone. when will this PR be merged?

Test Package

pip install -U git+https://github.com/selimcavas/colpali@feature/qwen3-colpali

Enviornment

from transformers.utils.import_utils import is_flash_attn_2_available

from colpali_engine.models.qwen3 import ColQwen3, ColQwen3Processor

model = ColQwen3.from_pretrained(
    "TomoroAI/tomoro-colqwen3-embed-4b",
    dtype=torch.bfloat16,
    device_map="cuda:0",  # or "mps" if on Apple Silicon
    attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
).eval()

when i tried above script, I got the error messages below.

AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 model = ColQwen3.from_pretrained(
      2     "TomoroAI/tomoro-colqwen3-embed-4b",
      3     torch_dtype=torch.bfloat16,
      4     device_map="cuda:0",  # or "mps" if on Apple Silicon
      5     attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
      6 ).eval()

File ~/.local/lib/python3.10/site-packages/colpali_engine/models/qwen3/colqwen3/modeling_colqwen3.py:63, in ColQwen3.from_pretrained(cls, *args, **kwargs)
     61 if key_mapping is None:
     62     key_mapping = getattr(cls, "_checkpoint_conversion_mapping", None)
---> 63 return super().from_pretrained(*args, **kwargs, key_mapping=key_mapping)

File ~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:277, in restore_default_dtype.<locals>._wrapper(*args, **kwargs)
    275 old_dtype = torch.get_default_dtype()
    276 try:
--> 277     return func(*args, **kwargs)
    278 finally:
    279     torch.set_default_dtype(old_dtype)

File ~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:4971, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, weights_only, *model_args, **kwargs)
   4968 config = copy.deepcopy(config)  # We do not want to modify the config inplace in from_pretrained.
   4969 with ContextManagers(model_init_context):
   4970     # Let's make sure we don't run the init function of buffer modules
-> 4971     model = cls(config, *model_args, **model_kwargs)
   4973 # Make sure to tie the weights correctly
   4974 model.tie_weights()

File ~/.local/lib/python3.10/site-packages/colpali_engine/models/qwen3/colqwen3/modeling_colqwen3.py:37, in ColQwen3.__init__(self, config, mask_non_image_embeddings, **kwargs)
     34 attn_impl = kwargs.pop("attn_implementation", None)
     35 use_cache = kwargs.pop("use_cache", None)
---> 37 super().__init__(config=config)
     39 hidden_size = getattr(self.config, "hidden_size", None)
     40 if hidden_size is None and hasattr(self.config, "text_config"):

File ~/.local/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py:898, in Qwen3VLModel.__init__(self, config)
    896 super().__init__(config)
    897 self.visual = Qwen3VLVisionModel._from_config(config.vision_config)
--> 898 self.language_model = Qwen3VLTextModel._from_config(config.text_config)
    899 self.rope_deltas = None  # cache rope_deltas here
    901 # Initialize weights and apply final processing

File ~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:277, in restore_default_dtype.<locals>._wrapper(*args, **kwargs)
    275 old_dtype = torch.get_default_dtype()
    276 try:
--> 277     return func(*args, **kwargs)
    278 finally:
    279     torch.set_default_dtype(old_dtype)

File ~/.local/lib/python3.10/site-packages/transformers/modeling_utils.py:2311, in PreTrainedModel._from_config(cls, config, **kwargs)
   2308         model = cls(config, **kwargs)
   2310 else:
-> 2311     model = cls(config, **kwargs)
   2313 # restore default dtype if it was modified
   2314 if dtype_orig is not None:

File ~/.local/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py:776, in Qwen3VLTextModel.__init__(self, config)
    772 self.layers = nn.ModuleList(
    773     [Qwen3VLTextDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
    774 )
    775 self.norm = Qwen3VLTextRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
--> 776 self.rotary_emb = Qwen3VLTextRotaryEmbedding(config=config)
    777 self.gradient_checkpointing = False
    779 # Initialize weights and apply final processing

File ~/.local/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py:297, in Qwen3VLTextRotaryEmbedding.__init__(self, config, device)
    294 self.register_buffer("inv_freq", inv_freq, persistent=False)
    295 self.original_inv_freq = self.inv_freq
--> 297 self.mrope_section = config.rope_scaling.get("mrope_section", [24, 20, 20])

AttributeError: 'NoneType' object has no attribute 'get'

selimcavas added 2 commits November 8, 2025 23:35

Add ColQwen3 and ColQwen3MoE models with processors

9b78cea

chore: bump transformers to 4.57.1

982704b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ColQwen3 and ColQwen3MoE#355

Add ColQwen3 and ColQwen3MoE#355
selimcavas wants to merge 2 commits intoilluin-tech:mainfrom
selimcavas:feature/qwen3-colpali

selimcavas commented Nov 8, 2025

Uh oh!

selimcavas commented Nov 10, 2025

Uh oh!

ManuelFay commented Nov 10, 2025

Uh oh!

QuentinJGMace commented Nov 10, 2025

Uh oh!

ManuelFay commented Nov 10, 2025

Uh oh!

selimcavas commented Nov 10, 2025

Uh oh!

athrael-soju commented Nov 20, 2025 •

edited

Loading

Uh oh!

mlconti1 commented Nov 21, 2025

Uh oh!

athrael-soju commented Nov 22, 2025 •

edited

Loading

Uh oh!

Mungeryang commented Dec 25, 2025 •

edited

Loading

Uh oh!

ManuelFay commented Dec 26, 2025

Uh oh!

Mungeryang commented Dec 29, 2025

Uh oh!

craftsangjae commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

selimcavas commented Nov 8, 2025

Uh oh!

selimcavas commented Nov 10, 2025

Uh oh!

ManuelFay commented Nov 10, 2025

Uh oh!

QuentinJGMace commented Nov 10, 2025

Uh oh!

ManuelFay commented Nov 10, 2025

Uh oh!

selimcavas commented Nov 10, 2025

Uh oh!

athrael-soju commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlconti1 commented Nov 21, 2025

Uh oh!

athrael-soju commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mungeryang commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ManuelFay commented Dec 26, 2025

Uh oh!

Mungeryang commented Dec 29, 2025

Uh oh!

craftsangjae commented Feb 10, 2026

Test Package

Enviornment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

athrael-soju commented Nov 20, 2025 •

edited

Loading

athrael-soju commented Nov 22, 2025 •

edited

Loading

Mungeryang commented Dec 25, 2025 •

edited

Loading