Conversation
|
I forgot to tag @ManuelFay |
|
Surely pretty nice - but we can't merge things that have not been tested through training ! @QuentinJGMace has been training Qwen3VL models recently, there is a branch open already. I'll let this one open so he can see cherry pick what he wants from both branches ! |
|
Hey @selimcavas ! thanks for the contrib. I'm not sure about implementing support for MoE models as I don't think we'll train one (and none exists at the moment). But if one is trained one day I'll be happy to merge the code to support it. As @ManuelFay said, i've been experimenting a bit with qwen3, as I'm soon on (long) hollidays i'm not sure when a new model will come out, but one should be eventually :) |
|
Maybe we can pass this off to @mlconti1 ? |
|
Okay I might try training a model by adjusting the params I currently have a rtx5090, approximately how many gpu hours (H100) does it take to train a full model such as ColQwen2.5? I planned to train the Qwen3 VL 2B model |
I'm casually training Colqwen3-vl-2B on an RTX 5090. I'm expecting it to take roughly 16 hours, with checkpoints every 250 steps and tracking via wandb. PR in my fork, if you want to have a look: https://github.com/athrael-soju/colpali/pull/6/files I think it's got potential of being a great colpali model and the recipe is already there from previous models, so why not? |
|
Hi, sorry for the delay, just came back from holidays too! |
It plateaued before 1 epoch unfortunately. I've been having issues with the dataset and had to also update some files from colpali_engine to get it to run. I recall not having any of these issues when I was experimenting with colintern. Feel free to check my PR if you get a chance, but I'll try again soon. |
|
@ManuelFay Last month, I have trained a Colqwen3 based on Qwen3-VL-2B-Instruct model with 2 * Nvidia RTX A100GPUs. If I can pull the fork to our repo? Welcome your criticism and suggestions~ |
|
Hey hey ! Any differences with the code here ? |
Init the colqwen3-base model; Inherits from the Qwen3VLForConditionalGeneration class. |
|
hello, everyone. when will this PR be merged? Test PackageEnviornmentfrom transformers.utils.import_utils import is_flash_attn_2_available
from colpali_engine.models.qwen3 import ColQwen3, ColQwen3Processor
model = ColQwen3.from_pretrained(
"TomoroAI/tomoro-colqwen3-embed-4b",
dtype=torch.bfloat16,
device_map="cuda:0", # or "mps" if on Apple Silicon
attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
).eval()when i tried above script, I got the error messages below. |
I am currently unable to test the training script due to GPU constraints, this is mostly a draft implementation done with codex. MoE processing is currently the same with dense, I kept the implementation seperate to leave room for the implementations to diverge later.