Skip to content
/ CoS Public

[NeurIPS 2025] Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards

License

Notifications You must be signed in to change notification settings

baaivision/CoS

Repository files navigation

Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards

Honghao Chen1,2,3#, Xingzhou Lou1,2#, Xiaokun Feng1,2#, Kaiqi Huang1,2, Xinlong Wang3

1CASIA, 2UCAS, 3BAAI
# Equal Contribution
[Paper]

In this work, we introduce Chain of Step reasoning for vision-language models, enabling assessing reasoning step quality accurately and leading to effective reinforcement learning and inference-time scaling with fine-grained rewards. Experimental results across multiple benchmarks demonstrate the effectiveness of CoS. More importantly, we conduct extensive empirical analysis and ablations to unveil CoS’s appealing properties. We hope this paper offers insights into more complex multi-modal reasoning.

ShareGPT-Step-300K

Note: You can directly use our SFT dataset (special tokens have been added) through the following link, or you can assess the raw step data to customize your SFT dataset. For customization, you can modify get_sft_json.py to get your SFT data accordingly.

Description Links
ShareGPT-Step-300K.jsonl The SFT Jsonl 🤗 HF link
images.zip image files 🤗 HF link
raw_jsonl.zip raw step jsonl file for customization 🤗 HF link

PRM & Data

Note: You can directly use our train jsonl file to train the PRM (special tokens have been added with a fixed format) through the following link, or you can assess the raw data to customize your dataset. For customization, you can modify get_prm_json.py to get your data accordingly.

Description Links
CoS-PRM The PRM model 🤗 HF link
prm_data_raw.json raw prm data 🤗 HF link
prm_data_train.jsonl prm training jsonl 🤗 HF link

Checkpoints

Description Links
CoS-SFT The SFT model 🤗 HF link
CoS The RL model 🤗 HF link

ToDo List

  • SFT Dataset
  • PRM & Dataset
  • Training & Inference code
  • Checkpoints

License

Apache License 2.0

Citation

@article{chen2025unveiling,
  title={Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards},
  author={Chen, Honghao and Lou, Xingzhou and Feng, Xiaokun and Huang, Kaiqi and Wang, Xinlong},
  journal={arXiv preprint arXiv:2509.19003},
  year={2025}
}

Acknowledgement

We thank the repositories for their excellent work: InternVL, LLaVa-NeXt, TAP

About

[NeurIPS 2025] Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published