Skip to content

hustvl/Spa3R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔮 Spaeing the Unseen

Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning

Haoyi Jiang1, Liu Liu2, Xinjie Wang2, Yonghao He3,
Wei Sui3, Zhizhong Su2, Wenyu Liu1, Xinggang Wang1
1Huazhong University of Science & Technology, 2Horizon Robotics, 3D-Robotics

arXiv Hugging Face Model License: MIT

Installation

Please clone this project with --recursive.

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

pip install submodules/vggt
pip install -e submodules/lmms-eval

Data Preparation

1. Pre-training

We utilize a combination of large-scale indoor scene datasets: ScanNet and ScanNet++.

2. Instruction Tuning

  • For video-centric VSI-Bench, we fine-tune on VSI-590K.
  • For image-based benchmarks, we use a composite training set. Please refer to the VG-LLM datasets.

Training

1. Spa3R Pre-training

To train the Predictive Spatial Field Modeling (PSFM) framework from scratch:

export PYTHONPATH=.
python scripts/train_spa3r.py

2. Spa3-VLM Instruction Tuning

Set the pre-trained Spa3R path in the script: geometry_encoder_path=/path/to/spa3r.ckpt

bash scripts/train_vlm_sft.sh

Evaluation

To evaluate Spa3-VLM on spatial reasoning benchmarks:

bash scripts/eval_vlm.sh

Citation

If you find our work helpful for your research, please consider starring this repository ⭐ and citing our work:

@article{Spa3R,
  title={Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning},
  author={Haoyi Jiang and Liu Liu and Xinjie Wang and Yonghao He and Wei Sui and Zhizhong Su and Wenyu Liu and Xinggang Wang},
  journal={arXiv preprint arXiv:2602.21186},
  year=2026
}

License

This project is released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors