ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks

🔥🔥🔥 ZoomEarth

We released ZoomEarth🌍, a vision language model that is designed to solve visual reasoning and question answering tasks on ultra-high-resolution remote sensing imagery with active perception. Moreover, ZoomEarth can seamlessly integrate with downstream models for tasks such as cloud removal, denoising, segmentation, and image editing through simple tool interfaces, demonstrating strong extensibility.

🎊 News and Updates

2025.11.22 🎉🎉🎉 We release the code that supports faster inference with vLLM!
2025.11.18 🎉🎉🎉 ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks is now avilable on arXiv!
2025.11.15 🎉🎉🎉 ZoomEarth-3B is publicly available on huggingface🤗!
2025.11.15 🎉🎉🎉 LRS-GRO is publicly available on huggingface🤗!

🎞️ Demo Video

ZoomEarth-Demo.mp4

🧠 Model

Our model, ZoomEarth, is built upon Qwen2.5-VL-3B, a powerful VLM that
It supports fine-grained reasoning, spatial context interpretation, and multi-level object understanding.

Model weights: ZoomEarth-3B 🤗
Training scripts: src/train/
Evaluate scripts: src/eval/

🛰️ Dataset

LRS-GRO contains high-resolution satellite images annotated with:

Multi-level question types (global, regional, object)
Bounding boxes and spatial relations
Reasoning-based and factual QAs

Split	#Images	#Questions	Avg. Resolution
SFT	88	1011	5000
RL	228	2500	5000
Test	908	9734	5000

Download:
LRS-GRO 🤗

⚙️ Installation

Step 1. Create a conda environment and activate it.

conda create -n zoom-earth python=3.10 -y
conda activate zoom-earth

Step 2. Install PyTorch (We use PyTorch 2.4.1 / CUDA 12.1)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 3. Install other depencencies

pip install -r requirements.txt

Step 4. Configure NLTK local corpora (for WordNet)

# (1) Download WordNet data to a local directory (optional if already exists)
python -m nltk.downloader wordnet -d ./nltk_data

# (2) In your code, add the following before importing WordNet
import nltk

local_corpora = "./nltk_data"
nltk.data.path.insert(0, local_corpora)

from nltk.corpus import wordnet as wn

and then replace local_corpora with actual path in src/eval/eval.py, src/train/RL/src/open-r1-multimodal/src/open_r1/custom/customized_funcs.py

🚀 Quick start

python src/demo.py

🚂 Train

To train ZoomEarth, first run bash ./run_scripts/train_sft.sh to start SFT training phase.

After that, run bash ./run_scripts/train_rl.sh to start RL training phase.

📋 Test

To evaluate model on LRS-GRO, first run bash ./run_scripts/infer.sh to generate inference file.

After that, run bash ./run_scripts/eval.sh to get detailed evaluation result.

Or infer with vLLM:

First install vLLM to your environment And then start vLLM services by:

VLLM_USE_MODELSCOPE=true vllm serve \
PATH_TO_ZOOM_EARTH_MODEL \
--served-model-name ZoomEarth \
--max_model_len 2048 \
--host 0.0.0.0 \
--port 8000

Finally run python ./src/eval/infer_vllm.py --exp_name zoomearth-infer, and after infer you will find your result in result/zoomearth-infer.jsonl

📬 Contact

If you have questions or would like to collaborate, please contact us at:
📧 liuruixun6343@gmail.com

📧 HappyBug@stu.xjtu.edu.cn

Citation

If you found our work usful, welcome to cite us:

@misc{liu2025zoomearthactiveperceptionultrahighresolution,
  title={ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks},
  author={Ruixun Liu and Bowen Fu and Jiayi Song and Kaiyu Li and Wanchen Li and Lanxuan Xue and Hui Qiao and Weizhan Zhang and Deyu Meng and Xiangyong Cao},
  year={2025},
  eprint={2511.12267},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2511.12267},
}

❤ Acknowledgments

Thanks to the images from FAIR1M, GLH-Bridge and STAR, the benchmarks: LRS-VQA, MME-RealWorld, XLRS-Bench and GeoLLaVA-8K and the VLM-R1 training framework code.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
run_scripts		run_scripts
src		src
.gitignore		.gitignore
LRS-GRO.jsonl		LRS-GRO.jsonl
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks

🔥🔥🔥 ZoomEarth

🎊 News and Updates

🎞️ Demo Video

🧠 Model

🛰️ Dataset

⚙️ Installation

🚀 Quick start

🚂 Train

📋 Test

📬 Contact

Citation

❤ Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Languages

earth-insights/ZoomEarth

Folders and files

Latest commit

History

Repository files navigation

ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks

🔥🔥🔥 ZoomEarth

🎊 News and Updates

🎞️ Demo Video

🧠 Model

🛰️ Dataset

⚙️ Installation

🚀 Quick start

🚂 Train

📋 Test

📬 Contact

Citation

❤ Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages