Skip to content

ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks

Notifications You must be signed in to change notification settings

earth-insights/ZoomEarth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks

[Project][arXiv]

🔥🔥🔥 ZoomEarth

We released ZoomEarth🌍, a vision language model that is designed to solve visual reasoning and question answering tasks on ultra-high-resolution remote sensing imagery with active perception. Moreover, ZoomEarth can seamlessly integrate with downstream models for tasks such as cloud removal, denoising, segmentation, and image editing through simple tool interfaces, demonstrating strong extensibility.

🎊 News and Updates

  • 2025.11.22 🎉🎉🎉 We release the code that supports faster inference with vLLM!
  • 2025.11.18 🎉🎉🎉 ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks is now avilable on arXiv!
  • 2025.11.15 🎉🎉🎉 ZoomEarth-3B is publicly available on huggingface🤗!
  • 2025.11.15 🎉🎉🎉 LRS-GRO is publicly available on huggingface🤗!

🎞️ Demo Video

ZoomEarth-Demo.mp4

🧠 Model

Our model, ZoomEarth, is built upon Qwen2.5-VL-3B, a powerful VLM that
It supports fine-grained reasoning, spatial context interpretation, and multi-level object understanding.

🛰️ Dataset

LRS-GRO contains high-resolution satellite images annotated with:

  • Multi-level question types (global, regional, object)
  • Bounding boxes and spatial relations
  • Reasoning-based and factual QAs
Split #Images #Questions Avg. Resolution
SFT 88 1011 5000
RL 228 2500 5000
Test 908 9734 5000

Download:
LRS-GRO 🤗

⚙️ Installation

Step 1. Create a conda environment and activate it.

conda create -n zoom-earth python=3.10 -y
conda activate zoom-earth

Step 2. Install PyTorch (We use PyTorch 2.4.1 / CUDA 12.1)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 3. Install other depencencies

pip install -r requirements.txt

Step 4. Configure NLTK local corpora (for WordNet)

# (1) Download WordNet data to a local directory (optional if already exists)
python -m nltk.downloader wordnet -d ./nltk_data

# (2) In your code, add the following before importing WordNet
import nltk

local_corpora = "./nltk_data"
nltk.data.path.insert(0, local_corpora)

from nltk.corpus import wordnet as wn

and then replace local_corpora with actual path in src/eval/eval.py, src/train/RL/src/open-r1-multimodal/src/open_r1/custom/customized_funcs.py

🚀 Quick start

python src/demo.py

🚂 Train

To train ZoomEarth, first run bash ./run_scripts/train_sft.sh to start SFT training phase.

After that, run bash ./run_scripts/train_rl.sh to start RL training phase.

📋 Test

To evaluate model on LRS-GRO, first run bash ./run_scripts/infer.sh to generate inference file.

After that, run bash ./run_scripts/eval.sh to get detailed evaluation result.

Or infer with vLLM:

First install vLLM to your environment And then start vLLM services by:

VLLM_USE_MODELSCOPE=true vllm serve \
PATH_TO_ZOOM_EARTH_MODEL \
--served-model-name ZoomEarth \
--max_model_len 2048 \
--host 0.0.0.0 \
--port 8000

Finally run python ./src/eval/infer_vllm.py --exp_name zoomearth-infer, and after infer you will find your result in result/zoomearth-infer.jsonl

📬 Contact

If you have questions or would like to collaborate, please contact us at:
📧 liuruixun6343@gmail.com

📧 HappyBug@stu.xjtu.edu.cn

Citation

If you found our work usful, welcome to cite us:

@misc{liu2025zoomearthactiveperceptionultrahighresolution,
  title={ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks},
  author={Ruixun Liu and Bowen Fu and Jiayi Song and Kaiyu Li and Wanchen Li and Lanxuan Xue and Hui Qiao and Weizhan Zhang and Deyu Meng and Xiangyong Cao},
  year={2025},
  eprint={2511.12267},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2511.12267},
}

❤ Acknowledgments

Thanks to the images from FAIR1M, GLH-Bridge and STAR, the benchmarks: LRS-VQA, MME-RealWorld, XLRS-Bench and GeoLLaVA-8K and the VLM-R1 training framework code.

© 2025 ZoomEarth Project. Released under the Apache 2.0 License.

About

ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages