DescribeEarth: Describe Anything for Remote Sensing Images

Author: Kaiyu Li*, Zixuan Jiang*, Xiangyong Cao✉, Jiayu Wang, Yuchen Xiao, Deyu Meng, Zhi Wang

News

2025-10-01: 🔥🔥🔥The paper, code, dataset and benchmark are released.

Introduction

Automated textual description of remote sensing images is essential for applications such as environmental monitoring, urban planning, and disaster management. However, most existing methods only generate captions at the image level, lacking fine-grained object-level interpretation.

To address this gap, we propose Geo-DLC, a new task of object-level fine-grained image captioning for remote sensing. To support this task, we introduce:

DE-Dataset: a large-scale dataset with 25 categories and 261,806 annotated instances, providing detailed descriptions of object attributes, relationships, and contexts.
DE-Benchmark: an LLM-assisted question-answering evaluation suite to systematically measure model performance on Geo-DLC.
DescribeEarth: a Multi-modal Large Language Model (MLLM) explicitly designed for Geo-DLC, featuring a scale-adaptive focal strategy and a domain-guided fusion module to capture both high-resolution details and global context.

DescribeEarth consistently outperforms state-of-the-art general MLLMs on DE-Benchmark, achieving superior factual accuracy, descriptive richness, and grammatical soundness across diverse remote sensing scenarios.

Installation

see here

Usage

DEMO

cd scripts
python app.py

De-Dataset & De-Benchmark

De-Dataset can be downloaded from here. The dataset is formatted in the form as follow:

DE-Dataset
- {DIOR, DOTA}
- - image
- - description

Use bash scripts/format_data.sh to format data for training.

De-Benchmark can be downloaded from huggingface.

Quick Start

the Pretrained checkpoints of DescribeEarth can be downloaded from huggingface. To use it, put the whole folder in weights/.

Inference

python inference.py --model_dir <model_dir> --image <image_dir> --bbox <4-points-bbox/2-points-bbox>

Example

python inference.py --model_dir ../weights/DescribeEarth_0930 --image ./example1/image.jpg --bbox 36.0 332.0 311.0 325.0 317.0 584.0 42.0 591.0

Result

The object of category baseball_field within the specified polygon bounding box is a well-defined outdoor sports facility designed for baseball. The field features a central dirt infield area, clearly demarcated from the surrounding grassy outfield. The infield includes a pitcher's mound and bases, indicating its purpose for baseball games. The surrounding area consists of a large, open grassy field, typical of a baseball diamond layout. Adjacent to the field are structures that appear to be part of a larger complex, possibly including facilities such as dugouts or storage areas. The overall layout and design confirm this as a dedicated baseball field. There are no visible signs of current activity on the field itself.

Training

Following Qwen2.5-VL baseline, do the following to train on DE-dataset / your own dataset:

Edit Qwen2.5-VL/qwen-vl-finetune/qwenvl/data/__init__.py for the Path to the Formatted dataset.
Download pretrained weights (merged checkpoint of Qwen2.5-VL-3B and RemoteCLIP-vit-b32) from huggingface.
bash script/sft.sh under Qwen2.5-VL/qwen-vl-finetune

Evaluating

Use scripts/openai_valid.py to evaluate DescribeEarth and other models.

python openai_valid.py <path to QA.json> <path to image_dataset> -o <output_dir> --generator <'api' or 'local'> --api-key <api_key> --model_dir <model_dir>

Use calculate_score.py to get the final results.

BibTeX

@article{li2025describeearth,
  title={DescribeEarth: Describe Anything for Remote Sensing Images},
  author={Li, Kaiyu and Jiang, Zixuan and Cao, Xiangyong and Wang, Jiayu and Xiao Yuchen and Meng, Deyu and Wang, Zhi},
  journal={arXiv preprint arXiv:2509.25654},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DescribeEarth_Qwen2.5-VL		DescribeEarth_Qwen2.5-VL
data		data
environments		environments
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DescribeEarth: Describe Anything for Remote Sensing Images

News

Introduction

Installation

Usage

DEMO

De-Dataset & De-Benchmark

Quick Start

Inference

Training

Evaluating

BibTeX

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

earth-insights/DescribeEarth

Folders and files

Latest commit

History

Repository files navigation

DescribeEarth: Describe Anything for Remote Sensing Images

News

Introduction

Installation

Usage

DEMO

De-Dataset & De-Benchmark

Quick Start

Inference

Training

Evaluating

BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages