- Overview
- Repository Structure
- Results & Benchmarks
- Installation & Environment Setup
- Training
- Evaluation
- Tests
ResQ is an autonomous agent designed to solve the Search-and-Rescue problem in partially observable disaster environments, implementing a Hybrid Architecture combining:
-
Double DQN: Handles high-level decision-making. The agent processes limited local observations (
$3 \times 3$ grid) to decide between step-by-step exploration and triggering the A* macro-action. -
A* Macro-action: Invoked by the agent for rapidly securing detected victims within the local observation grid and transporting them to the drop-off zone via optimal paths.
The ResQ agent is trained and evaluated against two realistic Search-and-Rescue baselines:
-
The Lawnmower Baseline: A systematic, non-adaptive strategy that executes a snake-like sweep pattern to exhaustively cover the map row-by-row.
-
The Greedy Observation Baseline: A heuristic agent that mimics human intuition in low-visibility conditions. It prioritizes immediate rescue upon victim detection and otherwise navigates to least-visited adjacent cells to explore new areas.
This project evaluates whether reinforcement learning can outperform hand-crafted heuristic strategies in a partially observable environment.
resq-agent/
│── baseline/
│ ├── lawnmower.py
│ ├── greedy_observation.py
│── environment/
│ └── ResQEnv/...
│── images/
│ └── demo.gif
│── model/
│── a_star.py
│── callback.py
│── config.py
│── evaluate.py
│── train.py
│── test.py
│── requirements.txt
│── .gitignore
We evaluated the ResQ Agent against two baselines over 2000 randomized episodes/maps.
The table below compares the navigational efficiency and reliability of each strategy.
| Strategy | Avg. Steps (Lower=Faster) | Performance Gap | Stuck Rate |
|---|---|---|---|
| Lawnmower Baseline | 322.95 | Lowerbound | 0.0% (Guaranteed) |
| ResQ Agent | 333.78 | +3.3% (Near-Optimal) | 0.0% (n=2000 maps) |
| Greedy Baseline | 350.10 | +8.4% | >0% |
git clone https://github.com/johnnyau19/resq-agent
cd resq-agentpython3 -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows:
venv\Scripts\activatepip install -r requirements.txtTraining is controlled by train.py, using Stable-Baselines3 DQN with custom callbacks and TensorBoard logging.
python train.pyYou will be prompted to enter the desired number of training timesteps when the script starts.
This will:
- Train a Double DQN agent
- Logs learning curves to
./logs/tensorboard/ - Saves periodic checkpoints to
./logs/checkpoints/ - Uses our BaselineCompareCallback, which:
- Evaluates the agent against the Lawnmower baseline
- Automatically saves the best-performing model to
/models/best_overall_model/
Run full evaluation:
python evaluate.pyThis will:
- Run the trained DQN model, Lawnmower and Greedy baselines
- Evaluate performance over 2000 randomized episodes (unique seed per run)
- Display real-time Matplotlib performance curves
- Print summary statistics
Example output:
DQN average steps: XXX
Lawnmower average steps: XXX
Greedy average steps: XXX
python test.pyRuns simple validation checks for environment and agent logic.

