IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards

"When a metric is used as a target, it ceases to be a good metric." — Goodhart's Law

IFDecorator addresses this fundamental challenge in RLVR training, where models often exploit verification shortcuts rather than truly understanding user intent, leading to the critical over-optimization problem.

🌟 Framework Overview

The IFDecorator Framework: A synergistic architecture combining three components - the cooperative-adversarial data flywheel that evolves instruction-verification pairs, IntentCheck for robust intent alignment, and trip wires for proactive reward hacking detection. This unified approach transforms RLVR training into a robust and sample-efficient pipeline.

📊 Data

The datasets are available on Hugging Face: guox18/IFDecorator. Each data entry includes a "difficulty" label rather than a "complexity" label:

Difficulty: Pass rate under corresponding verification.
Complexity: Number of constraints.

🚀 Key Results

🎯 Superior Performance with Robust Hack Resistance

Breaking the Trade-off: IFDecorator achieves the optimal balance between instruction following performance and hack resistance. Our framework guides models toward the upper-right region, where strong instruction following capability coexists with robust resistance to reward hacking - a combination that traditional RLVR approaches struggle to achieve.

📈 Dataset Statistics

Difficulty instead of Complexity: Instructions with different complexity levels may have varying actual difficulty. Our data flywheel quantifies difficulty through pass rates, ensuring efficient training.

🏗️ Code

Data Processing Pipeline (`modules/`)

preprocess/: Data collection and preprocessing
enhance/: Data evolving
postprocess/: Post-processing and filtering

Reinforcement Learning Training (`training/`)

reward/ and reward_manager/: Reward Design
Training recipes for Qwen2.5-7B and Qwen2.5-32B models

Monitoring (`monitoring/`)

Instructions with trap (probe.jsonl)
Trigger and capture reward hacking.

🚀 Quick Start

Prerequisites

Python 3.10

Installation

1. Basic Environment Setup

# Clone the repository
git clone <repository-url>
cd code

# Install dependencies for flywheel
pip install -r requirements.txt

Data Pipeline Execution

The data preparation process consists of three sequential steps:

Step 1: Preprocessing

cd modules/preprocess
./run_preprocess.sh <input_dir> <output_path> [seed]

Step 2: Enhancement Pipeline

cd modules/enhance
./run_pipeline.sh

Step 3: Postprocessing

cd modules/postprocess
./run_postprocess.sh [pipeline_num] [input_file]

Reinforcement Learning

1. Install VERL Environment

# Clone VERL repository
git clone https://github.com/volcengine/verl.git
cd verl

# Checkout specific commit for compatibility
git checkout 5c5b92819db93dd47ad3403f41ef9b871c47874c

# Install VERL
pip install .

Important: Different VERL versions may have different output formats regarding special tokens. Use commit 5c5b92819db93dd47ad3403f41ef9b871c47874c for guaranteed compatibility

You have two options for reward manager:

Option A: Replace the reward manager with our custom implementation
Option B: Use the official batch reward manager (recommended for newer VERL versions)

3. Start Training

Navigate to the recipe directory and run the appropriate training script:

cd recipe

# For Qwen2.5-7B model
bash run_qwen2_5-7b.sh

# For Qwen2.5-32B model  
bash run_qwen2_5-32b.sh

Reward Hacking Detection

You can monitor and detect potential reward hacking using our tripwires system:

cd tripwires
bash run_hacking_prob.sh

📄 License

This project is licensed under the Creative Commons Attribution 4.0 International License - see the LICENSE file for details.

📚 Citation

If you use this work in your research, please cite:

@misc{guo2025ifdecoratorwrappinginstructionfollowing,
      title={IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards}, 
      author={Xu Guo and Tianyi Liang and Tong Jian and Xiaogui Yang and Ling-I Wu and Chenhui Li and Zhihui Lu and Qipeng Guo and Kai Chen},
      year={2025},
      eprint={2508.04632},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.04632}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
infer		infer
modules		modules
pics		pics
recipe		recipe
scripts		scripts
tripwires		tripwires
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
global_config.py		global_config.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards

🌟 Framework Overview

📊 Data

🚀 Key Results

🎯 Superior Performance with Robust Hack Resistance

📈 Dataset Statistics

🏗️ Code

Data Processing Pipeline (`modules/`)

Reinforcement Learning Training (`training/`)

Monitoring (`monitoring/`)

🚀 Quick Start

Prerequisites

Installation

1. Basic Environment Setup

Data Pipeline Execution

Step 1: Preprocessing

Step 2: Enhancement Pipeline

Step 3: Postprocessing

Reinforcement Learning

1. Install VERL Environment

3. Start Training

Reward Hacking Detection

📄 License

📚 Citation

About

Uh oh!

Releases

Packages

Languages

License

guox18/IFDecorator

Folders and files

Latest commit

History

Repository files navigation

IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards

🌟 Framework Overview

📊 Data

🚀 Key Results

🎯 Superior Performance with Robust Hack Resistance

📈 Dataset Statistics

🏗️ Code

Data Processing Pipeline (modules/)

Reinforcement Learning Training (training/)

Monitoring (monitoring/)

🚀 Quick Start

Prerequisites

Installation

1. Basic Environment Setup

Data Pipeline Execution

Step 1: Preprocessing

Step 2: Enhancement Pipeline

Step 3: Postprocessing

Reinforcement Learning

1. Install VERL Environment

3. Start Training

Reward Hacking Detection

📄 License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data Processing Pipeline (`modules/`)

Reinforcement Learning Training (`training/`)

Monitoring (`monitoring/`)

Packages