V₀: A Generalist Value Model for Any Policy at State Zero

V₀ is a Generalist Value Model designed to predict the expected performance of any model on unseen instructions without requiring parameter updates or additional rollouts. By treating a policy's dynamic capability as explicit context, V₀ serves as an efficient resource scheduler for LLM training and deployment.

💡 Overview

Function: V₀ uses a model's historical performance to predict how it will perform on unseen instructions without running the model itself.

In LLM policy-gradient RL training (e.g., PPO), value models are typically coupled to a specific policy. V₀ reframes this paradigm by:

State Zero Estimation: Focusing on the initial prompt to predict success rates before generation.
Dynamic Profiling: Using instruction-performance pairs to perceive capability shifts without retraining.
Resource Scheduling: Optimizing sampling budgets in GRPO and acting as an intelligent router during deployment.

Method

The V₀ architecture consists of a Semantic Backbone for embedding extraction and a Residual Query Adapter that integrates static and dynamic queries. These features are processed by a TabPFN inference head to generate value predictions.

🚀 Getting Started

Installation

Clone the repository and install the dependencies:

git clone https://github.com/Now-Join-Us/V0.git
cd V0
pip install -r requirements.txt

Training

python main_train.py

Evaluation & Demo

To launch the local demo:

python demo.py

📖 Citation

If you find V₀ useful in your research, please cite our work:

@article{generalist_value_model_v0,
  author       = {Yi-Kai Zhang and Zhiyuan Yao and Hongyan Hao and Yueqing Sun and Qi Gu and Hui Su and Xunliang Cai and De-Chuan Zhan and Han-Jia Ye},
  title        = {V0: A Generalist Value Model for Any Policy at State Zero},
  journal      = {CoRR},
  volume       = {abs/2602.03584},
  year         = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
data		data
v0_core		v0_core
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
index.html		index.html
main_train.py		main_train.py
main_train.sh		main_train.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V₀: A Generalist Value Model for Any Policy at State Zero

💡 Overview

Method

🚀 Getting Started

Installation

Training

Evaluation & Demo

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

Now-Join-Us/V0

Folders and files

Latest commit

History

Repository files navigation

V0: A Generalist Value Model for Any Policy at State Zero

💡 Overview

Method

🚀 Getting Started

Installation

Training

Evaluation & Demo

📖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

V₀: A Generalist Value Model for Any Policy at State Zero

Packages