Skip to content
/ V0 Public

The code repository for "$V_0$: A Generalist Value Model for Any Policy at State Zero"

License

Notifications You must be signed in to change notification settings

Now-Join-Us/V0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

V0: A Generalist Value Model for Any Policy at State Zero

arXiv GitHub Demo

V0 is a Generalist Value Model designed to predict the expected performance of any model on unseen instructions without requiring parameter updates or additional rollouts. By treating a policy's dynamic capability as explicit context, V0 serves as an efficient resource scheduler for LLM training and deployment.

💡 Overview

Function: V0 uses a model's historical performance to predict how it will perform on unseen instructions without running the model itself.

In LLM policy-gradient RL training (e.g., PPO), value models are typically coupled to a specific policy. V0 reframes this paradigm by:

  • State Zero Estimation: Focusing on the initial prompt to predict success rates before generation.
  • Dynamic Profiling: Using instruction-performance pairs to perceive capability shifts without retraining.
  • Resource Scheduling: Optimizing sampling budgets in GRPO and acting as an intelligent router during deployment.

Motivation

Method

The V0 architecture consists of a Semantic Backbone for embedding extraction and a Residual Query Adapter that integrates static and dynamic queries. These features are processed by a TabPFN inference head to generate value predictions.

Method Pipeline

🚀 Getting Started

Installation

Clone the repository and install the dependencies:

git clone https://github.com/Now-Join-Us/V0.git
cd V0
pip install -r requirements.txt

Training

python main_train.py

Evaluation & Demo

To launch the local demo:

python demo.py

📖 Citation

If you find V0 useful in your research, please cite our work:

@article{generalist_value_model_v0,
  author       = {Yi-Kai Zhang and Zhiyuan Yao and Hongyan Hao and Yueqing Sun and Qi Gu and Hui Su and Xunliang Cai and De-Chuan Zhan and Han-Jia Ye},
  title        = {V0: A Generalist Value Model for Any Policy at State Zero},
  journal      = {CoRR},
  volume       = {abs/2602.03584},
  year         = {2026}
}

About

The code repository for "$V_0$: A Generalist Value Model for Any Policy at State Zero"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published