- [2025.10.26] π Our project introduction has been featured on DeepWiki!
- [2025.10.16] π Our paper has been accepted by NeurIPS 2025 Efficient Reasoning Workshop!
- [2025.10.13] πΈ Excited to have a tutorial video for AgentFlow covered by Discover AI on YouTube!
- [2025.10.10] π Our X post received 1K+ likes! Feel free to check out the post and join the discussion! π¬
- [2025.10.08] π₯ We are honored to be featured as π€ HuggingFace Daily Paper #2.
AgentFlow is a trainable, tool-integrated agentic framework designed to overcome the scalability and generalization limits of todayβs tool-augmented reasoning approaches.
Unlike prevailing approaches such as Search-R1 which train a single LLM to interleave reasoning steps with tool calls, AgentFlow introduces a modular agentic system with four specialized modules: π§ Planner, π Executor, β Verifier, and βοΈ Generator.
For effective planning and tool use, the framework directly optimizes planner agent within the system in an online fashion using Flow-based Group Refined Policy Optimization (Flow-GRPO), achieving superior performance across diverse domains with improved tool-calling reliability and long-horizon reasoning capabilities.
Excited to have a tutorial video for AgentFlow covered by Discover AI on YouTube!
- π§© Modular Agentic System β Four specialized agent modules (Planner, Executor, Verifier, Generator) that coordinate via evolving memory and integrated tools across multiple turns.
- π Multi-Tool Integration β Seamlessly connect with diverse tool ecosystems, including
base_generator,python_coder,google_search,wikipedia_search,web_search, and more. - π― Flow-GRPO Algorithm β Enables in-the-flow agent optimization for long-horizon reasoning tasks with sparse rewards.
- π Proven Results β AgentFlow (7B Backbone) beats top baselines on 10 benchmarks, with +14.9% search, +14.0% agentic, +14.5% math, +4.1% science, even outperforming ~200B-parameter GPT-4o.
- βοΈ Setup
- β‘ Quick Start on AgentFlow Inference
- π₯ Quick Start on AgentFlow Flow-GRPO Training
- π― AgentFlow Benchmark
- π§© Use Your Own Model in AgentFlow
- π€ Core Contributors
- π Advisors
- π Acknowledgements
- π Contributing
- Python 3.11 (recommended)
bash setup.sh
source .venv/bin/activate
# (Optional) Install `parallel` for running benchmark experiments in parallel:
sudo apt-get update
sudo apt-get install parallelCopy the .env.template file from agentflow/.env.template and rename it to .env, then place it in the agentflow/ folder. Update the following variables with your own API keys:
OPENAI_API_KEY(for judging reasponse)GOOGLE_API_KEY(for Google Search tool)DASHSCOPE_API_KEY([optional] for calling Qwen-2.5-7B-Instruct as engine for agents and tools)TOGETHER_API_KEY([optional] alternative for calling Qwen-2.5-7B-Instruct as engine for agents and tools - recommended for international users)- More ways: serve Qwen2.5-7B-instruct model with vLLM (details refer to
serve_vllm_local.md).
Please check API Key Setup Guide for detailed instructions on how to obtain these keys.
cp agentflow/.env.template agentflow/.env
# Then edit agentflow/.env with your API keysBefore running inference or training, we recommend verifying that your API keys and environment are properly configured.
Run the following command to test all integrated tools:
cd agentflow/agentflow
bash ./tools/test_all_tools.shExample output:
Testing all tools...
β
base_generator passed
β
google_search passed
β
python_coder passed
β
wikipedia_search passed
...
β
All tests passed
Verify that your LLM engines (OpenAI, DashScope, Gemini, etc.) are correctly initialized and responding:
python agentflow/scripts/test_llm_engine.pyExample output:
π Starting fault-tolerant test for 11 engines...
β
Passed: 4
β’ gpt-4o β ChatOpenAI
β’ dashscope-qwen2.5-3b-instruct β ChatDashScope
β’ gemini-1.5-flash β ChatGemini
β’ deepseek-chat β ChatDeepseek
...
π All engines initialized successfully!
AgentFlow provides a modular agentic system with four specialized modules (planner, executor, verifier, generator) that coordinate through evolving memory and a toolkit over multiple turns to solve complex reasoning tasks.
To quickly experience the system in action, run the command below (donβt forget to set up your API key):
python quick_start.pyExample output of python quick_start.py:
==> Initializing agentflow...
==> Setting up tools...
==> π― Reasoning Steps from AgentFlow (Deep Thinking...)
==> π Step 0: Query Analysis
==> π― Step 1: Action Prediction (Google_Search_Tool)
==> π οΈ Step 1: Command Execution (Google_Search_Tool)
...
**Answer:** The capital of France is Paris.
==> β
Query Solved!
**Process Summary:**
1. **Query Analysis:** Identified as a factual question about the capital of France.
2. **Tool Selection:** Used Google Search for accurate information.
3. **Execution:** Confirmed Paris as the capital.
4. **Verification:** Cross-referenced sources for reliability.
**Answer:** The capital of France is Paris.
For effective planning and tool use, the framework directly optimizes the planner agent within the system in an online fashion using Flow-GRPO. Below is a quick start for training.
Before diving in, we recommend verifying that AgentFlow's tools, LLM engines, and network configuration are properly set up. See test_env.md for detailed testing instructions.
We mix two datasets for training: NQ (Natural Questions) for agentic search and DeepMath-103K for mathematical reasoning.
# train data
python data/get_train_data.py
# validation data
python data/aime24_data.pyAfter that, data dir should be:
data/
βββ train/
β βββ combined_train.parquet (182,190 samples)
βββ val/
β βββ aime24.parquet (30 samples)
βββ aime24_data.py
βββ get_train_data.py
Start agentflow training using Flow-GRPO with tmux:
# Create tmux session and start agentflow service (Window 0)
tmux new-session -s agentflow
bash train/serve_with_logs.sh
# Create new window (Ctrl+B then C) and start training (Window 1)
bash train/train_with_logs.shConfiguration:
All training hyperparameters are in train/config.yaml (model settings, tools, RL parameters, resources, etc.)
Logging: We provide a comprehensive logging to monitor training. See logs.md for more details.
Serve the trained planner model with VLLM (here we deploy our 7B Flow-GRPO planner model):
bash scripts/serve_vllm.shRun inference on specific benchmark tasks:
cd test
# Run Bamboogle benchmark
bash bamboogle/run.shAfter running, each task folder (e.g., test/bamboogle/) will contain:
data/: Contains the evaluation dataset (e.g.,data.json).logs/: Contains detailed execution logs for each problem index (organized by model label).results/: Contains the model's generated answers (output_i.json) and final evaluation scores (finalscore_*.log).
You can find more benchmarking details in benchmark.md.
AgentFlow supports different LLM engines for each agent module. See llm_engine.md for supported models and factory.py for the corresponding model_string configuration:
Planner Agent:
- Modify the
llm_engine_nameparameter in the correspondingrun.shscript (e.g.,test/bamboogle/run.sh)
Other Agents (Executor, Verifier, Generator):
- By default, these agents use a fixed LLM engine (Qwen-2.5-7B-Instruct via DashScope)
- To use your own model, modify
self.llm_engine_fixedinagentflow/agentflow/models/planner.py:19:
self.llm_engine_fixed = create_llm_engine(model_string="your-engine", is_multimodal=False, temperature=temperature)and
- Modify the
llm_engine_nameparameter in the Executor instantiation fromagentflow/agentflow/solver.py:232:
# Instantiate Executor
executor = Executor(
# llm_engine_name=llm_engine_name,
llm_engine_name="dashscope",
root_cache_dir=root_cache_dir,
verbose=verbose,
# base_url=base_url,
temperature=temperature
)- For detailed information on supported engines and
model_stringformats, seellm_engine.md
AgentFlow (Qwen-2.5-7B-Instruct Backbone) outperforms top baselines on 10 benchmarks:
- +14.9% on search
- +14.0% on agentic reasoning
- +14.5% on math
- +4.1% on science
π‘ Even surpasses larger proprietary models like GPT-4o (~200B).
- Improved planning and decision-making
- Enhanced tool-calling reliability
- Positive scaling trends with model size & reasoning turns
Explore more in our paper or project page.
Zhuofeng Li |
Haoxiang Zhang |
Pan Lu |
James Zou |
Yejin Choi |
Yu Zhang |
We thank the following open-source projects:
- verl for the excellent RL framework design.
- vLLM for fast LLM inference support.
- Verl-Tool and agent-lightning for their early-stage exploration in agentic RL Training.
We thank Lambda for GPU support!
We are truly looking forward to open-source contributions to AgentFlow! If youβre interested in contributing, collaborating, or reporting issues, please feel free to open an issue or submit a pull request (PR). You can also reach us at zhuofengli12345@gmail.com, isaacpfino@gmail.com, lupantech@gmail.com or join our Slack community: AgentFlow.
We are also looking forward to your feedback and suggestions!
@article{li2025flow,
title={In-the-Flow Agentic System Optimization for Effective Planning and Tool Use},
author={Li, Zhuofeng and Zhang, Haoxiang and Han, Seungju and Liu, Sheng and Xie, Jianwen and Zhang, Yu and Choi, Yejin and Zou, James and Lu, Pan},
journal={arXiv preprint arXiv:2510.05592},
year={2025}
}



