-
Notifications
You must be signed in to change notification settings - Fork 243
Description
The OpenThoughts-Agent project has been running RL training with SkyRL-train and Harbor for a while.
The integration of Harbor+SkyRL-train allows users to do RL training for terminal-use style tasks by just focusing on the data.
See the initial release: https://www.openthoughts.ai/blog/agent
For that project, all the code resided in a fork (so that we could make project-specific hot fixes): https://github.com/mlfoundations/SkyRL
This issue tracks the upstreaming of those changes to the main branch of SkyRL, which will be much more robust than what is currently there on the main branch.
-
[train] Make RayPPOTrainer.train to be async, unifying event loop #868
-
[train][OpenAI] Make engine_init_kwargs.chat_template config apply to vllm /chat/completions #890
-
[train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages #981
-
3/N: step-wise training
-
4/N: async RL + Harbor
-
5/N: Integrate with Harbor's QueueOrchestrator to let Harbor handle retries and concurrency limit: Queue orchestrator laude-institute/harbor#527
-
Integration test, harbor configs (ensure all the defaults still work)
-
Other small ones