Skip to content

Conversation

@tyler-griggs
Copy link
Member

Summary

This PR adds the Tinker HTTP API server that enables external training orchestration using the SkyRL-Train backend. This allows tools like tinker-cookbook to run supervised fine-tuning (SFT) by making HTTP calls to control training, rather than embedding training logic directly.

Key capabilities:

  • HTTP API for forward_backward() and optim_step() operations
  • Support for LoRA training with dynamic model creation
  • External control of mini-batching and training loops
  • Per-sequence loss outputs for fine-grained control

Changes

New directories:

  • skyrl_train/tinker/ - Tinker API server and engine

    • api.py - FastAPI server with training endpoints
    • engine.py - Background process handling training requests
    • backends/skyrl_train.py - SkyRL backend implementation
    • types.py - Type definitions for API contracts
    • loss_fns.py - Loss function implementations (JAX made optional)
  • skyrl_train/tx_utils/ - Shared utilities

    • generator.py, log.py, models.py, etc.

Key modifications:

  • Updated all imports from tx.* to skyrl_train.*
  • Made JAX imports optional (only needed for JAX backend, not SkyRL)
  • Removed JAX backend references (only SkyRL backend supported)
  • Fixed engine subprocess path to use skyrl_train.tinker.engine

Architecture

tinker-cookbook (sl_loop.py)
  └─ HTTP requests to Tinker API
       └─ skyrl-train/skyrl_train/tinker/api.py (FastAPI server)
            └─ skyrl-train/skyrl_train/tinker/engine.py (background process)
                 └─ skyrl-train/skyrl_train/tinker/backends/skyrl_train.py
                      └─ WorkerDispatch → Ray actors → PolicyWorker

Usage

Start the Tinker API server:

cd ~/SkyRL/skyrl-train
uv run --extra vllm python -m skyrl_train.tinker.api \
    --base-model Qwen/Qwen3-0.6B \
    --backend skyrl_train \
    --port 8001

Run supervised fine-tuning from tinker-cookbook:

cd ~/tinker-cookbook
TINKER_API_KEY=test uv run python -m tinker_cookbook.recipes.sl_loop \
    base_url="http://localhost:8001" \
    model_name="Qwen/Qwen3-0.6B" \
    batch_size=4 \
    lora_rank=8

Test Results

Successfully tested with sl_loop.py from tinker-cookbook:

  • Training runs at ~0.5-0.9s per step
  • Loss decreases as expected (3.10 → 2.5-3.7 range)
  • Forward/backward takes ~0.33-0.36s per mini-batch
  • Gradient norms in expected range (678-1750)

Background

This integration was needed to enable external training orchestration for SFT workloads. The original Tinker code lived in skyrl-tx but required Flash Attention which had environment issues. Copying into skyrl-train allows it to use the working Flash Attention installation and existing SkyRL infrastructure.

Next Steps

  • Add tests for Tinker API endpoints
  • Add documentation for API usage
  • Consider re-enabling checkpointing support in sl_loop
  • Evaluate performance optimizations (ray.put() for batches)

🤖 Generated with Claude Code

tyler-griggs and others added 6 commits January 28, 2026 00:29
This enables SkyRLTrainBackend to use AdamParams.learning_rate from
Tinker's optim_step() requests, allowing external learning rate schedules.

Changes:
- Add set_lr() to PolicyWorkerBase and CriticWorkerBase
- Add set_lr() dispatch method to WorkerDispatch
- Update SkyRLTrainBackend.optim_step() to apply learning rate before stepping
- Add GPU tests for set_lr functionality

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tinker manages learning rate externally via set_lr(), so we disable
SkyRL's internal scheduler by setting it to "constant" with no warmup.
This prevents conflicts between Tinker's LR schedule and SkyRL's scheduler.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When set_lr() updates the optimizer's param_groups directly, get_lr()
needs to read from the same source. Previously get_lr() read from
scheduler.get_last_lr() which would return stale values after set_lr().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This adds the Tinker HTTP API server that enables external training
orchestration (e.g., from tinker-cookbook sl_loop.py) using the
SkyRL-Train backend.

Key changes:
- Added skyrl_train/tinker/ with API server, engine, and SkyRL backend
- Added skyrl_train/tx_utils/ for shared utilities
- Made JAX optional in loss_fns.py (only needed for JAX backend)
- Updated all imports from tx.* to skyrl_train.*
- Fixed engine subprocess path to use skyrl_train module

This enables running supervised fine-tuning with external control:
  # Start server:
  uv run --extra vllm python -m skyrl_train.tinker.api \
      --base-model Qwen/Qwen3-0.6B --backend skyrl_train --port 8001

  # Run training from tinker-cookbook:
  TINKER_API_KEY=test uv run python -m tinker_cookbook.recipes.sl_loop \
      base_url="http://localhost:8001" model_name="Qwen/Qwen3-0.6B"

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@tyler-griggs tyler-griggs changed the base branch from main to vllm-integration January 29, 2026 01:52
@tyler-griggs tyler-griggs changed the base branch from vllm-integration to main January 29, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants