Add Tinker API server for supervised fine-tuning with SkyRL backend #986

tyler-griggs · 2026-01-29T01:39:50Z

Summary

This PR adds the Tinker HTTP API server that enables external training orchestration using the SkyRL-Train backend. This allows tools like tinker-cookbook to run supervised fine-tuning (SFT) by making HTTP calls to control training, rather than embedding training logic directly.

Key capabilities:

HTTP API for forward_backward() and optim_step() operations
Support for LoRA training with dynamic model creation
External control of mini-batching and training loops
Per-sequence loss outputs for fine-grained control

Changes

New directories:

skyrl_train/tinker/ - Tinker API server and engine
- api.py - FastAPI server with training endpoints
- engine.py - Background process handling training requests
- backends/skyrl_train.py - SkyRL backend implementation
- types.py - Type definitions for API contracts
- loss_fns.py - Loss function implementations (JAX made optional)
skyrl_train/tx_utils/ - Shared utilities
- generator.py, log.py, models.py, etc.

Key modifications:

Updated all imports from tx.* to skyrl_train.*
Made JAX imports optional (only needed for JAX backend, not SkyRL)
Removed JAX backend references (only SkyRL backend supported)
Fixed engine subprocess path to use skyrl_train.tinker.engine

Architecture

tinker-cookbook (sl_loop.py)
  └─ HTTP requests to Tinker API
       └─ skyrl-train/skyrl_train/tinker/api.py (FastAPI server)
            └─ skyrl-train/skyrl_train/tinker/engine.py (background process)
                 └─ skyrl-train/skyrl_train/tinker/backends/skyrl_train.py
                      └─ WorkerDispatch → Ray actors → PolicyWorker

Usage

Start the Tinker API server:

cd ~/SkyRL/skyrl-train
uv run --extra vllm python -m skyrl_train.tinker.api \
    --base-model Qwen/Qwen3-0.6B \
    --backend skyrl_train \
    --port 8001

Run supervised fine-tuning from tinker-cookbook:

cd ~/tinker-cookbook
TINKER_API_KEY=test uv run python -m tinker_cookbook.recipes.sl_loop \
    base_url="http://localhost:8001" \
    model_name="Qwen/Qwen3-0.6B" \
    batch_size=4 \
    lora_rank=8

Test Results

Successfully tested with sl_loop.py from tinker-cookbook:

Training runs at ~0.5-0.9s per step
Loss decreases as expected (3.10 → 2.5-3.7 range)
Forward/backward takes ~0.33-0.36s per mini-batch
Gradient norms in expected range (678-1750)

Background

This integration was needed to enable external training orchestration for SFT workloads. The original Tinker code lived in skyrl-tx but required Flash Attention which had environment issues. Copying into skyrl-train allows it to use the working Flash Attention installation and existing SkyRL infrastructure.

Next Steps

Add tests for Tinker API endpoints
Add documentation for API usage
Consider re-enabling checkpointing support in sl_loop
Evaluate performance optimizations (ray.put() for batches)

🤖 Generated with Claude Code

This enables SkyRLTrainBackend to use AdamParams.learning_rate from Tinker's optim_step() requests, allowing external learning rate schedules. Changes: - Add set_lr() to PolicyWorkerBase and CriticWorkerBase - Add set_lr() dispatch method to WorkerDispatch - Update SkyRLTrainBackend.optim_step() to apply learning rate before stepping - Add GPU tests for set_lr functionality Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Tinker manages learning rate externally via set_lr(), so we disable SkyRL's internal scheduler by setting it to "constant" with no warmup. This prevents conflicts between Tinker's LR schedule and SkyRL's scheduler. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When set_lr() updates the optimizer's param_groups directly, get_lr() needs to read from the same source. Previously get_lr() read from scheduler.get_last_lr() which would return stale values after set_lr(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This adds the Tinker HTTP API server that enables external training orchestration (e.g., from tinker-cookbook sl_loop.py) using the SkyRL-Train backend. Key changes: - Added skyrl_train/tinker/ with API server, engine, and SkyRL backend - Added skyrl_train/tx_utils/ for shared utilities - Made JAX optional in loss_fns.py (only needed for JAX backend) - Updated all imports from tx.* to skyrl_train.* - Fixed engine subprocess path to use skyrl_train module This enables running supervised fine-tuning with external control: # Start server: uv run --extra vllm python -m skyrl_train.tinker.api \ --base-model Qwen/Qwen3-0.6B --backend skyrl_train --port 8001 # Run training from tinker-cookbook: TINKER_API_KEY=test uv run python -m tinker_cookbook.recipes.sl_loop \ base_url="http://localhost:8001" model_name="Qwen/Qwen3-0.6B" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

tyler-griggs and others added 6 commits January 28, 2026 00:29

Format set_lr method

276626d

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix config path: optim -> optimizer_config

8ee69c7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview January 29, 2026 01:40 View deployment

tyler-griggs changed the base branch from main to vllm-integration January 29, 2026 01:52

tyler-griggs changed the base branch from vllm-integration to main January 29, 2026 01:52

tyler-griggs mentioned this pull request Jan 29, 2026

Add checkpointing support for Tinker SkyRL backend #990

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tinker API server for supervised fine-tuning with SkyRL backend #986

Add Tinker API server for supervised fine-tuning with SkyRL backend #986

Uh oh!

tyler-griggs commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Tinker API server for supervised fine-tuning with SkyRL backend #986

Are you sure you want to change the base?

Add Tinker API server for supervised fine-tuning with SkyRL backend #986

Uh oh!

Conversation

tyler-griggs commented Jan 29, 2026

Summary

Changes

New directories:

Key modifications:

Architecture

Usage

Start the Tinker API server:

Run supervised fine-tuning from tinker-cookbook:

Test Results

Background

Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants