fix: rope_scaling for vLLM and HuggingFace context extension #976

dzorlu · 2026-01-27T19:14:40Z

Summary

Fixes rope_scaling handling for both vLLM inference engines and HuggingFace FSDP training to enable YaRN context extension.

Problem

When using rope_scaling config (e.g., YaRN for context extension), training fails with:

vLLM: TypeError: AsyncEngineArgs.__init__() got an unexpected keyword argument 'rope_scaling'
HuggingFace: TypeError: Qwen3ForCausalLM.__init__() got an unexpected keyword argument 'rope_scaling'

Solution

vLLM Fix (`ray_wrapped_inference_engine.py`)

vLLM >= 0.8.3 removed direct rope_scaling parameter
Must pass via hf_overrides["rope_parameters"] instead
Convert OmegaConf DictConfig to regular dict to avoid struct mode errors
Reference: https://docs.vllm.ai/en/latest/examples/offline_inference/context_extension/

# Before (wrong)
rope_engine_kwargs["rope_scaling"] = rope_scaling

# After (correct)
hf_overrides["rope_parameters"] = dict(rope_scaling)
rope_engine_kwargs["hf_overrides"] = hf_overrides

HuggingFace Fix (`model_wrapper.py`)

HuggingFace models don't accept rope_scaling as a from_pretrained() kwarg
Must set on the model config object instead

# Before (wrong)
self.model = model_class.from_pretrained(..., rope_scaling=rope_scaling)

# After (correct)
model_config.rope_scaling = dict(rope_scaling)
self.model = model_class.from_pretrained(..., config=model_config)

Files Changed

skyrl_train/inference_engines/ray_wrapped_inference_engine.py
skyrl_train/model_wrapper.py

Test Plan

Run training with YaRN rope scaling config:

++trainer.rope_scaling.rope_type=yarn
++trainer.rope_scaling.factor=2.0
++trainer.rope_scaling.original_max_position_embeddings=32768

🤖 Generated with Claude Code

## vLLM Fix (ray_wrapped_inference_engine.py) - Use `hf_overrides.rope_parameters` instead of direct `rope_scaling` kwarg - vLLM >= 0.8.3 requires rope config via hf_overrides - Convert OmegaConf DictConfig to regular dict to avoid struct mode errors - Reference: https://docs.vllm.ai/en/latest/examples/offline_inference/context_extension/ ## HuggingFace Fix (model_wrapper.py) - Set `rope_scaling` on model config object instead of passing as kwarg - HuggingFace models don't accept rope_scaling as a from_pretrained() kwarg - Fixes: TypeError: Qwen3ForCausalLM.__init__() got an unexpected keyword argument 'rope_scaling' 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist

Code Review

The pull request effectively resolves the TypeError issues related to rope_scaling in both vLLM inference engines and HuggingFace FSDP training. The changes correctly adapt the parameter passing mechanism for rope_scaling and rope_theta to align with the updated requirements of vLLM (using hf_overrides["rope_parameters"]) and HuggingFace (setting directly on model_config). The implementation includes robust handling for OmegaConf DictConfig conversion, ensuring compatibility and preventing runtime errors. The overall solution is well-aligned with the problem description and provides a clear fix for context extension with YaRN.

paper.md: - Simplified overview (no infrastructure details) - Added per-environment breakdown tables for v0.1 and v0.2 - Removed Critical Issues section (moved to model_issues.md) - Added empty results tables for held-out environments - Reference model_issues.md for fixes model_issues.md: - Added v0.2.1 changelog entry for rope_scaling fix - Links to upstream PR NovaSky-AI#976 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: rope_scaling for vLLM and HuggingFace context extension #976

fix: rope_scaling for vLLM and HuggingFace context extension #976

Uh oh!

dzorlu commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: rope_scaling for vLLM and HuggingFace context extension #976

Are you sure you want to change the base?

fix: rope_scaling for vLLM and HuggingFace context extension #976

Uh oh!

Conversation

dzorlu commented Jan 27, 2026

Summary

Problem

Solution

vLLM Fix (ray_wrapped_inference_engine.py)

HuggingFace Fix (model_wrapper.py)

Files Changed

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vLLM Fix (`ray_wrapped_inference_engine.py`)

HuggingFace Fix (`model_wrapper.py`)