Skip to content

traylinx/switchAILocal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

115 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
switchAILocal Logo

switchAILocal

One local endpoint. All your AI providers.

Quick Start β€’ Installation β€’ Setup Providers β€’ API Reference


What is switchAILocal?

switchAILocal is a unified API gateway that lets you use all your AI providers through a single OpenAI-compatible endpoint running on your machine.

Key Benefits

Feature Description
🎨 Modern Web UI Single-file React dashboard to configure providers, manage model routing, and adjust settings (226 KB, zero dependencies)
πŸ”‘ Use Your Subscriptions Connect Gemini CLI, Claude Code, Codex, Ollama, and moreβ€”no API keys needed
🎯 Single Endpoint Any OpenAI-compatible tool works with http://localhost:18080
πŸ“Ž CLI Attachments Pass files and folders directly to CLI providers via extra_body.cli
🧠 Superbrain Intelligence Autonomous self-healing: monitors executions, diagnoses failures with AI, auto-responds to prompts, restarts with corrective flags, and routes to fallback providers
βš–οΈ Load Balancing Round-robin across multiple accounts per provider
πŸ”„ Intelligent Failover Smart routing to alternatives based on capabilities and success rates
πŸ”’ Local-First Everything runs on your machine, your data never leaves

Supported Providers

CLI Tools (Use Your Paid Subscriptions)

Provider CLI Tool Prefix Status
Google Gemini gemini geminicli: βœ… Ready
Anthropic Claude claude claudecli: βœ… Ready
OpenAI Codex codex codex: βœ… Ready
Mistral Vibe vibe vibe: βœ… Ready
OpenCode opencode opencode: βœ… Ready

Local Models

Provider Prefix Status
Ollama ollama: βœ… Ready
LM Studio lmstudio: βœ… Ready

Cloud APIs

Provider Prefix Status
Traylinx switchAI switchai: βœ… Ready
Google AI Studio gemini: βœ… Ready
Anthropic API claude: βœ… Ready
OpenAI API openai: βœ… Ready
OpenRouter openai-compat: βœ… Ready

Quick Start

1. Clone & Start (The Easy Way)

We provide a unified Hub Script (ail.sh) to manage everything.

git clone https://github.com/traylinx/switchAILocal.git
cd switchAILocal

# Start locally (builds automatically)
./ail.sh start

# OR start with Docker (add --build to force rebuild)
./ail.sh start --docker --build

2. Connect Your Providers

Choose the authentication method that works best for you:

Option A: Local CLI Wrappers (Recommended - Zero Setup)

If you already have gemini, claude, or vibe CLI tools installed and authenticated, switchAILocal uses them automatically. No additional login required!

# Just use the CLI prefix - it works immediately
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "geminicli:gemini-2.5-pro", "messages": [...]}'
  • βœ… Zero configuration - Uses your existing CLI authentication
  • βœ… Works immediately - No --login needed
  • βœ… Supports: geminicli:, claudecli:, codex:, vibe:, opencode:

Option B: API Keys (Standard)

Add your AI Studio or Anthropic API keys to config.yaml:

gemini:
  api-key: "your-gemini-api-key"
claude:
  api-key: "your-claude-api-key"

Then use without the cli suffix: gemini:, claude:

Option C: OAuth Cloud Proxy (Advanced - Alternative to CLI)

Only needed if:

  • ❌ You don't have the CLI tools installed
  • ❌ You don't have API keys
  • βœ… You want switchAILocal to manage OAuth tokens directly
# Optional OAuth login (alternative to CLI wrappers)
./switchAILocal --login        # Google Gemini OAuth
./switchAILocal --claude-login # Anthropic Claude OAuth

⚠️ Note: This requires GEMINI_CLIENT_ID and GEMINI_CLIENT_SECRET environment variables. Most users should use Option A (CLI wrappers) instead.

πŸ“– See the Provider Guide for detailed setup instructions.

3. Check Status

./ail.sh status

The server runs on http://localhost:18080.

The server starts on http://localhost:18080.


Usage Examples

Basic Request (Auto-Routing)

When you omit the provider prefix, switchAILocal automatically routes to an available provider:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Explicit Provider Selection

Use the provider:model format to route to a specific provider:

# Force Gemini CLI
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "geminicli:gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Force Ollama
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "ollama:llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Force Claude CLI
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "claudecli:claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Force LM Studio
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "lmstudio:mistral-7b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

List Available Models

curl http://localhost:18080/v1/models \
  -H "Authorization: Bearer sk-test-123"

SDK Integration

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",  # Must match a key in config.yaml
)

# Recommended: Auto-routing (switchAILocal picks the best available provider)
completion = client.chat.completions.create(
    model="gemini-2.5-pro",  # No prefix = auto-route to any logged-in provider
    messages=[
        {"role": "user", "content": "What is the meaning of life?"}
    ]
)
print(completion.choices[0].message.content)

# Streaming example
stream = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

# Optional: Explicit provider selection (use prefix only when needed)
completion = client.chat.completions.create(
    model="ollama:llama3.2",  # Force Ollama provider
    messages=[{"role": "user", "content": "Hello!"}]
)

JavaScript/Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:18080/v1',
  apiKey: 'sk-test-123', // Must match a key in config.yaml
});

async function main() {
  // Auto-routing
  const completion = await client.chat.completions.create({
    model: 'gemini-2.5-pro',
    messages: [
      { role: 'user', content: 'What is the meaning of life?' }
    ],
  });

  console.log(completion.choices[0].message.content);

  // Explicit provider selection
  const ollamaResponse = await client.chat.completions.create({
    model: 'ollama:llama3.2',  // Force Ollama
    messages: [
      { role: 'user', content: 'Hello!' }
    ],
  });
}

main();

Streaming Example (Python)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
)

stream = client.chat.completions.create(
    model="geminicli:gemini-2.5-pro",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Configuration

All settings are in config.yaml. Copy the example to get started:

cp config.example.yaml config.yaml

Key configuration options:

# Server port (default: 18080)
port: 18080

# Enable Ollama integration
ollama:
  enabled: true
  base-url: "http://localhost:11434"

# Enable LM Studio
lmstudio:
  enabled: true
  base-url: "http://localhost:1234/v1"

# Enable LUA plugins for request/response modification
plugin:
  enabled: true
  plugin-dir: "./plugins"

πŸ“– See Configuration Guide for all options.


Cortex Router: Intelligent Model Selection

The Cortex Router plugin provides intelligent, multi-tier routing that automatically selects the optimal model based on request content.

Quick Start

Enable intelligent routing in config.yaml:

plugin:
  enabled: true
  enabled-plugins:
    - "cortex-router"

intelligence:
  enabled: true
  router-model: "ollama:qwen:0.5b"  # Fast classification model
  matrix:
    coding: "switchai-chat"
    reasoning: "switchai-reasoner"
    fast: "switchai-fast"
    secure: "ollama:llama3.2"  # Local model for sensitive data

How It Works

When you use model="auto" or model="cortex", the router analyzes your request through multiple tiers:

  1. Reflex Tier (<1ms): Pattern matching for obvious cases (code blocks β†’ coding model, PII β†’ secure model)
  2. Semantic Tier (<20ms): Embedding-based intent matching (requires Phase 2)
  3. Cognitive Tier (200-500ms): LLM-based classification with confidence scoring
# Automatic intelligent routing
completion = client.chat.completions.create(
    model="auto",  # Let Cortex Router decide
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
# β†’ Routes to coding model automatically

Phase 2 Features (Optional)

Enable advanced features for even smarter routing:

intelligence:
  enabled: true
  
  # Semantic matching (faster than LLM classification)
  embedding:
    enabled: true
  semantic-tier:
    enabled: true
  
  # Skill-based prompt augmentation
  skill-matching:
    enabled: true
  
  # Quality-based model cascading
  cascade:
    enabled: true

21 Pre-built Skills including:

  • Language experts (Go, Python, TypeScript)
  • Infrastructure (Docker, Kubernetes, DevOps)
  • Security, Testing, Debugging
  • Frontend, Vision, and more

πŸ“– See Cortex Router Phase 2 Guide for full documentation.


Documentation

For Users

Guide Description
Installation Getting started guide
Configuration All configuration options
Providers Setting up AI providers
API Reference REST API documentation
Intelligent Systems Memory, Heartbeat, Steering, and Hooks
Advanced Features Payload overrides, failover, and more
State Box Secure state management & configuration
Management Dashboard Modern web UI for provider setup, model routing & settings

Build from Source

# Build the main server
go build -o switchAILocal ./cmd/server

# Build the Management UI (optional)
./ail_ui.sh

For Developers

Guide Description
SDK Usage Embed switchAILocal in your Go apps
LUA Plugins Custom request/response hooks
SDK Advanced Create custom providers

Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes
  4. Push and open a Pull Request

License

MIT License - see LICENSE for details.


Maintained by Sebastian Schkudlara

About

One local endpoint. All your AI providers. A unified API gateway that bridges Gemini, Claude, Ollama, and CLI tools into a single OpenAI-compatible server.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors