One local endpoint. All your AI providers.
Quick Start β’ Installation β’ Setup Providers β’ API Reference
switchAILocal is a unified API gateway that lets you use all your AI providers through a single OpenAI-compatible endpoint running on your machine.
| Feature | Description |
|---|---|
| π¨ Modern Web UI | Single-file React dashboard to configure providers, manage model routing, and adjust settings (226 KB, zero dependencies) |
| π Use Your Subscriptions | Connect Gemini CLI, Claude Code, Codex, Ollama, and moreβno API keys needed |
| π― Single Endpoint | Any OpenAI-compatible tool works with http://localhost:18080 |
| π CLI Attachments | Pass files and folders directly to CLI providers via extra_body.cli |
| π§ Superbrain Intelligence | Autonomous self-healing: monitors executions, diagnoses failures with AI, auto-responds to prompts, restarts with corrective flags, and routes to fallback providers |
| βοΈ Load Balancing | Round-robin across multiple accounts per provider |
| π Intelligent Failover | Smart routing to alternatives based on capabilities and success rates |
| π Local-First | Everything runs on your machine, your data never leaves |
| Provider | CLI Tool | Prefix | Status |
|---|---|---|---|
| Google Gemini | gemini |
geminicli: |
β Ready |
| Anthropic Claude | claude |
claudecli: |
β Ready |
| OpenAI Codex | codex |
codex: |
β Ready |
| Mistral Vibe | vibe |
vibe: |
β Ready |
| OpenCode | opencode |
opencode: |
β Ready |
| Provider | Prefix | Status |
|---|---|---|
| Ollama | ollama: |
β Ready |
| LM Studio | lmstudio: |
β Ready |
| Provider | Prefix | Status |
|---|---|---|
| Traylinx switchAI | switchai: |
β Ready |
| Google AI Studio | gemini: |
β Ready |
| Anthropic API | claude: |
β Ready |
| OpenAI API | openai: |
β Ready |
| OpenRouter | openai-compat: |
β Ready |
We provide a unified Hub Script (ail.sh) to manage everything.
git clone https://github.com/traylinx/switchAILocal.git
cd switchAILocal
# Start locally (builds automatically)
./ail.sh start
# OR start with Docker (add --build to force rebuild)
./ail.sh start --docker --buildChoose the authentication method that works best for you:
If you already have gemini, claude, or vibe CLI tools installed and authenticated, switchAILocal uses them automatically. No additional login required!
# Just use the CLI prefix - it works immediately
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "geminicli:gemini-2.5-pro", "messages": [...]}'- β Zero configuration - Uses your existing CLI authentication
- β
Works immediately - No
--loginneeded - β
Supports:
geminicli:,claudecli:,codex:,vibe:,opencode:
Add your AI Studio or Anthropic API keys to config.yaml:
gemini:
api-key: "your-gemini-api-key"
claude:
api-key: "your-claude-api-key"Then use without the cli suffix: gemini:, claude:
Only needed if:
- β You don't have the CLI tools installed
- β You don't have API keys
- β You want switchAILocal to manage OAuth tokens directly
# Optional OAuth login (alternative to CLI wrappers)
./switchAILocal --login # Google Gemini OAuth
./switchAILocal --claude-login # Anthropic Claude OAuthGEMINI_CLIENT_ID and GEMINI_CLIENT_SECRET environment variables. Most users should use Option A (CLI wrappers) instead.
π See the Provider Guide for detailed setup instructions.
./ail.sh statusThe server runs on http://localhost:18080.
The server starts on http://localhost:18080.
When you omit the provider prefix, switchAILocal automatically routes to an available provider:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'Use the provider:model format to route to a specific provider:
# Force Gemini CLI
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "geminicli:gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Force Ollama
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "ollama:llama3.2",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Force Claude CLI
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "claudecli:claude-sonnet-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Force LM Studio
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "lmstudio:mistral-7b",
"messages": [{"role": "user", "content": "Hello!"}]
}'curl http://localhost:18080/v1/models \
-H "Authorization: Bearer sk-test-123"from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123", # Must match a key in config.yaml
)
# Recommended: Auto-routing (switchAILocal picks the best available provider)
completion = client.chat.completions.create(
model="gemini-2.5-pro", # No prefix = auto-route to any logged-in provider
messages=[
{"role": "user", "content": "What is the meaning of life?"}
]
)
print(completion.choices[0].message.content)
# Streaming example
stream = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
# Optional: Explicit provider selection (use prefix only when needed)
completion = client.chat.completions.create(
model="ollama:llama3.2", # Force Ollama provider
messages=[{"role": "user", "content": "Hello!"}]
)import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:18080/v1',
apiKey: 'sk-test-123', // Must match a key in config.yaml
});
async function main() {
// Auto-routing
const completion = await client.chat.completions.create({
model: 'gemini-2.5-pro',
messages: [
{ role: 'user', content: 'What is the meaning of life?' }
],
});
console.log(completion.choices[0].message.content);
// Explicit provider selection
const ollamaResponse = await client.chat.completions.create({
model: 'ollama:llama3.2', // Force Ollama
messages: [
{ role: 'user', content: 'Hello!' }
],
});
}
main();from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123",
)
stream = client.chat.completions.create(
model="geminicli:gemini-2.5-pro",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)All settings are in config.yaml. Copy the example to get started:
cp config.example.yaml config.yamlKey configuration options:
# Server port (default: 18080)
port: 18080
# Enable Ollama integration
ollama:
enabled: true
base-url: "http://localhost:11434"
# Enable LM Studio
lmstudio:
enabled: true
base-url: "http://localhost:1234/v1"
# Enable LUA plugins for request/response modification
plugin:
enabled: true
plugin-dir: "./plugins"π See Configuration Guide for all options.
The Cortex Router plugin provides intelligent, multi-tier routing that automatically selects the optimal model based on request content.
Enable intelligent routing in config.yaml:
plugin:
enabled: true
enabled-plugins:
- "cortex-router"
intelligence:
enabled: true
router-model: "ollama:qwen:0.5b" # Fast classification model
matrix:
coding: "switchai-chat"
reasoning: "switchai-reasoner"
fast: "switchai-fast"
secure: "ollama:llama3.2" # Local model for sensitive dataWhen you use model="auto" or model="cortex", the router analyzes your request through multiple tiers:
- Reflex Tier (<1ms): Pattern matching for obvious cases (code blocks β coding model, PII β secure model)
- Semantic Tier (<20ms): Embedding-based intent matching (requires Phase 2)
- Cognitive Tier (200-500ms): LLM-based classification with confidence scoring
# Automatic intelligent routing
completion = client.chat.completions.create(
model="auto", # Let Cortex Router decide
messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
# β Routes to coding model automaticallyEnable advanced features for even smarter routing:
intelligence:
enabled: true
# Semantic matching (faster than LLM classification)
embedding:
enabled: true
semantic-tier:
enabled: true
# Skill-based prompt augmentation
skill-matching:
enabled: true
# Quality-based model cascading
cascade:
enabled: true21 Pre-built Skills including:
- Language experts (Go, Python, TypeScript)
- Infrastructure (Docker, Kubernetes, DevOps)
- Security, Testing, Debugging
- Frontend, Vision, and more
π See Cortex Router Phase 2 Guide for full documentation.
| Guide | Description |
|---|---|
| Installation | Getting started guide |
| Configuration | All configuration options |
| Providers | Setting up AI providers |
| API Reference | REST API documentation |
| Intelligent Systems | Memory, Heartbeat, Steering, and Hooks |
| Advanced Features | Payload overrides, failover, and more |
| State Box | Secure state management & configuration |
| Management Dashboard | Modern web UI for provider setup, model routing & settings |
# Build the main server
go build -o switchAILocal ./cmd/server
# Build the Management UI (optional)
./ail_ui.sh| Guide | Description |
|---|---|
| SDK Usage | Embed switchAILocal in your Go apps |
| LUA Plugins | Custom request/response hooks |
| SDK Advanced | Create custom providers |
Contributions are welcome!
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes
- Push and open a Pull Request
MIT License - see LICENSE for details.