LiteLLM Proxy 多模型路由實戰

LiteLLM 是什麼？

LiteLLM 是一個 LLM API 代理，提供標準的 OpenAI 兼容接口，背後可以路由到 100+ LLM 提供商。

Client → LiteLLM Proxy (:4000) → DeepSeek / ModelStudio / Ollama / ...

安裝

pip install litellm[proxy]

配置

litellm_config.yaml：

model_list:
  - model_name: deepseek-flash
    litellm_params:
      model: deepseek/deepseek-chat
      api_key: sk-xxx
  - model_name: deepseek-pro
    litellm_params:
      model: deepseek/deepseek-reasoner
      api_key: sk-xxx
  - model_name: qwen-max
    litellm_params:
      model: dashscope/qwen-max
      api_key: sk-xxx
  - model_name: local-qwen
    litellm_params:
      model: ollama/qwen2.5:7b
      api_base: http://localhost:11434

router_settings:
  routing_strategy: "usage-based"  # 負載均衡
  allowed_fails: 3
  num_retries: 2
  fallbacks:
    - deepseek-flash: [qwen-max, local-qwen]

general_settings:
  master_key: sk-litellm-admin

啟動：

litellm --config litellm_config.yaml --port 4000

在 Claude Code 中使用

{
  "apiKey": "sk-litellm-admin",
  "baseURL": "http://localhost:4000/v1",
  "model": "deepseek-flash"
}

成本追蹤

LiteLLM 自動記錄每次調用的 token 使用和成本。訪問 http://localhost:4000/ui 查看 Dashboard。

LiteLLM 是什麼？

安裝

配置

在 Claude Code 中使用

成本追蹤

推薦閱讀

相關文章

Claude Code 深度使用指南

DeepSeek V4 Flash / V4 Pro API 最佳實踐與效能對比

其他推理框架對比：llama.cpp、MLX、ExLlamaV2 完整指南

技能資訊