kpt-labs-vault/05-Dashboard/Ideen/LiteLLM-Integration.md

# LiteLLM Integration — Zukünftiger Plan

> **Status:** Zurückgestellt ( Juni 2026 )
> **Grund:** Aktuell wird alles direkt über Ollama (lokal) und OpenRouter (API) genutzt. LiteLLM als Proxy ist nicht notwendig solange wir nur 2-3 Provider haben.

## Was ist LiteLLM?

**LiteLLM** ist ein **OpenAI-kompatibler LLM Proxy / API Gateway**.

```
Dein Code → LiteLLM Proxy (:4000) → OpenRouter / Ollama / Anthropic / Google / ...
```

### Vorteile

| Feature | Beschreibung |
|---------|-------------|
| **Einheitliche API** | OpenAI-kompatible API für ALLE Anbieter |
| **Multi-Provider Fallback** | Automatischer Wechsel bei Ausfall |
| **Kosten-Tracking** | Token-Nutzung pro Anbieter tracken |
| **Rate Limiting** | Zentrale Kontrolle |
| **Load Balancing** | Requests auf mehrere Keys verteilen |

## Wann LiteLLM sinnvoll wird

- **5+ AI-Provider** gleichzeitig
- **Automatisches Failover** zwischen Anbietern nötig
- **Kosten-Tracking** pro Team/User
- **Rate Limiting** für verschiedene User-Gruppen
- **Model-Routing** (einfache Anfragen → billiges Modell, komplexe → teures)

## Konfiguration (für später)

### docker-compose.yml
```yaml
litellm:
  image: ghcr.io/berriai/litellm:latest
  ports:
    - "4000:4000"
  volumes:
    - ./litellm-config.yaml:/app/config.yaml
  environment:
    - OPENROUTER_API_KEY=${OPENROUTER_KEY_PRIMARY}
    - OPENROUTER_FALLBACK_KEY=${OPENROUTER_KEY_FALLBACK1}
    - OLLAMA_BASE_URL=http://ollama:11434
```

### litellm-config.yaml
```yaml:
model_list:
  - model_name: hermes-default
    litellm_params:
      model: openrouter/anthropic/claude-sonnet-4
      api_key: ${OPENROUTER_KEY_PRIMARY}
  - model_name: hermes-fast
    litellm_params:
      model: openrouter/google/gemini-2.0-flash-001:free
      api_key: ${OPENROUTER_KEY_FALLBACK1}
  - model_name: hermes-local
    litellm_params:
      model: ollama/llama3.1:8b
      api_base: http://ollama:11434

fallbacks:
  - hermes-default: [hermes-fast, hermes-local]
  - hermes-fast: [hermes-local]
```

### Nächste Schritte (wenn implementiert)

1. LiteLLM Container in docker-compose.yml
2. Config mit allen Providern
3. Scripts auf `localhost:4000/v1/chat/completions` umstellen
4. Fallback-Konfiguration testen
5. Kosten-Tracking Dashboard

## Aktueller Stand (Juni 2026)

- ❌ LiteLLM Container läuft OHNE Konfiguration
- ✅ Ollama direkt: `http://localhost:11434`
- ✅ OpenRouter direkt: `https://openrouter.ai/api/v1`
- ✅ API Key Rotation: PRIMARY → FALLBACK1 → Ollama

## Referenzen

- Docs: https://docs.litellm.ai/
- GitHub: https://github.com/BerriAI/litellm
- Docker: https://docs.litellm.ai/docs/proxy/docker