kpt-labs-vault/05-Dashboard/Ideen/LiteLLM-Re-Implementation.md

# LiteLLM Re-Implementierungsplan

> Erstellt: 2026-06-22
> Status: ⏳ Idee / Backlog

## 🎯 Ziel

LiteLLM als **LLM Proxy / API Gateway** wieder einfach nutzen — aber diesmal **richtig konfiguriert** und **optional** (Fallback auf direkte Ollama/OpenRouter-Calls).

## 📋 Aktueller Stand (Juni 2026)

- ❌ LiteLLM Container wurde entfernt
- ❌ Scanner, Memory (ChromaDB), Redis wurden entfernt
- ✅ Ollama läuft direkt auf localhost:11434
- ✅ OpenRouter Keys sind in .env hinterlegt
- ✅ YouTube-Watcher nutzt Ollama direkt

## 🔄 Warum LiteLLM wieder einführen?

| Vorteil | Beschreibung |
|---------|--------------|
| **Einheitliche API** | OpenAI-kompatible API für ALLE Anbieter |
| **Multi-Provider Fallback** | Automatischer Wechsel bei 429/Outage |
| **Kosten-Tracking** | Token-Nutzung pro Anbieter tracken |
| **Rate Limiting** | Zentrale Kontrolle |
| **Model-Routing** | Automatisches Routing basierend auf Availability |

## 🏗️ Architektur (geplant)

```
┌─────────────────────────────────────────────────────────┐
│                    KPT-LABS System                       │
│                                                          │
│  Dashboard / Scripts / Bots                              │
│         │                                                │
│         ▼                                                │
│  ┌─────────────┐     ┌──────────────┐                   │
│  │  LiteLLM    │────▶│  OpenRouter  │ (Primary)         │
│  │  Proxy      │     └──────────────┘                   │
│  │  :4000      │     ┌──────────────┐                   │
│  │             │────▶│  Ollama      │ (Local, Free)     │
│  │             │     │  :11434      │                   │
│  │             │     └──────────────┘                   │
│  │             │     ┌──────────────┐                   │
│  │             │────▶│  NVIDIA API  │ (Backup)          │
│  └─────────────┘     └──────────────┘                   │
│                                                          │
│  Fallback-Kette:                                         │
│  OpenRouter PRIMARY → OpenRouter FALLBACK1 → Ollama     │
└─────────────────────────────────────────────────────────┘
```

## 📝 Konfiguration (config.yaml)

```yaml
model_list:
  # === OpenRouter (Primary) ===
  - model_name: "openrouter/primary"
    litellm_params:
      model: "openrouter/anthropic/claude-sonnet-4"
      api_key: "${OPENROUTER_KEY_PRIMARY}"

  # === OpenRouter (Fallback 1) ===
  - model_name: "openrouter/fallback1"
    litellm_params:
      model: "openrouter/anthropic/claude-sonnet-4"
      api_key: "${OPENROUTER_KEY_FALLBACK1}"

  # === Ollama (Local, Free) ===
  - model_name: "ollama/llama3.1:8b"
    litellm_params:
      model: "ollama/llama3.1:8b"
      api_base: "http://localhost:11434"

  - model_name: "ollama/gemma4:12b"
    litellm_params:
      model: "ollama/gemma4:12b"
      api_base: "http://localhost:11434"

  # === NVIDIA (Backup) ===
  - model_name: "nvidia/backup"
    litellm_params:
      model: "nvidia/nemotron-3-ultra-550b"
      api_key: "${NVIDIA_API_KEY}"

# Router Settings
router_settings:
  routing_strategy: "simple-shuffle"
  fallbacks:
    - ["openrouter/primary"]: ["openrouter/fallback1"]
    - ["openrouter/fallback1"]: ["ollama/llama3.1:8b"]
    - ["ollama/llama3.1:8b"]: ["ollama/gemma4:12b"]

# General Settings
general_settings:
  master_key: "${LITELLM_MASTER_KEY}"
  database_url: "sqlite:///litellm.db"
```

## 🐳 Docker Compose (geplant)

```yaml
services:
  litellm:
    image: ghcr.io/berriai/litellm:latest
    ports:
      - "4000:4000"
    volumes:
      - ./litellm/config.yaml:/app/config.yaml
    env_file: .env
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
```

## 🔧 Anpassungen im KPT-LABS Code

### 1. YouTube-Watcher (`yt_pipeline_v8.py`)
```python
# Vorher (direkt Ollama):
OLLAMA_URL = "http://localhost:11434"

# Nachher (über LiteLLM):
LITELLM_URL = "http://localhost:4000"
MODEL = "ollama/llama3.1:8b"  # oder "openrouter/primary"
```

### 2. API Key Rotation (`api_key_monitor.py`)
```python
# Vorher: Direkte OpenRouter-Calls
# Nachher: LiteLLM Health-Check + Key-Rotation
LITELLM_URL = "http://localhost:4000"
```

### 3. Dashboard API
```python
# Vorher: Direkte Ollama-Calls
# Nachher: LiteLLM Proxy
```

## 📊 Kostenvergleich

| Anbieter | Kosten | Geschwindigkeit | Verfügbarkeit |
|----------|--------|-----------------|---------------|
| Ollama (lokal) | $0 | ~3-15s | 24/7 |
| OpenRouter Primary | ~$1/Tag | ~1-3s | Rate-Limited |
| OpenRouter Fallback1 | ~$1/Tag | ~1-3s | Rate-Limited |
| NVIDIA | ~$0.50/Tag | ~2-5s | Rate-Limited |

## 🚀 Implementierungsplan

### Phase 1: LiteLLM Grundkonfiguration
- [ ] `litellm/config.yaml` erstellen
- [ ] Docker Compose Service hinzufügen
- [ ] Health-Check testen
- [ ] Model-Liste verifizieren

### Phase 2: Fallback-Kette testen
- [ ] OpenRouter 429 simulieren
- [ ] Automatischer Wechsel zu Ollama testen
- [ ] Latenz messen

### Phase 3: KPT-LABS Code anpassen
- [ ] YouTube-Watcher auf LiteLLM umstellen
- [ ] API Key Rotation anpassen
- [ ] Dashboard API anpassen

### Phase 4: Monitoring
- [ ] LiteLLM Dashboard aktivieren
- [ ] Kosten-Tracking einrichten
- [ ] Alerts bei 429/Outage

## ⚠️ Risiken

- **Single Point of Failure**: Wenn LiteLLM down → kein LLM verfügbar
  - **Mitigation**: Fallback auf direkte Ollama-Calls im Code
- **Komplexität**: Mehr Moving Parts
  - **Mitigation**: Einfache Config, gutes Monitoring
- **Kosten**: LiteLLM selbst ist kostenlos, aber OpenRouter-Kosten bleiben

## 📚 Referenzen

- [LiteLLM Docs](https://docs.litellm.ai/)
- [LiteLLM Docker](https://docs.litellm.ai/docs/proxy/docker)
- [OpenRouter API](https://openrouter.ai/docs/api-reference/overview)
- [Ollama API](https://github.com/ollama/ollama/blob/main/docs/api.md)