Floki b9485f5384 📊 auto-update: system snapshot 2026-06-22

2026-06-22 13:01:07 +02:00

6.4 KiB

Raw Permalink Blame History

LiteLLM Re-Implementierungsplan

Erstellt: 2026-06-22 Status: ⏳ Idee / Backlog

🎯 Ziel

LiteLLM als LLM Proxy / API Gateway wieder einfach nutzen — aber diesmal richtig konfiguriert und optional (Fallback auf direkte Ollama/OpenRouter-Calls).

📋 Aktueller Stand (Juni 2026)

❌ LiteLLM Container wurde entfernt
❌ Scanner, Memory (ChromaDB), Redis wurden entfernt
✅ Ollama läuft direkt auf localhost:11434
✅ OpenRouter Keys sind in .env hinterlegt
✅ YouTube-Watcher nutzt Ollama direkt

🔄 Warum LiteLLM wieder einführen?

Vorteil	Beschreibung
Einheitliche API	OpenAI-kompatible API für ALLE Anbieter
Multi-Provider Fallback	Automatischer Wechsel bei 429/Outage
Kosten-Tracking	Token-Nutzung pro Anbieter tracken
Rate Limiting	Zentrale Kontrolle
Model-Routing	Automatisches Routing basierend auf Availability

🏗️ Architektur (geplant)

┌─────────────────────────────────────────────────────────┐
│                    KPT-LABS System                       │
│                                                          │
│  Dashboard / Scripts / Bots                              │
│         │                                                │
│         ▼                                                │
│  ┌─────────────┐     ┌──────────────┐                   │
│  │  LiteLLM    │────▶│  OpenRouter  │ (Primary)         │
│  │  Proxy      │     └──────────────┘                   │
│  │  :4000      │     ┌──────────────┐                   │
│  │             │────▶│  Ollama      │ (Local, Free)     │
│  │             │     │  :11434      │                   │
│  │             │     └──────────────┘                   │
│  │             │     ┌──────────────┐                   │
│  │             │────▶│  NVIDIA API  │ (Backup)          │
│  └─────────────┘     └──────────────┘                   │
│                                                          │
│  Fallback-Kette:                                         │
│  OpenRouter PRIMARY → OpenRouter FALLBACK1 → Ollama     │
└─────────────────────────────────────────────────────────┘

📝 Konfiguration (config.yaml)

model_list:
  # === OpenRouter (Primary) ===
  - model_name: "openrouter/primary"
    litellm_params:
      model: "openrouter/anthropic/claude-sonnet-4"
      api_key: "${OPENROUTER_KEY_PRIMARY}"
  
  # === OpenRouter (Fallback 1) ===
  - model_name: "openrouter/fallback1"
    litellm_params:
      model: "openrouter/anthropic/claude-sonnet-4"
      api_key: "${OPENROUTER_KEY_FALLBACK1}"
  
  # === Ollama (Local, Free) ===
  - model_name: "ollama/llama3.1:8b"
    litellm_params:
      model: "ollama/llama3.1:8b"
      api_base: "http://localhost:11434"
  
  - model_name: "ollama/gemma4:12b"
    litellm_params:
      model: "ollama/gemma4:12b"
      api_base: "http://localhost:11434"

  # === NVIDIA (Backup) ===
  - model_name: "nvidia/backup"
    litellm_params:
      model: "nvidia/nemotron-3-ultra-550b"
      api_key: "${NVIDIA_API_KEY}"

# Router Settings
router_settings:
  routing_strategy: "simple-shuffle"
  fallbacks:
    - ["openrouter/primary"]: ["openrouter/fallback1"]
    - ["openrouter/fallback1"]: ["ollama/llama3.1:8b"]
    - ["ollama/llama3.1:8b"]: ["ollama/gemma4:12b"]

# General Settings
general_settings:
  master_key: "${LITELLM_MASTER_KEY}"
  database_url: "sqlite:///litellm.db"

🐳 Docker Compose (geplant)

services:
  litellm:
    image: ghcr.io/berriai/litellm:latest
    ports:
      - "4000:4000"
    volumes:
      - ./litellm/config.yaml:/app/config.yaml
    env_file: .env
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

🔧 Anpassungen im KPT-LABS Code

1. YouTube-Watcher (`yt_pipeline_v8.py`)

# Vorher (direkt Ollama):
OLLAMA_URL = "http://localhost:11434"

# Nachher (über LiteLLM):
LITELLM_URL = "http://localhost:4000"
MODEL = "ollama/llama3.1:8b"  # oder "openrouter/primary"

2. API Key Rotation (`api_key_monitor.py`)

# Vorher: Direkte OpenRouter-Calls
# Nachher: LiteLLM Health-Check + Key-Rotation
LITELLM_URL = "http://localhost:4000"

3. Dashboard API

# Vorher: Direkte Ollama-Calls
# Nachher: LiteLLM Proxy

📊 Kostenvergleich

Anbieter	Kosten	Geschwindigkeit	Verfügbarkeit
Ollama (lokal)	$0	~3-15s	24/7
OpenRouter Primary	~$1/Tag	~1-3s	Rate-Limited
OpenRouter Fallback1	~$1/Tag	~1-3s	Rate-Limited
NVIDIA	~$0.50/Tag	~2-5s	Rate-Limited

🚀 Implementierungsplan

Phase 1: LiteLLM Grundkonfiguration

litellm/config.yaml erstellen
Docker Compose Service hinzufügen
Health-Check testen
Model-Liste verifizieren

Phase 2: Fallback-Kette testen

OpenRouter 429 simulieren
Automatischer Wechsel zu Ollama testen
Latenz messen

Phase 3: KPT-LABS Code anpassen

YouTube-Watcher auf LiteLLM umstellen
API Key Rotation anpassen
Dashboard API anpassen

Phase 4: Monitoring

LiteLLM Dashboard aktivieren
Kosten-Tracking einrichten
Alerts bei 429/Outage

⚠️ Risiken

Single Point of Failure: Wenn LiteLLM down → kein LLM verfügbar
- Mitigation: Fallback auf direkte Ollama-Calls im Code
Komplexität: Mehr Moving Parts
- Mitigation: Einfache Config, gutes Monitoring
Kosten: LiteLLM selbst ist kostenlos, aber OpenRouter-Kosten bleiben

6.4 KiB Raw Permalink Blame History

LiteLLM Re-Implementierungsplan

🎯 Ziel

📋 Aktueller Stand (Juni 2026)

🔄 Warum LiteLLM wieder einführen?

🏗️ Architektur (geplant)

📝 Konfiguration (config.yaml)

🐳 Docker Compose (geplant)

🔧 Anpassungen im KPT-LABS Code

1. YouTube-Watcher (yt_pipeline_v8.py)

2. API Key Rotation (api_key_monitor.py)

3. Dashboard API

📊 Kostenvergleich

🚀 Implementierungsplan

Phase 1: LiteLLM Grundkonfiguration

Phase 2: Fallback-Kette testen

Phase 3: KPT-LABS Code anpassen

Phase 4: Monitoring

⚠️ Risiken

📚 Referenzen

6.4 KiB

Raw Permalink Blame History

1. YouTube-Watcher (`yt_pipeline_v8.py`)

2. API Key Rotation (`api_key_monitor.py`)