Open-source LLM router with Thompson Sampling and energy-aware routing

(github.com)

1 points | by bee003 4 hours ago

1 comments

bee003 4 hours ago
I built an LLM routing engine for my startup and just open-sourced it. MIT licensed, pip install, no external dependencies beyond pydantic.
It decides which LLM to call for each request — optimizing for cost, latency, and quality simultaneously.
What's in the box (11K lines of Python):
- Thompson Sampling for self-learning model selection (learns from outcomes, no labels needed) - Implementation of Berkeley's ARBITRAGE paper for advantage-aware model switching - Energy Oracle that estimates Joules, Watt-hours, and CO2 per inference request - Semantic cache with embedding similarity (50-90% savings on repeated queries) - Context compression (system dedup, whitespace normalization, old-turn summarization) - Provider health tracking with circuit breakers - Shadow mode with LLM-as-judge quality comparison - Pluggable storage (memory/SQLite/Postgres)
The core insight: ~90% of prompts don't need a frontier model. The hard part is knowing which 90%. Thompson Sampling figures this out automatically from request outcomes.
I was paying $2K+/month routing everything through GPT and Claude. After building this, the same traffic costs ~$400 with no measurable quality drop on simple tasks.
The competitive landscape (OpenRouter, Martian, Unify) is closed-source. I couldn't find an open-source router that actually learns, so I built one.
Limitations I'll be honest about: it's a component library today, not a drop-in proxy. You wire it into your stack. A high-level router.chat() wrapper is coming.
https://github.com/beee003/astrai-router
Happy to answer questions about the routing algorithms, energy modeling, or Thompson Sampling implementation.