1 comments

  • bee003 4 hours ago
    I built an LLM routing engine for my startup and just open-sourced it. MIT licensed, pip install, no external dependencies beyond pydantic.

    It decides which LLM to call for each request — optimizing for cost, latency, and quality simultaneously.

    What's in the box (11K lines of Python):

    - Thompson Sampling for self-learning model selection (learns from outcomes, no labels needed) - Implementation of Berkeley's ARBITRAGE paper for advantage-aware model switching - Energy Oracle that estimates Joules, Watt-hours, and CO2 per inference request - Semantic cache with embedding similarity (50-90% savings on repeated queries) - Context compression (system dedup, whitespace normalization, old-turn summarization) - Provider health tracking with circuit breakers - Shadow mode with LLM-as-judge quality comparison - Pluggable storage (memory/SQLite/Postgres)

    The core insight: ~90% of prompts don't need a frontier model. The hard part is knowing which 90%. Thompson Sampling figures this out automatically from request outcomes.

    I was paying $2K+/month routing everything through GPT and Claude. After building this, the same traffic costs ~$400 with no measurable quality drop on simple tasks.

    The competitive landscape (OpenRouter, Martian, Unify) is closed-source. I couldn't find an open-source router that actually learns, so I built one.

    Limitations I'll be honest about: it's a component library today, not a drop-in proxy. You wire it into your stack. A high-level router.chat() wrapper is coming.

    https://github.com/beee003/astrai-router

    Happy to answer questions about the routing algorithms, energy modeling, or Thompson Sampling implementation.