Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale

(daft.ai)

5 points | by ykev 3 hours ago

1 comments

sammysidhu 3 hours ago
Part of the Daft team here! Happy to answer any questions