News
Newest
Ask
Show
Jobs
Open on GitHub
Cutting LLM Batch Inference Time in Half: Dynamic Prefix Bucketing at Scale
(daft.ai)
5 points | by
ykev
3 hours ago
1 comments
sammysidhu
3 hours ago
Part of the Daft team here! Happy to answer any questions
1 comments