7 comments

  • throwa356262 1 day ago
    Better performance than TQ and better quality than FP16?

    Am I reading this right??

    • qeternity 1 day ago
      It's not better quality: 59.3% vs 59.4% fp16 on AIME 25
      • sheepscreek 1 day ago
        0.1% is within margin of error. Depending on the performance boost, it might be worthwhile taking a minuscule quality hit.
    • electroglyph 1 day ago
      any divergence (even if the benchmark is better) from full precision is error
      • 7e 23 hours ago
        Just pretend that it is the next step update when training. You didn’t train your model to step=inf, I hope?
    • thefox96 1 day ago
      Faster than Fp16, not better quality i guess
    • pbich 1 day ago
      [dead]
  • lukasc-ch 12 hours ago
    ... and it's on llama.cpp that to this guy! https://www.reddit.com/r/LocalLLaMA/comments/1txlhxu/i_imple...
  • v3ss0n 1 day ago
    Why this is not a PR for vLLM ?
    • woadwarrior01 1 day ago
      Last I heard, vLLM was backed by a company that has raised $150m in seed funding. I'm sure they've got the resources to port it.
    • electronsoup 1 day ago
      Why this is not a PR for llama.cpp
    • esafak 1 day ago
      It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.

      edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.

      • jmalicki 1 day ago
        And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.
    • thefox96 1 day ago
      it should be easy to do btw
  • mikeayles 19 hours ago
    [dead]
  • sspoisk 16 hours ago
    [flagged]
  • shockembopper 1 day ago
    [dead]
  • 0xjeffro 1 day ago
    yao yao ling xian