Large Language Diffusion Models

(arxiv.org)

5 points | by kadushka 6 hours ago

1 comments

  • Alex-Programs 6 hours ago
    https://news.ycombinator.com/item?id=43080189 previously posted to little attention.

    I really think there ought to be more discussion of this paper.

    copying from my previous comment: A first-generation diffusion model is beating LLama 3 in some areas, a model with a huge amount of tuning and improvement work. And it's from China again!

    A whole new "tree" of development has opened up. With so many possibilities - traditional scaling laws, out-loud chain of thought, in-model layer-repeating chain of thought, and now diffusion models - it seems unlikely to me that LLMs are going to hit a wall that the river of technological progress cannot flow around.

    I wonder how well they'll work at translation. The paper indicates that they're rather good at poetry.

    Interesting times.

    • kadushka 6 hours ago
      I'm still reading the paper, but my main question is how slow is the model compared to LLM of the same size. It seems like to get the best accuracy they need to set number of time steps to the number of tokens to be generated. Does it make it comparable in speed to an LLM?