Mamba-3

(together.ai)

109 points | by matt_d 3 days ago

2 comments

  • nl 3 hours ago
    I'm looking forward to comparing this to Inception 2 (the text diffusion model) which in my experience is very fast and reasonably high quality.
    • PhilippGille 26 minutes ago
      You mean Mercury 2, by Inception: https://openrouter.ai/inception/mercury-2
    • jychang 1 hour ago
      That's completely different. That's like saying you want to compare the Nvidia 5090 GPU to the latest Call of Duty.
    • cubefox 2 hours ago
      Mamba-3 is an architecture while diffusion is, I believe, a type of objective. So these are not mutually exclusive and therefore not comparable.
  • robofanatic 3 hours ago
    > Mamba-3 is a new state space model (SSM) designed with inference efficiency as the primary goal — a departure from Mamba-2, which optimized for training speed. The key upgrades are a more expressive recurrence formula, complex-valued state tracking, and a MIMO (multi-input, multi-output) variant that boosts accuracy without slowing down decoding.

    Why can’t they simply say -

    Mamba-3 focuses on being faster and more efficient when making predictions, rather than just being fast to train like Mamba-2.

    • esquire_900 3 hours ago
      This is sort of what their first sentence states? Except your line implies that they are fast in training and inference, they imply they are focusing on inference and are dropping training speed for it.

      It's a nice opening as it is imo

      • cubefox 1 hour ago
        They don't say anything about dropping training speed.
    • E-Reverance 3 hours ago
      The first sentence basically does though, no?
      • robofanatic 2 hours ago
        Of course my only objection was the language. LLMs are now old enough to leave the jargon behind and talk in simple easy to understand terms.
        • oersted 1 hour ago
          I’d argue the opposite, the terminology is fairly mainstream by now and “inference” has a much more specific sense than “making predictions”.
    • mufasachan 1 hour ago
      The blog is technical, technical terms in the TL;DR seems relevant to me.
    • arendtio 1 hour ago
      I don't get the downvotes, as I had trouble understanding the intro as well. It seems it was written for a very specific audience.
      • qeternity 1 hour ago
        Yes, it is written for a specific audience.

        That is not a reason for snark.

        As other commenters have noted, it’s well written.

      • magicalhippo 36 minutes ago
        > I don't get the downvotes

        Because the blog post is a technical one and the intro contains very common jargon, and the proposed alternative was wrong.

    • camillomiller 1 hour ago
      I don’t know why you’re being downvoted. As a longtime editor your version is immensely better. Looks like the original was probably not human-written.