116 points | by teamchong 7 hours ago
7 comments
- fork of a fork of a quantization library
- suspicious burst of ~nothing comments from new accounts
- 6 comments 7 hours in, 4 flagged/dead, 2 also spammy and/or confused
- Demo shows it's worse: 800 ms instead of 2.6 ms for text embedding search
- "but it saves space" - yes. 1.2 MB in RAM instead of 7.2 MB to turn search in 1s instead of sub-frame.
- Keyword matching on TurboQuant being somethign cool
- It's not even wrong to do this with the output embeddings, there's way more obvious ways to save even more space
- README is a LLM thinking author is asking of work, not a README explaining anything
I guess need to dig into this and see if it’s faster and has more use cases! Thanks for publishing your work
- fork of a fork of a quantization library
- suspicious burst of ~nothing comments from new accounts
- 6 comments 7 hours in, 4 flagged/dead, 2 also spammy and/or confused
- Demo shows it's worse: 800 ms instead of 2.6 ms for text embedding search
- "but it saves space" - yes. 1.2 MB in RAM instead of 7.2 MB to turn search in 1s instead of sub-frame.
- Keyword matching on TurboQuant being somethign cool
- It's not even wrong to do this with the output embeddings, there's way more obvious ways to save even more space
- README is a LLM thinking author is asking of work, not a README explaining anything
I guess need to dig into this and see if it’s faster and has more use cases! Thanks for publishing your work