Show HN: Local-first fast CPU image to text for screenshots, PDFs, webpages

(github.com)

11 points | by mrkn1 8 hours ago

6 comments

- how well do you think this ll work with code? i mean take code screenshots and convert it into actual code for vscode

mrkn1 4 hours ago

Just ran

  textsnap "https://i.ytimg.com/vi/LBNDfxjEYlA/maxresdefault.jpg"

and got this

  $('.count').each(function () {
  $('this').prop('Counter', 0).animate({
    Counter: $('this').text()
  }, {
      duration: 4000,
      easing: 'swing',
      step: 'function (now) {
          $('this").text(Math.ceil(now));
      }
    }); 
  });

abstract257 6 hours ago
Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...
[-]
- krunck 5 hours ago
  I had to extract the image from a PDF for it to work. Then run it on each page image extracted.
monosma 4 hours ago
What was the reason for adopting PaddleOCR? Can other OCR models be used as well?
[-]
- mrkn1 3 hours ago
  No reason other than their Q4 model working reasonably well and fast on my CPU laptop. Should work with any ONNX VLM model
kouru225 4 hours ago
Roman alphabet only or does this work with other alphabets?
[-]
- mrkn1 3 hours ago
  109 languages, including other alphabets.
garrett2558 7 hours ago
Very cool, I'm building my own local-first product as well
[-]
- mrkn1 7 hours ago
  thank you! what is it about?
BIGFOOT_EXISTS 5 hours ago
Now this is legit cool, keep up the great work.
[-]
- mrkn1 5 hours ago
  thank you!