Appreciate the concept, seems deeply useful if a bit underbaked at present.
Active STT allows a "No STT loaded" option that mentions it requires a multimodal LLM like Gemma 4. Except even when I use Gemma 4 features, Ctrl+S to dictate doesn't work. Unless I Voice Edit then quickly Dictate as soon as it processes the silence. Sometimes if the Dictation is triggered on silence, it'll just choose to paste whatever text is on screen. There's no way to dismiss the popup with the text before it's ready to vanish on its own. There's no way to preview what the TTS voices sound like without triggering something to be said manually.
It seems like this will be a great tool soon, but currently there are very many rough edges that would benefit greatly from a nice heavy sanding pass.
So it's a dictation tool? Then why does "voice to text" barely appear on the page? Why are you describing it here as an AI assistant but the page doesn't say anything about that? "Understands my screen"? Why does my dictation software need to understand my screen? I don't know what "text generation", "AI editing" or "AI writing" even mean.
Active STT allows a "No STT loaded" option that mentions it requires a multimodal LLM like Gemma 4. Except even when I use Gemma 4 features, Ctrl+S to dictate doesn't work. Unless I Voice Edit then quickly Dictate as soon as it processes the silence. Sometimes if the Dictation is triggered on silence, it'll just choose to paste whatever text is on screen. There's no way to dismiss the popup with the text before it's ready to vanish on its own. There's no way to preview what the TTS voices sound like without triggering something to be said manually.
It seems like this will be a great tool soon, but currently there are very many rough edges that would benefit greatly from a nice heavy sanding pass.