I've been going down this rabbit hole too. I ended up building DictaFlow (https://dictaflow.vercel.app/) because I needed something that specifically works in VDI/Citrix environments where clipboard pasting is blocked (I work in finance).
It uses a 'character-typing' method instead of clipboard injection, so it's compatible with pretty much anything remote. Also kept it super lightweight (<50MB RAM) for Windows users who don't want to run a full local server stack.
Cool to see Handy using the newer models—local voice tech is finally getting good.
It uses a 'character-typing' method instead of clipboard injection, so it's compatible with pretty much anything remote. Also kept it super lightweight (<50MB RAM) for Windows users who don't want to run a full local server stack.
Cool to see Handy using the newer models—local voice tech is finally getting good.