Hi,
I found your project quite interesting so I looked into it. I admit I've never done anything with twitch overlays or voice recognition, but I do know that Google has an API for speech to text recognition. It would be pretty easy to get this to work, but Google Cloud can be expensive, and given that you're running a twitch channel I imagine the input would be rather large. However, it's possible to run different software locally on your machine for free. I imagine I can test using Google cloud to do this quickly, but if that becomes too slow/expensive we can move on to locally running it on your machine.
Lance