How was this made
Blood, sweat and tears
We have
A pipe server that
Reads in a config dict
Creates an object to the TTS Engine and holds it in memory to reduce coldstart time
Speaks using sounddevice (heavily reliant on py3-tts-wrapper)
Client - calling executable
You can pass it a config and a string or no string and it will use the pasteboard text
Calls the pipe service
A GUI Configuration editor
QT Based editor
Note calls client.exe with temp configs.
There is a lot of magic to make this work though. This includes
TTS-Wrapper - a unified wrapper to a range of TTS engines. This is needed as we need a unified way of get_voices and speak, speak_streamed etc
Sherpa-Onnx - a really nice tooling pipleine to deal with VITS models that run on the edge.
MMS and Models readied for Sherpa-Onnx - Massive help this work from Meta - and we converted their models for (Sherpa-)Onnx. We made some things on the way like a nice JSON with details on the voices. Commerical Providers: Please note the licence these are under
QT/QT Threading. We had "fun" with threads. Never again will I do it like this
Encryption in a github Action of keys and a hideous JSON file from Google. That wasted us a week.
Credits
Will Wade (original v1, refactoring v2 several times, dealing with encryption, build scripts and generally pulling my hair out)
Acer Jay Costillo (QT work and refactoring)
Gavin Henderson - for making the call on baking in creds. I hated that and several times threw the idea out.
Simon Poole - CTO at Smartbox for making me aware of MMS.
Whats next?
SAPI Bridge. This is really what is needed. C++ developers - we need your help. See Roadmap
Last updated