How was this made

Blood, sweat and tears

We have

  1. A pipe server that

    1. Reads in a config dict

    2. Creates an object to the TTS Engine and holds it in memory to reduce coldstart time

    3. Speaks using sounddevice (heavily reliant on py3-tts-wrapper)

  2. Client - calling executable

    1. You can pass it a config and a string or no string and it will use the pasteboard text

    2. Calls the pipe service

  3. A GUI Configuration editor

    1. QT Based editor

    2. Note calls client.exe with temp configs.

There is a lot of magic to make this work though. This includes

  • TTS-Wrapper - a unified wrapper to a range of TTS engines. This is needed as we need a unified way of get_voices and speak, speak_streamed etc

  • Sherpa-Onnx - a really nice tooling pipleine to deal with VITS models that run on the edge.

  • MMS and Models readied for Sherpa-Onnx - Massive help this work from Meta - and we converted their models for (Sherpa-)Onnx. We made some things on the way like a nice JSON with details on the voices. Commerical Providers: Please note the licence these are under

  • QT/QT Threading. We had "fun" with threads. Never again will I do it like this

  • Encryption in a github Action of keys and a hideous JSON file from Google. That wasted us a week.

Credits

  • Will Wade (original v1, refactoring v2 several times, dealing with encryption, build scripts and generally pulling my hair out)

  • Acer Jay Costillo (QT work and refactoring)

  • Gavin Henderson - for making the call on baking in creds. I hated that and several times threw the idea out.

  • Simon Poole - CTO at Smartbox for making me aware of MMS.

Whats next?

  • SAPI Bridge. This is really what is needed. C++ developers - we need your help. See Roadmap

Last updated