You first need an Azure subscription - Create one for free.
Create a Speech resource in the Azure portal.
Your Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Azure AI services resources, see Get the keys for your resource
Creating a service account for OAuth 2.0 involves generating credentials for a non-human user, often used in server-to-server interactions. Here's how you can create OAuth 2.0 credentials using a service account for Google APIs:
Create a Service Account:
Go to the Google Cloud Console: Visit the Google Cloud Console.
Create a New Project: If you don't already have a project, create a new one in the developer console.
Enable APIs: Enable the APIs that your service account will be using. For example, if you're using Google Drive API, enable that API for your project.
Create a Service Account:
In the Google Cloud Console, navigate to "IAM & Admin" > "Service accounts."
Click on "Create Service Account."
Enter a name for the service account and an optional description.
Choose the role for the service account. This determines the permissions it will have.
Click "Continue" to proceed.
Create and Download Credentials:
On the next screen, you can grant the service account a role in your project. You can also skip this step and grant roles later.
Click "Create Key" to create and download the JSON key file. This file contains the credentials for your service account.
Keep this JSON file secure and do not expose it publicly.
Use the Service Account Credentials:
In your code, load the credentials from the JSON key file. The credentials can be used to authenticate and access the APIs on behalf of the service account.
Grant Required Permissions:
If you skipped assigning roles during the service account creation, you can now grant roles to the service account by navigating to "IAM & Admin" > "IAM" and adding the service account's email address with the appropriate roles.
It works on Python 3.10 or 3.11. The dependencies aren't well covered on all other versions (and there are a lot!)
We use a GitHub action to build the application (see workflow here.)
Blood, sweat and tears
We have
A pipe server that
Reads in a config dict
Creates an object to the TTS Engine and holds it in memory to reduce coldstart time
Speaks using sounddevice (heavily reliant on py3-tts-wrapper)
Client - calling executable
You can pass it a config and a string or no string and it will use the pasteboard text
Calls the pipe service
A GUI Configuration editor
QT Based editor
Note calls client.exe with temp configs.
There is a lot of magic to make this work though. This includes
- a unified wrapper to a range of TTS engines. This is needed as we need a unified way of get_voices and speak, speak_streamed etc
- a really nice tooling pipleine to deal with VITS models that run on the edge.
and - Massive help this work from Meta - and we converted their models for (Sherpa-)Onnx. We made some things on the way like a nice JSON with details on the voices. Commerical Providers: Please note the licence these are under
QT/QT Threading. We had "fun" with threads. Never again will I do it like this
Encryption in a github Action of keys and a hideous JSON file from Google. That wasted us a week.
Will Wade (original v1, refactoring v2 several times, dealing with encryption, build scripts and generally pulling my hair out)
Acer Jay Costillo (QT work and refactoring)
Gavin Henderson - for making the call on baking in creds. I hated that and several times threw the idea out.
Simon Poole - CTO at Smartbox for making me aware of MMS.
Whats next?
You can use the command line's --style
flag for Azure voices. If you do this, follow it with one of these style flags. You can change the strength of these with --styledegree
being 0.1 to 2. By default it is 1. So 2 would double it. Be warned. Some voices don't have all styles. .
SAPI Bridge. This is really what is needed. C++ developers - we need your help. See
-s, --style
Specifies the voice style for Azure Text-to-Speech.
String
No
None
--style "sad"
-sd, --styledegree
Specifies the degree of the style for Azure TTS.
Float
No
None
--styledegree 1.5
-c, --config
Path to a defined config file .
String
No
None
--config "C:\somepath\some.cfg"
-l, --listvoices
List Voices to see what's available
Bool
No
None
-p, --preview
Only preview the voice
Book
No
None
advertisement_upbeat
affectionate
angry
assistant
calm
chat
cheerful
customerservice
depressed
disgruntled
documentary-narration
embarrassed
empathetic
envious
excited
fearful
friendly
gentle
hopeful
lyrical
narration-professional
narration-relaxed
newscast
newscast-casual
newscast-formal
poetry-reading
sad
serious
shouting
sports_commentary
sports_commentary_excited
whispering
terrified
unfriendly