Using SSML with the Real Voice plugin
With the Real Voice plugin, you can generate the audio version of a post from either plain text or from SSML (Speech Synthesis Markup Language). In this document, you will find the procedure to select SSML as the source of the audio version of your articles and additional general information on SSML.
Select SSML as the source
To select SSML as a source, proceed as follows:
- Visit the post for which you want to generate the audio version
- Open the Text to Speech post sidebar section
- With the Document Type selector choose “SSML”
- Enter the text with SSML tags in the Document (Text/SSML) textarea
- Save your post with the Publish button
Each text-to-speech converter has a peculiar SSML implementation. An SSML document that is valid, for example, for Amazon Polly, may not work with Azure Text to Speech.
To find the elements supported by the text-to-speech converter in use, please refer to the official documentation pages:
- SpeechSynthesis (Browser) – SpeechSynthesisUtterance: text property
- Amazon Polly – Supported SSML Tags
- Google Cloud Text-to-Speech – Speech Synthesis Markup Language (SSML)
- Azure Text to Speech – Speech Synthesis Markup Language (SSML) overview
Using common SSML tags
speak root element should always wrap the SSML document.
<speak>I can speak.</speak>
Use this element to add a pause in the speech.
In this example, a three-second pause has been added.
<speak>I can pause three seconds <break time="3s"/>and start again.</speak>
This element allows you to control the pitch, speaking rate, and volume of the output.
In this example, the
prosody tag is used to modify the volume of a sentence.
<speak> <s>I can speak at the default volume.</s> <s><prosody volume="+6dB"> I can speak at approximately twice the original signal amplitude </prosody></s> </speak>
Emphasizes a word or phrase. Note that the way emphasis is applied depends on the specific context and the language.
The example below emphasizes the word “big” in the first sentence:
<speak> <s>That is a <emphasis>big</emphasis> achievement!</s> </speak>
This element represents a paragraph.
Below is an example of a single paragraph:
<speak> <p>The sun sets, painting the sky in hues of orange and pink. A gentle breeze whispers through the trees..</p> </speak>
This element represents a sentence.
Below is an example of a paragraph that includes two sentences.
<speak> <p> <s>The sun sets over the horizon.</s> <s>Colors blaze across the sky, creating a breathtaking evening panorama.</s> </p> </speak>
To learn more about SSML, please refer to the W3C specification.
At this time, the Elevenlabs service doesn’t support SSML. As a consequence, you are restricted to using plain text.