Using SSML with the Real Voice plugin

With the Real Voice plugin, you can generate the audio version of a post from either plain text or from SSML (Speech Synthesis Markup Language). In this document, you will find the procedure to select SSML as the source of the audio version of your articles and additional general information on SSML.

Select SSML as the source

To select SSML as a source, proceed as follows:

  1. Visit the post for which you want to generate the audio version
  2. Open the Text to Speech post sidebar section
  3. With the Document Type selector choose “SSML”
  4. Enter the text with SSML tags in the Document (Text/SSML) textarea
  5. Save your post with the Publish button

SSML standards

Each text-to-speech converter has a peculiar SSML implementation. An SSML document that is valid, for example, for Amazon Polly, may not work with Azure Text to Speech.

To find the elements supported by the text-to-speech converter in use, please refer to the official documentation pages:

Using common SSML tags

speak

The speak root element should always wrap the SSML document.

<speak>I can speak.</speak>

break

Use this element to add a pause in the speech.

In this example, a three-second pause has been added.

<speak>I can pause three seconds <break time="3s"/>and start again.</speak>

prosody

This element allows you to control the pitch, speaking rate, and volume of the output.

In this example, the prosody tag is used to modify the volume of a sentence.

<speak>
   <s>I can speak at the default volume.</s>
   <s><prosody volume="+6dB">
       I can speak at approximately twice the original signal amplitude
   </prosody></s>
</speak>

emphasis

Emphasizes a word or phrase. Note that the way emphasis is applied depends on the specific context and the language.

The example below emphasizes the word “big” in the first sentence:

<speak>
<s>That is a <emphasis>big</emphasis> achievement!</s>
</speak>

p

This element represents a paragraph.

Below is an example of a single paragraph:

<speak>
  <p>The sun sets, painting the sky in hues of orange and pink. A gentle breeze whispers through the trees..</p>
</speak>

Example:

s

This element represents a sentence.

Below is an example of a paragraph that includes two sentences.

<speak>
  <p>
    <s>The sun sets over the horizon.</s>
    <s>Colors blaze across the sky, creating a breathtaking evening panorama.</s>
  </p>
</speak>

W3C documentation

To learn more about SSML, please refer to the W3C specification.

Note

At this time, the Elevenlabs service doesn’t support SSML. As a consequence, you are restricted to using plain text.