Add Azure Text to Speech to Your WordPress Site

In this article, we explore methods for converting WordPress articles to audio files using Azure Text to Speech.

We will first cover how to set up Azure Text to Speech in WordPress with a plugin. The tutorial’s last part provides examples for developing a custom integration.

General information on Azure Text to Speech

Azure Text to Speech is a service available in Microsoft Azure that produces realistic voices in multiple languages. It can be used commercially for various applications, such as phone apps, for virtual assistants, to create web accessibility features, and more. In this tutorial, we will use this service to convert the content of WordPress posts to audio.

Pricing

For the most updated information on the pricing of the speech services, please see the Azure AI Speech pricing page.

Azure Limits and Quota

Before using Azure Text to Speech in your application, make sure that the limits and quotas meet your needs. The details on this subject are available on this page.

Available Voices

Azure TTS includes a multitude of voices that support multiple languages. In PHP, you can get a list of voices at any time using the Text to Speech REST API.

Configure Azure Text to Speech and Get the API Key

The first step in the process is to login into Microsoft Azure from the homepage of the platform. Here, hover the mouse over Speech Services and click the Create button.

Services in the Azure platform homepage.

Now configure the project details by selecting the Region, Resource Group, and the remaining configuration options. Then click Review + create.

Configuration of the service in Microsoft Azure.

The resource deployment will take a few seconds. When the process is completed, proceed to the Keys and endpoint section, copy one of the two available keys and store it in a secure place. This key will be used to authenticate the HTTP requests made to the Azure API.

Two API keys associated with the service are available on the Azure platform.

Add Azure Text to Speech to WordPress Using the Real Voice Plugin

This section will describe how to convert your WordPress articles to audio using the Microsoft Azure Text to Speech integration included in the Real Voice plugin for WordPress.

First Plugin Configuration

First, download the free version on wordpress.org or buy the Pro version from our site.

Once the plugin is installed and activated, proceed to its settings and select Azure Text to Speech as the text-to-speech converter.

Using the **Text-to-speech Converter** option, we select Azure Text to Speech as the converter used by the Real Voice plugin.

Then, under the Azure Text-to-Speech section, enter the API key previously copied from the Azure platform.

In the **Azure Text-to-speech** section, using the **Azure Speech Resource Key** option, we configure the API key of the service.

To conclude, select the Azure region that best fits your needs. The region close to your WordPress site web server should usually be the most appropriate.

Additional options to customize your request to the Azure API are available. Specifically:

User Agent – Add a custom user agent to identify your request. For example, you might set “yoursite.com” to register that these requests have been sent from your site.
Output Format – Here, you set the output format of the generated audio file. This selection affects the generated audio quality and the space occupied by the files.
Voice – This option selects the voice used to generate the audio version of the post. Note that beside each voice, you will see the language and locale associated with it. If, for example, you are interested in an English voice with a United States accent, select a voice with the suffix “en-us.”

To conclude the configuration process of the plugin, make sure that the post types for which you want to generate the audio are selected in the Post Types option. The plugin will add the text-to-speech-related tools used to perform text-to-speech conversion and list the audio files associated with the post only in the post types defined by this option.

Generate the Spoken Version of an Article

Now, edit the post that you want to convert to audio.

Then open the Audio File post editor sidebar section and click Generate File to convert the post to audio using the post content as the text to convert.

The text-to-speech tools added by the Real Voice plugin are available in the post editor sidebar.

Note that for more precise results, it is recommended that you convert the exact text or SSML using the DOCUMENT (TEXT/SSML) field in the Text to Speech post editor sidebar.

Listen to the Generated Speech on the WordPress Front End

You can now verify if the audio produced meets your needs. To achieve this, simply visit the article for which you have generated the spoken version. Just below the article title, you will find an audio player. Click on the Play button of the audio player to listen to the audio version of the post.

Update the audio version of a post

The plugin will store indefinitely the mp3 files containing the audio versions of the articles. These files are stored in the WordPress upload folder.

If you update an article and want to re-generate the corresponding audio file, you can do it at any time by clicking the Generate File button again.

Developing an Integration of Azure TTS in WordPress

Set Up a Few Variables

First, set up a few variables to configure the API request.

// Set up a few variables.
$text                      = 'Speak this.';
$region                    = 'eastus';
$user_agent                = 'Default Agent';
$x_microsoft_output_format = 'audio-24khz-160kbitrate-mono-mp3';
$voice_short_name          = 'en-US-ChristopherNeural';
$api_url                   = 'https://' . $region
                             . '.tts.speech.microsoft.com/cognitiveservices/v1';
$key                       = 'YOUR_AZURE_SPEECH_RESOURCE_KEY';

Prepare the Content to Convert

Prepare the content to convert as SSML. Note that our example text is enclosed in SSML tags.

$cont = '<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="' . esc_attr( $voice_short_name ) . '">
    ' . $text . '
</voice>
</speak>';

Prepare HTTP Headers and the Related Arguments

To reliably get the proper response from the Azure Text to Speech API, you need to include specific values in the HTTP headers and the HTTP request arguments.

// Request headers.
$headers = array(
	'Ocp-Apim-Subscription-Key' => $key,
	'Content-Type'              => 'application/ssml+xml',
	'Host'                      => $region . '.tts.speech.microsoft.com',
	'Content-Length'            => strlen( $cont ),
	'User-Agent'                => $user_agent,
	'X-Microsoft-OutputFormat'  => $x_microsoft_output_format,
);

// HTTP request arguments.
$args = array(
	'method'      => 'POST',
	'headers'     => $headers,
	'body'        => $cont,
	'data_format' => 'body',
	'timeout'     => 60,
);

Performing a Request to the API With wp_remote_request()

Submit the request using wp_remote_request(), a WordPress function that allows us to easily perform an HTTP request and receive the related response. Note that this function is an alternative to using cURL.

// Send the HTTP request.
$response = wp_remote_request( $api_url, $args );

Handle the Response

Finally, we can handle the response and save the error message or the audio data in a PHP variable.
In detail, if the HTTP request fails or includes an error message, we save the error in a PHP variable. If there are no errors, the returned audio data is saved to an mp3 file in a custom server location using file_put_contents().

// Check for errors.
if ( is_wp_error( $response ) ) {

	$result = array(
		'error'   => true,
		'message' => $response->get_error_message(),
	);

} elseif ( 200 !== $response['response']['code'] ) {

	$result = array(
		'error'   => true,
		'message' => $response['response']['message'],
	);

} else {

	// Get the audio data.
	$audio_data = wp_remote_retrieve_body( $response );

	// Set the location where the mp3 file will be stored.
	$file_path = '/home/yoursite/public_html/mp3/example.mp3';

	// Create the audio file in a custom directory of your server.
	$result = file_put_contents( $file_path, $audio_data );

}

Integrating an Audio Player With the HTML AUDIO Element

The most basic method of adding an audio player capable of playing an audio file is to use the audio HTML element.

By using the the_content filter, you can prepend the HTML audio player to a post and reference the related audio file:

// Add the audio player at the end of the post content.
add_filter( 'the_content', 'add_player_html' );

function add_player_html() {

	// Return an HTML audio player that references the file with the audio version of the post.
	return '<audio controls>
			<source src="https://example.com/wp-content/uploads/2024/03/example.mp3" type="audio/mpeg">
			Your browser does not support the audio element.
		</audio>';

}

Perform Conversions With a Dedicated Button on a Meta Box

To conclude the implementation you may consider adding a “Convert to audio” button with a dedicated meta box. When the button is clicked, the PHP integration used to send the API request and generate the audio file described in the previous section will run.

You can activate the script, for example, by sending an AJAX request, placing the request in a REST API endpoint, or simply reloading the page and checking for a specific query parameter.