Text to Speech API

Our Text to Speech API allows you to automatically generate audio in 100 languages, with 800 voices. You can batch-produce audio files from external content, integrate our realistic text to speech voices into your software, and a lot more.

This page explains how to use our text to speech API to create audio files.

NOTE: The easiest way to run simple batch conversion jobs is to use our command-line utility. This page contains information for people who want to build their own integration.

Choose between Streaming or Polling API

Narakeet has two ways of integrating with the Text to speech API:

  1. Short content (streaming) API is simpler, faster, but restricted to relatively short content.
  2. Long content (JSON polling) API is more complex but allows significantly larger and longer conversions.

If you want to build audio on the fly for short sentences, such as synthesising individual paragraphs or labels for user interface elements, use the short content (streaming) API. To convert large documents, build audiobooks, or produce uncompressed output for professional videos, use the long content (polling) API.

Here is a quick summary of the limitations and differences between the APIs.

FeatureShort content (streaming) APILong content (polling) API
Maximum content length1 KB1024 KB
Supported formatsM4A, MP3M4A, MP3, WAV
Process duration30 seconds45 minutes

When executing the requests, you select the API with the accept header. If you provide application/octet-stream as the accept header, the short content (streaming) API will be used, and you will get the result back as a binary stream. If you do not provide the accept header, the long content (polling) API will be used, and you will get back a status URL that you can poll for results.

API Endpoints

There are three endpoints for audio project build requests, which produce different output formats:

  • https://api.narakeet.com/text-to-speech/wav creates uncompressed 16-bit PCM wav files (highest quality, largest file)
  • https://api.narakeet.com/text-to-speech/mp3 creates compressed MP3 files (smaller file, good quality)
  • https://api.narakeet.com/text-to-speech/m4a creates compressed MPEG-4 files (best combination of file size and quality)

Note that the WAV endpoint only works for the long content (polling) API. M4A and MP3 endpoints support both short content API (streaming) and long content API (JSON polling).

Authenticating requests

To use the API, you will need a Narakeet API key. For information on how to get a key, check out our guide on Managing API Keys.

You should provide the API key as a header to all requests to the public REST endpoints, using the x-api-key header.

Short content API (Streaming)

The short content API requires just one request, and returns the audio as a binary stream.

To request an audio file build, use one of the endpoints, and:

  • Use the POST HTTP method
  • Set the Content Type to text/plain (see Converting Subtitle files for additional values)
  • Provide your API key in the x-api-key header
  • Specify an accept header with the value application/octet-stream
  • In the request body, provide a UTF-8 encoded script text

The snippet below will generate a M4A file using the text “Hi there, this is your API speaking”, and save it to result.m4a.

curl -d "Hi there, this is your API speaking" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.m4a https://api.narakeet.com/text-to-speech/m4a 

Note that on Windows, if you use CURL from the terminal, you may need to URL encode the content before sending it. See the tips section for an example.

You can read the audio duration of the generated file, rounded up to the nearest second, from the x-duration-seconds header.

See Configuring Audio Tasks for information on selecting the voice and adjusting the reading speed.

Narakeet API NodeJS/JavaScript example

For a simple example of how to access the short content (streaming) API from JavaScript/NodeJS, check out https://github.com/narakeet/text-to-speech-api-nodejs-example.

Narakeet API Python example

For a simple example of how to access the short content (streaming) API from Python, check out https://github.com/narakeet/text-to-speech-api-python-example.

Narakeet API CSharp/.NET Core example

For a simple example of how to access the short content (streaming) API from CSharp/.NET Core, check out https://github.com/narakeet/text-to-speech-api-csharp-example.

Narakeet API PHP example

For a simple example of how to access the short content (streaming) API from PHP, check out https://github.com/narakeet/text-to-speech-api-php-example

Narakeet API Java example

For a simple example of how to access the short content (streaming) API from Java, check out https://github.com/narakeet/text-to-speech-api-java-example

Narakeet API Dart example

For a simple example of how to access the short content (streaming) API from Dart, check out https://github.com/narakeet/text-to-speech-api-dart-example

Error handling

If there is an error during audio conversion, Short content API will contain the error in the immediate response. The response will have status code 400 (for user errors) or 500 (for server errors). The response type will be application/json, and the body of the response will be a JSON object containing more information about the error.

Long Content API (JSON Polling)

The large content API allows running longer and larger jobs. To be fault-tolerant, it does not require you to keeping a single HTTPS connection open for a longer period of time. Instead, you make several short requests. This integration is much more complicated than the short content API, but it allows for better resilience and longer processing.

To create an audio file using the long content API, execute the following steps:

  1. Request an audio build, which will provide you with a status URL
  2. Poll the status URL periodically until the build finishes. This will provide you with a URL of the audio file, valid for 24 hours
  3. Download the audio from the URL, or somehow else consume the result (for example, send the URL to another service).

NOTE: Requests to storage endpoints (step 2 and 3) do not require the authentication. The storage URLs provided to you by the REST API will already be pre-signed with authentication tokens. Do not include your API key as a separate header when performing those requests.

Long Content API NodeJs example

For a simple example of how to access the long content (polling) API from NodeJs/Javascript, check out https://github.com/narakeet/text-to-speech-polling-api-nodejs-example.

Long Content API Python example

For a simple example of how to access the long content (polling) API from Python, check out https://github.com/narakeet/text-to-speech-polling-api-python-example.

Long Content API PHP example

For a simple example of how to access the long content (polling) API from PHP, check out https://github.com/narakeet/text-to-speech-polling-api-php-example.

Long Content API Java example

For a simple example of how to access the long content (polling) API from Java, check out https://github.com/narakeet/text-to-speech-polling-api-java-example.

### Long Content API CSharp Example

For a simple example of how to access the long content (polling) API from C#/.NET Core, check out https://github.com/narakeet/text-to-speech-polling-api-csharp-example/.

Step 1: Request an audio build

To request an audio file build, use one of the endpoints, and:

  • Use the POST HTTP method
  • Set the Content Type to text/plain (see Converting Subtitle files for additional values)
  • Provide your API key in the x-api-key header
  • Do not set the accept header
  • In the request body, provide UTF-8 encoded script text

The response will be a JSON structure containing the field statusUrl. This is the URL where you can periodically poll for results.

The snippet below will trigger the build using CURL and extract the status URL:

BODY="Hi there, this is your API speaking"
API_RESPONSE=$(curl -d $BODY -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" https://api.narakeet.com/text-to-speech/wav)
STATUS_URL=$(echo $API_RESPONSE | jq -r .statusUrl)

Note that on Windows, if you use CURL from the terminal, you may need to URL encode the content before sending it. See the tips section for an example.

Step 2: Poll for results

To get the status of your build job, poll the status URL returned by the previous step periodically. We recommend polling every 5-10 seconds.

  • Use the GET HTTP method
  • Do not provide the API key in the headers. The URL already has all appropriate authorisations

The status URL will contain the build job status as a JSON object, with following properties:

  • finished: boolean value (true/false) signalling if the video build completed. The value true means you should stop polling.
  • percent: numerical value between 0 and 100, signalling the progress of the audio build.
  • succeeded: once the task is finished, a boolean value (true/false) signalling if the video was built, or if there was an error. The value true means that you can download the result video.
  • result : if the task succeeded, a string value with a secure URL, valid for 10 minutes, where you can download the audio file.
  • message: if the task failed, a string value detailing the error
  • durationInSeconds: If the task succeeded, an integer value with the generated audio duration in seconds, rounded up to the nearest second.

Step 3: Download the result

Once the status URL contains finished value true, and succeeded value true, you will find the URL to the resulting audio file in the result field. This is a secure, temporary URL that expires in 24 hours, so you should download the audio file or immediately process it somehow else.

Error handling

If there is an error with starting the task, the request endpoint will return status code 400 for user errors, and 500 for server errors. The response type will be application/json, and the body of the response will be a JSON object containing more information about the error.

Once the task starts, the status URL will contain more information on processing. In case of an error, the status URL will respond with a JSON object. You can detect an error by the following properties:

  • finished: boolean value true (the job is over)
  • succeeded: boolean value false (the job failed)
  • message: error message

Configuring audio tasks

You can use the full power of Narakeet audio scripting through the API. Here is how to configure your audio conversion job.

Selecting the default voice

You can select the default voice either appending a voice query string parameter, or by supplying the voice header in your script. All our Text to Speech voices are supported through the REST interface.

curl --data-binary "@my-script.txt"  -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey

Controlling the reading speed

You can select the default voice speed either appending a voice-speed query string parameter, or by supplying the voice-speed header in your script.

curl --data-binary "@my-script.txt"  -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey&voice-speed=1.1

Controlling voice volume

You can select the default voice volume either appending a voice-volume query string parameter, or by supplying the voice-volume header in your script.

curl --data-binary "@my-script.txt"  -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey&voice-volume=soft

Configuring other options

Narakeet scripts support setting default options in a header section (enclosed in --- above and below, at the start of the script file). You can use the header section to set the default voice speed, volume, choose a voice and a lot more. For more information, check out the Script header formatting reference. For example, the following script sets the default voice and pitch.

---
voice: Victoria
voice-pitch: high
---

This script will be read by Victoria, in high pitch

Converting subtitle files (SRT and VTT)

You can use the Content-Type header to control how Narakeet interprets your input. By default, the text/plain content type will read out the entire body of the request as a Narakeet script. You can also automatically convert popular subtitle and closed caption file formats (SubRip SRT and WebVTT) by supplying a different content type.

  • for SubRip (.srt) files, use application/x-subrip or text/srt
  • for WebVTT (.vtt) files, use text/vtt

Provide the subtitle file contents in the request body, and make sure that the content is UTF-8 encoded.

Note that Narakeet aligns entire sentences when processing subtitle and closed caption files, and does not automatically compress the audio if the chosen voice speaks slower than the subtitle timings dictate. If the voice you choose reads content slower than your subtitles, you may need to increase the voice speed.

curl --data-binary "@subtitles.vtt" -H 'Content-Type: text/vtt' -H "x-api-key: $APIKEY" "$URL/text-to-speech/mp3?voice=marion&voice-speed=1.1"

For more information on converting subtitle files to audio works, and the limitations of Narakeet when turning subtitles to speech, see our guide on how to make closed captions and subtitles for text to speech audio.

Tips and tricks

Using international characters on Windows

This trick is not necessary for UTF8 Linux or MacOS terminals.

The Windows terminal and CURL do not work nicely with Unicode characters. To pass Unicode characters outside the basic ASCII range, you can use the following options

  1. Save the content as UTF-8 encoded into a file, then use the --data-binary option (see the next tip for an example)
  2. URL-encode the content, and then post it with content type application/x-www-form-urlencoded.

Do not use the --data-urlencode option of CURL, it has the same problem as posting --data; you will need to URL encode the content yourself. For example, using the encodeURIComponent javascript method in NodeJS or your browser.

Here is an example:

curl --data "Rad%C5%A1ej%20by%20som%20i%C5%A1iel%20do%20da%C5%BE%C4%8Fa." -H "Content-Type: application/x-www-form-urlencoded" -H "x-api-key: %APIKEY%" https://api.narakeet.com/text-to-speech/wav?voice=juraj

Sending files using cURL

If you use cURL, instead of pasting larger scripts into a command line, save them into a text file and then use the --data-binary option to load a file.

curl --data-binary "@my-script.txt" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" https://api.narakeet.com/text-to-speech/wav

Do not use the --data cURL option for sending files, as this removes newlines and whitespace in some cases, so this will lead to problems for multi-line scripts. The --data-binary option preserves newlines and whitespace.

Getting the generated audio duration

You can get the audio file duration rounded up to the nearest second.

If you use the streaming API, retrieve it using the x-duration-seconds response header. See https://github.com/narakeet/text-to-speech-api-php-example/blob/master/tts-extract-duration.php for an example.

In the long content polling API, the final status JSON will contain a field called durationInSeconds, containing the audio duration.

More information

  • See this flow in action, implemented using Node.js, in the narakeet/api-client GitHub project.
  • For general API limitations and pricing, see the main Developer API page.