Create multi-lingual content

You can easily make video and audio files with several different languages and voices.

If you’re making language lessons, voiceover for movies or video games involving roles speaking multiple languages or including foreign words in a narration, you may need to produce audio that includes multiple languages. Narakeet makes that relatively easy. However, because AI voices have a primary language, you may need to mark up the script to make it clear which language to use when. This tutorial shows several options for creating multilingual content.

Choosing the right method

There are several methods of creating multilingual content. The key questions to decide on the right method are:

  1. Should the voice be consistent across languages? For example, a single person switching between languages to pronounce a foreign name or a phrase in a different language has a consistent voice. Multiple people reading different parts of a script will not have a consistent voice throughout.
  2. If the voice needs to be consistent, should the foreign part be read out as a native speaker of the primary language, or the foreign language. For example, you can get a Russian voice to read out an English street name with a heavy Russian accent, or with an English accent.

When the voice should not be consistent, the best option is to use multiple speakers.

When the voice needs to be consistent, spoken with a native accent in both languages, you can use multi-lingual voices and set the language using a narration span or the narration-voice stage direction.

When the voice needs to be consistent, but spoken with a foreign accent, you can use a single-language voice and set the language using a narration span or the narration-voice stage direction.

Testing the pronunciation

If you’re using multiple languages with a single voice, it’s very important to test the pronunciation of foreign words. Our voices are primarily trained in one language, and although some can speak multiple languages, we recommend that you verify the pronunciation using the Preview function.

Use multiple speakers - one per language

For movie voiceovers, podcasts, radio content or video games narration that contains a dialogue between multiple characters speaking different languages, you can use the voice stage direction to change the active speaker. For example, to create a dialogue between an Italian-speaking role and an English-speaking role, you can use the English-speaking voice Amy, and the Italian-speaking voice Alessandra.

(voice: Amy)

Hello, I'm Amy.

(voice: Alessandra)

Buongiorno!

Play the audio below to hear this example:

Use a single speaker for a few words in a sentence

When you just want to include a few foreign words in a single sentence, for example to read out a company name or the name of a person or a town, you can mark the foreign phrase using a narration span. Include the phrase in square brackets ([]), and immediately follow it with the language code in curly braces ({}). You can use an ISO 639-1 alpha-2 code (for example, en for English or de for German), or an an ISO 639-1 alpha-2 code followed by a dash, and a ISO 3166-1 alpha-2 region code (for example, fr-CA for Canadian French).

For example, Amy is an English name and the Italian voice Alessandra would not read it correctly if it was just included in Italian text. However, you can mark it with a narration span to tell the AI voice the content should be in English. Here is how Alessandra could greet Amy properly:

(voice: Alessandra)

Buongiorno [Amy]{en}!

Play the audio below to hear this example:

Using narration spans is particularly useful for language lessons, where the language instructor needs to point out the differences between various languages. Here is an example of an English voice reading out a word that’s spelled the same in both English and German.

(voice: charles)

The German word [See]{de} and the English word See have the same spelling: `see`

Play the audio below to hear this example:

When using a voice trained for one language to speak another, it’s critical to test the output. Many voices can read popular words from major languages (such as English), but support for less frequent words or less popular languages may vary a lot. Use the Preview function to try out voice/content combinations without spending credits.

Use a single speaker for a larger piece of text

When the content in a different language is longer, marking narration spans can be a bit tedious. For situations like that, it’s useful to temporarily switch to a different language using the narration-language stage direction.

Here is an example of Helmut, a German language voice, reading a German paragraph and a translation to English.

(voice: helmut)

Berlin hat zahlreiche Fließgewässer und Seen. Die Spree mündet in Spandau in die Havel, die den Westen Berlins in Nord-Süd-Richtung durchfließt.

(narration-language: en)

Berlin has numerous rivers and lakes. The Spree flows into the Havel in Spandau, which crosses the west of Berlin in a north-south direction.

Play the audio below to hear this example:

The narration-language stage direction applies to all the text that follows it, not just a single paragraph. If you want to switch back to the original language after the foreign content, make sure to use the narration-language stage direction again.

Use a polyglot voice for a native accent

In the previous example, Helmut speaks English with a heavy German accent. This is usually good for authenticity, for example for voicemail message aimed at a local audience. If you’re producing long-form content, a more native accent for the foreign language content might be more suitable. Some voices are trained to speak with a native accent in many different languages, and they will have better support for longer content.

Dietrich is a German voice that is trained to speak English. Compare the output below to the previous example to hear the difference:

(voice: dietrich)

Berlin hat zahlreiche Fließgewässer und Seen. Die Spree mündet in Spandau in die Havel, die den Westen Berlins in Nord-Süd-Richtung durchfließt.

(narration-language: en)

Berlin has numerous rivers and lakes. The Spree flows into the Havel in Spandau, which crosses the west of Berlin in a north-south direction.

Play the audio below to hear this example:

Polyglot voices

The following polyglot voices are trained for many different languages, and you can choose one of them if you want to create a native accent in the foreign language.

  • Mei (Mandarin Chinese)
  • Xiaoming (Mandarin Chinese)
  • Quan (Mandarin Chinese)
  • Gertrud (Standard German)
  • Dietrich (Standard German)
  • Daisy (British English)
  • Benedict (British English)
  • Betty (American English)
  • Gary (American English)
  • Raymond (American English)
  • Cindy (American English)
  • Elvira (Castilian Spanish)
  • Paquita (Castilian Spanish)
  • Ricardo (Castilian Spanish)
  • Jacques (Metropolitan French)
  • Lucienne (Metropolitan French)
  • Victor (Metropolitan French)
  • Giovanni ( Italian)
  • Pietro ( Italian)
  • Giulio ( Italian)
  • Toshiro ( Japanese)
  • Ji-sub ( Korean)
  • Murilo (Brazilian Portuguese)

Choose one of them to get the foreign content in a native accent, and for better foreign word support. Again, make sure to test that the voice supports your language using Previews

Defaulting a polyglot voice to a different language

You can use polyglot voices as extra options for many languages where there are only a few other choices. To do that, you can add the narration-language stage direction at the start of the scene. For multi-scene scripts, you can set the language globally using the narration-language header. That way, you do not have to set the language on each individual scene. Here is an example of Gertrud (a Polyglot voice normally speaking German) reading text in Croatian:

---
voice: gertrud
narration-language: hr-HR
---

Temelji hrvatske države nalaze se u razdoblju ranoga srednjeg vijeka kada su Hrvati osnovali svoje dvije kneževine: Panonsku i Primorsku Hrvatsku.

This option is useful mostly when creating scripted videos and using the text to speech API. If you are using the web interface, using the stage direction to set the language is probably simpler.

More information