A Python library for multilingual phonemization and Text-to-Speech (TTS) using ONNX models.
phoonnx is a comprehensive toolkit for performing high-quality, efficient TTS inference using ONNX-compatible models.
It provides a flexible framework for text normalization, phonemization, and speech synthesis, with built-in support for
multiple languages and phonemic alphabets. The library is also designed to work with models trained using
phoonnx_train, including utilities for dataset preprocessing and exporting models to the ONNX format.
It supports over 1000 languages and voices from various frameworks (phoonnx, piper, mimic3, coqui, MMS, transformers). The full list can be found in VOICES.md
- Efficient Inference: Leverages
onnxruntimefor fast and efficient TTS synthesis. - Multilingual Support: Supports a wide range of languages and phonemic alphabets, including IPA, ARPA, Hangul (Korean), and Pinyin (Chinese).
- Multiple Phonemizers: Integrates with various phonemizers like eSpeak, Gruut, and Epitran to convert text to phonemes.
- Advanced Text Normalization: Includes robust utilities for expanding contractions and pronouncing numbers and dates.
- Dataset Preprocessing: Provides a command-line tool to prepare LJSpeech-style datasets for training.
- Model Export: A script is included to convert trained models into the ONNX format, ready for deployment.
As phoonnx is available on PyPI, you can install it using pip.
pip install phoonnxThe main component for inference is the TTSVoice class. You can load a model and synthesize speech from text as follows:
import wave
from phoonnx.config import VoiceConfig, SynthesisConfig
from phoonnx.voice import TTSVoice
# Load a pre-trained ONNX model and its configuration
voice = TTSVoice.load("model.onnx", "config.json")
# Configure the synthesis parameters (optional)
synthesis_config = SynthesisConfig(
noise_scale=0.667,
length_scale=1.0,
noise_w_scale=0.8,
enable_phonetic_spellings=True, # apply pronunciation fixes, see "locale" folder in this repo
add_diacritics=False # for arabic and hebrew
)
# Synthesize audio from text
text = "Hello, this is a test of the phoonnx library."
slug = f"phoonnx_{voice.config.phoneme_type.value}_{voice.config.lang_code}"
with wave.open(f"{slug}.wav", "wb") as wav_file:
voice.synthesize_wav(text, wav_file, synthesis_config)phoonnx provides out-of-the-box integration for Open Voice OS and a powerful command-line interface for voice model management.
phoonnx includes a native OVOS TTS plugin ovos-tts-plugin-phoonnx which allows the library to work seamlessly within the Open Voice OS ecosystem.
Once installed, it can be configured as a standard TTS engine and automatically manages model fetching and loading.
"tts": {
"module": "ovos-tts-plugin-phoonnx",
"ovos-tts-plugin-phoonnx": {
"voice": "OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone"
}
}if "voice" is not provided then the first model that supports your language will be selected
voice synthesis parameters usually come from the model .json file, but you can override them (globally) in mycroft.conf
"tts": {
"module": "ovos-tts-plugin-phoonnx",
"ovos-tts-plugin-phoonnx": {
"voice": "OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone",
"enable_phonetic_spellings": true,
"noise_scale": 0.667,
"length_scale": 1,
"noise_w": 0.8,
"add_diacritics": false
}
}Phoonnx includes a command line utility, phoonnx-voices provides a set of tools to manage and interact with the available TTS voice models.
This is particularly useful for pre-downloading models and viewing supported languages.
# Update the local cache of all available voices from upstream sources
phoonnx-voices update-cache
# List all supported languages
phoonnx-voices list-langs
# List all available voices (simple list)
phoonnx-voices list-voices
# List all voices with detailed info
phoonnx-voices list-voices --verbose
# List voices for a specific language (e.g., Portuguese)
phoonnx-voices list-voices --lang pt-PT
# Download the model files for a specific voice ID
phoonnx-voices download OpenVoiceOS/phoonnx_pt-PT_miro_tugaphoneSee the dedicated training.md
phoonnx leverages several external Grapheme-to-Phoneme (G2P) and text-processing libraries to provide flexible and
high-quality phonemization across many languages.
You should prefer phonemizers trained on full sentences vs individual words if available
The core phonemizer classes are summarized in the table below, listing the supported languages, the source library they wrap, and the output alphabets they can generate.
| Language(s) | Phonemizer Class | Source/Library | Output Alphabets |
|---|---|---|---|
| Multilingual | ByT5Phonemizer |
OpenVoiceOS ByT5 ONNX Models | IPA |
| Multilingual | CharsiuPhonemizer |
Charsiu ByT5 ONNX Model | IPA |
| Multilingual | EspeakPhonemizer |
espeak-ng command-line tool |
IPA |
| Multilingual | GruutPhonemizer |
gruut | IPA |
| Multilingual | MisakiPhonemizer |
misaki | IPA |
| Multilingual | TransphonePhonemizer |
transphone | IPA |
| Multilingual | EpitranPhonemizer |
epitran | IPA |
| Mirandese (mwl) | MirandesePhonemizer |
mwl_phonemizer | IPA |
| Arabic (ar) | MantoqPhonemizer |
mantoq | BUCKWALTER, IPA |
| Chinese (zh) | JiebaPhonemizer |
jieba | HANZI |
| Chinese (zh) | G2pMPhonemizer |
g2pC | IPA, Pinyin |
| Chinese (zh) | G2pMPhonemizer |
g2pm | IPA, Pinyin |
| Chinese (zh) | XpinyinPhonemizer |
xpinyin | IPA, Pinyin |
| Chinese (zh) | PypinyinPhonemizer |
pypinyin | IPA, Pinyin |
| English (en) | G2PEnPhonemizer |
g2pE | IPA |
| English (en) | OpenPhonemizer |
OpenPhonemizer | IPA |
| English (en) | DeepPhonemizer |
DeepPhonemizer | IPA / ARPA |
| Galician (gl) | CotoviaPhonemizer |
cotovia | IPA, Native Cotovia Phonemes |
| Hebrew (he) | PhonikudPhonemizer |
phonikud | IPA |
| Japanese (ja) | OpenJTaklPhonemizer |
pyopenjtalk | HEPBURN, KANA |
| Japanese (ja) | CutletPhonemizer |
cutlet | HEPBURN, KUNREI, NIHON |
| Japanese (ja) | PyKakasiPhonemizer |
pykakasi | HEPBURN, KANA, HIRA |
| Korean (ko) | G2PKPhonemizer |
g2pK | IPA, HANGUL |
| Korean (ko) | KoG2PPhonemizer |
KoG2P | IPA, HANGUL |
| Persian (fa) | PersianPhonemizer |
persian_phonemizer | ERAAB, IPA |
| Vietnamese (vi) | VIPhonemePhonemizer |
Viphoneme | IPA |
Phoonnx is built in the shoulders of giants
- jaywalnut310/vits - the original VITS implementation, the back-bone architecture of phoonnx models
- MycroftAI/mimic3 and rhasspy/piper - for inspiration and reference implementation of a phonemizer for pre-processing inputs
Individual languages greatly benefit from domain-specific knowledge, for convenience phoonnx also bundles code from
- uvigo/cotovia for galician phonemization (pre-compiled binaries bundled)
- mush42/mantoq for arabic phonemization
- mush42/libtashkeel for arabic diacritics
- scarletcho/KoG2P for korean phonemization
- stannam/hangul_to_ipa a converter from Hangul to IPA
- chorusai/arpa2ipa a converter from Arpabet to IPA
- PaddlePaddle/PaddleSpeech for chinese number verbalization