Phoonnx

A Python library for multilingual phonemization and Text-to-Speech (TTS) using ONNX models.

Introduction

phoonnx is a comprehensive toolkit for performing high-quality, efficient TTS inference using ONNX-compatible models. It provides a flexible framework for text normalization, phonemization, and speech synthesis, with built-in support for multiple languages and phonemic alphabets. The library is also designed to work with models trained using phoonnx_train, including utilities for dataset preprocessing and exporting models to the ONNX format.

It supports over 1000 languages and voices from various frameworks (phoonnx, piper, mimic3, coqui, MMS, transformers). The full list can be found in VOICES.md

Features

Efficient Inference: Leverages onnxruntime for fast and efficient TTS synthesis.
Multilingual Support: Supports a wide range of languages and phonemic alphabets, including IPA, ARPA, Hangul (Korean), and Pinyin (Chinese).
Multiple Phonemizers: Integrates with various phonemizers like eSpeak, Gruut, and Epitran to convert text to phonemes.
Advanced Text Normalization: Includes robust utilities for expanding contractions and pronouncing numbers and dates.
Dataset Preprocessing: Provides a command-line tool to prepare LJSpeech-style datasets for training.
Model Export: A script is included to convert trained models into the ONNX format, ready for deployment.

Installation

As phoonnx is available on PyPI, you can install it using pip.

pip install phoonnx

Usage

Synthesizing Speech

The main component for inference is the TTSVoice class. You can load a model and synthesize speech from text as follows:

import wave

from phoonnx.config import VoiceConfig, SynthesisConfig
from phoonnx.voice import TTSVoice

# Load a pre-trained ONNX model and its configuration
voice = TTSVoice.load("model.onnx", "config.json")

# Configure the synthesis parameters (optional)
synthesis_config = SynthesisConfig(
    noise_scale=0.667,
    length_scale=1.0,
    noise_w_scale=0.8,
    enable_phonetic_spellings=True, # apply pronunciation fixes, see "locale" folder in this repo
    add_diacritics=False  # for arabic and hebrew
)

# Synthesize audio from text
text = "Hello, this is a test of the phoonnx library."
slug = f"phoonnx_{voice.config.phoneme_type.value}_{voice.config.lang_code}"
with wave.open(f"{slug}.wav", "wb") as wav_file:
    voice.synthesize_wav(text, wav_file, synthesis_config)

Integration and Management

phoonnx provides out-of-the-box integration for Open Voice OS and a powerful command-line interface for voice model management.

Open Voice OS Plugin

phoonnx includes a native OVOS TTS plugin ovos-tts-plugin-phoonnx which allows the library to work seamlessly within the Open Voice OS ecosystem.

Once installed, it can be configured as a standard TTS engine and automatically manages model fetching and loading.

  "tts": {
    "module": "ovos-tts-plugin-phoonnx",
    "ovos-tts-plugin-phoonnx": {
      "voice": "OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone"
    }
  }

if "voice" is not provided then the first model that supports your language will be selected

voice synthesis parameters usually come from the model .json file, but you can override them (globally) in mycroft.conf

  "tts": {
    "module": "ovos-tts-plugin-phoonnx",
    "ovos-tts-plugin-phoonnx": {
      "voice": "OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone",
      "enable_phonetic_spellings": true,
      "noise_scale": 0.667,
      "length_scale": 1,
      "noise_w": 0.8,
      "add_diacritics": false
    }
  }

Command Line Interface (CLI)

Phoonnx includes a command line utility, phoonnx-voices provides a set of tools to manage and interact with the available TTS voice models.

This is particularly useful for pre-downloading models and viewing supported languages.

Usage

# Update the local cache of all available voices from upstream sources
phoonnx-voices update-cache

# List all supported languages
phoonnx-voices list-langs

# List all available voices (simple list)
phoonnx-voices list-voices

# List all voices with detailed info
phoonnx-voices list-voices --verbose

# List voices for a specific language (e.g., Portuguese)
phoonnx-voices list-voices --lang pt-PT

# Download the model files for a specific voice ID
phoonnx-voices download OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone

Training

See the dedicated training.md

Supported Phonemizers

phoonnx leverages several external Grapheme-to-Phoneme (G2P) and text-processing libraries to provide flexible and high-quality phonemization across many languages.

You should prefer phonemizers trained on full sentences vs individual words if available

The core phonemizer classes are summarized in the table below, listing the supported languages, the source library they wrap, and the output alphabets they can generate.

Language(s)	Phonemizer Class	Source/Library	Output Alphabets
Multilingual	`ByT5Phonemizer`	OpenVoiceOS ByT5 ONNX Models	IPA
Multilingual	`CharsiuPhonemizer`	Charsiu ByT5 ONNX Model	IPA
Multilingual	`EspeakPhonemizer`	`espeak-ng` command-line tool	IPA
Multilingual	`GruutPhonemizer`	gruut	IPA
Multilingual	`MisakiPhonemizer`	misaki	IPA
Multilingual	`TransphonePhonemizer`	transphone	IPA
Multilingual	`EpitranPhonemizer`	epitran	IPA
Mirandese (mwl)	`MirandesePhonemizer`	mwl_phonemizer	IPA
Arabic (ar)	`MantoqPhonemizer`	mantoq	BUCKWALTER, IPA
Chinese (zh)	`JiebaPhonemizer`	jieba	HANZI
Chinese (zh)	`G2pMPhonemizer`	g2pC	IPA, Pinyin
Chinese (zh)	`G2pMPhonemizer`	g2pm	IPA, Pinyin
Chinese (zh)	`XpinyinPhonemizer`	xpinyin	IPA, Pinyin
Chinese (zh)	`PypinyinPhonemizer`	pypinyin	IPA, Pinyin
English (en)	`G2PEnPhonemizer`	g2pE	IPA
English (en)	`OpenPhonemizer`	OpenPhonemizer	IPA
English (en)	`DeepPhonemizer`	DeepPhonemizer	IPA / ARPA
Galician (gl)	`CotoviaPhonemizer`	cotovia	IPA, Native Cotovia Phonemes
Hebrew (he)	`PhonikudPhonemizer`	phonikud	IPA
Japanese (ja)	`OpenJTaklPhonemizer`	pyopenjtalk	HEPBURN, KANA
Japanese (ja)	`CutletPhonemizer`	cutlet	HEPBURN, KUNREI, NIHON
Japanese (ja)	`PyKakasiPhonemizer`	pykakasi	HEPBURN, KANA, HIRA
Korean (ko)	`G2PKPhonemizer`	g2pK	IPA, HANGUL
Korean (ko)	`KoG2PPhonemizer`	KoG2P	IPA, HANGUL
Persian (fa)	`PersianPhonemizer`	persian_phonemizer	ERAAB, IPA
Vietnamese (vi)	`VIPhonemePhonemizer`	Viphoneme	IPA

Credits

Phoonnx is built in the shoulders of giants

jaywalnut310/vits - the original VITS implementation, the back-bone architecture of phoonnx models
MycroftAI/mimic3 and rhasspy/piper - for inspiration and reference implementation of a phonemizer for pre-processing inputs

Individual languages greatly benefit from domain-specific knowledge, for convenience phoonnx also bundles code from

uvigo/cotovia for galician phonemization (pre-compiled binaries bundled)
mush42/mantoq for arabic phonemization
mush42/libtashkeel for arabic diacritics
scarletcho/KoG2P for korean phonemization
stannam/hangul_to_ipa a converter from Hangul to IPA
chorusai/arpa2ipa a converter from Arpabet to IPA
PaddlePaddle/PaddleSpeech for chinese number verbalization

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github		.github
phoonnx		phoonnx
phoonnx_train		phoonnx_train
requirements		requirements
scripts		scripts
tests		tests
CHANGELOG.md		CHANGELOG.md
README.md		README.md
TRAINING.md		TRAINING.md
VOICES.md		VOICES.md
renovate.json		renovate.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phoonnx

Introduction

Features

Installation

Usage

Synthesizing Speech

Integration and Management

Open Voice OS Plugin

Command Line Interface (CLI)

Usage

Training

Supported Phonemizers

Credits

About

Uh oh!

Releases 49

Packages

Contributors 4

Uh oh!

Languages

TigreGotico/phoonnx

Folders and files

Latest commit

History

Repository files navigation

Phoonnx

Introduction

Features

Installation

Usage

Synthesizing Speech

Integration and Management

Open Voice OS Plugin

Command Line Interface (CLI)

Usage

Training

Supported Phonemizers

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 49

Packages 0

Contributors 4

Uh oh!

Languages

Packages