Skip to content

TigreGotico/phoonnx

Repository files navigation

Ask DeepWiki

Phoonnx

A Python library for multilingual phonemization and Text-to-Speech (TTS) using ONNX models.


Introduction

phoonnx is a comprehensive toolkit for performing high-quality, efficient TTS inference using ONNX-compatible models. It provides a flexible framework for text normalization, phonemization, and speech synthesis, with built-in support for multiple languages and phonemic alphabets. The library is also designed to work with models trained using phoonnx_train, including utilities for dataset preprocessing and exporting models to the ONNX format.

It supports over 1000 languages and voices from various frameworks (phoonnx, piper, mimic3, coqui, MMS, transformers). The full list can be found in VOICES.md


Features

  • Efficient Inference: Leverages onnxruntime for fast and efficient TTS synthesis.
  • Multilingual Support: Supports a wide range of languages and phonemic alphabets, including IPA, ARPA, Hangul (Korean), and Pinyin (Chinese).
  • Multiple Phonemizers: Integrates with various phonemizers like eSpeak, Gruut, and Epitran to convert text to phonemes.
  • Advanced Text Normalization: Includes robust utilities for expanding contractions and pronouncing numbers and dates.
  • Dataset Preprocessing: Provides a command-line tool to prepare LJSpeech-style datasets for training.
  • Model Export: A script is included to convert trained models into the ONNX format, ready for deployment.

Installation

As phoonnx is available on PyPI, you can install it using pip.

pip install phoonnx

Usage

Synthesizing Speech

The main component for inference is the TTSVoice class. You can load a model and synthesize speech from text as follows:

import wave

from phoonnx.config import VoiceConfig, SynthesisConfig
from phoonnx.voice import TTSVoice

# Load a pre-trained ONNX model and its configuration
voice = TTSVoice.load("model.onnx", "config.json")

# Configure the synthesis parameters (optional)
synthesis_config = SynthesisConfig(
    noise_scale=0.667,
    length_scale=1.0,
    noise_w_scale=0.8,
    enable_phonetic_spellings=True, # apply pronunciation fixes, see "locale" folder in this repo
    add_diacritics=False  # for arabic and hebrew
)

# Synthesize audio from text
text = "Hello, this is a test of the phoonnx library."
slug = f"phoonnx_{voice.config.phoneme_type.value}_{voice.config.lang_code}"
with wave.open(f"{slug}.wav", "wb") as wav_file:
    voice.synthesize_wav(text, wav_file, synthesis_config)

Integration and Management

phoonnx provides out-of-the-box integration for Open Voice OS and a powerful command-line interface for voice model management.

Open Voice OS Plugin

phoonnx includes a native OVOS TTS plugin ovos-tts-plugin-phoonnx which allows the library to work seamlessly within the Open Voice OS ecosystem.

Once installed, it can be configured as a standard TTS engine and automatically manages model fetching and loading.

  "tts": {
    "module": "ovos-tts-plugin-phoonnx",
    "ovos-tts-plugin-phoonnx": {
      "voice": "OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone"
    }
  }

if "voice" is not provided then the first model that supports your language will be selected

voice synthesis parameters usually come from the model .json file, but you can override them (globally) in mycroft.conf

  "tts": {
    "module": "ovos-tts-plugin-phoonnx",
    "ovos-tts-plugin-phoonnx": {
      "voice": "OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone",
      "enable_phonetic_spellings": true,
      "noise_scale": 0.667,
      "length_scale": 1,
      "noise_w": 0.8,
      "add_diacritics": false
    }
  }

Command Line Interface (CLI)

Phoonnx includes a command line utility, phoonnx-voices provides a set of tools to manage and interact with the available TTS voice models.

This is particularly useful for pre-downloading models and viewing supported languages.

Usage

# Update the local cache of all available voices from upstream sources
phoonnx-voices update-cache

# List all supported languages
phoonnx-voices list-langs

# List all available voices (simple list)
phoonnx-voices list-voices

# List all voices with detailed info
phoonnx-voices list-voices --verbose

# List voices for a specific language (e.g., Portuguese)
phoonnx-voices list-voices --lang pt-PT

# Download the model files for a specific voice ID
phoonnx-voices download OpenVoiceOS/phoonnx_pt-PT_miro_tugaphone

Training

See the dedicated training.md


Supported Phonemizers

phoonnx leverages several external Grapheme-to-Phoneme (G2P) and text-processing libraries to provide flexible and high-quality phonemization across many languages.

You should prefer phonemizers trained on full sentences vs individual words if available

The core phonemizer classes are summarized in the table below, listing the supported languages, the source library they wrap, and the output alphabets they can generate.


Language(s) Phonemizer Class Source/Library Output Alphabets
Multilingual ByT5Phonemizer OpenVoiceOS ByT5 ONNX Models IPA
Multilingual CharsiuPhonemizer Charsiu ByT5 ONNX Model IPA
Multilingual EspeakPhonemizer espeak-ng command-line tool IPA
Multilingual GruutPhonemizer gruut IPA
Multilingual MisakiPhonemizer misaki IPA
Multilingual TransphonePhonemizer transphone IPA
Multilingual EpitranPhonemizer epitran IPA
Mirandese (mwl) MirandesePhonemizer mwl_phonemizer IPA
Arabic (ar) MantoqPhonemizer mantoq BUCKWALTER, IPA
Chinese (zh) JiebaPhonemizer jieba HANZI
Chinese (zh) G2pMPhonemizer g2pC IPA, Pinyin
Chinese (zh) G2pMPhonemizer g2pm IPA, Pinyin
Chinese (zh) XpinyinPhonemizer xpinyin IPA, Pinyin
Chinese (zh) PypinyinPhonemizer pypinyin IPA, Pinyin
English (en) G2PEnPhonemizer g2pE IPA
English (en) OpenPhonemizer OpenPhonemizer IPA
English (en) DeepPhonemizer DeepPhonemizer IPA / ARPA
Galician (gl) CotoviaPhonemizer cotovia IPA, Native Cotovia Phonemes
Hebrew (he) PhonikudPhonemizer phonikud IPA
Japanese (ja) OpenJTaklPhonemizer pyopenjtalk HEPBURN, KANA
Japanese (ja) CutletPhonemizer cutlet HEPBURN, KUNREI, NIHON
Japanese (ja) PyKakasiPhonemizer pykakasi HEPBURN, KANA, HIRA
Korean (ko) G2PKPhonemizer g2pK IPA, HANGUL
Korean (ko) KoG2PPhonemizer KoG2P IPA, HANGUL
Persian (fa) PersianPhonemizer persian_phonemizer ERAAB, IPA
Vietnamese (vi) VIPhonemePhonemizer Viphoneme IPA

Credits

Phoonnx is built in the shoulders of giants

Individual languages greatly benefit from domain-specific knowledge, for convenience phoonnx also bundles code from

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages