BLEU: support tokenizer string alias and clarify input edge cases #730

Vangmay · 2026-01-28T01:01:06Z

This PR makes a few small, backward-compatible improvements to the BLEU metric implementation to improve usability and robustness.

Support a string alias for the default tokenizer (e.g. tokenizer="13a") in addition to passing a callable.
Add explicit validation for length mismatches and empty input lists.

These changes do not modify the default BLEU computation or scoring logic and are intended to make behavior clearer and easier to debug for users.

Fixes #729

Add minor improvements to bleu

6499a4d

Provide feedback