ditong

A multi-language lexicon toolkit for building cross-language word dictionaries with full metadata tracking.

Quick Start

# Install
cd go && go build -o ditong ./cmd

# Run interactive mode
./ditong

# Or with flags
./ditong --languages en,tr --min-length 5 --max-length 8 --parallel

Features

Multi-language normalization — Turkish, German, French, Spanish → ASCII
Parallel processing — Bounded worker pools with channel-based job distribution
Similarity search — BK-tree for fuzzy matching (~3.7µs per query)
IPA transcription — Rule-based phonetic transcription
Synthesis builder — Cross-language word unions with filtering

Performance

BenchmarkLevenshteinDistance    11M ops    107 ns/op    128 B/op
BenchmarkBKTreeSearch           350K ops   3.7 µs/op    4.1 KB/op

The Levenshtein implementation uses a two-row matrix (O(min(n,m)) space) rather than the full matrix. BK-tree queries exploit the triangle inequality for pruning, typically searching <10% of nodes.

CLI Options

Flag	Default	Description
`--languages`	`en`	Comma-separated language codes
`--min-length`	`5`	Minimum word length
`--max-length`	`8`	Maximum word length
`--parallel`	`true`	Enable parallel processing
`--workers`	`0` (auto)	Number of parallel workers
`--ipa`	`false`	Generate IPA transcriptions
`--cursewords`	`false`	Include profanity dictionaries
`--quiet`	`false`	Suppress progress output

How It Works

Normalization: Characters like ç, ş, ğ (Turkish) or ä, ö, ü (German) map to ASCII equivalents. This means care (EN) and çare (TR) become the same identifier, with both sources tracked.

Parallel Build: Uses a bounded worker pool pattern—goroutines pull from a buffered channel rather than spawning unbounded. This keeps memory predictable under load.

Similarity Search: BK-trees partition words by edit distance. For a query, only branches where |node_distance - query_distance| ≤ max_distance need searching. This gives sublinear lookup for fuzzy matching.

Output

{
  "normalized": "care",
  "length": 4,
  "sources": [
    {"language": "en", "original_form": "care"},
    {"language": "tr", "original_form": "çare"}
  ],
  "languages": ["en", "tr"],
  "ipa": "/kɛər/"
}

Supported Languages

en, tr, de, fr, es, it, pt, nl, pl, ru

Development

cd go
go test -v ./...
go test -bench=. ./internal/similarity/

License

GPL-3.0 — see LICENSE. Commercial licensing: sales@rahatol.com

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitea/workflows		.gitea/workflows
.github/workflows		.github/workflows
benchmarks		benchmarks
go		go
scripts		scripts
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.toml		config.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ditong

Quick Start

Features

Performance

CLI Options

How It Works

Output

Supported Languages

Development

License

About

Uh oh!

Releases

Packages

Languages

License

sinanm89/ditong

Folders and files

Latest commit

History

Repository files navigation

ditong

Quick Start

Features

Performance

CLI Options

How It Works

Output

Supported Languages

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages