Advanced Windows GUI for WhisperX with dynamic batch management, drag-and-drop queue, and speaker diarization support.
Topics:
whisperx,gui,transcription,diarization,python-3-13,windows-app,batch-processing
A powerful graphical user interface for WhisperX, designed for batch transcription and speaker diarization of audio files. The project is optimized for Windows and supports dynamic queue management.
- 📦 Batch Processing: Add any number of audio files to the queue.
- ⚡ Dynamic Queue: Add or remove files directly while the processing is active.
- 🖱️ Drag-and-Drop: Easily reorder files in the queue by dragging (using the
☰handle icon). - 🗣️ Diarization: Automatic speaker identification (requires Hugging Face Token).
- 📝 Word-level Timestamps: Highly accurate timestamps for every single word.
- ⚙️ Flexible Settings:
- Choose Whisper models (tiny, base, small, medium, large-v2, large-v3).
- Configure diarization parameters (number of speakers, sensitivity threshold).
- Manage batch size and chunk size to optimize for your VRAM.
- 📄 Multiple Output Formats: Save results in
.srt,.txt,.json, and formatted text modes.
The project uses uv for automatic Python version management and dependency handling.
- Download the repository.
- Install FFmpeg:
- Download from ffmpeg.org.
- Add the
binfolder path to your system'sPATHenvironment variable.
- Run the Installer:
- Execute
install.bat. It will automatically download the correct Python version (3.13), create a virtual environment, and install all dependencies.
- Execute
- Launch the App:
- Use
run.batfor daily usage.
- Use
For diarization to work, you need to obtain an access token:
- Create an account on Hugging Face.
- Accept the model licenses (Accept License):
- Create a token in your profile settings (Access Tokens) and paste it into the application ("Diarization Settings" button).
The project includes an automated test suite to verify GUI logic and queue management.
To run tests:
python -m unittest test_dynamic_queue.pyNote: Tests include mocks for heavy libraries (torch, whisperx), so they can be run even without ML dependencies installed.
- PyTorch 2.6+ Compatibility: The application includes specific fixes to allow loading model weights in recent PyTorch versions.
- Multi-threading: Audio processing runs in a background thread, keeping the GUI responsive.
- Caching: Hugging Face models are stored inside the
.venv/cachefolder, making it easy to keep your system clean.
- oiik/win-gui-whisperx — for the original implementation ideas.
- Habr: Batch Transcription | Habr: WhisperX Guide