Optimization Guide

Choosing the Right Model

Tscribe uses Whisper models of various sizes. Choosing the right one depends on your audio quality, language, and how much time you have.

Quick Recommendations

Use tiny-q5_1 or base-q5_1. They are optimized for English and process audio almost instantly.

Use medium-q5_0. For Whisper, reliable multilingual accuracy starts at the medium size. Best for Ukrainian, Spanish, etc.

Use large-v3-turbo-q5_0. It's the gold standard for accuracy, optimized for speed without sacrificing quality.

Use large-v3. It is the slowest model (3.1 GB) but excels at deciphering complex, noisy, or multi-speaker environments.

32 MB

~32 MB. Fast, small. Good for English voice commands and simple dictation.

English Only

60 MB

~60 MB. Optimized for English. Very fast but may struggle with non-English languages.

English Focus

190 MB

~190 MB. Good balance for English tasks. More accurate than base, but still not recommended for deep multilingual work.

English Focus

DEFAULT 539 MB

~540 MB. Best for Ukrainian and non-English audio. Perfect for podcasts and YouTube content.

High Quality

FASTEST ACCURACY 574 MB

~574 MB. Same quality as large-v3 but 2x faster. Best overall accuracy for most tasks.

Pro Choice

3095 MB

~3.1 GB. Maximum accuracy. Slowest, but best for noisy/complex audio with multiple speakers.

Maximum Depth