Optimization Guide

Choosing the Right Model

Tscribe uses Whisper models of various sizes. Choosing the right one depends on your audio quality, language, and how much time you have.

Quick Recommendations

Fastest for English

Use tiny-q5_1 or base-q5_1. They are optimized for English and process audio almost instantly.

True Multilingual

Use medium-q5_0. For Whisper, reliable multilingual accuracy starts at the medium size. Best for Ukrainian, Spanish, etc.

Maximum Accuracy

Use large-v3-turbo-q5_0. It's the gold standard for accuracy, optimized for speed without sacrificing quality.

Noisy Audio

Use large-v3. It is the slowest model (3.1 GB) but excels at deciphering complex, noisy, or multi-speaker environments.

Model Comparison

tiny-q5_1

32 MB

~32 MB. Fast, small. Good for English voice commands and simple dictation.

English Only

base-q5_1

60 MB

~60 MB. Optimized for English. Very fast but may struggle with non-English languages.

English Focus

small-q5_1

190 MB

~190 MB. Good balance for English tasks. More accurate than base, but still not recommended for deep multilingual work.

English Focus

medium-q5_0

DEFAULT 539 MB

~540 MB. Best for Ukrainian and non-English audio. Perfect for podcasts and YouTube content.

High Quality

large-v3-turbo-q5_0

FASTEST ACCURACY 574 MB

~574 MB. Same quality as large-v3 but 2x faster. Best overall accuracy for most tasks.

Pro Choice

large-v3

3095 MB

~3.1 GB. Maximum accuracy. Slowest, but best for noisy/complex audio with multiple speakers.

Maximum Depth