A Survey on Open-Source Turkish Text-to-Speech
Datasets, Models, and the Absence of Emotion
PART 1 OF 3
This survey is the first step toward our Turkish emotional TTS model.
We are building the dataset and model that this research shows is missing.
Papers Reviewed
Datasets Analyzed
Models Verified
Emotional Speech Datasets
Abstract
Turkish is an agglutinative language spoken by over 83 million people, yet it remains underserved in text-to-speech synthesis. This survey reviews 20 peer-reviewed papers, 10 community speech datasets on HuggingFace, and 21 open-source Turkish-specific TTS models. We evaluate each resource on three axes: dataset openness, model openness, and emotion support. Our findings show that no peer-reviewed work provides Turkish emotional speech data; the highest-quality system (MOS 4.39) is proprietary; 80% of HuggingFace datasets lack licenses; and community contributors have adopted modern architectures (F5-TTS, Orpheus) that have not appeared in academic Turkish TTS publications. We present these findings without prescriptive bias and let the evidence speak for itself.
Key Findings
No Emotional Speech Data
Across all 20 papers and 10 datasets, not a single resource provides Turkish speech annotated with emotion labels. Two community models (Orpheus) offer non-verbal cue tags (laugh, sigh), but no categorical emotion conditioning exists.
Best System is Closed
The highest reported MOS (4.39) belongs to Turkcell's proprietary system trained on 63 hours from a professional voice actress. No data, code, or weights were released. The best open effort achieves MOS 4.49 but with incomplete artifacts.
Licensing is Unreliable
Only 2 of 10 HuggingFace datasets declare any license. The largest dataset mislabels CC-BY-NC-SA-3.0 as CC-BY-SA-3.0. Multiple models have conflicting license metadata and model card restrictions.
80% Lack Documentation
Eight of ten datasets have empty READMEs with no methodology, source attribution, or quality metrics. Independent verification of data quality is impossible for most resources.
Academic vs. Community Gap
Academic papers use Tacotron 2 and FastSpeech 2 (2023). Community contributors deploy F5-TTS, Orpheus, LLaSA, and Dia (2025). But none of the 21 community models report formal MOS scores.
Turkish Lags Behind Tatar
TatarTTS provides 70 hours with MOS 4.54-4.65 under CC-BY-4.0. Turkish, with 15x more speakers, has no equivalent open resource combining dataset, model, and evaluation.
Open-Source Turkish Speech Datasets
| Dataset | Samples | Hours | kHz | License | Speakers | Emotion | Notes |
|---|---|---|---|---|---|---|---|
| Appenlimited/700h-tr | 2,000 | 5.6 | 16 | Unknown | ~20 | No | Mislabeled (says 700h); commercial sample |
| Anilosan15/Turkish_TTS | 30,606 | ~33 | 48 | None | 1 | No | YouTube scrape; copyright concern |
| mukahraman/orpheus-tr | 1,000 | - | 24 | None | - | No | Pre-tokenized; Orpheus-locked |
| falan42/Bentropi (11 parts) | 2,595 | ~10 | 16 | None | 1 | No | YouTube extraction; copyright |
| falan42/Tunc_M | 4,552 | ~10 | 16 | None | 1 | No | YouTube math lectures |
| falan42/Mert-H | 837 | ~3 | 16 | None | 1 | No | YouTube educational; duplicated |
| afkfatih/combined-raw | 81,513 | ~130 | 24 | CC-BY-SA-3.0* | 100s | No | Largest; license mislabeled |
| afkfatih/snac-tokenized | 81,513 | ~130 | 24 | CC-BY-NC-SA | 100s | No | Orpheus-locked |
| Anilosan15/Synthetic | 13,000 | 29 | 16 | CC-BY-4.0 | 4 | No | Fully synthetic; TTS source undisclosed |
| yuserabv/turkish_tts | 30,915 | - | - | None | - | No | SpeechT5 features only; no raw audio |
*Effective license is CC-BY-NC-SA-3.0 due to Khan Academy source component.
Verified Turkish TTS Models
| Model | Architecture | Size | License | DL/mo | Emotion | Notes |
|---|---|---|---|---|---|---|
| F5-TTS (Flow Matching + DiT) | ||||||
| Orkhon-TTS | F5-TTS | 3.4 GB | Apache 2.0 | 77 | No | Alpha; voice cloning |
| marduk-ra/F5-Turkish | F5-TTS | 4.1 GB | CC-BY-NC | - | No | 3 checkpoints; 24 likes |
| Karayakar/F5-Turkish | F5-TTS | 4.1 GB | MIT | - | No | Demo Space available |
| Orpheus (Llama 3B + SNAC) | ||||||
| Karayakar/Orpheus-PT-5000 | Orpheus 3B | 13.2 GB | MIT | 204 | Yes | Emotion tags; most popular |
| Karayakar/Orpheus-GGUF | Orpheus Q5 | 2.4 GB | MIT | 69 | Yes | CPU quantized |
| Cosmobillian/turkish_orpheus | Orpheus F16 | 6.6 GB | Apache 2.0 | 5 | Yes | PT-5000 fine-tune |
| SpeechT5 (~100-145M params) | ||||||
| Omarrran/speecht5_tts | SpeechT5 | 0.5 GB | MIT | 107 | No | Intern exercise |
| deryauysal/cv_tr | SpeechT5 | 0.6 GB | MIT | 10 | No | Common Voice |
| Other | ||||||
| facebook/mms-tts-tur | VITS 36M | 145 MB | CC-BY-NC | 4,116 | No | Most downloaded overall |
| Anilosan15/kani-tts-400m | LFM2 370M | 740 MB | Apache 2.0 | 282 | No | Academic only |
| SalihHub/karagoz-hacivat | XTTS-v2 | 3.8 GB | CC-BY-NC-SA | 0 | No | Cultural theme |
| Piper tr_TR-dfki | VITS ONNX | 63 MB | MIT | - | No | Only Piper Turkish voice |
BibTeX
@article{roxas2026turkishtts,
title={A Literature Survey on Open-Source Turkish Text-to-Speech:
Datasets, Models, and Emotion},
author={Roxas, Daniel Quillan},
year={2026}
}