A Literature Survey on Open-Source Turkish Text-to-Speech: Datasets, Models, and Emotion

Roxas, Daniel Quillan

A Survey on Open-Source Turkish Text-to-Speech

Datasets, Models, and the Absence of Emotion

Daniel Quillan Roxas, Ali Bolat

2026

Code References (CSV)

PART 1 OF 3

This survey is the first step toward our Turkish emotional TTS model.

We are building the dataset and model that this research shows is missing.

20

Papers Reviewed

10

Datasets Analyzed

21

Models Verified

0

Emotional Speech Datasets

Abstract

Turkish is an agglutinative language spoken by over 83 million people, yet it remains underserved in text-to-speech synthesis. This survey reviews 20 peer-reviewed papers, 10 community speech datasets on HuggingFace, and 21 open-source Turkish-specific TTS models. We evaluate each resource on three axes: dataset openness, model openness, and emotion support. Our findings show that no peer-reviewed work provides Turkish emotional speech data; the highest-quality system (MOS 4.39) is proprietary; 80% of HuggingFace datasets lack licenses; and community contributors have adopted modern architectures (F5-TTS, Orpheus) that have not appeared in academic Turkish TTS publications. We present these findings without prescriptive bias and let the evidence speak for itself.

Key Findings

No Emotional Speech Data

Across all 20 papers and 10 datasets, not a single resource provides Turkish speech annotated with emotion labels. Two community models (Orpheus) offer non-verbal cue tags (laugh, sigh), but no categorical emotion conditioning exists.

Best System is Closed

The highest reported MOS (4.39) belongs to Turkcell's proprietary system trained on 63 hours from a professional voice actress. No data, code, or weights were released. The best open effort achieves MOS 4.49 but with incomplete artifacts.

Licensing is Unreliable

Only 2 of 10 HuggingFace datasets declare any license. The largest dataset mislabels CC-BY-NC-SA-3.0 as CC-BY-SA-3.0. Multiple models have conflicting license metadata and model card restrictions.

80% Lack Documentation

Eight of ten datasets have empty READMEs with no methodology, source attribution, or quality metrics. Independent verification of data quality is impossible for most resources.

Academic vs. Community Gap

Academic papers use Tacotron 2 and FastSpeech 2 (2023). Community contributors deploy F5-TTS, Orpheus, LLaSA, and Dia (2025). But none of the 21 community models report formal MOS scores.

Turkish Lags Behind Tatar

TatarTTS provides 70 hours with MOS 4.54-4.65 under CC-BY-4.0. Turkish, with 15x more speakers, has no equivalent open resource combining dataset, model, and evaluation.

Open-Source Turkish Speech Datasets

Dataset	Samples	Hours	kHz	License	Speakers	Emotion	Notes
Appenlimited/700h-tr	2,000	5.6	16	Unknown	~20	No	Mislabeled (says 700h); commercial sample
Anilosan15/Turkish_TTS	30,606	~33	48	None	1	No	YouTube scrape; copyright concern
mukahraman/orpheus-tr	1,000	-	24	None	-	No	Pre-tokenized; Orpheus-locked
falan42/Bentropi (11 parts)	2,595	~10	16	None	1	No	YouTube extraction; copyright
falan42/Tunc_M	4,552	~10	16	None	1	No	YouTube math lectures
falan42/Mert-H	837	~3	16	None	1	No	YouTube educational; duplicated
afkfatih/combined-raw	81,513	~130	24	CC-BY-SA-3.0*	100s	No	Largest; license mislabeled
afkfatih/snac-tokenized	81,513	~130	24	CC-BY-NC-SA	100s	No	Orpheus-locked
Anilosan15/Synthetic	13,000	29	16	CC-BY-4.0	4	No	Fully synthetic; TTS source undisclosed
yuserabv/turkish_tts	30,915	-	-	None	-	No	SpeechT5 features only; no raw audio

*Effective license is CC-BY-NC-SA-3.0 due to Khan Academy source component.

Verified Turkish TTS Models

Model	Architecture	Size	License	DL/mo	Emotion	Notes
F5-TTS (Flow Matching + DiT)
Orkhon-TTS	F5-TTS	3.4 GB	Apache 2.0	77	No	Alpha; voice cloning
marduk-ra/F5-Turkish	F5-TTS	4.1 GB	CC-BY-NC	-	No	3 checkpoints; 24 likes
Karayakar/F5-Turkish	F5-TTS	4.1 GB	MIT	-	No	Demo Space available
Orpheus (Llama 3B + SNAC)
Karayakar/Orpheus-PT-5000	Orpheus 3B	13.2 GB	MIT	204	Yes	Emotion tags; most popular
Karayakar/Orpheus-GGUF	Orpheus Q5	2.4 GB	MIT	69	Yes	CPU quantized
Cosmobillian/turkish_orpheus	Orpheus F16	6.6 GB	Apache 2.0	5	Yes	PT-5000 fine-tune
SpeechT5 (~100-145M params)
Omarrran/speecht5_tts	SpeechT5	0.5 GB	MIT	107	No	Intern exercise
deryauysal/cv_tr	SpeechT5	0.6 GB	MIT	10	No	Common Voice
Other
facebook/mms-tts-tur	VITS 36M	145 MB	CC-BY-NC	4,116	No	Most downloaded overall
Anilosan15/kani-tts-400m	LFM2 370M	740 MB	Apache 2.0	282	No	Academic only
SalihHub/karagoz-hacivat	XTTS-v2	3.8 GB	CC-BY-NC-SA	0	No	Cultural theme
Piper tr_TR-dfki	VITS ONNX	63 MB	MIT	-	No	Only Piper Turkish voice

BibTeX

@article{roxas2026turkishtts,
  title={A Literature Survey on Open-Source Turkish Text-to-Speech:
         Datasets, Models, and Emotion},
  author={Roxas, Daniel Quillan},
  year={2026}
}