
Fırat becomes Firat
A clean sign turns Fırat into Firat.
Turkish glyph benchmark for image models.
A glyph-level benchmark for Turkish text in AI-generated images. Human labels are ground truth; AI judge labels are auxiliary scale scans.
The Hugging Face Dataset is the
canonical artifact. This Space for fge-auto/dotting-benchmark is a lightweight browser for
leaderboard and example images.
Gemini 3.5 Flash labels cover the full corpus and are useful for scanning trends. Final claims should use the human-labeled subset.
| Model | Images | Correct (AI-est.) | Dotted on legible dotless targets | Legible |
|---|---|---|---|---|
| GPT Image 2 | 210 | 97.1% | 1.8% | 100.0% |
| Nano Banana 2 | 210 | 93.3% | 5.4% | 100.0% |
| Nano Banana Pro | 210 | 86.7% | 6.0% | 100.0% |
| GPT Image 1.5 | 210 | 83.8% | 10.7% | 100.0% |
| GPT Image 1 Mini | 207 | 82.6% | 7.9% | 100.0% |
| GPT Image 1 | 210 | 82.4% | 10.1% | 100.0% |
| Grok Imagine Image Quality | 210 | 78.1% | 17.9% | 100.0% |
| Ideogram 4.0 | 210 | 75.2% | 13.9% | 98.6% |
| FLUX.2 [flex] | 210 | 65.7% | 25.0% | 100.0% |
| Krea 2 Medium | 210 | 63.3% | 19.2% | 99.5% |
Curated examples from the benchmark corpus. Human labels are shown where available.

A clean sign turns Fırat into Firat.

The same task is hard, but not impossible.

The sign looks confident. The machine cannot read what it wrote.

The diacritic did not disappear. It fell off.

It translated the word into a scene.

The lights are gorgeous. The middle letter is not.