Optical character recognition (OCR) of handwritten text remains a demanding task, particularly once the focus shifts beyond English. In this experiment, we assessed a range of generative AI models on three handwritten text samples—one each in English, French, and Hungarian—to examine cross-linguistic performance. While accuracy was consistently high for the English and, to a large extent, the French sample, the Hungarian text revealed significant variation across models, with only Copilot managing to reproduce it without substantive errors.
Performance Comparison
🏆 Champion as of October 2: Copilot, Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, Mistral
Model | Grade |
---|---|
Copilot | A |
Gemini 2.5 Pro | B |
Grok-4 | B |
Claude Sonnet 4.5 | B |
Mistral | B |
DeepSeek-V3.2 | C |
GPT-5 | E |
Qwen3-Max | E |
The results show notable variation across languages. Copilot was the only model to reproduce English, French, and Hungarian texts accurately. Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, and Mistral handled all three languages well overall, though with minor inaccuracies in Hungarian. DeepSeek-V3.2 produced acceptable outputs in English and French but substantially altered the Hungarian text, while GPT-5 and Qwen3-Max did not generate usable results for any language.
Input file
For this experiment, we prepared three separate handwritten PDFs, each containing the same short text about generative AI. The text was reproduced in English, French, and Hungarian versions, allowing us to test how different models handled OCR tasks across languages under comparable conditions.

Prompt
The attached PDF contains handwritten text. Please carefully interpret the handwriting and return the complete text in plain, readable form. Correct any obvious spelling errors caused by handwriting, so that words are accurately recognised. Do not summarise, shorten, or rephrase the content. Preserve the original meaning, order, and structure of the text.
The prompt instructed each model to reproduce the handwritten PDF text without summarising or rephrasing, while correcting obvious spelling errors. The explicit correction instruction was essential: without it, performance—particularly on the Hungarian handwriting—dropped significantly, with models misreading or fragmenting words that could otherwise be reconstructed.
Output
Copilot



Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, Mistral

DeepSeek-V3.2

GPT-5


Qwen3-Max

Recommendations
This comparison shows that accuracy in handwritten OCR varies strongly by language. While most models produced acceptable results in English and French, Hungarian exposed more pronounced weaknesses. Copilot was the only model to deliver fully accurate outputs across all three texts. The file format also plays a role, and in a follow-up post we will test how performance changes when models are asked to process handwriting from image files rather than PDFs.
The authors used Copilot [Microsoft (2025) Copilot (accessed on 2 October 2025), Large language model (LLM), available at: https://copilot.microsoft.com] to generate the output.