OCR Copilot recommendations prompt library

Testing OCR on Handwritten PDFs: Comparing Model Accuracy on English, French, and Hungarian Samples

Oct 2, 2025

4 min read

Testing OCR on Handwritten PDFs: Comparing Model Accuracy on English, French, and Hungarian Samples — Source: Getty Images For Unsplash+

Optical character recognition (OCR) of handwritten text remains a demanding task, particularly once the focus shifts beyond English. In this experiment, we assessed a range of generative AI models on three handwritten text samples—one each in English, French, and Hungarian—to examine cross-linguistic performance. While accuracy was consistently high for the English and, to a large extent, the French sample, the Hungarian text revealed significant variation across models, with only Copilot managing to reproduce it without substantive errors.

Performance Comparison

🏆 Champion as of October 2: Copilot, Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, Mistral

Model	Grade
Copilot	A
Gemini 2.5 Pro	B
Grok-4	B
Claude Sonnet 4.5	B
Mistral	B
DeepSeek-V3.2	C
GPT-5	E
Qwen3-Max	E

The results show notable variation across languages. Copilot was the only model to reproduce English, French, and Hungarian texts accurately. Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, and Mistral handled all three languages well overall, though with minor inaccuracies in Hungarian. DeepSeek-V3.2 produced acceptable outputs in English and French but substantially altered the Hungarian text, while GPT-5 and Qwen3-Max did not generate usable results for any language.

Input file

For this experiment, we prepared three separate handwritten PDFs, each containing the same short text about generative AI. The text was reproduced in English, French, and Hungarian versions, allowing us to test how different models handled OCR tasks across languages under comparable conditions.

ocr_handwriting_eng

ocr_handwriting_eng.pdf

149 KB

ocr_handwriting_fr

ocr_handwriting_fr.pdf

369 KB

ocr_handwriting_hu

ocr_handwriting_hu.pdf

280 KB

Prompt

The attached PDF contains handwritten text. Please carefully interpret the handwriting and return the complete text in plain, readable form. Correct any obvious spelling errors caused by handwriting, so that words are accurately recognised. Do not summarise, shorten, or rephrase the content. Preserve the original meaning, order, and structure of the text.

The prompt instructed each model to reproduce the handwritten PDF text without summarising or rephrasing, while correcting obvious spelling errors. The explicit correction instruction was essential: without it, performance—particularly on the Hungarian handwriting—dropped significantly, with models misreading or fragmenting words that could otherwise be reconstructed.

Output

Copilot

Copilot's performance (accessed on 2 October 2025)

Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, Mistral

Gemini 2.5 Pro's performance (accessed on 2 October 2025)

DeepSeek-V3.2

GPT-5

Qwen3-Max

Recommendations

This comparison shows that accuracy in handwritten OCR varies strongly by language. While most models produced acceptable results in English and French, Hungarian exposed more pronounced weaknesses. Copilot was the only model to deliver fully accurate outputs across all three texts. The file format also plays a role, and in a follow-up post we will test how performance changes when models are asked to process handwriting from image files rather than PDFs.

The authors used Copilot [Microsoft (2025) Copilot (accessed on 2 October 2025), Large language model (LLM), available at: https://copilot.microsoft.com] to generate the output.