Image-Based Handwriting Recognition: Comparing AI Model Accuracy Across Languages

Image-Based Handwriting Recognition: Comparing AI Model Accuracy Across Languages
Source: Getty Images For Unsplash+

In this follow-up experiment, we tested how generative AI models handle handwritten text when provided as image input rather than PDF. Unlike in the earlier test, all models were able to process the images and return readable outputs.

Performance Comparison

🏆 Champion as of October 2: Copilot, Gemini 2.5 Pro, GPT-5

Model Grade
Copilot A
Gemini 2.5 Pro A
GPT-5 A
Claude Sonnet 4.5 B
Mistral B
DeepSeek-V3.2 B
Grok-4 C
Qwen3-Max C

Three models—GPT-5, Copilot, and Gemini 2.5 Pro—produced fully accurate results across English, French, and Hungarian, with Gemini showing a marked improvement compared to the PDF case. Claude Sonnet 4.5, Mistral, and DeepSeek-V3.2 performed consistently with their earlier results, handling English and French reliably but introducing errors in the Hungarian version. Qwen3-Max also advanced compared to the PDF test, correctly reproducing the English and French texts from the images, though still struggling with Hungarian. Grok-4, by contrast, showed weaker performance on image input than on PDFs, with more substantial rewriting of the Hungarian text.

Input

For this experiment, we used the same three handwritten samples as in the previous test—one each in English, French, and Hungarian—but provided them as image files rather than PDFs. The instruction given to each model was deliberately minimal: “Return the text from the attached image.”

Output

GPT-5

GPT-5's performance (accessed on 2 October 2025)

Gemini 2.5 Pro

Gemini 2.5 Pro's performance (accessed on 2 October 2025)

Claude Sonnet 4.5, Mistral, DeepSeek-V3.2, Qwen3-Max

Claude Sonnet 4.5's performance (accessed on 2 October 2025)

Grok-4

Grok-4's performance (accessed on 2 October 2025)

Recommendations

This comparison underlines that the format of the input file can shape OCR performance. While image-based input proved more universally accessible—enabling all models to return readable text—it also raises practical considerations. For short handwritten notes, supplying images is straightforward, but for longer or multi-page documents the need to scan or photograph each page makes this approach less efficient than extracting text directly from PDFs.

The authors used Copilot [Microsoft (2025) Copilot (accessed on 2 October 2025), Large language model (LLM), available at: https://copilot.microsoft.com] to generate the output.