PROMPT REVOLUTION

Testing AI Coding Abilities: How GPT 5.2, Opus 4.5, Gemini 3 Pro, and Grok 4.1 Handle Algorithm Problems

As AI models increasingly demonstrate coding capabilities, understanding AI-written codes' reliability and ease of use becomes essential for researchers and practitioners. In this benchmark study, we evaluated four recent models—GPT 5.2 Thinking, Opus 4.5, Gemini 3 Pro, and Grok 4.1 Expert—on two algorithm problems

by Miklós Sebők - Rebeka Kiss - Dávid László • Dec 2, 2025