PROMPT REVOLUTION

Image-Based Handwriting Recognition: Comparing AI Model Accuracy Across Languages

In this follow-up experiment, we tested how generative AI models handle handwritten text when provided as image input rather than PDF. Unlike in the earlier test, all models were able to process the images and return readable outputs. Performance Comparison 🏆 Champion as of October 2: Copilot, Gemini 2.5 Pro,

by Miklós Sebők - Rebeka Kiss • Oct 7, 2025

model comparison data collection limitations

Assessing GenAI Models in Retrieving Official NBA Game Data: Capabilities and Limitations

In this post we present a comparative test of several generative AI models, focusing on their ability to retrieve official sports statistics. The task involved collecting NBA game data through a single prompt, with the expectation that the models would accurately locate, extract, and structure the requested information. While the

by Miklós Sebők - Rebeka Kiss • Sep 10, 2025

agents OpenAI Agent manus.ai

Can OpenAI Agent Support Academic Research? A Practical Comparison with Manus.ai and Perplexity

We tested the new OpenAI Agent to assess its usefulness in academic research tasks, comparing it directly with Manus.ai and Perplexity’s research mode. Our aim was to evaluate how effectively each tool finds relevant scholarly and policy sources, navigates restricted websites (including captchas and Cloudflare protections), and allows

by Miklós Sebők - Rebeka Kiss • Aug 4, 2025

API model comparison recommendations

Unlocking Large Language Models via API: Capabilities, Access, and Practical Considerations

Accessing large language models (LLMs) through APIs opens up research and development opportunities that go well beyond the limits of traditional chat interfaces. For researchers API access enables language models to be built directly into bespoke workflows, analytical tools, or automated processes—supporting tasks such as large-scale text analysis, data

by Miklós Sebők - Rebeka Kiss • May 31, 2025

Grok-3 DeepSeek-V3 Qwen2.5-Max

Benchmarking GenAI Models for Penguin Species Prediction: Grok 3, DeepSeek-V3, and Qwen2.5-Max Delivered Top Results

How well can today’s leading GenAI models classify real-world biodiversity data—without bespoke code or traditional machine learning pipelines? In this study, we benchmarked a range of large language models on the task of predicting penguin species from tabular ecological measurements, including both numerical and categorical variables. Using a

by Miklós Sebők - Rebeka Kiss • May 30, 2025

agents FutureHouse Platform searching for literature

Comparing the FutureHouse Platform’s Falcon Agent and OpenAI’s o3 for Literature Search on Machine Coding for the Comparative Agendas Project

Having previously explored the FutureHouse Platform’s agents in tasks such as identifying tailor-made laws and generating a literature review on legislative backsliding, we now directly compare its Falcon agent and OpenAI’s o3. Our aim was to assess their performance on a focused literature search task: compiling a ranked

by Miklós Sebők - Rebeka Kiss • May 19, 2025

GPT-4.5 Gemini 2.5 Flash Perplexity

Identifying Primary Texts Where Deleuze and Foucault Critique Marxist Theory: A Literature Search Prompt

Can large language models return accurate results when asked to find real academic texts written by specific philosophers on a specific topic? In this case, the topic was Marxist theory — and the instruction was to list peer-reviewed publications in which Gilles Deleuze or Michel Foucault offer a critique of it.

by Miklós Sebők - Rebeka Kiss • May 2, 2025

OCR model comparison limitations

GenAI vs. Scanned PDFs: Assessing LLMs’ Optical Character Recognition Capabilities

In two recent blog posts, we demonstrated how scanned PDFs can be processed using Mistral’s tools—first via a prompt-based approach in Mistral Le Chat, and then through the Mistral OCR API. Both methods effectively extracted readable, structured text from complex documents. But how well do other generative AI

by Miklós Sebők - Rebeka Kiss • Apr 18, 2025

data visualisation model comparison limitations

Current Limitations of GenAI Models in Data Visualisation: Lessons from a Model Comparison Case Study

In earlier explorations, we identified the GenAI models that appeared most promising for data visualisation tasks—models that demonstrated strong code generation capabilities, basic support for data wrangling, and compatibility with popular Python libraries such as Matplotlib and Seaborn. In this follow-up case study, we examine a different dimension: rather

by Miklós Sebők - Rebeka Kiss • Apr 16, 2025

data visualisation model comparison Claude 3.7 Sonnet

Claude 3.7 Sonnet, GPT-4o and GPT-4.5 as Top Choices for Data Visualisation with GenAI

As data visualisation becomes an increasingly integral part of communicating research and insights, the choice of generative AI models capable of handling both code and chart rendering has never been more important. In this post, we explore which models can be reliably used for visual tasks—such as generating Python-based

by Miklós Sebők - Rebeka Kiss • Apr 13, 2025

hallucination Grok-3 limitations

Is the User Always Right? Only Grok-3 Fact-Checked the Input Rather Than Accepting the User’s Claim

We asked generative AI models to draft a short news-style summary based on a recently published academic article. After generating the initial text, we prompted the models again — this time falsely claiming that a specific expression they had used did not appear in the original publication and had never been

by Miklós Sebők - Rebeka Kiss • Apr 9, 2025

file handling Mistral GPT-4o

Which Models Can Generate Downloadable .txt Files? A Hands-On Test for Survey Data Output

If your goal is to generate a downloadable .txt file—be it for survey responses, formatted data, or any structured text—not all models can deliver. In this case study, we tested various models to see whether they can create actual .txt files rather than just displaying text or offering

by Miklós Sebők - Rebeka Kiss • Apr 8, 2025