Understanding Model Confidence Through Log Probabilities: A Practical OpenAI API Implementation

Log probabilities (logprobs) provide a window into how confident a language model is about its predictions. In this technical implementation, we demonstrate how to access and interpret logprobs via the OpenAI API, using a series of increasingly difficult multiplication tasks. Our experiment reveals that declining confidence scores can effectively signal

Testing Claude Projects' New RAG Feature: Fast Setup, Accurate Retrieval Across 113 Articles

Claude Projects recently introduced a Retrieval-Augmented Generation (RAG) feature that extends its capabilities beyond the standard context window. Unlike traditional approaches that require custom vector databases and API integration, Claude's implementation offers a streamlined, no-code solution for handling large knowledge bases directly within the Projects interface. In this

Building Intelligent Tool Use with the OpenAI API: A Practical Implementation Guide

Tool use (also called function calling) is one of the most powerful capabilities available in modern language models. It allows models to extend beyond text generation by interacting with external systems, databases, or custom logic—making AI agents capable of real-world tasks like checking weather, querying APIs, or running calculations.

Can AI Models Detect AI-Generated Text? Testing GPT-5.2, Claude Sonnet 4.5, and Gemini 3 on Cover Letters

To assess whether leading AI models can reliably distinguish between human-written and AI-generated text, we tested GPT-5.2, Claude Sonnet 4.5, and Gemini 3 on a set of cover letters. The models were asked to identify AI-generated content and then to rewrite letters to appear more human-written. The results

Testing OCR on Handwritten PDFs: Comparing Model Accuracy on English, French, and Hungarian Samples

Optical character recognition (OCR) of handwritten text remains a demanding task, particularly once the focus shifts beyond English. In this experiment, we assessed a range of generative AI models on three handwritten text samples—one each in English, French, and Hungarian—to examine cross-linguistic performance. While accuracy was consistently high

by Rebeka Kiss

Testing Claude Sonnet 4.5 for Academic Slide Design: From Research Papers to Conference Outlines

We tested the new Sonnet 4.5 model on a demanding academic task: transforming a research paper into a structured 15-slide outline for a conference presentation. The prompt required not only conceptual rigour and narrative coherence but also attention to visual communication, with clear guidance on where charts, tables, and

by Rebeka Kiss

Does the Language of the Prompt Matter? Collecting Hungarian Population Data Using Claude Opus 4.1 and Sonnet 4

When using generative AI for structured data collection, the language of the prompt can make a real difference. In our test with Hungarian population statistics, both Claude Opus 4.1 and Sonnet 4 produced accurate outputs when prompted in Hungarian – but with an English prompt, Sonnet 4 generated rounded figures

by Rebeka Kiss