model comparison

model comparison

Unlocking Large Language Models via API: Capabilities, Access, and Practical Considerations

Accessing large language models (LLMs) through APIs opens up research and development opportunities that go well beyond the limits of traditional chat interfaces. For researchers API access enables language models to be built directly into bespoke workflows, analytical tools, or automated processes—supporting tasks such as large-scale text analysis, data

Benchmarking GenAI Models for Penguin Species Prediction: Grok 3, DeepSeek-V3, and Qwen2.5-Max Delivered Top Results

How well can today’s leading GenAI models classify real-world biodiversity data—without bespoke code or traditional machine learning pipelines? In this study, we benchmarked a range of large language models on the task of predicting penguin species from tabular ecological measurements, including both numerical and categorical variables. Using a

Comparing the FutureHouse Platform’s Falcon Agent and OpenAI’s o3 for Literature Search on Machine Coding for the Comparative Agendas Project

Having previously explored the FutureHouse Platform’s agents in tasks such as identifying tailor-made laws and generating a literature review on legislative backsliding, we now directly compare its Falcon agent and OpenAI’s o3. Our aim was to assess their performance on a focused literature search task: compiling a ranked

Identifying Primary Texts Where Deleuze and Foucault Critique Marxist Theory: A Literature Search Prompt

Can large language models return accurate results when asked to find real academic texts written by specific philosophers on a specific topic? In this case, the topic was Marxist theory — and the instruction was to list peer-reviewed publications in which Gilles Deleuze or Michel Foucault offer a critique of it.

Current Limitations of GenAI Models in Data Visualisation: Lessons from a Model Comparison Case Study

In earlier explorations, we identified the GenAI models that appeared most promising for data visualisation tasks—models that demonstrated strong code generation capabilities, basic support for data wrangling, and compatibility with popular Python libraries such as Matplotlib and Seaborn. In this follow-up case study, we examine a different dimension: rather

Claude 3.7 Sonnet, GPT-4o and GPT-4.5 as Top Choices for Data Visualisation with GenAI

As data visualisation becomes an increasingly integral part of communicating research and insights, the choice of generative AI models capable of handling both code and chart rendering has never been more important. In this post, we explore which models can be reliably used for visual tasks—such as generating Python-based

Does the Language of the Prompt Matter? Exploring GenAI Response Differences by Query Language

The growing use of generative AI tools in research and professional contexts raises important questions about linguistic bias and consistency in model outputs. This short study explores whether the language of a prompt—specifically, English versus Hungarian—affects the content, tone, and legal precision of responses produced by a state-of-the-art