model comparison

model comparison

Comparing the FutureHouse Platform’s Falcon Agent and OpenAI’s o3 for Literature Search on Machine Coding for the Comparative Agendas Project

Having previously explored the FutureHouse Platform’s agents in tasks such as identifying tailor-made laws and generating a literature review on legislative backsliding, we now directly compare its Falcon agent and OpenAI’s o3. Our aim was to assess their performance on a focused literature search task: compiling a ranked

Identifying Primary Texts Where Deleuze and Foucault Critique Marxist Theory: A Literature Search Prompt

Can large language models return accurate results when asked to find real academic texts written by specific philosophers on a specific topic? In this case, the topic was Marxist theory — and the instruction was to list peer-reviewed publications in which Gilles Deleuze or Michel Foucault offer a critique of it.

Current Limitations of GenAI Models in Data Visualisation: Lessons from a Model Comparison Case Study

In earlier explorations, we identified the GenAI models that appeared most promising for data visualisation tasks—models that demonstrated strong code generation capabilities, basic support for data wrangling, and compatibility with popular Python libraries such as Matplotlib and Seaborn. In this follow-up case study, we examine a different dimension: rather

Claude 3.7 Sonnet, GPT-4o and GPT-4.5 as Top Choices for Data Visualisation with GenAI

As data visualisation becomes an increasingly integral part of communicating research and insights, the choice of generative AI models capable of handling both code and chart rendering has never been more important. In this post, we explore which models can be reliably used for visual tasks—such as generating Python-based

Does the Language of the Prompt Matter? Exploring GenAI Response Differences by Query Language

The growing use of generative AI tools in research and professional contexts raises important questions about linguistic bias and consistency in model outputs. This short study explores whether the language of a prompt—specifically, English versus Hungarian—affects the content, tone, and legal precision of responses produced by a state-of-the-art

Harnessing GenAI for Searching Literature: Current Limitations and Practical Considerations

Collecting and reviewing relevant literature is often one of the most time-consuming parts of academic research. We tested whether this process could be faster using a single prompt with different generative AI models. To explore this, we prompted each model to return accurate academic articles on ‘legislative backsliding’. The outcome