Using Falcon for Writing a Literature Review on the FutureHouse Platform: Useful for Broad Topics, Not for Niche Concepts

Using Falcon for Writing a Literature Review on the FutureHouse Platform: Useful for Broad Topics, Not for Niche Concepts
Source: FutureHouse Platform

The FutureHouse Platform, launched in May 2025, is a domain-specific AI environment designed to support various stages of scientific research. It provides researchers with access to four specialised agents — each tailored to a particular task in the knowledge production pipeline: concise information retrieval (Crow), deep literature synthesis (Falcon), precedent detection (Owl), and chemistry workflow support (Phoenix). In an earlier post, we evaluated Crow’s performance in retrieving targeted legal-political content from scientific literature. This time, we turn to Falcon, the platform’s agent dedicated to writing structured literature reviews. Our aim was to assess whether Falcon could construct a conceptually coherent and source-grounded overview of a specific emerging topic: legislative backsliding. While the topic connects to a broader field with growing literature (democratic backsliding), our evaluation revealed clear limitations in Falcon’s ability to capture conceptual specificity and integrate key contributions.

Choosing the Appropriate Agent for Literature Review Tasks

Among the four agents available on the FutureHouse Platform, Falcon – Deep Search appeared the most appropriate choice for our research objective, as it is explicitly designed to produce long, structured literature reviews and support hypothesis development. Our aim was to evaluate its ability to generate a coherent and well-sourced overview of a specific research area with emerging but distinct contours: legislative backsliding.

Prompt 1

To do so, we submitted the following prompt:

Write a comprehensive literature review on the topic of legislative backsliding. The review should define the concept, outline its theoretical foundations, and summarise the main scholarly debates surrounding it.

Please include key authors and landmark studies, categorise the literature based on analytical approaches (e.g. normative, empirical, comparative), and highlight the most relevant findings regarding causes, mechanisms, and consequences of legislative backsliding.

Where applicable, refer to specific regional or country case studies (e.g. Hungary, Poland, United States) and note any gaps or controversies in the current literature.

Use an academic tone and structure the review clearly with subheadings.

FutureHouse Platform's interface (accessed on 18 May 2025)

Output 1

Although Falcon’s output followed an academic structure and cited reputable sources, the review ultimately failed to address the specific task. Instead of providing a focused overview of legislative backsliding, the agent produced a general discussion of democratic backsliding, with only superficial mentions of the legislative dimension. The section titled “Definition of Legislative Backsliding” relied exclusively on Haggard & Kaufman (2021) framework, which does not define legislative backsliding as a distinct concept. Crucially, Falcon omitted our 2023 article in Parliamentary Affairs, which introduces and operationalises the term.

FutureHouse Platform's interface (accessed on 18 May 2025)
Source: https://academic.oup.com/pa/article/76/4/741/7194612

Prompt 2

In our follow-up prompt, we explicitly asked Falcon to revise its output to focus on legislative backsliding, rather than the broader category of democratic backsliding. We requested that the literature review centre specifically on the erosion of legislative functions and cite sources that define and analyse this concept directly.

Please revise the literature review to focus specifically on legislative backsliding, rather than democratic backsliding more broadly. The goal is to identify and synthesise scholarly work that defines, conceptualises, and analyses legislative backsliding as a distinct phenomenon — particularly in relation to the erosion of legislative norms, weakening of parliamentary oversight, or the instrumentalisation of law-making processes. Please prioritise sources that explicitly address this dimension, including theoretical contributions and empirical case studies (e.g. Hungary, Poland, United States). If the concept is treated as a subcategory of broader backsliding literature, that relationship should be clarified.

Output 2

Despite the instruction, Falcon’s revised output showed no meaningful improvement. It continued to rely almost exclusively on Haggard and Kaufman’s 2021 work on democratic backsliding — a relevant but insufficient source for our task. No additional or more appropriate references were introduced, and our own 2023 article defining and operationalising legislative backsliding was again omitted — notably, even though a previous test using Crow (as documented in an earlier post) confirmed that the platform can in fact retrieve and cite this publication. The result remained thematically misaligned and did not satisfy the specific criteria of the prompt.

FutureHouse Platform's interface (accessed on 18 May 2025)

Recommendations

Despite being explicitly designed for in-depth literature reviews, the agent failed to integrate core sources, misinterpreted the task’s conceptual focus, and repeatedly cited the same general works on democratic backsliding. In contrast, OpenAI’s GPT-4.5 DeepResearch feature — when given the same prompt — produced a more structured, thematically accurate, and well-organised overview.

Of course, GPT-4.5 has its own limitations: it cannot currently generate in-text or footnote-style citations properly and cannot correctly refer to specific page numbers from sources. Nonetheless, given Falcon’s present developmental stage and performance, we recommend relying on DeepResearch function of GPT-4.5 or similar models — though these, too, should currently be treated as first-draft aids rather than final research products.

GPT-4.5's performance (accessed on 18 May 2025)
The authors used the Falcon agent via FutureHouse Platform [FutureHouse AI (2025), Falcon (accessed on 17 May 2025), General-purpose agent, available at: https://www.futurehouse.org/] to generate the output.