limitations

limitations

Evolving File Handling in GenAI Models: Stronger Input Support, Persistent Output Limitations

In a previous blog, we examined the file handling capabilities of leading GenAI interfaces. That analysis detailed which formats they could process reliably and where they encountered difficulties—particularly with structured data and technical file types. Since then, the landscape has shifted. While downloadable file generation still faces notable constraints,

Can OpenAI Agent Support Academic Research? A Practical Comparison with Manus.ai and Perplexity

We tested the new OpenAI Agent to assess its usefulness in academic research tasks, comparing it directly with Manus.ai and Perplexity’s research mode. Our aim was to evaluate how effectively each tool finds relevant scholarly and policy sources, navigates restricted websites (including captchas and Cloudflare protections), and allows

Testing Academia.edu’s AI Reviewer: Technical Errors and Template-Based Feedback

Academia.edu has recently introduced an AI-based Reviewer tool, positioned as a solution for generating structured feedback on academic manuscripts. While the concept is promising, our evaluation revealed a number of significant limitations. We encountered recurring technical issues during both file uploads and Google Docs integration, often requiring multiple attempts

Testing the Limits of AI Peer Review: When Even Ian Goodfellow Gets Rejected by OpenReviewer

High-quality feedback is essential for researchers aiming to improve their work and navigate the peer review process more effectively. Ideally, such feedback would be available before formal submission—allowing authors to identify the strengths and weaknesses of their research early on. This is precisely the promise of OpenReviewer, an automated

Slide Generation from Scientific Articles: Putting Manus’s New Slide Generator to the Test

In this post, we examine the performance of Manus’s newly updated slide generation tool when applied to a peer-reviewed scientific article. The developers claim recent improvements focused on enhancing the tool’s ability to support academic communication. To test these capabilities, we selected a published study in political science

Testing Publisher AI Tools for Journal Selection: A Guide for Researchers

As AI-assisted tools become increasingly embedded in academic publishing, most major journal platforms now offer automated systems that claim to recommend suitable outlets based on a manuscript’s abstract. But how well do these tools perform in practice? To explore this, we tested five journal finder platforms — four developed by

Assessing the FutureHouse Owl Agent’s Ability to Detect Defined Concepts in Academic Research

Following our previous evaluations of the FutureHouse Platform’s research agents this post turns to Owl, the platform’s tool for precedent and concept detection in academic literature. Owl is intended to help researchers determine whether a given concept has already been defined, thereby streamlining theoretical groundwork and avoiding redundant

Comparing the FutureHouse Platform’s Falcon Agent and OpenAI’s o3 for Literature Search on Machine Coding for the Comparative Agendas Project

Having previously explored the FutureHouse Platform’s agents in tasks such as identifying tailor-made laws and generating a literature review on legislative backsliding, we now directly compare its Falcon agent and OpenAI’s o3. Our aim was to assess their performance on a focused literature search task: compiling a ranked

Using Falcon for Writing a Literature Review on the FutureHouse Platform: Useful for Broad Topics, Not for Niche Concepts

The FutureHouse Platform, launched in May 2025, is a domain-specific AI environment designed to support various stages of scientific research. It provides researchers with access to four specialised agents — each tailored to a particular task in the knowledge production pipeline: concise information retrieval (Crow), deep literature synthesis (Falcon), precedent detection

Human- or AI-Generated Text? What AI Detection Tools Still Can’t Tell Us About the Originality of Written Content

Can we truly distinguish between text produced by artificial intelligence and that written by a human author? As large language models become increasingly sophisticated, the boundary between machine-generated and human-crafted writing is growing ever more elusive. Although a range of detection tools claim to identify AI-generated text with high precision,