PROMPT REVOLUTION

Downloadable File Generation in GenAI Interfaces: Progress Made, Challenges Remain

In a previous blog post, we focused on input file handling in leading GenAI interfaces, mapping out which formats could be uploaded and processed reliably, and where the main limitations were. This time, we turn to the other side of the equation—output generation—with a concrete test case: producing

by Miklós Sebők - Rebeka Kiss • Aug 15, 2025

recommendations limitations file handling

Evolving File Handling in GenAI Models: Stronger Input Support, Persistent Output Limitations

In a previous blog, we examined the file handling capabilities of leading GenAI interfaces. That analysis detailed which formats they could process reliably and where they encountered difficulties—particularly with structured data and technical file types. Since then, the landscape has shifted. While downloadable file generation still faces notable constraints,

by Miklós Sebők - Rebeka Kiss • Aug 12, 2025

agents OpenAI Agent manus.ai

Can OpenAI Agent Support Academic Research? A Practical Comparison with Manus.ai and Perplexity

We tested the new OpenAI Agent to assess its usefulness in academic research tasks, comparing it directly with Manus.ai and Perplexity’s research mode. Our aim was to evaluate how effectively each tool finds relevant scholarly and policy sources, navigates restricted websites (including captchas and Cloudflare protections), and allows

by Miklós Sebők - Rebeka Kiss • Aug 4, 2025

peer review limitations Academia.edu

Testing Academia.edu’s AI Reviewer: Technical Errors and Template-Based Feedback

Academia.edu has recently introduced an AI-based Reviewer tool, positioned as a solution for generating structured feedback on academic manuscripts. While the concept is promising, our evaluation revealed a number of significant limitations. We encountered recurring technical issues during both file uploads and Google Docs integration, often requiring multiple attempts

by Miklós Sebők - Rebeka Kiss • Jul 7, 2025

peer review OpenReviewer limitations

Testing the Limits of AI Peer Review: When Even Ian Goodfellow Gets Rejected by OpenReviewer

High-quality feedback is essential for researchers aiming to improve their work and navigate the peer review process more effectively. Ideally, such feedback would be available before formal submission—allowing authors to identify the strengths and weaknesses of their research early on. This is precisely the promise of OpenReviewer, an automated

by Miklós Sebők - Rebeka Kiss • Jun 25, 2025

presentation limitations manus.ai

Slide Generation from Scientific Articles: Putting Manus’s New Slide Generator to the Test

In this post, we examine the performance of Manus’s newly updated slide generation tool when applied to a peer-reviewed scientific article. The developers claim recent improvements focused on enhancing the tool’s ability to support academic communication. To test these capabilities, we selected a published study in political science

by Miklós Sebők - Rebeka Kiss • Jun 17, 2025

journal selection Publishers AI assistants

Testing Publisher AI Tools for Journal Selection: A Guide for Researchers

As AI-assisted tools become increasingly embedded in academic publishing, most major journal platforms now offer automated systems that claim to recommend suitable outlets based on a manuscript’s abstract. But how well do these tools perform in practice? To explore this, we tested five journal finder platforms — four developed by

by Miklós Sebők - Rebeka Kiss • Jun 11, 2025

agents FutureHouse Platform searching for literature

Assessing the FutureHouse Owl Agent’s Ability to Detect Defined Concepts in Academic Research

Following our previous evaluations of the FutureHouse Platform’s research agents this post turns to Owl, the platform’s tool for precedent and concept detection in academic literature. Owl is intended to help researchers determine whether a given concept has already been defined, thereby streamlining theoretical groundwork and avoiding redundant

by Miklós Sebők - Rebeka Kiss • May 19, 2025

agents FutureHouse Platform searching for literature

Comparing the FutureHouse Platform’s Falcon Agent and OpenAI’s o3 for Literature Search on Machine Coding for the Comparative Agendas Project

Having previously explored the FutureHouse Platform’s agents in tasks such as identifying tailor-made laws and generating a literature review on legislative backsliding, we now directly compare its Falcon agent and OpenAI’s o3. Our aim was to assess their performance on a focused literature search task: compiling a ranked

by Miklós Sebők - Rebeka Kiss • May 19, 2025

agents FutureHouse Platform literature review

Using Falcon for Writing a Literature Review on the FutureHouse Platform: Useful for Broad Topics, Not for Niche Concepts

The FutureHouse Platform, launched in May 2025, is a domain-specific AI environment designed to support various stages of scientific research. It provides researchers with access to four specialised agents — each tailored to a particular task in the knowledge production pipeline: concise information retrieval (Crow), deep literature synthesis (Falcon), precedent detection

by Miklós Sebők - Rebeka Kiss • May 18, 2025

AI detection limitations Originality.ai

Human- or AI-Generated Text? What AI Detection Tools Still Can’t Tell Us About the Originality of Written Content

Can we truly distinguish between text produced by artificial intelligence and that written by a human author? As large language models become increasingly sophisticated, the boundary between machine-generated and human-crafted writing is growing ever more elusive. Although a range of detection tools claim to identify AI-generated text with high precision,

by Miklós Sebők - Rebeka Kiss • May 18, 2025

Copilot limitations presentation

Can Copilot Create Scientific Presentations? Not Quite Yet

Generative AI has made extraordinary strides in recent years, reshaping how researchers write, code, and even analyse data. But a key question remains for academics hoping to automate more of their daily work: Can popular GenAI models like Copilot, Claude 3.7 Sonnet, Gemini 2.0 Flash, GPT-4o, Grok 3

by Miklós Sebők - Rebeka Kiss • May 14, 2025