limitations

limitations

Assessing GenAI Models in Retrieving Official NBA Game Data: Capabilities and Limitations

In this post we present a comparative test of several generative AI models, focusing on their ability to retrieve official sports statistics. The task involved collecting NBA game data through a single prompt, with the expectation that the models would accurately locate, extract, and structure the requested information. While the

Searching for Literature with GenAI: Do the Latest Models Deliver Greater Accuracy?

In our earlier blog post, Harnessing GenAI for Searching Literature: Current Limitations and Practical Considerations, we examined the reliability of generative AI models for scholarly literature searching. To assess whether the newest releases represent any improvement, we tested them on the same narrowly defined academic topic. The results indicate modest

Evolving File Handling in GenAI Models: Stronger Input Support, Persistent Output Limitations

In a previous blog, we examined the file handling capabilities of leading GenAI interfaces. That analysis detailed which formats they could process reliably and where they encountered difficulties—particularly with structured data and technical file types. Since then, the landscape has shifted. While downloadable file generation still faces notable constraints,

Can OpenAI Agent Support Academic Research? A Practical Comparison with Manus.ai and Perplexity

We tested the new OpenAI Agent to assess its usefulness in academic research tasks, comparing it directly with Manus.ai and Perplexity’s research mode. Our aim was to evaluate how effectively each tool finds relevant scholarly and policy sources, navigates restricted websites (including captchas and Cloudflare protections), and allows

Testing Academia.edu’s AI Reviewer: Technical Errors and Template-Based Feedback

Academia.edu has recently introduced an AI-based Reviewer tool, positioned as a solution for generating structured feedback on academic manuscripts. While the concept is promising, our evaluation revealed a number of significant limitations. We encountered recurring technical issues during both file uploads and Google Docs integration, often requiring multiple attempts

Testing the Limits of AI Peer Review: When Even Ian Goodfellow Gets Rejected by OpenReviewer

High-quality feedback is essential for researchers aiming to improve their work and navigate the peer review process more effectively. Ideally, such feedback would be available before formal submission—allowing authors to identify the strengths and weaknesses of their research early on. This is precisely the promise of OpenReviewer, an automated

Slide Generation from Scientific Articles: Putting Manus’s New Slide Generator to the Test

In this post, we examine the performance of Manus’s newly updated slide generation tool when applied to a peer-reviewed scientific article. The developers claim recent improvements focused on enhancing the tool’s ability to support academic communication. To test these capabilities, we selected a published study in political science

Testing Publisher AI Tools for Journal Selection: A Guide for Researchers

As AI-assisted tools become increasingly embedded in academic publishing, most major journal platforms now offer automated systems that claim to recommend suitable outlets based on a manuscript’s abstract. But how well do these tools perform in practice? To explore this, we tested five journal finder platforms — four developed by

Assessing the FutureHouse Owl Agent’s Ability to Detect Defined Concepts in Academic Research

Following our previous evaluations of the FutureHouse Platform’s research agents this post turns to Owl, the platform’s tool for precedent and concept detection in academic literature. Owl is intended to help researchers determine whether a given concept has already been defined, thereby streamlining theoretical groundwork and avoiding redundant

Comparing the FutureHouse Platform’s Falcon Agent and OpenAI’s o3 for Literature Search on Machine Coding for the Comparative Agendas Project

Having previously explored the FutureHouse Platform’s agents in tasks such as identifying tailor-made laws and generating a literature review on legislative backsliding, we now directly compare its Falcon agent and OpenAI’s o3. Our aim was to assess their performance on a focused literature search task: compiling a ranked