Gemini’s ‘Audio Overview’ as a Tool for Open Science: Turning Scientific Papers into Accessible Audio

by Rebeka Kiss

Jun 27, 2025

3 min read

Gemini’s ‘Audio Overview’ as a Tool for Open Science: Turning Scientific Papers into Accessible Audio — Source: Getty Images For Unsplash+

Can artificial intelligence make academic research more accessible to non-specialist audiences—or even to busy researchers on the go? Gemini’s new ‘Audio Overview’ feature provides a novel way to experience scientific papers: through short, conversational audio summaries. Available even in the free version of Gemini 2.5 Flash, this tool can be used to support broader engagement with academic content by turning complex texts into plain, easily understandable language. In this post, we demonstrate how we tested the feature on real research articles and reflect on its usefulness for research dissemination in the context of open science—highlighting both its strengths and its limitations.

Prompt

Generate Audio Overview

To use the feature, the prompt must be entered exactly as shown above. At this stage, extending or customising the prompt is impossible — the model will only generate an audio summary when the request is phrased in this fixed form. If users attempt to tailor the content (e.g. by requesting a focus on methods, findings, or specific sections), the model does not return audio but instead provides a text-based response addressing those elements. While this limits interactivity, the default summaries generally offer a coherent and accessible overview of the source text.

Output

We uploaded one of our articles alongside the prompt, and within minutes Gemini produced an audio summary with a podcast-like tone and structure. The output could be listened to directly in the interface and downloaded as an MP3 file — a convenient option for offline use or sharing. A short notice also appears, warning that some content may be missing if the document is too long or complex. In our case, the summary did not attempt to cover every detail, but it successfully conveyed the main message in clear and accessible language. It avoided technical jargon and offered a generalised overview, but without oversimplifying the core argument. From a dissemination perspective, this does not seem to be a drawback — on the contrary, it makes the material more approachable for non-specialist audiences while retaining the substance of the original text. The generated overview can be listened to below.

Gemini 2.5 Flash's performance (accessed on 27 June 2025)

Tailor Made Tyranny The Rise of Hyper Specific Laws and the Erosion of Democracy

0:00

/389.36

In a second test, we attempted to guide the model by specifying which parts of the article we were most interested in — namely, the main findings and the impact section. However, this level of control is currently not supported: instead of generating an audio file, Gemini responded with a conventional text summary focused on the requested aspects. This shows that the Audio Overview feature cannot (yet) be prompted to target specific sections or analytical layers of a paper. At present, the tool only produces audio if the standard, fixed prompt is used without any further instruction. Although the model’s response explicitly stated that it was unable to generate an audio overview, we then re-entered the basic prompt shown above — and this time, as before, an audio version was successfully generated.

Recommendation

Overall, Gemini’s Audio Overview feature offers a simple yet effective means of supporting the broader dissemination of academic research. While it does not allow for customisation or targeted prompting, it reliably produces clear and concise plain language audio summaries that preserve the core message of a paper. This makes it particularly suitable for increasing accessibility among non-specialist audiences, for example in public-facing communication, social media outreach, or research highlights shared via institutional websites or project channels.

The authors used Gemini 2.5 Flash [Google DeepMind (2025) Gemini 2.5 Flash (accessed on 27 June 2025), Large language model (LLM), available at: https://deepmind.google/technologies/gemini/] to generate the output.