Zero-Shot Distal Fibula Fracture Detection: Gemini 2.5 Flash Delivers Spot-On Results on German Radiology Reports

Jun 2, 2025

3 min read

Zero-Shot Distal Fibula Fracture Detection: Gemini 2.5 Flash Delivers Spot-On Results on German Radiology Reports — Source: Getty Images For Unsplash+

Identifying specific clinical findings in unstructured medical texts is a common challenge in healthcare data science. In this post, we benchmark Google’s Gemini 2.5 Flash language model on a zero-shot classification task: detecting the presence or absence of distal fibula fractures in real-world German radiology reports. Without any examples or domain-specific training, Gemini 2.5 Flash correctly classified all 50 reports, demonstrating that large language models can accurately process and label non-English clinical free text in a purely zero-shot setting.

Input file

For this experiment, we used a publicly available dataset of German radiology reports from Hannover Medical School, accessible here. From the full collection, we drew a random sample of 50 reports and removed the original binary label column ("Fibula-Fraktur"). The resulting data for each case included four fields: an ID, a composite report text (“Befundtext”), clinical information (“Klin_Angabe”), the clinical question (“Fragestellung”), and the radiologist’s findings (“Befund”). For example, one record from the dataset is structured as follows:

Röntgenbefunde nach Frakturen_sample

Röntgenbefunde nach Frakturen_sample.xlsx

30 KB

Prompt

The model was instructed to review each radiology report and determine whether the findings described a fracture of the distal fibula, assigning a binary label (“TRUE” or “FALSE”) for each entry. No examples or explanatory notes were provided; the task required the model to interpret the German clinical free text without prior domain adaptation or further guidance. Our aim was to test whether Gemini 2.5 Flash could accurately identify reports describing distal fibula fractures using only the plain text—without any training examples or domain-specific instructions—mirroring the typical challenges encountered in biomedical research when working with unlabelled clinical data.

You are a medical expert in radiology report analysis. Below is a table with several columns, including free-text German radiology findings ("Befund").

For each row, assess whether the "Befund" text describes a distal fibula fracture.

If the report mentions a fracture of the distal fibula, label that row as "TRUE" in the last column.

If there is no mention of a distal fibula fracture, label the row as "FALSE" in the last column.

Return the original table with an additional column (F), filling it only with "TRUE" or "FALSE" for each row. Do not add any explanation or comments.

Output

Gemini 2.5 Flash correctly classified all 50 German radiology reports for the presence or absence of a distal fibula fracture, matching the original ground truth labels with complete accuracy. The model’s output was consistent and free from any errors or ambiguous responses, providing a clear binary label (“TRUE” or “FALSE”) for each case. This was achieved without prior exposure to the dataset or language-specific examples. The free-text medical reports included domain-specific terminology and varied phrasing, yet the model handled these consistently.

Gemini 2.5 Flash's performance (accessed on 2 June 2025)

The model produced results in a format that could be instantly exported to a spreadsheet, allowing for rapid review, sharing, or further analysis within seconds.

Gemini_output

Gemini_output.xlsx

23 KB

Recommendations

Our findings suggest that large language models such as Gemini 2.5 Flash can accurately extract clinically relevant information from unstructured radiology reports, even when working with non-English datasets and without domain-specific training. For biomedical research and clinical audit projects involving free-text clinical records, generative AI models now offer a practical approach for initial data annotation or cohort selection. However, it remains important to validate model outputs against expert review, especially before integrating such results into downstream analyses or clinical workflows.

Note: Do not use GenAI models for clinical diagnosis or medical decision-making!

The authors used Gemini 2.5 Flash [Google DeepMind (2025) Gemini 2.5 Flash (accessed on 2 June 2025), Large language model (LLM), available at: https://deepmind.google/technologies/gemini/] to generate the output.

Enhancing Research Productivity: A Comprehensive Guide to Canvas and Artifacts in GenAI Interfaces

Zero-Shot Distal Fibula Fracture Detection: Gemini 2.5 Flash Delivers Spot-On Results on German Radiology Reports

Input file

Prompt

Output

Recommendations

Related Posts

Enhancing Research Productivity: A Comprehensive Guide to Canvas and Artifacts in GenAI Interfaces

Leveraging Generative AI to Verify Journal Guideline Compliance: A Practical Guide for Researchers

Gemini’s ‘Audio Overview’ as a Tool for Open Science: Turning Scientific Papers into Accessible Audio

Structured Legal Summarisation of Constitutional Court Decisions from PDF Using Gemini 2.5 Pro

Slide Generation from Scientific Articles: Putting Manus’s New Slide Generator to the Test