When Even a Simple Bar Chart Fails: Testing Gemini 2.0 Flash for Data Visualisation

Apr 14, 2025

4 min read

When Even a Simple Bar Chart Fails: Testing Gemini 2.0 Flash for Data Visualisation — Source: Freepik - EyeEm

Except for Claude 3.7 Sonnet, GPT-4o and GPT-4.5 (which we discussed in a separate post), all other GenAI models we tested encountered issues when asked to generate a simple bar chart from a CSV file. The limitations varied widely—from difficulties in reading the file and missing visualisation interfaces, to incomplete or inaccurate chart outputs. In some cases, models returned code instead of a chart, but even then, not all scripts ran successfully or produced the intended results. In this post, we focus on Gemini 2.0 Flash to illustrate these shortcomings and better understand the current challenges in using most GenAI tools for basic data visualisation tasks.

Input file

The dataset used in this analysis was sourced from Kaggle and includes detailed information on Nobel Prize winners from 1901 to 2016. It contains variables such as name, gender, birth country, award category, and year of recognition. (See this related post.)

Prompt

The prompt used for this test asked each model to generate a simple bar chart based on the gender distribution of Nobel Prize winners, using a structured CSV file as input. The full version of the prompt, including step-by-step instructions and formatting details, is available in this previous post.

Output

Gemini 2.0 Flash demonstrated a range of issues even when tackling a very simple visualisation task. In the first attempt, it generated a bar chart where the position for the "male" category appeared, but the corresponding bar was either not rendered or was displayed incorrectly—despite this being by far the most common group in the dataset.

Gemini 2.0 Flash's performance (accessed on 14 April 2025)

After we explicitly pointed this out and asked for a corrected version, the model acknowledged the problem and apologised, yet proceeded to regenerate the exact same flawed chart, still failing to display the "male" data properly.

In the same step, Gemini inserted an external image from ResearchGate, presenting it as the corrected result.

While the visual did relate to Nobel Prize winners, it was not generated from the uploaded CSV file and did not match the task described in the prompt.

Following the earlier issues, we tried an alternative approach: first, we asked Gemini to return a plain-text summary of the number of Nobel Prize winners by gender and then generated a bar chart based on that summary. The model successfully provided the correct counts—896 male, 50 female, and 26 other or unspecified—in the expected text format. However, once again, the subsequent visualisation step failed.

Despite accurately reporting the number of male laureates in the previous step, Gemini failed to represent this information correctly in the chart. The "male" category is missing both from the bar chart itself and from the legend — yet it is not explicitly shown as zero, nor is it mentioned in the axis labels or annotations.

We also attempted to generate a pie chart based on the previously summarised gender data. However, the resulting chart was again incomplete — the segment representing male winners, who make up over 92% of the dataset, was missing or not rendered correctly.

To investigate, we reviewed the generated code, which clearly showed that the model had correctly parsed the data and computed the gender distribution: 896 male, 50 female, and 26 other laureates. This indicates that the issue lies not in the analysis, but in the chart rendering interface, which appears unable to visualise the full result correctly.

The code clearly specifies that the "male" category should be represented in navy blue, alongside "female" in pink and "org" in green. Despite this explicit instruction in the colour mapping, the resulting chart fails to render the "male" segment, indicating a breakdown between the correctly defined data and its visual output.

Recommendation

The example used in this test was deliberately simple and highly visual: a basic gender distribution chart based on a well-structured dataset. This made it easy to detect when something was clearly missing or misrepresented—such as the absence of the dominant "male" category in the chart. However, this kind of failure becomes much harder to spot in more complex visualisations, where patterns are less obvious and data categories more nuanced. A model that misrepresents key elements without alerting the user can lead to misleading conclusions, especially when working with real-world, high-stakes datasets.

For this reason, we recommend treating GenAI-generated charts with caution—particularly when using models that do not offer transparent control over how uploaded data is interpreted and rendered. While models like Claude 3.7 Sonnet and GPT-4o/4.5 have shown reliable performance in these tasks (see this post), Gemini 2.0 Flash currently lacks the stability and accuracy needed even for basic, file-driven visualisation workflows.

The authors used Gemini 2.0 Flash [Google DeepMind (2025), Gemini 2.0 Flash (accessed on 14 April 2025), Large Language Model (LLM), available at: https://gemini.google.com] to generate the output.

Testing Academia.edu’s AI Reviewer: Technical Errors and Template-Based Feedback

When Even a Simple Bar Chart Fails: Testing Gemini 2.0 Flash for Data Visualisation

Input file

Prompt

Output

Recommendation

Related Posts

Testing Academia.edu’s AI Reviewer: Technical Errors and Template-Based Feedback

Testing the Limits of AI Peer Review: When Even Ian Goodfellow Gets Rejected by OpenReviewer

Slide Generation from Scientific Articles: Putting Manus’s New Slide Generator to the Test

Testing Publisher AI Tools for Journal Selection: A Guide for Researchers

Assessing the FutureHouse Owl Agent’s Ability to Detect Defined Concepts in Academic Research