Benchmarking GenAI Models for Penguin Species Prediction: Grok 3, DeepSeek-V3, and Qwen2.5-Max Delivered Top Results
How well can today’s leading GenAI models classify real-world biodiversity data—without bespoke code or traditional machine learning pipelines? In this study, we benchmarked a range of large language models on the task of predicting penguin species from tabular ecological measurements, including both numerical and categorical variables. Using a