Benchmarking Claude 4 Sonnet and GPT-4o for Brain MRI Image Labelling: Comparing Chat Interface and API Results
We assessed the ability of Claude 4 Sonnet and GPT-4o to classify brain MRI images as healthy or tumorous for research labelling purposes, using both the chat interface (10 images) and the API (125 images). Claude 4 Sonnet achieved perfect accuracy (10/10) in chat, but its API refused to