Medical multimodal large language models (MLLMs) have been outperformed by traditional deep learning models in image classification tasks, despite being touted as a revolutionary technology. The degradation in performance is a significant concern, as MLLMs are being increasingly applied to medical imaging analysis. Researchers have found that state-of-the-art medical MLLMs consistently underperform compared to traditional models, revealing a sobering reality that challenges the hype surrounding MLLMs1. The performance gap is particularly notable in medical image classification, a fundamental task in the field. This disparity raises questions about the effectiveness of MLLMs in real-world applications. The findings suggest that the limitations of MLLMs may have significant implications for the development and deployment of medical imaging analysis systems. So what matters to practitioners is that they must carefully evaluate the performance of MLLMs against traditional models before integrating them into critical applications.