Researchers have introduced EG-VQA, a benchmarking framework designed to evaluate the performance of Video Large Language Models (Video-LLMs) in video question answering tasks with a focus on grounded temporal evidence. This development addresses a significant gap in existing benchmarks, which primarily assess answer correctness without examining the underlying evidence supporting those answers. By incorporating temporal evidence, EG-VQA enables a more comprehensive evaluation of Video-LLMs, facilitating a deeper understanding of their ability to generate answers grounded in relevant video content1. The introduction of EG-VQA has significant implications for the development and deployment of Video-LLMs, as it highlights the importance of evidence-based answer generation. This, in turn, matters to practitioners and informed readers because it underscores the need for more rigorous evaluation methodologies to ensure the reliability and trustworthiness of AI systems in various applications.