Large language models trained as autonomous agents typically rely on imitation learning, which only teaches them to mimic actions without understanding the reasoning behind those actions. This lack of understanding stems from the absence of contrasting successful actions with suboptimal alternatives, resulting in agents that are unaware of the quality of their actions. To address this limitation, researchers have introduced self-reflection supervision, which involves comparing expert demonstrations with suboptimal actions to provide agents with a deeper understanding of their decisions1. This approach, known as Agentic Critical Training, enables agents to develop a more nuanced awareness of their actions and their consequences. By enhancing the autonomy and decision-making capabilities of large language models, this training method has significant implications for the development of more advanced AI systems. This matters to practitioners because it can lead to more sophisticated and effective AI agents that can operate with greater autonomy and make more informed decisions.