Researchers have shed new light on on-policy distillation, a key technique used to fine-tune large language models, by dissecting its underlying mechanisms and dynamics1. The study reveals that the success of on-policy distillation hinges on two crucial conditions: the student and teacher models must exhibit compatible thinking patterns, and even when this condition is met, other factors can still impede the process. This newfound understanding has significant implications for the development and deployment of large language models, as it can inform the design of more effective distillation protocols. By elucidating the intricacies of on-policy distillation, this research can help mitigate potential risks associated with large language models, such as biased or unstable behavior. So what matters to practitioners is that this insight can be leveraged to create more robust and reliable language models, ultimately enhancing their performance and security.