Researchers have introduced a semi-supervised reinforcement learning approach to elicit medical reasoning using knowledge-enhanced data synthesis, aiming to overcome the scarcity of high-quality reasoning data that hinders the development of large language models in medical applications1. This method deviates from traditional supervised fine-tuning and reinforcement learning techniques, which have shown limited improvement in underrepresented areas. By leveraging data synthesis, the approach enables the generation of high-quality reasoning traces, enhancing the capability of large language models to reason and make decisions in complex medical scenarios. The security implications of such developments are significant, as they can reshape both the capability and risk surfaces of these models. As large language models become more prevalent in medical applications, the potential risks and benefits associated with their use must be carefully considered. The development of more advanced and secure large language models is crucial, and this approach may contribute to achieving that goal.