Discriminative perception via anchored description is crucial for reasoning segmentation, as current reinforcement learning methods struggle to maintain focus on the relevant context. Multimodal Large Language Models (MLLMs) rely on geometric rewards to generate explanatory reasoning chains, but these rewards often fail to ensure the reasoning process remains anchored to the referred region. This limitation can lead to irrelevant context being introduced, compromising the accuracy of the segmentation. Researchers have identified this flaw and are working to develop more effective methods for guiding MLLMs, including the use of anchored descriptions to improve discriminative perception1. The development of more advanced reasoning segmentation techniques has significant implications for the security of MLLMs, as the ability to accurately guide the reasoning process can help mitigate potential risks. So what matters to practitioners is that advancements in MLLM capabilities via reinforcement learning must be carefully balanced against the potential security risks that emerge from these developments.