Researchers have made significant strides in enhancing large language models' (LLMs) reasoning capabilities in formal domains, such as mathematics and code, through Reinforcement Learning with Verifiable Rewards (RLVR). However, LLMs still face challenges in general reasoning tasks that require complex capabilities like causal inference and temporal understanding. The SUPERNOVA approach aims to extend RLVR to general reasoning by leveraging reinforcement learning on natural instructions, which has the potential to improve LLMs' ability to reason in a more human-like manner1. This development is crucial as LLMs' capabilities and risk surfaces are being reshaped by advancements in reinforcement learning, with security implications that follow the hype cycle. As LLMs become more sophisticated, their potential vulnerabilities and attack surfaces also expand, making it essential for practitioners to stay informed about the latest developments and their security implications. The evolution of LLMs through reinforcement learning has significant consequences for the security landscape.