Autonomous cyber-defense agents are being trained using Reinforcement Learning (RL) to combat sophisticated cyber-attacks. These agents utilize neurosymbolic approaches, combining behavior trees with learning-enabled components (LECs) to adapt and implement security rules while maintaining critical operations. Researchers have proposed a method for learning Red Agent policy from observations, enabling the development of more effective autonomous cyber-defense agents1. This approach allows agents to learn from observations and improve their decision-making capabilities, enhancing the overall security posture of modern networks. The shift in threat models from criminal to geopolitical, driven by state-aligned activity, necessitates a different approach to cyber-defense. As a result, the development of intelligent autonomous cyber-defense agents has become a critical priority. The ability to learn from observations and adapt to new threats is crucial for these agents to effectively counter sophisticated cyber-attacks, making this research a significant step forward in the field of cyber-defense.