Researchers have introduced SocialOmni, a benchmarking framework designed to evaluate the audio-visual social interactivity of omni-modal large language models (OLMs). This development addresses a significant gap in existing benchmarks, which primarily focus on static, accuracy-centric tasks, neglecting the complex dynamics of human-machine interaction. SocialOmni aims to assess an OLM's ability to navigate dynamic cues in natural dialogues, a crucial aspect of social interactivity. By integrating audio, vision, and text, OLMs like SocialOmni have the potential to redefine human-machine interaction, but also introduce new security risks. The emergence of SocialOmni is particularly relevant in the context of recent developments in large language models (LLMs) from the DeFi space, which have reshaped both capability and risk surfaces1. As such, the security implications of these advancements should be carefully considered, making SocialOmni a crucial tool for evaluating the social interactivity of OLMs and mitigating potential security threats.