Researchers have introduced 3DCity-LLM, a novel framework aimed at enhancing multi-modality large language models for 3D city-scale perception and understanding. This approach addresses the significant challenge of scaling these models to outdoor, city-scale environments, where they currently struggle to perform. 3DCity-LLM utilizes a coarse-to-fine feature encoding strategy, incorporating three parallel branches to facilitate more effective vision-language processing. By doing so, it enables more accurate and comprehensive understanding of complex urban scenes. The development of 3DCity-LLM has the potential to significantly impact various applications, including urban planning, autonomous vehicles, and smart city infrastructure1. This advancement matters to practitioners as it brings multi-modality large language models closer to real-world deployment in large-scale, dynamic environments, where they can provide valuable insights and improve decision-making.