Automated generation of structured layouts from natural language descriptions has become possible with recent advancements in Multimodal Large Language Models (MLLMs). However, existing methods that generate code to represent layouts are limited by their inability to consider the visual outcome of the rendered image. This lack of visual feedback makes it challenging to guarantee the quality and accuracy of the generated layouts. Researchers have proposed a new approach that incorporates visual feedback into the iterative refinement process of text layout generation1. By enabling the model to perceive and respond to the visual representation of the layout, this method can improve the overall quality and coherence of the generated layouts. This development matters to practitioners because it has the potential to significantly enhance the accuracy and efficiency of automated layout generation, allowing for more effective and realistic visual representations of text-based data.