Test-time finetuning of large language models can be significantly accelerated through convex reconstruction and gradient caching, making it a more viable option for real-time applications. By optimizing the retrieval and finetuning process, researchers can reduce the computational overhead associated with adapting language models to individual prompts. This approach enables faster and more efficient test-time finetuning, which is crucial for applications where speed and accuracy are paramount. The proposed method achieves this by leveraging convex reconstruction to improve the retrieval process and gradient caching to reduce the computational cost of finetuning1. This breakthrough has significant implications for the development of more responsive and adaptable language models, which can be used to enhance a range of natural language processing tasks. So what matters to practitioners is that this innovation can potentially mitigate the trade-off between speed and quality in test-time finetuning, leading to more effective and efficient language model deployments.