Researchers have discovered that using the same optimizer for both pretraining and full finetuning of large language models (LLMs) results in less forgetting of previously learned information, while achieving comparable or improved performance on new tasks1. This finding suggests that optimizer-model consistency is crucial in maintaining the balance between learning and forgetting in LLMs. The study's results indicate that full finetuning with the same optimizer as pretraining outperforms other optimizers in terms of the learning-forgetting tradeoff. This has significant implications for the development and deployment of LLMs, as it can lead to more efficient and effective training protocols. So what matters to practitioners is that this discovery can inform the design of more robust and adaptable LLMs, ultimately enhancing their reliability and performance in real-world applications.
Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less
⚠️ Critical Alert
Why This Matters
AI advances carry implications extending beyond technology into policy, security, and workforce dynamics.
References
- Authors. (2026, May 7). Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less. *arXiv*. https://arxiv.org/abs/2605.06654v1
Original Source
arXiv AI
Read original →