Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Researchers have discovered that using the same optimizer for both pretraining and full finetuning of large language models (LLMs) results in less forgetting of previously learned information, while achieving comparable or improved performance on new tasks¹. This finding suggests that optimizer-model consistency is crucial in maintaining the balance between learning and forgetting in LLMs. The study's results indicate that full finetuning with the same optimizer as pretraining outperforms other optimizers in terms of the learning-forgetting tradeoff. This has significant implications for the development and deployment of LLMs, as it can lead to more efficient and effective training protocols. So what matters to practitioners is that this discovery can inform the design of more robust and adaptable LLMs, ultimately enhancing their reliability and performance in real-world applications.

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

References

Related Intelligence

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

References

Related Intelligence

Get the Signal. Skip the Noise.