rsQLoRA: Fine-tune Llama 3 with Higher Ranks and QLoRA
Evaluating the impact of rank-stabilized LoRA on recent LLMs and when using QLoRA
With parameter-efficient fine-tuning methods such as LoRA, we can fine-tune LLMs on consumer hardware. For instance, with LoRA, it is possible to fine-tune Llama 3 8B using a 16 GB GPU.
However, LoRA is only approximating full fine-tuning. Previous work has shown that the capacity of a LoRA adapter to learn is limited by the low-rank nature of the update applied to the adapter’s parameters during fine-tuning.
Increasing the rank, which is a hyperparameter of LoRA, sounds like an intuitive solution to increase the rank of the update and potentially improve fine-tuning. In practice, increasing the rank is often ineffective. Various methods have been proposed to leverage higher LoRA ranks. One such method is Rank-stabilized LoRA (rsLoRA), which is straightforward enough to be supported by many fine-tuning frameworks.
In this article, I explain rsLoRA and apply it to Llama 3. We will verify whether rsLoRA has a positive impact when fine-tuning high-rank adapters for Llama 3.
I’ve also implemented a notebook showing how to use rsLoRA, in combination with QLoRA, here: