QA-LoRA: Quantization-Aware Fine-tuning for…

Benjamin Marie

Oct 9, 2023

Jointly fine-tune and quantize

Read →

3 Comments

Ronan McGovern

The Blip

Oct 10, 2023Liked by Benjamin Marie

Why is QLoRA better than the base model? Because it has more training?

Expand full comment

Reply (1)

Ronan McGovern

The Blip

Oct 10, 2023Liked by Benjamin Marie

Nice piece btw, I’ve yet to dig into QA LoRA and am def hoping it allows for decent bf16 models that can then be AWQd.

Expand full comment

Reply (1)

Benjamin Marie

Oct 11, 2023Author

QLoRA is (slightly) better at least thanks to the superior quantization data type. In the QLoRA paper they show that nf4 is more accurate than int4.

----

QA-LoRA fine tunes LoRA for already quantized LLMs. The base LLM can't be fp16. That being said, it should be possible to dequantize the final merged model to fp16 and then awq it. It might work. It's an interesting experiment to try.

Expand full comment

The Kaitchup – AI on a Budget

QA-LoRA: Quantization-Aware Fine-tuning for…