2 Comments

Why not target the attention k, v and q? In LoRA

Expand full comment
author

Meta targets v and q in their "recipes". But the Platypus team observed better resuts with gate, down, and up. I didn't do a scientific evaluation of whether this is indeed better, but I confirm that it looks better than only targeting q and v.

Platypus motivates it by referring to the work by He et al. (2022, https://arxiv.org/pdf/2110.04366v3.pdf). He et al. obtained better results with this configuration.

Expand full comment