Llama 2 on a Budget

Benjamin Marie

Aug 27, 2023

The cheat sheet

Read →

2 Comments

Ronan McGovern

The Blip

Aug 28, 2023Liked by Benjamin Marie

Why not target the attention k, v and q? In LoRA

Expand full comment

Reply (1)

Benjamin Marie

Aug 28, 2023Author

Meta targets v and q in their "recipes". But the Platypus team observed better resuts with gate, down, and up. I didn't do a scientific evaluation of whether this is indeed better, but I confirm that it looks better than only targeting q and v.

Platypus motivates it by referring to the work by He et al. (2022, https://arxiv.org/pdf/2110.04366v3.pdf). He et al. obtained better results with this configuration.

Expand full comment

The Kaitchup – AI on a Budget

Llama 2 on a Budget