2 Comments
Jan 20Liked by Benjamin Marie

Pity deepseek 16B is weaker than a 13B model

Expand full comment
author

Not sure what to think about it. It uses as many parameters as Phi-2 during inference (2.8B parameters) but requires almost 6 times more memory.

Expand full comment