DeepSeekMoE - KTO is the best - Marlin - Easy data are enough
Pity deepseek 16B is weaker than a 13B model
Not sure what to think about it. It uses as many parameters as Phi-2 during inference (2.8B parameters) but requires almost 6 times more memory.
Pity deepseek 16B is weaker than a 13B model
Not sure what to think about it. It uses as many parameters as Phi-2 during inference (2.8B parameters) but requires almost 6 times more memory.