Yi: Fine-tune and Run One of the Best Bilingual LLMs on Your Computer
How to use the Yi models on a budget
The initial versions of the Yi models were unveiled in December 2023. Since their launch, there have been significant enhancements along with the introduction of new model sizes. The lineup now includes models with 6 billion, 9 billion, and 34 billion parameters. Additionally, chat models and variants capable of processing contexts up to 200,000 tokens have been introduced.
Yi's LLMs are open and highly effective in performing various tasks. Unlike many other open LLMs, Yi models are bilingual. They can perform tasks in English and Chinese.
In this article, I review the Yi models and the technical report describing them to understand how they were trained. Then, I show how to run, quantize, fine-tune, and benchmark the models on consumer hardware. Even the 34B model can run on a single consumer GPU if quantized.
I made a notebook for the Yi LLMs implementing:
Inference with Transformers and vLLM
Quantization with bitsandbytes, AWQ, and GPTQ
Fine-tuning with QLoRA
Benchmarking for performance and accuracy with the Evaluation Harness and Optimum Benchmark