The Kaitchup's Book: LLMs on a Budget
Learn how to fine-tune, quantize, run, and serve LLMs on consumer hardware
The Kaitchup has published over 150 articles on fine-tuning and running large language models (LLMs) on consumer hardware. Each article explores a recent technique or model but doesn’t cover the basics of fine-tuning and running LLMs.
To address these gaps, I’m writing my first book, "LLMs on a Budget," which tackles the aspects not covered in the articles. The first chapter, focusing on parameter-efficient fine-tuning, is scheduled for release in late October. Subsequent chapters on quantization, inference, serving, and evaluation will follow approximately every month. You can find the complete outline of the book at the end of this page.
Pre-sales are now open. By registering early, you will get the book and all related materials (including notebooks) at a 30% discount using the promotion code “O4PW2F0“
You'll receive the book chapter by chapter, in PDF, as they are completed. The notebooks will be shared on Google Colab.
The first chapter is already available!
This book is included in the Kaitchup Pro subscription. Pro subscribers will have early access to the book’s chapters, regularly released as exclusive articles on The Kaitchup starting this month. Note: If you are a Pro subscriber, you don’t need to do anything. You are already on the list to receive the chapters.
Key Features of LLMs on a Budget
Learn everything you need to know, step by step, about fine-tuning, quantizing, running, and serving LLMs on consumer hardware.
Get exclusive access to detailed, fully-commented notebooks that walk you through every process.
Stay ahead of the curve with cutting-edge methods and continuous updates. This book will be regularly updated with the latest techniques through the end of 2025.
Book Description
Large language models (LLMs) are notoriously difficult to fine-tune and run on consumer hardware using standard techniques, often requiring professional-grade GPUs with substantial memory.
This book will show you advanced techniques for fine-tuning, quantizing, running, and serving LLMs on consumer hardware. It explains, in clear and accessible language, and with code, parameter-efficient fine-tuning methods like LoRA and its many variants (QLoRA, DoRA, VeRA, etc.), as well as effective quantization strategies.
You’ll learn how to select the right LLMs and configure hyperparameters to maximize your hardware's potential. The book includes practical examples in detailed Jupyter notebooks, demonstrating the use of models like Llama 3.1 8B (for 16 GB GPUs) and TinyLlama (for 6 GB GPUs).
By the end of this book, you will have mastered the art of running LLMs on a budget and will be fully equipped to adapt your skills to the latest LLMs as they emerge in the coming years.
What you will learn
Fine-tune LLMs using parameter-efficient techniques like LoRA, QLoRA, DoRA, VeRA, and others, with tools such as Hugging Face libraries, Unsloth, and Axolotl.
Prepare, format, and synthesize datasets for fine-tuning LLMs.
Learn how to quantize LLMs to significantly reduce memory consumption, enabling deployment on low-end hardware configurations.
Efficiently run and serve LLMs using optimized inference frameworks like vLLM, NVIDIA’s TensorRT-LLM, and Hugging Face's TGI.
Master the evaluation of LLMs, including how to assess the credibility and value of published benchmark scores, an essential skill to find out which LLMs to select for your applications.
Who this book is for
This course is designed to be accessible to a wide range of learners, regardless of their prior experience in machine learning. It is crafted to be easily understood by beginners in AI and LLMs, while also providing valuable insights, tips, and tricks for those with more expertise in LLMs.
No advanced mathematical background is required; all equations are explained in plain English. For those interested in the mathematical foundations, relevant scientific papers are provided for further reading.
A basic understanding of Python is necessary to follow and understand the code examples. While familiarity with Hugging Face libraries (like transformers, TRL, and PEFT) is helpful, it is not required. You won’t need prior knowledge of deep learning frameworks such as PyTorch or TensorFlow to benefit from this course.
Outline
A chapter contains more than 50 pages.
Chapter 1: Parameter-Efficient Fine-Tuning for Large Language Models
[Under review; Release: October 2024]
Understanding the Cost of Full Fine-tuning
LoRA Adapters: Basics, Cost, Hyperparameters, and Performance
How to Code LoRA Fine-tuning, Step by Step:
With Hugging Face Transformers, TRL and PEFT
With Unsloth
With Axolotl (if you don’t want to code)
Load and Merge your LoRA Adapters
Using Adapters for Inference with Transformers and vLLM
Chapter 2: Prepare Your Training Dataset
[Release: November 2024]
The Common Characteristics of a Good Training Dataset
Formatting a Dataset for Instruct Fine-tuning
Generate Your Own Synthetic Dataset
With GPT-4o mini
With an open LLM
Chapter 3: Quantization for LLMs
[Release: December 2024]
The Basics of Quantization
Quantization Aware Training vs. Post-training Quantization
Popular Quantization Algorithms
GPTQ
AWQ
GGUF: K-Quants and Imatrix
AutoRound
Bitsandbytes
AQLM
HQQ
How to Choose a Quantization Algorithm
Quantization to a Lower Precision vs. Using a Smaller Model
Chapter 4: Quantization in Fine-tuning
[Release: January, 2025]
The Basics of QLoRA
How to Code QLoRA Fine-tuning, Step by Step:
With Hugging Face Transformers, TRL and PEFT
With Unsloth
With Axolotl
Quantization and Paging of the Optimizer States
Benchmarking QLoRA with Different Quantization Techniques
Advanced Techniques
GaLore and Q-GaLore
End-to-end FP8 Fine-tuning
rsLoRA
DoRA
VeRA
X-LoRA
Chapter 5: Running and Serving LLMs
[under conception]
Chapter 6: Evaluating LLMs
[under conception]
I am totally over the moon with this announcement!! Your blog posts and articles are always on point, clearly explained and above all supported by usable code. Definitely a fan of you and all your work!
Will there be a discount for KaitchupPro members for this book?