Announcing The Kaitchup's Book: LLMs on a Budget [Pre-sales Open]

Learn how to fine-tune, quantize, run, and serve LLMs on consumer hardware

Sep 04, 2024

The Kaitchup has published over 150 articles on fine-tuning and running large language models (LLMs) on consumer hardware. Each article explores a recent technique or model but doesn’t cover the basics of fine-tuning and running LLMs.

To address these gaps, I’m writing my first book, "LLMs on a Budget," which tackles the aspects not covered in the articles. The first chapter, focusing on parameter-efficient fine-tuning, is scheduled for release in late October. Subsequent chapters on quantization, inference, serving, and evaluation will follow approximately every month. You can find the complete outline of the book at the end of this page.

Pre-sales are now open. By registering early, you will get the book and all related materials (including notebooks) at a 50% discount using the promotion code “PRESALE“:

Pre-order the book "LLMs on a Bugdet"

You'll receive the book chapter by chapter, in PDF and EPUB formats, as they are completed. The notebooks will be shared on Google Colab.

This book is included in the Kaitchup Pro subscription. Pro subscribers will have early access to the book’s chapters, regularly released as exclusive articles on The Kaitchup starting this month. Note: If you are a Pro subscriber, you don’t need to do anything. You are already on the list to receive the chapters.

Subscribe to The Kaitchup Pro

Key Features of LLMs on a Budget

Learn everything you need to know, step by step, about fine-tuning, quantizing, running, and serving LLMs on consumer hardware.
Get exclusive access to detailed, fully-commented notebooks that walk you through every process.
Stay ahead of the curve with cutting-edge methods and continuous updates. This book will be regularly updated with the latest techniques through the end of 2025.

Book Description

Large language models (LLMs) are notoriously difficult to fine-tune and run on consumer hardware using standard techniques, often requiring professional-grade GPUs with substantial memory.

This book will show you advanced techniques for fine-tuning, quantizing, running, and serving LLMs on consumer hardware. It explains, in clear and accessible language, and with code, parameter-efficient fine-tuning methods like LoRA and its many variants (QLoRA, DoRA, VeRA, etc.), as well as effective quantization strategies.

You’ll learn how to select the right LLMs and configure hyperparameters to maximize your hardware's potential. The book includes practical examples in detailed Jupyter notebooks, demonstrating the use of models like Llama 3.1 8B (for 16 GB GPUs) and TinyLlama (for 6 GB GPUs).

By the end of this book, you will have mastered the art of running LLMs on a budget and will be fully equipped to adapt your skills to the latest LLMs as they emerge in the coming years.

What you will learn

Fine-tune LLMs using parameter-efficient techniques like LoRA, QLoRA, DoRA, VeRA, and others, with tools such as Hugging Face libraries, Unsloth, and Axolotl.
Prepare, format and synthetize datasets for fine-tuning LLMs.
Learn how to quantize LLMs to significantly reduce memory consumption, enabling deployment on low-end hardware configurations.
Efficiently run and serve LLMs using optimized inference frameworks like vLLM, NVIDIA’s TensorRT-LLM, and Hugging Face's TGI.
Master the evaluation of LLMs, including how to assess the credibility and value of published benchmark scores, an essential skill to find out which LLMs to select for your applications.

Who this book is for

This course is designed to be accessible to a wide range of learners, regardless of their prior experience in machine learning. It is crafted to be easily understood by beginners in AI and LLMs, while also providing valuable insights, tips, and tricks for those with more expertise in LLMs.

No advanced mathematical background is required; all equations are explained in plain English. For those interested in the mathematical foundations, relevant scientific papers are provided for further reading.

A basic understanding of Python is necessary to follow and understand the code examples. While familiarity with Hugging Face libraries (like transformers, TRL, and PEFT) is helpful, it is not required. You won’t need prior knowledge of deep learning frameworks such as PyTorch or TensorFlow to benefit from this course.

Outline

A chapter contains more than 50 pages.

Chapter 1: Parameter-Efficient Fine-Tuning for Large Language Models
[Under review; Release: October 2024]
- Understanding the Cost of Full Fine-tuning
- LoRA Adapters: Basics, Cost, Hyperparameters, and Performance
- How to Code LoRA Fine-tuning, Step by Step:
  - With Hugging Face Transformers, TRL and PEFT
  - With Unsloth
  - With Axolotl (if you don’t want to code)
- Load and Merge your LoRA Adapters
- Using Adapters for Inference with Transformers, vLLM, and TGI
- Advanced LoRA
  - rsLoRA
  - DoRA
  - VeRA
  - X-LoRA
Chapter 2: Prepare Your Training Dataset
[Release: November 2024]
- The Common Characteristics of a Good Training Dataset
- Formatting a Dataset for Instruct Fine-tuning
- Generate Your Own Synthetic Dataset
  - With GPT-4o mini
  - With an open LLM
Chapter 3: Quantization for LLMs
[Release: December 2024]
- The Basics of Quantization
- Quantization Aware Training vs. Post-training Quantization
- Popular Quantization Algorithms
  - GPTQ
  - AWQ
  - GGUF: K-Quants and Imatrix
  - AutoRound
  - Bitsandbytes
  - AQLM
  - HQQ
- How to Choose a Quantization Algorithm
- Quantization to a Lower Precision vs. Using a Smaller Model
Chapter 4: Quantization in Fine-tuning
[Release: January, 2025]
- The Basics of QLoRA
- How to Code QLoRA Fine-tuning, Step by Step:
  - With Hugging Face Transformers, TRL and PEFT
  - With Unsloth
  - With Axolotl
- Quantization and Paging of the Optimizer States
- Benchmarking QLoRA with Different Quantization Techniques
- Advanced Techniques
  - GaLore and Q-GaLore
  - End-to-end FP8 Fine-tuning
Chapter 5: Running and Serving LLMs
[under conception]
Chapter 6: Evaluating LLMs
[under conception]

WvG

Sep 4Liked by Benjamin Marie

I am totally over the moon with this announcement!! Your blog posts and articles are always on point, clearly explained and above all supported by usable code. Definitely a fan of you and all your work!

Expand full comment

The Kaitchup – AI on a Budget

Discussion about this post