Sitemap - 2024 - The Kaitchup – AI on a Budget

Fixing Faulty Gradient Accumulation: Understanding the Issue and Its Resolution

The Weekly Kaitchup #63

Train and Serve an AI Chatbot Based on Llama 3.2

Fast Speculative Decoding with Llama 3.2 and vLLM

The Weekly Kaitchup #62

Generate Synthetic Data from Personas to Train AI Chatbots

Fine-tuning LLMs with 32-bit, 8-bit, and Paged AdamW Optimizers

The Weekly Kaitchup #61

The Unreasonable Impact of Gradient Checkpointing for Fine-tuning LLMs

Fine-Tuning Meta's Llama 3.2 1B & 3B Models on Budget GPUs

[Early Access] LLMs on a Budget, Chapter 1: Parameter-Efficient Fine-Tuning

The Weekly Kaitchup #60

How to Set Up a PEFT LoraConfig

transformers.js: Run Phi-3.5 & Llama 3.2 in Your Browser

Qwen2.5 QLoRA, LoRA, and Full Fine-tuning on Your Computer

The Weekly Kaitchup #59

Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM

Multimodal RAG with ColPali and Qwen2-VL on Your Computer

The Weekly Kaitchup #58

Introducing Minivoc: Faster and Memory-Efficient LLMs Through Vocabulary Reduction [WIP]

GuideLLM: Is Your Server Ready for LLM Deployment?

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

The Weekly Kaitchup #57

Falcon Mamba, Jamba, RWKV... Can You Use Them on Your Computer?

The Kaitchup's Book: LLMs on a Budget

Run Qwen2-VL on Your Computer with Text, Images, and Video, Step by Step

The Weekly Kaitchup #56

Run Llama 3.1 70B Instruct on Your GPU with ExLlamaV2 (2.2, 2.5, 3.0, and 4.0-bit)

Mistral-NeMo: 4.1x Smaller with Quantized Minitron

The Weekly Kaitchup #55

Fine-tuning Phi-3.5 MoE and Mini on Your Computer

QLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPU

The Weekly Kaitchup #54

Fine-tuning Base LLMs vs. Fine-tuning Their Instruct Version

The Best Quantization Methods to Run Llama 3.1 on Your GPU

The Weekly Kaitchup #53

SmolLM: Full Fine-tuning and Aligning Tiny LLMs on Your Computer

Multi-GPU Fine-tuning for Llama 3.1 70B with FSDP and QLoRA

The Weekly Kaitchup #52

Serve Multiple LoRA Adapters with vLLM

Llama 3.1: Fine-tuning on Consumer Hardware — LoRA vs. QLoRA

The Weekly Kaitchup #51

Llama 3 405B: Can You Fine-tune It?

Function Calling: Fine-tuning Llama 3 and Qwen2 on xLAM

The Weekly Kaitchup #50

Launching The Kaitchup Pro

GPU Benchmarking: What Is the Best GPU for LoRA, QLoRA, and Inference?

The Kaitchup Pro

Fine-tune Gemma 2 on Your Computer with LoRA and QLoRA

The Weekly Kaitchup #49

Train Better Llama 3 Embeddings with Simple Contrastive Learning

Fine-tune a Multimodal Chat Model with Florence-2 on Your Computer

The Weekly Kaitchup #48

rsQLoRA: Fine-tune Llama 3 with Higher Ranks and QLoRA

Florence-2: Run Multitask Vision-language Models on Your Computer

The Weekly Kaitchup #47

Intel AutoRound: Accurate Low-bit Quantization for LLMs

Simple QLoRA Fine-tuning with Axolotl

The Weekly Kaitchup #46

Continue Pre-training Llama 3 and Other LLMs on Your Computer

KV Cache Quantization for Memory-Efficient Inference with LLMs

The Weekly Kaitchup #45

Qwen2 vs. Llama 3: QLoRA Learning Curves and Quantization Performance

My LLM Can't Stop Generating, How to Fix It?

The Weekly Kaitchup #44

Fine-tune Tiny Adapters for Llama 3 with VeRA

Fine-tune Phi-3 Medium on Your Computer

The Weekly Kaitchup #43

1-bit and 2-bit Llama 3: Quantization with HQQ and Fine-tuning with HQQ+

Fine-tune the Token Embeddings and the Language Modeling Head of Llama 3

The Weekly Kaitchup #42

From Llama 3 70B to 120B: How to Self-Augment an LLM?

List of AI Notebooks

Fine-tuning LLMs with a Chat Template

The Weekly Kaitchup #41

Avoid Quantizing Llama 3 8B with GPTQ and Use BitsandBytes Instead

Fine-tune Llama 3 70B on Your GPU with AQLM 2-bit

The Weekly Kaitchup #40

Fine-tune Tiny Chat Models with Apple OpenELM and ORPO

Run Llama 3 70B on Your GPU with ExLlamaV2

The Weekly Kaitchup #39

Phi-3 mini: Fine-tuning and Quantization on Your Computer

Turn Llama 3 into an Embedding Model with LLM2Vec

The Weekly Kaitchup #38

Estimate the Memory Consumption of LLMs for Inference and Fine-tuning

Fine-tune Llama 3 on Your Computer

The Weekly Kaitchup #37

Training, Loading, and Merging QDoRA, QLoRA, and LoftQ Adapters

Neural Speed: Fast Inference on CPU for 4-bit Large Language Models

The Weekly Kaitchup #36

LoftQ: Better Initialization for a Quantization-Aware LoRA

ORPO: Preference Optimization without the Supervised Fine-tuning (SFT) Step

The Weekly Kaitchup #35

GaLore: Full Fine-tuning on Your GPU

A Guide on Hyperparameters and Training Arguments for Fine-tuning LLMs

The Weekly Kaitchup #34

Marlin: Nearly Ideal Inference Speed for 4-bit Models with vLLM (1k+ tokens/sec)

RAG for Mistral 7B Instruct with LlamaIndex and Transformers

The Weekly Kaitchup #33

Yi: Fine-tune and Run One of the Best Bilingual LLMs on Your Computer

Fine-tune a Better Google Gemma with Unsloth and Distilled DPO

The Weekly Kaitchup #32

Fine-tune Mixtral-8x7B Quantized with AQLM (2-bit) on Your GPU

DoRA vs. LoRA: Better and Faster than LoRA?

The Weekly Kaitchup #31

Speculative Decoding for Faster Inference with Mixtral-8x7B and Gemma

The Weekly Kaitchup #30

GGUF Quantization for Fast and Memory-Efficient Inference on Your CPU

Google's Gemma: Fine-tuning, Quantization, and Inference on Your Computer

The Weekly Kaitchup #29

Run a 7.7x Smaller Mixtral-8x7B on Your GPU with AQLM 2-bit Quantization

Fine-tuning and Quantization of Qwen1.5 LLMs on Your Computer

The Weekly Kaitchup #28

vLLM: Serve Fast Mistral 7B and Llama 2 Models from Your Computer

SqueezeLLM: Better 3-bit and 4-bit Quantization for Large Language Models

The Weekly Kaitchup #27

TinyLlama: Pre-training a Small Llama 2 from Scratch

The Weekly Kaitchup #26

From 16-bit to 2-bit: Finding the Best Trade-off Between Memory-Efficiency and Accuracy

The Mayonnaise: Rank First on the Open LLM Leaderboard with TIES-Merging

The Weekly Kaitchup #25

Fine-tune a Mixture of Experts on Your Computer

The Weekly Kaitchup #24

Maixtchup: Make Your Own Mixture of Experts with Mergekit

Optimum-Benchmark: How Fast and Memory-Efficient Is Your LLM?

The Weekly Kaitchup #23

Support Us

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

The Weekly Kaitchup #22

Fine-tune LLMs on Your CPU with QLoRA

Phi-2: A Small Model Easy to Fine-tune on Your GPU