List of AI Notebooks

This page shows all the AI notebooks currently available to paid subscribers. To access the links, become a paid subscriber:

Then, go back to the AI Notebooks page and all the notebooks will appear:

AI Notebooks


#105 Run VLMs with vLLM - Examples with Pixtral and Phi-3.5 Vision

#104 Multimodal RAG on Your Computer with ColPali and Qwen2-VL for PDF Documents

#103 GuideLLM: Is Your Server Ready for Your LLM Deployment?

#102 GGUF Quantization with an Importance Matrix (imatrix) and K-quantization -- Example with Gemma 2

#101 Fine-tuning and Quantization for SSM, RWKV, and Hybrid Models -- Examples with Falcon Mamba, RWKV-6, and Jamba-1.5-Mini

#100 Run Qwen2-VL on Your Computer with Text, Images, and Video

#99 Run Llama 3.1 70B Instruct with ExLlamaV2 on Your GPU, and Comparison with AWQ and GPTQ

#98 Quantize and Evaluate Mistral-NeMo-Minitron-8B-Base and Llama-3.1-Minitron-4B

#97 Fine-tuning Phi-3.5 MoE and Mini -- With Code for AutoRound and Bitsandbytes Quantization

#96 Fine-tuning Llama 3.1 Quantized with AQLM, HQQ, GPTQ, and AutoRound -- Code and Training Logs

#95 Fine-tuning Base LLMs vs. Fine-tuning their Instruct Version

#94 Quantize and Evaluate Llama 3.1 Instruct with BitsandBytes, AWQ, GPTQ, and AutoRound

#93 Fine-tune SmolLM 135M and 370M with Distilled DPO

#92 Fine-tune Llama 3.1 70B with Two Consumer GPUs -- Using FSDP and QLoRA

#91 Serve Multiple LoRA Adapters with vLLM -- Example with Llama 3

#90 Fine-tune Llama 3.1 on Your Computer with QLoRA and LoRA -- Focus on the padding side

#89 Function Calling: Fine-tuning LLMs on xLAM -- Examples with Llama 3 and Qwen2

#88 GPU Benchmarking for LoRA, QLoRA Fine-tuning, and Inference with and without 4-bit Quantization

#87 Fine-tune Gemma 2 on Your Computer -- With Transformers and Unsloth

#86 Train LLM Embedding Models with SimCSE-- Examples with Llama 3

#85 Florence 2: Fine-tune a Multimodal Chat Model on Your Computer

#84 Fine-tune Llama 3 with Higher QLoRA Ranks (rsLoRA)

#83 Florence 2: Run a Vision-language Model on Your Computer

#82 Smaller LLMs with AutoRound Low-bit Quantization

#81 Easy Fine-tuning with Axolotl -- Example with Llama 3

#80 Continue Pre-training LLMs on Your Computer with Unsloth -- Examples with Llama 3 and Mistral 7B

#79 Quantize the KV Cache for Memory-Efficient Inference -- Example with Llama 3 8B

#78 Qwen2 QLoRA Fine-tuning and Quantization

#77 How to identify and fix issues with the EOS token to prevent endless generation -- Examples with Llama 3

#76 VeRA: Fine-tuning Tiny Adapters for Llama 3 8B

#75 Fine-tune Phi-3 Medium on Your Computer -- With Code to Merge Adapters

#74 1-bit and 2-bit Llama 3: Quantization with HQQ and Fine-tuning with HQQ+

#73 Fine-tuning the Token Embeddings and the Language Modeling Head -- Example with Llama 3 8B

#72 Duplicate, remove, and reorder layers of an LLM -- Example with Llama 3

#71 Fine-tuning LLMs with a Chat Template -- Turning Llama 3 into a Pirate

#70 Quantize and Evaluate Llama 3 8B

#69 Fine-tune Llama 3 70B on Your GPU with AQLM Quantization

#68 Fine-tune Tiny Chat Models with ORPO -- Example with Apple's OpenELM

#67 Llama 3 70B Quantization with ExLlamaV2

#66 Phi-3: Fine-tuning, Inference, and Quantization

#65 Turning Llama 3 into a Text Embedding Model with LLM2Vec

#64 Estimate the Memory Consumption for Fine-tuning and Running LLMs

#63 Fast and Small Llama 3 with Activation-Aware Quantization (AWQ)

#62 Fine-tune Llama 3 on Your Computer -- With Code to Merge Adapters and Quantize the Model

#61 Training, Loading, and Merging QDoRA, QLoRA, and LoftQ Adapters

#60 Neural Speed: Fast Inference for 4-bit LLMs on CPU

#59 LoftQ: A Better LoRA Adapter for Quantized LLMs -- Example with Mistral 7B

#58 Fine-tune Instruct LLMs with ORPO -- Example with Mistral 7B

#57 Full Fine-tuning with GaLore on a Consumer GPU -- Example with Mistral 7B

#56 vLLM Inference with Marlin for GPTQ

#55 RAG for Mistral 7B on Consumer Hardware with LlamaIndex and Transformers

#54 Yi: Fine-tuning, Inference, Quantization, and Benchmarking

#53 Fast Fine-tuning and DPO Training for Google Gemma with Unsloth (Zephyr Recipe)

#52 Fine-tune Mixtral-8x7B on a Single Consumer GPU with AQLM Quantization

#51 Fine-tune Mistral 7B with DoRA

#50 Speculative decoding with Transformers for Mixtral-8x7B , Gemma, Llama 2, and Pythia

#49 GGUF Your LLM with llama.cpp and Run It on Your CPU

#48 Gemma: Fine-tuning, Inference, and Quantization

#47 Quantization and Inference with 2-bit AQLM -- Example with Mixtral-7x8b and TinyLlama

#46 Qwen1.5: Fine-tuning, Inference, Quantization, and Benchmarking

#45 vLLM: Server AWQ and SqueezeLLM models

#44 SqueezeLLM: Make Accurate Quantized LLMs

#43 TinyLlama: Fine-tuning, Inference, Quantization, and Benchmarking

#42 8-bit vs. 4-bit vs. 3-bit vs. 2-bit GPTQ Quantization of Mistral 7B and Llama 2 7B/13B: Benchmarking Memory-Efficiency and Accuracy

#41 TIES-Merging: Trim and Merge LLMs While Keeping the Number of Parameters

#40 Fine-tune Mixture of Experts on Consumer Hardware -- Examples with Maixtchup

#39 Make Your Own Mixture of Experts with Mergekit

#38 Benchmarking LLMs with Optimum-benchmark

#37 Quantize and Offload Experts to Run Mixtral-8x7B on Consumer Hardware

#36 Fine-tune Mistral 7B on Your CPU with Intel Extension for Transformers

#35 Phi-2: Fine-tuning, quantization, and inference

#34 Fast and Memory-Efficient QLoRA Fine-tuning of Mistral 7B with Unsloth

#33 The Evaluation Harness for Quantized LLMs and LoRA Adapters -- Examples with Llama 2

#32 Fine-tune Mixtral-8x7B on Your Computer (QLoRA)

#31 Fine-tune Mistral 7B with Distilled IPO

#30 Combine Multiple LoRA Adapters -- Examples with Llama 2

#29 Use AWQ with Hugging Face Transformers — Examples with Mistral 7

#28 QLoRA Fine-tuning with FlashAttention-2 — Examples with Llama 2

#27 Evaluate the Impact of Merging a QLoRA Adapter into a 4-bit LLM

#26 Make a Cheap Zephyr 7B with Distilled DPO

#25 Fine-tune Llama 2 for Translation

#24 Fine-tune an Instruct Version of Mistral 7B with DPO

#23 Quantize Mistral Instruct 7B on Your Computer with bitsandbytes and AutoGPTQ

#22 Fine-tune Mistral 7B on Your Computer with QLoRa and TRL

#21 Fine-tuning Llama 2 with QA-LoRA

#20 Fast and Small Llama 2 with Activation-Aware Quantization

#19 phi1.5: Fine-tuning, quantization, and inference

#18 Mixed-Precision Quantization of Llama 2 70B and 13B with ExLlamaV2

#17 DeepSpeed Chat - Step #3: Reinforcement Learning with Human Feedback

#16 Loading, saving, and benchmarking safetensors with Llama 2

#15 DeepSpeed Chat - Step #2: Training a Reward Model

#14 Merge LoRA Adapters Fine-tuned with QLoRA: A Benchmark for Different Methods

#13 DeepSpeed Chat - Step #1: Supervised Fine-tuning w/ LoRA

#12 Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL - Examples with Llama 2

#11 GPTQ vs. bitsandbytes -Examples with Llama 2

#10 Run Platypus 13B on Your Computer

#9 Fine-tune Like Platypus on Your Computer with QLoRa and TRL

#8 Padding Causal LLM - Llama 2 examples

#7 Fine-tune Llama 2 on Your Computer with QLoRa and TRL

#6 Quantization of Llama 2 with GPTQ

#5 Fine-tune a 20B Parameters Chat Model with QLoRa: GPT-NeoX-20B on Alpaca

#4 Run Llama 2 Chat Models on Your Computer

#3 ReLoRa: Pre-train an LLM from scratch with low-rank networks

#2 Fine-tuning GPT-NeoX-20b with QLoRA

#1 Fine-tuning and Inference with Falcon-7B using TRL and QLoRa