This page shows all the AI notebooks currently available to paid subscribers. To access the links, become a paid subscriber:
Then, go back to the AI Notebooks page and all the notebooks will appear:
#106 Qwen2.5 QLoRA, LoRA, and Full Fine-tuning
#105 Run VLMs with vLLM - Examples with Pixtral and Phi-3.5 Vision
#104 Multimodal RAG on Your Computer with ColPali and Qwen2-VL for PDF Documents
#103 GuideLLM: Is Your Server Ready for Your LLM Deployment?
#102 GGUF Quantization with an Importance Matrix (imatrix) and K-quantization -- Example with Gemma 2
#101 Fine-tuning and Quantization for SSM, RWKV, and Hybrid Models -- Examples with Falcon Mamba, RWKV-6, and Jamba-1.5-Mini
#100 Run Qwen2-VL on Your Computer with Text, Images, and Video
#99 Run Llama 3.1 70B Instruct with ExLlamaV2 on Your GPU, and Comparison with AWQ and GPTQ
#98 Quantize and Evaluate Mistral-NeMo-Minitron-8B-Base and Llama-3.1-Minitron-4B
#97 Fine-tuning Phi-3.5 MoE and Mini -- With Code for AutoRound and Bitsandbytes Quantization
#96 Fine-tuning Llama 3.1 Quantized with AQLM, HQQ, GPTQ, and AutoRound -- Code and Training Logs
#95 Fine-tuning Base LLMs vs. Fine-tuning their Instruct Version
#94 Quantize and Evaluate Llama 3.1 Instruct with BitsandBytes, AWQ, GPTQ, and AutoRound
#93 Fine-tune SmolLM 135M and 370M with Distilled DPO
#92 Fine-tune Llama 3.1 70B with Two Consumer GPUs -- Using FSDP and QLoRA
#91 Serve Multiple LoRA Adapters with vLLM -- Example with Llama 3
#90 Fine-tune Llama 3.1 on Your Computer with QLoRA and LoRA -- Focus on the padding side
#89 Function Calling: Fine-tuning LLMs on xLAM -- Examples with Llama 3 and Qwen2
#88 GPU Benchmarking for LoRA, QLoRA Fine-tuning, and Inference with and without 4-bit Quantization
#87 Fine-tune Gemma 2 on Your Computer -- With Transformers and Unsloth
#86 Train LLM Embedding Models with SimCSE-- Examples with Llama 3
#85 Florence 2: Fine-tune a Multimodal Chat Model on Your Computer
#84 Fine-tune Llama 3 with Higher QLoRA Ranks (rsLoRA)
#83 Florence 2: Run a Vision-language Model on Your Computer
#82 Smaller LLMs with AutoRound Low-bit Quantization
#81 Easy Fine-tuning with Axolotl -- Example with Llama 3
#80 Continue Pre-training LLMs on Your Computer with Unsloth -- Examples with Llama 3 and Mistral 7B
#79 Quantize the KV Cache for Memory-Efficient Inference -- Example with Llama 3 8B
#78 Qwen2 QLoRA Fine-tuning and Quantization
#77 How to identify and fix issues with the EOS token to prevent endless generation -- Examples with Llama 3
#76 VeRA: Fine-tuning Tiny Adapters for Llama 3 8B
#75 Fine-tune Phi-3 Medium on Your Computer -- With Code to Merge Adapters
#74 1-bit and 2-bit Llama 3: Quantization with HQQ and Fine-tuning with HQQ+
#73 Fine-tuning the Token Embeddings and the Language Modeling Head -- Example with Llama 3 8B
#72 Duplicate, remove, and reorder layers of an LLM -- Example with Llama 3
#71 Fine-tuning LLMs with a Chat Template -- Turning Llama 3 into a Pirate
#70 Quantize and Evaluate Llama 3 8B
#69 Fine-tune Llama 3 70B on Your GPU with AQLM Quantization
#68 Fine-tune Tiny Chat Models with ORPO -- Example with Apple's OpenELM
#67 Llama 3 70B Quantization with ExLlamaV2
#66 Phi-3: Fine-tuning, Inference, and Quantization
#65 Turning Llama 3 into a Text Embedding Model with LLM2Vec
#64 Estimate the Memory Consumption for Fine-tuning and Running LLMs
#63 Fast and Small Llama 3 with Activation-Aware Quantization (AWQ)
#62 Fine-tune Llama 3 on Your Computer -- With Code to Merge Adapters and Quantize the Model
#61 Training, Loading, and Merging QDoRA, QLoRA, and LoftQ Adapters
#60 Neural Speed: Fast Inference for 4-bit LLMs on CPU
#59 LoftQ: A Better LoRA Adapter for Quantized LLMs -- Example with Mistral 7B
#58 Fine-tune Instruct LLMs with ORPO -- Example with Mistral 7B
#57 Full Fine-tuning with GaLore on a Consumer GPU -- Example with Mistral 7B
#56 vLLM Inference with Marlin for GPTQ
#55 RAG for Mistral 7B on Consumer Hardware with LlamaIndex and Transformers
#54 Yi: Fine-tuning, Inference, Quantization, and Benchmarking
#53 Fast Fine-tuning and DPO Training for Google Gemma with Unsloth (Zephyr Recipe)
#52 Fine-tune Mixtral-8x7B on a Single Consumer GPU with AQLM Quantization
#51 Fine-tune Mistral 7B with DoRA
#50 Speculative decoding with Transformers for Mixtral-8x7B , Gemma, Llama 2, and Pythia
#49 GGUF Your LLM with llama.cpp and Run It on Your CPU
#48 Gemma: Fine-tuning, Inference, and Quantization
#47 Quantization and Inference with 2-bit AQLM -- Example with Mixtral-7x8b and TinyLlama
#46 Qwen1.5: Fine-tuning, Inference, Quantization, and Benchmarking
#45 vLLM: Server AWQ and SqueezeLLM models
#44 SqueezeLLM: Make Accurate Quantized LLMs
#43 TinyLlama: Fine-tuning, Inference, Quantization, and Benchmarking
#42 8-bit vs. 4-bit vs. 3-bit vs. 2-bit GPTQ Quantization of Mistral 7B and Llama 2 7B/13B: Benchmarking Memory-Efficiency and Accuracy
#41 TIES-Merging: Trim and Merge LLMs While Keeping the Number of Parameters
#40 Fine-tune Mixture of Experts on Consumer Hardware -- Examples with Maixtchup
#39 Make Your Own Mixture of Experts with Mergekit
#38 Benchmarking LLMs with Optimum-benchmark
#37 Quantize and Offload Experts to Run Mixtral-8x7B on Consumer Hardware
#36 Fine-tune Mistral 7B on Your CPU with Intel Extension for Transformers
#35 Phi-2: Fine-tuning, quantization, and inference
#34 Fast and Memory-Efficient QLoRA Fine-tuning of Mistral 7B with Unsloth
#33 The Evaluation Harness for Quantized LLMs and LoRA Adapters -- Examples with Llama 2
#32 Fine-tune Mixtral-8x7B on Your Computer (QLoRA)
#31 Fine-tune Mistral 7B with Distilled IPO
#30 Combine Multiple LoRA Adapters -- Examples with Llama 2
#29 Use AWQ with Hugging Face Transformers — Examples with Mistral 7
#28 QLoRA Fine-tuning with FlashAttention-2 — Examples with Llama 2
#27 Evaluate the Impact of Merging a QLoRA Adapter into a 4-bit LLM
#26 Make a Cheap Zephyr 7B with Distilled DPO
#25 Fine-tune Llama 2 for Translation
#24 Fine-tune an Instruct Version of Mistral 7B with DPO
#23 Quantize Mistral Instruct 7B on Your Computer with bitsandbytes and AutoGPTQ
#22 Fine-tune Mistral 7B on Your Computer with QLoRa and TRL
#21 Fine-tuning Llama 2 with QA-LoRA
#20 Fast and Small Llama 2 with Activation-Aware Quantization
#19 phi1.5: Fine-tuning, quantization, and inference
#18 Mixed-Precision Quantization of Llama 2 70B and 13B with ExLlamaV2
#17 DeepSpeed Chat - Step #3: Reinforcement Learning with Human Feedback
#16 Loading, saving, and benchmarking safetensors with Llama 2
#15 DeepSpeed Chat - Step #2: Training a Reward Model
#14 Merge LoRA Adapters Fine-tuned with QLoRA: A Benchmark for Different Methods
#13 DeepSpeed Chat - Step #1: Supervised Fine-tuning w/ LoRA
#12 Quantize and Fine-tune LLMs with GPTQ Using Transformers and TRL - Examples with Llama 2
#11 GPTQ vs. bitsandbytes -Examples with Llama 2
#10 Run Platypus 13B on Your Computer
#9 Fine-tune Like Platypus on Your Computer with QLoRa and TRL
#8 Padding Causal LLM - Llama 2 examples
#7 Fine-tune Llama 2 on Your Computer with QLoRa and TRL
#6 Quantization of Llama 2 with GPTQ
#5 Fine-tune a 20B Parameters Chat Model with QLoRa: GPT-NeoX-20B on Alpaca
#4 Run Llama 2 Chat Models on Your Computer
#3 ReLoRa: Pre-train an LLM from scratch with low-rank networks
#2 Fine-tuning GPT-NeoX-20b with QLoRA
#1 Fine-tuning and Inference with Falcon-7B using TRL and QLoRa