The Kaitchup – AI on a Budget
Subscribe
Sign in
Home
Notes
AI Notebooks
The Kaitchup Pro
Table of Contents
The Kaitchup's Book
Tutorials
Models
Archive
About
Tutorials
Latest
Top
Discussions
Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM
Understanding how much memory you need to serve a VLM
Sep 19
•
Benjamin Marie
5
Share this post
Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
2
Multimodal RAG with ColPali and Qwen2-VL on Your Computer
Retrieve and exploit information from PDFs without OCR
Sep 16
•
Benjamin Marie
10
Share this post
Multimodal RAG with ColPali and Qwen2-VL on Your Computer
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
3
GuideLLM: Is Your Server Ready for LLM Deployment?
Simulate real-world inference workloads with GuideLLM
Sep 12
•
Benjamin Marie
5
Share this post
GuideLLM: Is Your Server Ready for LLM Deployment?
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
Falcon Mamba, Jamba, RWKV... Can You Use Them on Your Computer?
A close look at quantization and parameter-efficient fine-tuning (LoRA/QLoRA) for SSMs, RWKV, and hybrid models
Sep 5
•
Benjamin Marie
Share this post
Falcon Mamba, Jamba, RWKV... Can You Use Them on Your Computer?
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
Run Qwen2-VL on Your Computer with Text, Images, and Video, Step by Step
Your local multimodal chat model
Sep 2
•
Benjamin Marie
5
Share this post
Run Qwen2-VL on Your Computer with Text, Images, and Video, Step by Step
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
4
Run Llama 3.1 70B Instruct on Your GPU with ExLlamaV2 (2.2, 2.5, 3.0, and 4.0-bit)
Is ExLlamaV2 Still Good Enough?
Aug 29
•
Benjamin Marie
7
Share this post
Run Llama 3.1 70B Instruct on Your GPU with ExLlamaV2 (2.2, 2.5, 3.0, and 4.0-bit)
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
3
Mistral-NeMo: 4.1x Smaller with Quantized Minitron
How Pruning, Knowledge Distillation, and 4-Bit Quantization Can Make Advanced AI Models More Accessible and Cost-Effective
Aug 26
•
Benjamin Marie
12
Share this post
Mistral-NeMo: 4.1x Smaller with Quantized Minitron
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
4
Fine-tuning Phi-3.5 MoE and Mini on Your Computer
With code to quantize the models with bitsandbytes and AutoRound
Aug 22
•
Benjamin Marie
4
Share this post
Fine-tuning Phi-3.5 MoE and Mini on Your Computer
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
11
QLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPU
Bitsandbytes is not your only option
Aug 19
•
Benjamin Marie
11
Share this post
QLoRA with AutoRound: Cheaper and Better LLM Fine-tuning on Your GPU
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
1
SmolLM: Full Fine-tuning and Aligning Tiny LLMs on Your Computer
With supervised fine-tuning and distilled DPO
Aug 8
•
Benjamin Marie
8
Share this post
SmolLM: Full Fine-tuning and Aligning Tiny LLMs on Your Computer
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
9
Multi-GPU Fine-tuning for Llama 3.1 70B with FSDP and QLoRA
What you can do with only 2x24 GB GPUs, and a lot of CPU RAM
Aug 5
•
Benjamin Marie
6
Share this post
Multi-GPU Fine-tuning for Llama 3.1 70B with FSDP and QLoRA
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
18
Serve Multiple LoRA Adapters with vLLM
Without any increase in latency
Aug 1
•
Benjamin Marie
14
Share this post
Serve Multiple LoRA Adapters with vLLM
newsletter.kaitchup.com
Copy link
Facebook
Email
Note
Other
7
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts