Archive - The Kaitchup – AI on a Budget

The Weekly Kaitchup #59

Qwen2.5 - Ternary LLMs - Moshi

5 hrs ago •

Run and Serve Faster VLMs Like Pixtral and Phi-3.5 Vision with vLLM

Understanding how much memory you need to serve a VLM

Sep 19 •

Multimodal RAG with ColPali and Qwen2-VL on Your Computer

Retrieve and exploit information from PDFs without OCR

Sep 16 •

The Weekly Kaitchup #58

AdEMAMix - FLUTE

Sep 13 •

Introducing Minivoc: Faster and Memory-Efficient LLMs Through Vocabulary Reduction [WIP]

From 128k to 32k tokens

Sep 13 •

GuideLLM: Is Your Server Ready for LLM Deployment?

Simulate real-world inference workloads with GuideLLM

Sep 12 •

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

Fast and accurate GGUF models for your CPU

Sep 9 •

The Weekly Kaitchup #57

OLMoE - GuideLLM - vLLM 0.6.0

Sep 6 •

Falcon Mamba, Jamba, RWKV... Can You Use Them on Your Computer?

A close look at quantization and parameter-efficient fine-tuning (LoRA/QLoRA) for SSMs, RWKV, and hybrid models

Sep 5 •

Announcing The Kaitchup's Book: LLMs on a Budget [Pre-sales Open]

Learn how to fine-tune, quantize, run, and serve LLMs on consumer hardware

Sep 4 •

Run Qwen2-VL on Your Computer with Text, Images, and Video, Step by Step

Your local multimodal chat model

Sep 2 •

August 2024

The Weekly Kaitchup #56

NanoFlow - Comparison of LLM Inference Services (Accuracy) - Zamba2-1.2B

Aug 30 •

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts