What It Does

Pre-flight memory check before you launch a fine-tune. Pick:

Base model (or input custom dimensions)
Training mode: Full / LoRA / QLoRA
LoRA rank + target modules
Batch size × sequence length
Gradient checkpointing on/off
Flash Attention 2 on/off

Get back: total peak VRAM, per-component breakdown (weights / optimizer / gradients / activations), and a GPU compatibility table.

How The Math Works

Component	Formula
Model weights	`params × bytes_per_param` — bf16/fp16 = 2 bytes, NF4 (QLoRA) = ~0.5
LoRA adapter weights	`2 × rank × hidden × modules × layers × 2` bytes (bf16)
Optimizer (AdamW)	`8 bytes × trainable_params` (m + v in fp32)
Gradients	`4 bytes × trainable_params` (fp32)
Activations	`batch × seq × hidden × layers × act_factor` act_factor ≈ 34 → ×0.4 with checkpointing → ×0.5 with FA2
Framework overhead	+10% on subtotal

LoRA freezes the base model — only adapter params (typically 0.1-2% of total) are trainable, so optimizer and gradient memory shrink ~100×. QLoRA additionally quantises the frozen base to NF4, shrinking weight memory ~4×.

Worked Example

Llama 3.3 70B with QLoRA, rank 16, batch 4 × seq 2048, gradient checkpointing + FA2:

Weights: 70B × 0.5 bytes ≈ 35 GB
Trainable params: ~167M (0.2% of total)
Optimizer: 167M × 8 ≈ 1.3 GB
Gradients: 167M × 4 ≈ 0.7 GB
Activations (with FA2 + checkpoint): ~4 GB
Peak ≈ 41-45 GB — fits on A100 80GB / L40S 48GB, tight on RTX 5090 32GB.

By contrast, full fine-tune of the same model:

Weights: 140 GB
Optimizer: 564 GB
Gradients: 282 GB
Peak ≈ 1+ TB — needs DeepSpeed ZeRO-3 sharding across 8+ A100s.

What’s NOT Modelled

DeepSpeed / FSDP sharding — multi-GPU training reduces per-GPU memory
Unsloth optimizations — Unsloth can be 30-50% lower than these estimates
Inference VRAM — see GPU VRAM Calculator
Mixed-precision edge cases — fp8 training, MX-FP4 etc.

Estimates carry ±15% error. Always reserve 10-20% headroom.

GPU VRAM Calculator — inference VRAM
Reasoning Cost Calculator — if you can rent inference instead of training
Hyperscaler Pricing Comparison

LoRA / QLoRA Memory Calculator

Guide

What It Does

How The Math Works

Worked Example

What’s NOT Modelled

Related Tools