Pick a base model + training mode → see peak fine-tuning VRAM and which GPUs can fit the job.
Base model
—
Training mode
LoRA: base model frozen, train small adapter matrices. Saves optimizer + gradient memory.
LoRA settings
Rank controls adapter expressiveness. 8-32 standard; 64+ for harder tasks. "All linear" = matches PEFT default.
Batch & sequence
Peak VRAM
— GB
—
GPU fit
As-is, no warranty. These apps are free under their listed license and run entirely in your browser. Use at your own risk — don't blame me if your PC catches fire, your dog runs away, or the math turns out wrong. Verify anything that actually matters. None of this is professional financial, medical, legal, or engineering advice.
Pre-flight memory check before you launch a fine-tune. Pick:
Get back: total peak VRAM, per-component breakdown (weights / optimizer / gradients / activations), and a GPU compatibility table.
| Component | Formula |
|---|---|
| Model weights | params × bytes_per_param — bf16/fp16 = 2 bytes, NF4 (QLoRA) = ~0.5 |
| LoRA adapter weights | 2 × rank × hidden × modules × layers × 2 bytes (bf16) |
| Optimizer (AdamW) | 8 bytes × trainable_params (m + v in fp32) |
| Gradients | 4 bytes × trainable_params (fp32) |
| Activations | batch × seq × hidden × layers × act_factoract_factor ≈ 34 → ×0.4 with checkpointing → ×0.5 with FA2 |
| Framework overhead | +10% on subtotal |
LoRA freezes the base model — only adapter params (typically 0.1-2% of total) are trainable, so optimizer and gradient memory shrink ~100×. QLoRA additionally quantises the frozen base to NF4, shrinking weight memory ~4×.
Llama 3.3 70B with QLoRA, rank 16, batch 4 × seq 2048, gradient checkpointing + FA2:
By contrast, full fine-tune of the same model:
Estimates carry ±15% error. Always reserve 10-20% headroom.
For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.