Filter 54+ open-weight LLMs by hardware + task + license + capability. See which fit your VRAM at your chosen quantization, with a clearly-labeled performance estimate.
54 open-weight models 54 matching your filters
📡 Filterable browser, not an oracle. Performance estimates are based on params × bits-per-weight × hardware FLOPS — directional only. Real numbers come from running the model on your hardware. Catalog last refreshed 2026-06-01.
Hardware
Task
Must support
License
Min context window
Models
As-is, no warranty. These apps are free under their listed license and run entirely in your browser. Use at your own risk — don't blame me if your PC catches fire, your dog runs away, or the math turns out wrong. Verify anything that actually matters. None of this is professional financial, medical, legal, or engineering advice.
Pick your hardware on the left. The 54-model catalog filters on the right, with weights size and a throughput estimate against your selected quantization.
Each result card shows:
ollama pull command and HuggingFace linkThis is the most honest version of model performance estimation I can give without measuring on your hardware:
weights_GB = (params × bits_per_weight) / 8 / 1024³
active_weights_GB = same, but using params_active for MoE
tokens_per_sec ≈ (memory_bandwidth_GB/s × 0.8) / active_weights_GB
It’s bandwidth-bound, not FLOPS-bound — for typical LLM inference at batch 1, the GPU/CPU spends most of its time waiting for weights to stream from memory. The 0.8 fudge factor is a generous bandwidth utilization estimate. Real numbers will be lower with longer contexts (KV cache grows), with attention optimizations (FlashAttention), and with quantization-aware kernels.
Don’t make purchase decisions from this number — run llama-bench on the actual hardware for that. Do use it for “is this even feasible?” first-cut filtering.
The original spec was a recommender. We reframed during planning (2026-05-22): we don’t want to say “the best model for your Pi 5 is X.” Hardware-fit is just one dimension; quality on your specific task is another (and varies per benchmark, per prompt, per fine-tune). A facet browser lets you see all options at a glance and pick based on the tradeoffs you care about. That’s the honest framing.
llama-bench or [Ollama’s bench]reasoning-models.json and hyperscaler-pricing.jsonFor informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.