gekro
GitHub LinkedIn
AI

Sampling Playground

See exactly how temperature, top-p, top-k, min-p, and repetition penalty reshape a model's next-token probabilities

Prompt

The weather today is

12 candidate next-tokens with fixed logits. Move the sliders to see how each sampling stage reshapes the distribution.

Sampling parameters

0.80

Flattens (high) or sharpens (low) the whole distribution.

off

Keep only the k most-likely tokens. 0 = disabled.

off

Keep the smallest set whose probabilities sum to p. 1.0 = disabled.

off

Drop tokens below p × (the top token's probability). 0 = disabled.

1.00

Demotes already-used tokens (tagged rep↓): sunny, warm.

Order of operations

  1. Repetition penalty (on used tokens)
  2. Temperature scaling
  3. Softmax → baseline probabilities
  4. Top-k filter
  5. Top-p (nucleus) filter
  6. Min-p filter
  7. Renormalize survivors → final probs

Next-token distribution

Bar width = final sampling probability after all filters. Filtered tokens are greyed.

🔒 Everything runs in your browser. Fixed example logits, nothing sent anywhere. Companion to the Token Probability Visualizer.

As-is, no warranty. These apps are free under their listed license and run entirely in your browser. Use at your own risk — don't blame me if your PC catches fire, your dog runs away, or the math turns out wrong. Verify anything that actually matters. None of this is professional financial, medical, legal, or engineering advice.

© 2026 Rohit Burani · MIT · Built at gekro.com · View source ↗

Guide

What It Does

When a language model generates text, it produces a probability for every possible next token. Sampling parameters decide which of those tokens are actually allowed to be picked, and how the odds are weighted. Most people treat them as vague vibes - “turn temperature down to make it less crazy.” This tool shows the actual mechanism.

Start from a fixed example distribution (the candidate next-tokens after “The weather today is”), then move the sliders and watch the bars change in real time.

How to Use It

  1. Adjust any slider - temperature, top-k, top-p, min-p, or repetition penalty.
  2. Watch the bars: each token’s bar width is its final probability of being sampled. Tokens removed by a filter are greyed out and struck through.
  3. Two tokens are marked as “already used” so you can see repetition penalty push their probability down.
  4. Reset to defaults (temperature 0.8, everything else off) to see the plain softmax baseline.

How the Knobs Work

The parameters are applied in a specific order, and order matters:

  1. Repetition penalty - divides the logits of already-used tokens, making them less likely to repeat.
  2. Temperature - divides all logits. Below 1.0 sharpens the distribution (more confident, more repetitive); above 1.0 flattens it (more random).
  3. Softmax - turns logits into probabilities.
  4. Top-k - keeps only the k most probable tokens.
  5. Top-p (nucleus) - keeps the smallest set of tokens whose cumulative probability reaches p.
  6. Min-p - keeps only tokens at least as probable as min_p x (the top token's probability).
  7. Renormalize - the survivors are rescaled to sum to 1. Those are the real odds.

Why It’s Useful

  • Repetitive output - usually temperature too low, or no repetition penalty. Watch how a penalty redistributes mass to fresh tokens.
  • Incoherent output - usually temperature too high with no top-p/min-p cutoff, so the long tail of nonsense tokens stays sampleable.
  • top-p vs min-p - min-p is relative to the top token, so it adapts to how confident the model is on each step; top-p uses a fixed cumulative mass. Toggling both on the same distribution makes the difference obvious.

Limitations

  • Illustrative distribution - the token set and logits are a fixed, hand-built example, not a live model. The math is real; the numbers are a teaching set.
  • Greedy and beam search not shown - this covers the common sampling stack (temperature + truncation), not deterministic decoding.
  • Implementation details vary - some runtimes apply these filters in a slightly different order or define repetition penalty differently (presence vs frequency). This uses the most common Hugging Face conventions.

For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.