What It Does

When a language model generates text, it produces a probability for every possible next token. Sampling parameters decide which of those tokens are actually allowed to be picked, and how the odds are weighted. Most people treat them as vague vibes - “turn temperature down to make it less crazy.” This tool shows the actual mechanism.

Start from a fixed example distribution (the candidate next-tokens after “The weather today is”), then move the sliders and watch the bars change in real time.

How to Use It

Adjust any slider - temperature, top-k, top-p, min-p, or repetition penalty.
Watch the bars: each token’s bar width is its final probability of being sampled. Tokens removed by a filter are greyed out and struck through.
Two tokens are marked as “already used” so you can see repetition penalty push their probability down.
Reset to defaults (temperature 0.8, everything else off) to see the plain softmax baseline.

How the Knobs Work

The parameters are applied in a specific order, and order matters:

Repetition penalty - divides the logits of already-used tokens, making them less likely to repeat.
Temperature - divides all logits. Below 1.0 sharpens the distribution (more confident, more repetitive); above 1.0 flattens it (more random).
Softmax - turns logits into probabilities.
Top-k - keeps only the k most probable tokens.
Top-p (nucleus) - keeps the smallest set of tokens whose cumulative probability reaches p.
Min-p - keeps only tokens at least as probable as min_p x (the top token's probability).
Renormalize - the survivors are rescaled to sum to 1. Those are the real odds.

Why It’s Useful

Repetitive output - usually temperature too low, or no repetition penalty. Watch how a penalty redistributes mass to fresh tokens.
Incoherent output - usually temperature too high with no top-p/min-p cutoff, so the long tail of nonsense tokens stays sampleable.
top-p vs min-p - min-p is relative to the top token, so it adapts to how confident the model is on each step; top-p uses a fixed cumulative mass. Toggling both on the same distribution makes the difference obvious.

Limitations

Illustrative distribution - the token set and logits are a fixed, hand-built example, not a live model. The math is real; the numbers are a teaching set.
Greedy and beam search not shown - this covers the common sampling stack (temperature + truncation), not deterministic decoding.
Implementation details vary - some runtimes apply these filters in a slightly different order or define repetition penalty differently (presence vs frequency). This uses the most common Hugging Face conventions.