Catch the silent failure modes in your system prompt - vague instructions, conflicting rules, missing output format, role drift - before they cost you tokens
Lint score
100
/ 100
High
0
Medium
0
Low
0
Findings
As-is, no warranty. These apps are free under their listed license and run entirely in your browser. Use at your own risk — don't blame me if your PC catches fire, your dog runs away, or the math turns out wrong. Verify anything that actually matters. None of this is professional financial, medical, legal, or engineering advice.
System Prompt Linter runs your system prompt through approximately 20 heuristic checks based on documented prompt-engineering failure patterns. Every finding is severity-rated (high/medium/low) with an explanation and a concrete suggested fix. This is deterministic static analysis - no LLM is used to evaluate your prompt. Everything runs locally in your browser.
always do X followed by never do X-adjacent Y creates undefined behavior when both conditions apply simultaneously.The hardest thing about prompt engineering is that there’s no compiler. No error messages, no stack traces. A broken prompt produces output - just wrong output, and often in a way that only manifests under specific input patterns that don’t appear in your test cases.
The failure modes are categorical:
Ambiguity is interpreted differently on each inference. A prompt that says “respond concisely” without defining what concise means will produce 1-sentence answers sometimes, 3-paragraph answers other times, depending on factors you can’t control. This looks like non-determinism but is actually under-specification.
Conflicting rules create decision points where the model must choose between two instructions. The choice is model-version-dependent, temperature-dependent, and context-dependent. Your GPT-4 prod prompt may have worked because GPT-4 resolved the conflict one way; when you upgrade to a newer model, it resolves it differently. The behavior was never guaranteed.
Missing output format is the most costly in production. If your downstream code expects a JSON object and the model sometimes produces markdown-wrapped JSON and sometimes produces prose, your code breaks on ambiguous inputs. Specifying the exact output format - including an example - reduces this failure mode to near-zero.
Instruction overload is a real phenomenon. LLMs under long-context pressure show recency bias: instructions at the end of the system prompt are better followed than instructions in the middle. A prompt with 20 numbered rules will see rules 10–15 partially ignored. The linter flags this because the fix (consolidate rules, prioritize ruthlessly) is structural, not cosmetic.
In a single-turn chatbot, a slightly ambiguous prompt produces a slightly wrong answer. The user asks again with clarification. The stakes are low.
In an agentic system - a model that calls tools, runs in a loop, and takes actions - a slightly ambiguous prompt compounds. The agent misinterprets step 2 because step 1’s result was ambiguous. By step 5, it’s operating in a completely wrong context, and the damage (files created, emails sent, API calls made) is difficult to reverse.
This is why the standard for system prompt quality in production agents is higher than in chatbots. Every ambiguity is a branch point in the agent’s decision tree. More branches means more failure modes. The linter catches the ambiguities before they become agent bugs.
Specific anti-patterns and their agent failure modes:
{action: 'escalate', reason: '...'} and stop”) gives the agent a safe fallback.Think of this as a code review for prompts: it doesn’t catch every bug, but it catches the patterns that show up in post-mortems repeatedly.
For informational purposes only. Not financial, medical, or legal advice. You are solely responsible for how you use these tools.