Overview
Detect whether text was written by a human or generated by AI using a multi-layer approach:
- Rule-based analysis — linguistic patterns and statistical indicators
- LLM-as-judge — use Claude to score content against a detection ruleset
- External APIs — optional GPTZero or Originality.ai for corroboration
Instructions
Detection Ruleset
When analyzing text, evaluate these signals:
Strong AI Indicators (weight: high)
- Uniform sentence rhythm — sentences consistently similar in length and structure
- Hedging overuse — "it's important to note", "furthermore", "additionally"
- Perfect paragraph structure — every paragraph follows intro-body-conclusion
- Generic examples — abstract or hypothetical, not from real experience
- No typos or informal language — unnaturally clean writing
- Overuse of em-dashes — especially common in Claude output
Moderate AI Indicators (weight: medium)
- Safe, diplomatic stance — never takes a controversial position
- Abstract nouns over verbs — "the utilization of" vs "using"
- No sensory details — descriptions lack taste, smell, texture
- Temporal vagueness — "in recent years" instead of specific dates
Human Indicators (reduce AI suspicion)
- Typos or self-corrections
- Specific dates, names, places from personal experience
- Unusual word choices or slang
- Strong opinions stated without hedging
Statistical Signals
- Burstiness — human text mixes long and short sentences (score > 0.5 = likely human, < 0.3 = likely AI)
- Vocabulary richness — type-token ratio is lower in AI text
- Perplexity — AI text has more predictable word choices
LLM-as-Judge Prompt
Use a structured prompt that lists the ruleset above and asks the LLM to return a JSON object with score (0-10), verdict, confidence, signals_found, reasoning, and suspicious_phrases.
Pipeline
- Run local analysis (burstiness, AI phrase detection) — free, instant
- Run LLM analysis with the detection prompt — costs API tokens
- Optionally call GPTZero or Originality.ai for corroboration
- Average scores from all sources for a combined verdict
Examples
Example 1: Detecting AI-generated blog post
A content manager receives a freelance blog post titled "10 Ways to Boost Your Morning Routine". They paste the text into the detection pipeline:
Input text (excerpt):
"In today's fast-paced world, it's important to note that establishing a morning
routine can significantly enhance your productivity. Furthermore, research shows
that individuals who wake up early tend to be more successful. Additionally,
incorporating mindfulness practices into your morning can yield substantial benefits."
Local analysis:
Burstiness: 0.18 (low — sentences are uniform length)
AI phrases found: ["it's important to note", "furthermore", "additionally", "research shows"]
Local score: 5.0
LLM analysis:
Score: 8/10
Verdict: "likely_ai"
Signals: ["uniform sentence rhythm", "excessive hedging phrases", "temporal vagueness", "no personal anecdotes"]
Suspicious phrases: ["In today's fast-paced world", "it's important to note", "can significantly enhance"]
Combined score: 6.5/10 — Likely AI. Flagged for human review.
Example 2: Confirming human-written article
An editor checks a personal essay from a regular contributor:
Input text (excerpt):
"I burned my toast again this morning — third time this week. My neighbor Dave,
who's been a barista at Groundwork Coffee on Rose Ave since 2019, once told me
the secret is to never trust the 'light' setting. He's wrong, obviously, but I
still think about it every time I smell that acrid char."
Local analysis:
Burstiness: 0.62 (high — varied sentence lengths)
AI phrases found: []
Local score: 0.0
LLM analysis:
Score: 1/10
Verdict: "human"
Signals: ["specific personal anecdote", "named person and place", "informal language", "humor and opinion"]
Combined score: 0.5/10 — Human-written. No flag.
Guidelines
- No detection method is 100% accurate — always route flagged content to a human reviewer
- Short texts (<200 words) produce unreliable results; skip automated scoring
- Non-native English writers may trigger false positives; consider raising thresholds
- Paraphrasing tools can fool detectors — use multiple detection layers
- AI watermarking (C2PA) is more reliable when available
- For batch processing, chunk long documents into ~1500-word sections and average scores