Complete AI Detection Guide - How It Works

Detection Methods

1. Statistical Metrics Analysis (Client-Side)

The system calculates six linguistic metrics that help distinguish between AI-generated and human-written text:

Word Length

What it measures: Average character length of words in the text
AI indicator: Very long or very short average word lengths
Threshold: AI-like if > 5.5 or < 4.5 characters
Reasoning: AI often uses more formal vocabulary or overly simple words

Sentence Length

What it measures: Average number of words per sentence
AI indicator: Very long sentences (> 20 words) or very short ones (< 10 words)
Reasoning: AI tends to generate either overly complex or overly simple sentence structures

Burstiness

What it measures: Variation in sentence lengths (standard deviation / mean)
AI indicator: Low burstiness (< 0.5) suggests uniform structure
Human indicator: High burstiness (> 0.8) indicates natural variation
Reasoning: Humans naturally vary sentence length more than AI

Vocabulary Richness

What it measures: Ratio of unique words to total words
AI indicator: Extremely high variety (> 0.8) may indicate AI generation
Human indicator: More repetition (< 0.4) is typical in human writing
Reasoning: AI can artificially inflate vocabulary diversity

Perplexity

What it measures: Text predictability (simplified implementation)
AI indicator: Lower perplexity (< 30) suggests predictable patterns
Human indicator: Higher perplexity (> 60) indicates unpredictable patterns
Reasoning: AI-generated text often follows more predictable patterns

Entropy

What it measures: Character-level randomness (simplified implementation)
AI indicator: Lower entropy (< 3.5) suggests regular patterns
Human indicator: Higher entropy (> 4.5) indicates natural randomness
Reasoning: Human writing has more natural randomness in character usage

2. AI Similarity Analysis (Server-Side)

This advanced method uses AI to test how “AI-like” the text is by sending it to our servers for processing.

How It Works

Text Preprocessing: Input is limited to ~300 words for cost efficiency
AI Rephrasing: The text is sent to an LLM for analysis
Similarity Calculation: The original and AI-rephrased versions are compared using mathematical algorithms

Data Processing Notice

Important: When you use the AI Similarity Analysis feature, your text is sent to our servers (via Cloudflare Workers AI) for processing. We do not store or share this data - it's only used for the analysis and then discarded. The statistical metrics are calculated entirely in your browser and never leave your device.

Similarity Calculation Methods

Jaccard Similarity: Measures word overlap (intersection/union)
Longest Common Subsequence: Measures sequence similarity
Combined Score: 40% Jaccard + 60% sequence similarity

Interpretation Logic

Low Similarity (< 30%): More human-like - AI struggles to rephrase consistently
High Similarity (> 70%): More AI-like - AI easily rephrases in predictable patterns
AI Detection Score: (1 - similarity) × 100 (inverted because lower similarity = more human-like)

Confidence Levels

Very High AI Likelihood: 80%+ detection score
High AI Likelihood: 60-79% detection score
Moderate AI Likelihood: 40-59% detection score
Low AI Likelihood: 20-39% detection score
Very Low AI Likelihood: <20% detection score

Analysis Process

Step 1: Text Input

User enters text in the textarea and the system validates input showing real-time metrics.

Step 2: Statistical Analysis (Instant)

All enabled metrics are calculated client-side with visual indicators:

Green: Human-like indicators
Orange: AI-like indicators
Gray: Neutral/insufficient data

Step 3: Overall Assessment

System counts AI-like vs Human-like signals and provides conclusions:

Likely AI-generated: High confidence with AI signals
Possibly AI-generated: Lower confidence with AI signals
Likely human-written: High confidence with human signals
Possibly human-written: Lower confidence with human signals
Inconclusive: Need more text or conflicting signals

Step 4: AI Similarity Analysis (Optional)

User clicks “AI Similarity Analysis” button to get advanced analysis with confidence meter and detailed explanation.

Privacy & Performance

Data Privacy

No Data Storage: Text is only used for analysis, never stored
Client-Side Processing: Statistical metrics calculated in browser
Limited Server Calls: Only AI similarity analysis hits the server
Token Limiting: Max ~300 words sent to AI model

Performance Optimizations

Efficient Model: Uses Qwen 1.5 (1.8B parameters) instead of larger models
Cost Control: Token limiting keeps API costs low
Real-time Updates: Statistical metrics update as user types
Cloudflare Workers: Fast edge computing for AI calls

Limitations & Accuracy

Important Notes

Probabilistic Analysis: Results are not definitive, only suggestive
Context Matters: Short texts may not provide enough signals
Model Limitations: AI detection is an evolving field with inherent limitations
False Positives/Negatives: System may misclassify some texts

Best Practices

Use Multiple Metrics: Don't rely on single indicators
Consider Context: Factor in the source and purpose of the text
Minimum Text Length: At least 100-200 words for reliable analysis
Combine Methods: Use both statistical and similarity analysis when possible

This guide covers the current implementation as of the latest version. The AI detection field is rapidly evolving, and methods may be updated as new research emerges.