Detection Methods
1. Statistical Metrics Analysis (Client-Side)
The system calculates six linguistic metrics that help distinguish between AI-generated and human-written text:
Word Length
- What it measures: Average character length of words in the text
- AI indicator: Very long or very short average word lengths
- Threshold: AI-like if > 5.5 or < 4.5 characters
- Reasoning: AI often uses more formal vocabulary or overly simple words
Sentence Length
- What it measures: Average number of words per sentence
- AI indicator: Very long sentences (> 20 words) or very short ones (< 10 words)
- Reasoning: AI tends to generate either overly complex or overly simple sentence structures
Burstiness
- What it measures: Variation in sentence lengths (standard deviation / mean)
- AI indicator: Low burstiness (< 0.5) suggests uniform structure
- Human indicator: High burstiness (> 0.8) indicates natural variation
- Reasoning: Humans naturally vary sentence length more than AI
Vocabulary Richness
- What it measures: Ratio of unique words to total words
- AI indicator: Extremely high variety (> 0.8) may indicate AI generation
- Human indicator: More repetition (< 0.4) is typical in human writing
- Reasoning: AI can artificially inflate vocabulary diversity
Perplexity
- What it measures: Text predictability (simplified implementation)
- AI indicator: Lower perplexity (< 30) suggests predictable patterns
- Human indicator: Higher perplexity (> 60) indicates unpredictable patterns
- Reasoning: AI-generated text often follows more predictable patterns
Entropy
- What it measures: Character-level randomness (simplified implementation)
- AI indicator: Lower entropy (< 3.5) suggests regular patterns
- Human indicator: Higher entropy (> 4.5) indicates natural randomness
- Reasoning: Human writing has more natural randomness in character usage
2. AI Similarity Analysis (Server-Side)
This advanced method uses AI to test how “AI-like” the text is by sending it to our servers for processing.
How It Works
- Text Preprocessing: Input is limited to ~300 words for cost efficiency
- AI Rephrasing: The text is sent to an LLM for analysis
- Similarity Calculation: The original and AI-rephrased versions are compared using mathematical algorithms
Data Processing Notice
Important: When you use the AI Similarity Analysis feature, your text is sent to our servers (via Cloudflare Workers AI) for processing. We do not store or share this data - it's only used for the analysis and then discarded. The statistical metrics are calculated entirely in your browser and never leave your device.
Similarity Calculation Methods
- Jaccard Similarity: Measures word overlap (intersection/union)
- Longest Common Subsequence: Measures sequence similarity
- Combined Score: 40% Jaccard + 60% sequence similarity
Interpretation Logic
- Low Similarity (< 30%): More human-like - AI struggles to rephrase consistently
- High Similarity (> 70%): More AI-like - AI easily rephrases in predictable patterns
- AI Detection Score: (1 - similarity) × 100 (inverted because lower similarity = more human-like)
Confidence Levels
- Very High AI Likelihood: 80%+ detection score
- High AI Likelihood: 60-79% detection score
- Moderate AI Likelihood: 40-59% detection score
- Low AI Likelihood: 20-39% detection score
- Very Low AI Likelihood: <20% detection score
Analysis Process
Step 1: Text Input
User enters text in the textarea and the system validates input showing real-time metrics.
Step 2: Statistical Analysis (Instant)
All enabled metrics are calculated client-side with visual indicators:
- Green: Human-like indicators
- Orange: AI-like indicators
- Gray: Neutral/insufficient data
Step 3: Overall Assessment
System counts AI-like vs Human-like signals and provides conclusions:
- Likely AI-generated: High confidence with AI signals
- Possibly AI-generated: Lower confidence with AI signals
- Likely human-written: High confidence with human signals
- Possibly human-written: Lower confidence with human signals
- Inconclusive: Need more text or conflicting signals
Step 4: AI Similarity Analysis (Optional)
User clicks “AI Similarity Analysis” button to get advanced analysis with confidence meter and detailed explanation.
Privacy & Performance
Data Privacy
- No Data Storage: Text is only used for analysis, never stored
- Client-Side Processing: Statistical metrics calculated in browser
- Limited Server Calls: Only AI similarity analysis hits the server
- Token Limiting: Max ~300 words sent to AI model
Performance Optimizations
- Efficient Model: Uses Qwen 1.5 (1.8B parameters) instead of larger models
- Cost Control: Token limiting keeps API costs low
- Real-time Updates: Statistical metrics update as user types
- Cloudflare Workers: Fast edge computing for AI calls
Limitations & Accuracy
Important Notes
- Probabilistic Analysis: Results are not definitive, only suggestive
- Context Matters: Short texts may not provide enough signals
- Model Limitations: AI detection is an evolving field with inherent limitations
- False Positives/Negatives: System may misclassify some texts
Best Practices
- Use Multiple Metrics: Don't rely on single indicators
- Consider Context: Factor in the source and purpose of the text
- Minimum Text Length: At least 100-200 words for reliable analysis
- Combine Methods: Use both statistical and similarity analysis when possible
This guide covers the current implementation as of the latest version. The AI detection field is rapidly evolving, and methods may be updated as new research emerges.