How AI Detection Works

Understanding the science behind AI text detection and how our system identifies AI-generated content through linguistic analysis.

The Challenge of AI Detection

As AI language models become more sophisticated, distinguishing between human and AI-generated text has become increasingly complex. Our detection system analyzes multiple linguistic dimensions to provide transparent, explainable results.

Key Detection Metrics

Our detection system uses six core linguistic metrics, each implemented with specific thresholds based on research into AI vs. human writing patterns. All analysis runs entirely in your browser for privacy.

1. Average Word Length Analysis

Implementation: We calculate the mean character count across all words in the text.

avgWordLength = totalCharacters / totalWords

Detection Logic:

  • AI indicator: > 5.5 characters per word (formal, complex vocabulary)
  • Human indicator: < 4.5 characters per word (natural, conversational language)
  • Reasoning: AI models often default to more formal, academic vocabulary even in casual contexts

2. Average Sentence Length Analysis

Implementation: We count words per sentence across the entire text.

avgSentenceLength = totalWords / totalSentences

Detection Logic:

  • AI indicator: > 20 words per sentence (overly complex structure)
  • Human indicator: < 10 words per sentence (natural, conversational flow)
  • Reasoning: AI tends to generate either very long, complex sentences or unnaturally uniform lengths

3. Burstiness (Sentence Variation) Analysis

Implementation: We calculate the coefficient of variation in sentence lengths using standard deviation.

burstiness = standardDeviation / meanSentenceLength
standardDeviation = √(Σ(length - mean)² / count)

Detection Logic:

  • AI indicator: < 0.5 (uniform, monotonous variation)
  • Human indicator: > 0.8 (high natural variation)
  • Reasoning: Human writers naturally mix short, punchy sentences with longer, complex ones for rhythm and emphasis

4. Vocabulary Richness (Type-Token Ratio)

Implementation: We measure the ratio of unique words to total words.

vocabularyRichness = uniqueWords / totalWords

Detection Logic:

  • AI indicator: > 0.8 (artificially high diversity)
  • Human indicator: < 0.4 (natural word repetition)
  • Reasoning: AI may avoid natural word repetition that humans use for emphasis and coherence

5. Perplexity Analysis

Implementation: Measures text predictability based on word sequence probabilities.

perplexity = 2^(-Σ log₂(P(word|context)) / N)

Detection Logic:

  • AI indicator: < 30 (highly predictable patterns)
  • Human indicator: > 60 (unexpected word choices)
  • Note: Currently uses simplified calculation due to browser performance constraints

6. Entropy Analysis

Implementation: Measures information density and randomness in word distribution.

entropy = -Σ P(word) × log₂(P(word))

Detection Logic:

  • AI indicator: < 3.5 (regular, systematic patterns)
  • Human indicator: > 4.5 (natural randomness)
  • Note: Currently uses simplified calculation due to browser performance constraints

Analysis Algorithm and Scoring

Signal Detection Process

Our algorithm processes each metric and generates "signals" that indicate either AI-like or human-like characteristics:

1. Calculate each enabled metric
2. Compare against thresholds to generate signals
3. Count AI-like signals vs. human-like signals
4. Generate specific explanations for each signal
5. Calculate confidence based on signal strength

Confidence Score Calculation

Implementation: Confidence is calculated as the ratio of the strongest signal type to total signals.

confidence = max(aiSignals, humanSignals) / totalSignals
totalSignals = aiSignals + humanSignals

Result Classification

The system classifies results based on confidence levels and signal counts:

  • "Likely AI/Human": > 70% confidence + ≥ 2 total signals
  • "Possibly AI/Human": ≤ 70% confidence + ≥ 2 total signals
  • "Inconclusive": < 2 total signals or no enabled metrics

User Customization Features

Users can toggle individual metrics on/off, allowing for:

  • Focused analysis on specific linguistic dimensions
  • Adaptation to different text types and genres
  • Educational exploration of how each metric contributes
  • Debugging and understanding of detection results

Visual Feedback System

Each metric is displayed with:

  • Color coding: Green (human-like), red (AI-like), gray (neutral)
  • Numerical values: Exact measurements with units
  • Threshold indicators: Visual markers showing detection boundaries
  • Explanatory text: Context for what each result means

Privacy and Client-Side Processing

Complete Privacy: All text analysis happens entirely in your browser using JavaScript. No text is ever sent to our servers or any external services.

Technical Implementation:

  • Text processing using native JavaScript string methods
  • Statistical calculations performed client-side
  • No network requests during analysis
  • Results generated and displayed locally

Limitations and Considerations

It's important to understand that AI detection is not foolproof. Factors that can affect accuracy include:

  • Text length (shorter texts are harder to analyze)
  • Writing style and genre
  • Language and cultural context
  • Evolution of AI models

Continuous Improvement

Our detection algorithms are continuously updated to adapt to new AI models and writing patterns. We maintain transparency in our methodology while protecting against adversarial attempts to fool the system.