Analyzing Content Authenticity Through Linguistic & Statistical Evidence
A forensic analysis system that evaluates textual evidence using multiple statistical, linguistic, and semantic signals to assess content authenticity across education, publishing, hiring, and research domains.
Advanced technology meets practical application
Calibrated thresholds for Academic, Technical, Creative, and Casual content types with specialized analysis algorithms for each domain.
Combines perplexity, entropy, structural, linguistic, semantic, and perturbation-stability signals to form a multi-angle forensic evidence profile
Sentence-level highlighting with confidence scores and detailed forensic reasoning for each assessment.
Analyze short texts in 1.2 seconds, medium documents in 3.5 seconds with parallel metric computation.
Upload and analyze TXT, PDF, DOCX, DOC, and Markdown files with automatic text extraction.
Understanding the science behind the forensic evaluation
Measures how predictable the text is using reference language model. Model-generated or algorithmically assisted text typically exhibits lower perplexity (more predictable) than human writing, which tends to be more varied and surprising.
Calculates token-level diversity and unpredictability in text sequences. Human writing shows higher entropy with more varied word choices, while algorithmically generated text tends toward more uniform token distributions.
Analyzes sentence length variance, punctuation patterns, and lexical burstiness. Human writing exhibits more variation in sentence structure and rhythm compared to algorithmically generated text, which often shows more uniform patterns.
Evaluates POS tag diversity, syntactic complexity, and grammatical patterns. Examines the richness of language structures and whether they match natural human linguistic variation.
Assesses semantic coherence, repetition patterns, and contextual consistency. Identifies semantic consistency patterns that often differ between human-authored and algorithmically generated text.
Tests text stability under random perturbations. Algorithmically generated text tends to maintain higher likelihood scores even when slightly modified, while human text shows more variation.
Paste text or upload a document to begin evidence-based forensic analysis. Our multi-signal ensemble will provide detailed, explainable insights.
Run an analysis to see sentence-level highlighting
Run an analysis to see detailed metric breakdowns