Maxwell-Demon is a Python package and CLI toolkit for binary discrimination between human-authored and machine-generated text via a dual-reference entropy protocol.
The primary decision statistic is the window-wise entropy differential:
where each entropy term is the mean token surprisal under a calibrated reference distribution.
python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'| Command | Role in the protocol |
|---|---|
python scripts/prepare_resources.py |
Reference calibration (human + synthetic dictionaries) |
maxwell-demon-tournament |
Dual-reference scoring and delta extraction |
maxwell-demon-report |
Standalone Markdown report generation from a tournament CSV |
maxwell-demon-phase |
Phase-space rendering (delta_h, burstiness_paisa) |
This is the shortest end-to-end path from raw files to interpretable outputs.
Expected structure:
data/<dataset>/
human/
001_human.txt
ai/
001_ai.txt
python scripts/prepare_resources.py \
--synthetic-input data/<dataset>/ai \
--config config.example.tomlRequired outputs:
data/reference/paisa_ref_dict.jsondata/reference/synthetic_ref_dict.json
maxwell-demon-tournament \
--human-input data/<dataset>/human \
--ai-input data/<dataset>/ai \
--config config.example.tomlDefault outputs:
results/<dataset>/data/final_delta.csvresults/<dataset>/data/final_delta.md
maxwell-demon-phase \
--input results/<dataset>/data \
--config config.example.tomlDefault output:
results/<dataset>/plot/phase_delta_h_vs_burstiness_paisa.html
Core columns in tournament CSV:
delta_h = H_human_ref - H_synthetic_refburstiness_paisa = Var(-log P_human_ref(token))label(if present): expected class (human/ai)
Practical reading:
- lower
delta_hmeans lower surprisal under human reference relative to synthetic reference; - higher
delta_hmeans lower surprisal under synthetic reference relative to human reference; - higher
burstiness_paisameans stronger local surprisal fluctuation under the human model. - default decision rule (if a hard threshold is needed):
delta_h < 0 => human.
Do not interpret single windows in isolation; inspect distributions per file and per class.
delta_h = 0is a useful default boundary but not universally optimal; calibrate on a validation set.- Window-level rows are not independent observations from the same document; avoid overconfident significance claims.
- Domain and genre shift can change token distributions and degrade discrimination quality.
- OOV/rare-token behavior depends on smoothing and tokenization settings.
- Keep tokenization and smoothing identical between reference building and runtime analysis.
Using local synthetic text:
python scripts/prepare_resources.py \
--synthetic-input data/dataset_it_01/ai \
--config config.example.tomlUsing remote synthetic text:
python scripts/prepare_resources.py \
--synthetic-url https://example.com/synthetic_corpus.txt.gz \
--config config.example.tomlHuman-only fallback (when no synthetic corpus is available):
python scripts/prepare_resources.py \
--only-human \
--config config.example.tomlmaxwell-demon-tournament \
--human-input data/dataset_it_01/human \
--ai-input data/dataset_it_01/ai \
--config config.example.tomlDefault artifact:
results/dataset_it_01/data/final_delta.csvresults/dataset_it_01/data/final_delta.md(auto-generated report)
maxwell-demon-phase \
--input results/dataset_it_01/data \
--config config.example.tomlDefault artifact:
results/dataset_it_01/plot/phase_delta_h_vs_burstiness_paisa.html
Protocol default: lzma.
Alternative codecs (gzip, bz2, zlib) are available for ablation and sensitivity analyses, but lzma is the operational baseline.
Canonical template: config.example.toml.
Top-level sections:
[analysis][compression][tokenization][reference][output][openai][shadow_dataset]
Tokenization defaults:
method = "tiktoken"(recommended)encoding_name = "cl100k_base"include_punctuation = truefallback_to_legacy_if_tiktoken_missing = true
Backward-compatible mode is available with method = "legacy" (lowercase + regex punctuation stripping).
For statistical consistency, reference-dictionary construction and runtime analysis both use the same tokenization configuration.
If method = "tiktoken" and the tiktoken package is not available:
- with
fallback_to_legacy_if_tiktoken_missing = true(default), the runtime falls back tolegacyand emits a warning; - with
fallback_to_legacy_if_tiktoken_missing = false, execution fails with an explicitModuleNotFoundError.
Output paths are dataset-aware through templating:
- data:
results/{dataset}/data - plots:
results/{dataset}/plot
maxwell-demon: single-run diagnostics (raw,diff)maxwell-demon-plot: static PNG trajectory plotmaxwell-demon-plot-html: interactive HTML trajectory plotmaxwell-demon-report: standalone Markdown report tool (--input,--output)scripts/run_analysis.py: wrapper for single/tournament execution modes
.venv/bin/ruff check .
PYTHONPATH=src .venv/bin/python -m pytest testsDOC/theoretical_framework.mdDOC/docs.mdDOC/guide.md
MIT (LICENSE).