Parity Test Matrix
You carefully unroll the scroll of Parity. It reads: “To port a dungeon is to rebuild it stone by stone, trap by trap, random number by random number — and then prove that every adventurer who steps inside cannot tell which dungeon they are in.”
This is the canonical reference for what “parity” means operationally. It defines every test suite, session category, comparison channel, deterministic control, and quality gate that governs C-vs-JS fidelity.
For the testing dashboard, git-notes workflow, and enforcement infrastructure, see TESTING.md. For how to capture new C reference sessions, see COLLECTING_SESSIONS.md. For the session file format specification, see SESSION_FORMAT_V3.md.
Test Commands
| Command | Scope | What it runs | Gate |
|---|---|---|---|
npm run test:unit |
Unit tests | node --test test/unit/*.test.js (84 files) |
PR |
npm run test:session |
Session parity | node test/comparison/sessions.test.js — all session types |
PR |
npm test |
PR gate | test:unit + test:session |
PR |
npm run test:e2e |
Browser E2E | node --test --test-concurrency=1 test/e2e/*.test.js (3 files, Puppeteer) |
Release |
npm run test:all |
Release gate | test:unit + test:session + test:e2e |
Release |
The session test runner also accepts command-line options for targeted work:
node test/comparison/session_test_runner.js --verbose # detailed per-session output
node test/comparison/session_test_runner.js --type=chargen # single category
node test/comparison/session_test_runner.js --parallel # use all CPU cores
node test/comparison/session_test_runner.js --fail-fast # stop on first failure
node test/comparison/session_test_runner.js path/to/file.json # single session
Session Categories
You enter the Hall of Mirrors. Each mirror reflects the same dungeon — one rendered in C, the other in JavaScript. Your quest: make every reflection identical.
All 203 session files live in two directories (test/comparison/sessions/ and
test/comparison/maps/) and are classified into types by deriveType() in
session_loader.js. The test runner in sessions.test.js groups results by
type: chargen, gameplay, interface, map, special, other.
| Type | Count | Seeds | What’s checked | Example |
|---|---|---|---|---|
| chargen | 91 | 1, 42, 100, 200, 300 | Character creation for all 13 roles × 5 seeds, plus race and alignment variants at seed 42. Compares startup RNG sequence and initial map grid. | seed42_chargen_wizard |
| gameplay | 49 | 1–306 | Live play: selfplay traces (5–200 turns per role), wizard-mode sessions (13 roles), movement and combat prefixes, inventory, items, Gnomish Mines descent, option variants. Compares RNG, screen text, and ANSI color. | seed3_selfplay_100turns_gameplay |
| interface | 3 | 42 | Pregame UI: startup screen, options menu, name prompt. Screen text and color compared with normalization for version strings and box-drawing characters. | interface_startup |
| map | 10 | 16, 72, 119, 163, 306 | Dungeon level generation to multi-depth: typGrid cell-by-cell and per-level RNG. Includes paired JS and C reference files. | seed119_map |
| special | 50 | 1, 42, 100 | Special level generation: 16 level groups × 3 seeds. Validates that level grids have correct 21×80 dimensions. | seed42_special_mines |
Total: 203 session files (142 in sessions/, 61 in maps/).
Chargen inventory
Every role that can be created must be tested. Every seed that can diverge must be tried.
| Variant | Count | Seeds | Defined in seeds.json |
|---|---|---|---|
| Base roles (13 × 5 seeds) | 65 | 1, 42, 100, 200, 300 | chargen_seeds.seeds × chargen_seeds.sessions |
| Race variants | 15 | 42 | chargen_seeds.race_variants |
| Alignment variants | 10 | 42 | chargen_seeds.alignment_variants |
| Interface chargen | 1 | 42 | interface_seeds (typed as chargen by filename) |
Gameplay inventory
Gameplay sessions are the most diverse category — the dungeon has many rooms, and each must be checked.
| Subcategory | Count | Notes |
|---|---|---|
| Explicit gameplay traces | 16 | Movement, combat prefixes (F, G), multidigit counts, inventory, items, Gnomish Mines |
| Per-role selfplay (200 turns) | 13 | Seeds 101–113, one per role — enough turns to exercise pet AI, combat, and level features |
| Per-role wizard mode | 13 | Seeds 201–213, wish + explore per role — high-level items, spells, equipment |
| Option variants | 6 | Seeds 301–306: verbose on/off, DECgraphics on/off, time on/off |
| Castle (map dir) | 1 | seed42_castle.session.json |
Special level groups
Sixteen branches of the dungeon, each with its own rules, its own traps, its own way of killing you.
The 16 groups tested across 3 seeds (1, 42, 100): mines, sokoban, oracle, bigroom, castle, medusa, quest, gehennom, knox, valley, vlad, wizard, filler, rogue, tutorial, planes. Sokoban gets two extra seeds (200, 300) because its puzzle layouts are particularly sensitive to generation order.
Parity Dimensions
Five channels of truth. A session passes only when all five agree.
Session tests compare JS replay output against C reference captures across
five channels, implemented in test/comparison/comparators.js:
| Channel | What it checks | How mismatches are reported |
|---|---|---|
| RNG | PRNG call sequence (ISAAC64). Entries are normalized: source tags stripped, midlog and composite entries filtered. | First divergent call index with JS and session values, plus call-stack context (the >funcname entries preceding the divergence). |
| Screen | Terminal text output (24 lines). Trailing whitespace trimmed. Gameplay screens compare direct normalized rows (no column-shift padding heuristics); interface screens normalize box-drawing chars (┌→-) and version/copyright strings, then compare directly (no left-shift fallback). |
First mismatched row with both line contents. |
| Grid | Dungeon typGrid — the 21×80 integer array of tile types. Cell-by-cell comparison. | First differing cell: (x, y) coordinates, JS value, session value. |
| Color | ANSI terminal attributes per cell: foreground, background, bold, inverse, underline. Parses full SGR sequences and DEC special graphics (SO/SI mode). | First mismatched (row, col) with full attribute details for both sides. |
| Metrics | Aggregate counts: rng matched/total, screens matched/total, grids matched/total, colors matched/total. | Summary object on the result — the bird’s-eye view of how close you are. |
A session passes when every checked channel matches completely. The
firstDivergence field identifies the earliest mismatch; firstDivergences
gives the earliest mismatch per channel when multiple channels fail.
These are your trail markers when debugging — follow the first divergence,
and you will find the bug.
Deterministic Controls
The DevTeam thinks of everything. And so must we — every source of nondeterminism must be nailed down, or the mirrors will never align.
Parity testing requires bitwise-reproducible behavior across C and JS. Six controls eliminate environmental nondeterminism:
| Control | What it fixes | C enforcement | JS enforcement |
|---|---|---|---|
| Seed | PRNG initial state (ISAAC64) | Passed via run_session.py at harness startup |
Passed to game.init({ seed }) or replaySession(seed, ...) |
| Date/time | In-game clock, moon phase, Friday-the-13th | Harness patches gettimeofday / build constants |
replay_core.js uses fixed epoch; clock functions return deterministic values |
| Terminal size | Screen dimensions (80 columns × 24 lines) | Harness sets LINES=24 COLUMNS=80 |
HeadlessDisplay fixed at 80×24; comparators hard-code width 80 |
| Options | Game behavior flags (verbose, pickup, symset, color, etc.) | Harness writes .nethackrc or CLI equivalents |
replayFlags object merged with DEFAULT_FLAGS before replay |
| Sort stability | Tie-breaking order in object/monster lists | C harness uses stable qsort variant | JS Array.prototype.sort is stable per ES2019+ spec |
| Input replay | Keystroke-by-keystroke game input | Harness pipes recorded keystrokes via TTY | replaySession() feeds keys via input.pushKey() with per-step screen stabilization |
Quality Gates
Two altars guard the path to release. You must sacrifice at both.
PR gate (npm test)
Every pull request must pass before merge. No exceptions, no --no-verify.
| Requirement | Details |
|---|---|
| Unit tests green | All 84 unit test files pass |
| Session tests green | All 203 sessions pass across all types |
| No regressions | A previously-passing session may not regress to failing |
Release gate (npm run test:all + golden comparison)
The full suite. Run before any release.
| Requirement | Details |
|---|---|
| PR gate passes | Everything above |
| E2E tests green | 3 Puppeteer browser tests pass (startup, gameplay, game flow) |
| Golden comparison | --golden flag compares JS output against golden-branch C captures |
Session Recording
To test the JS dungeon, you must first record the C dungeon. The Oracle cannot compare what she has never seen.
New C reference sessions are generated by Python scripts in
test/comparison/c-harness/ that drive a patched C NetHack build:
| Script | What it captures |
|---|---|
gen_chargen_sessions.py |
Character creation from seeds.json chargen definitions |
gen_selfplay_trace.py |
Selfplay gameplay traces (scripted movement patterns) |
gen_selfplay_agent_trace.py |
Agent-driven selfplay traces (AI movement) |
gen_map_sessions.py |
Multi-depth map generation (typGrid + RNG per level) |
gen_special_sessions.py |
Special level grid captures (all 16 branch types) |
gen_interface_sessions.py |
Pregame UI interaction captures (startup, options, chargen) |
gen_option_sessions.py |
Option-variant gameplay (verbose, DECgraphics, time) |
create_wizard_sessions.py |
Wizard-mode per-role sessions (wish + explore) |
run_session.py |
Low-level single-session runner (building block for the above) |
setup.sh |
Builds the patched C NetHack binary for session capture |
All session definitions live in test/comparison/seeds.json. To add a new
session:
- Add the seed, moves, and options to the appropriate section of
seeds.json. - Run the corresponding
gen_*.pyscript fromc-harness/to capture the C reference. - Place the resulting
.session.jsoninsessions/ormaps/as appropriate. - Run
npm run test:sessionto verify JS matches.
For a full walkthrough of session capture, see COLLECTING_SESSIONS.md.