Parity Test Matrix

You carefully unroll the scroll of Parity. It reads: “To port a dungeon is to rebuild it stone by stone, trap by trap, random number by random number — and then prove that every adventurer who steps inside cannot tell which dungeon they are in.”

This is the canonical reference for what “parity” means operationally. It defines every test suite, session category, comparison channel, deterministic control, and quality gate that governs C-vs-JS fidelity.

For the testing dashboard, git-notes workflow, and enforcement infrastructure, see TESTING.md. For how to capture new C reference sessions, see COLLECTING_SESSIONS.md. For the session file format specification, see SESSION_FORMAT_V3.md.

Test Commands

Command	Scope	What it runs	Gate
`npm run test:unit`	Unit tests	`node --test test/unit/*.test.js` (84 files)	PR
`npm run test:session`	Session parity	`node test/comparison/sessions.test.js` — all session types	PR
`npm test`	PR gate	`test:unit` + `test:session`	PR
`npm run test:e2e`	Browser E2E	`node --test --test-concurrency=1 test/e2e/*.test.js` (3 files, Puppeteer)	Release
`npm run test:all`	Release gate	`test:unit` + `test:session` + `test:e2e`	Release

The session test runner also accepts command-line options for targeted work:

node test/comparison/session_test_runner.js --verbose          # detailed per-session output
node test/comparison/session_test_runner.js --type=chargen     # single category
node test/comparison/session_test_runner.js --parallel          # use all CPU cores
node test/comparison/session_test_runner.js --fail-fast         # stop on first failure
node test/comparison/session_test_runner.js path/to/file.json  # single session

Session Categories

You enter the Hall of Mirrors. Each mirror reflects the same dungeon — one rendered in C, the other in JavaScript. Your quest: make every reflection identical.

All 203 session files live in two directories (test/comparison/sessions/ and test/comparison/maps/) and are classified into types by deriveType() in session_loader.js. The test runner in sessions.test.js groups results by type: chargen, gameplay, interface, map, special, other.

Type	Count	Seeds	What’s checked	Example
chargen	91	1, 42, 100, 200, 300	Character creation for all 13 roles × 5 seeds, plus race and alignment variants at seed 42. Compares startup RNG sequence and initial map grid.	`seed42_chargen_wizard`
gameplay	49	1–306	Live play: selfplay traces (5–200 turns per role), wizard-mode sessions (13 roles), movement and combat prefixes, inventory, items, Gnomish Mines descent, option variants. Compares RNG, screen text, and ANSI color.	`seed3_selfplay_100turns_gameplay`
interface	3	42	Pregame UI: startup screen, options menu, name prompt. Screen text and color compared with normalization for version strings and box-drawing characters.	`interface_startup`
map	10	16, 72, 119, 163, 306	Dungeon level generation to multi-depth: typGrid cell-by-cell and per-level RNG. Includes paired JS and C reference files.	`seed119_map`
special	50	1, 42, 100	Special level generation: 16 level groups × 3 seeds. Validates that level grids have correct 21×80 dimensions.	`seed42_special_mines`

Total: 203 session files (142 in sessions/, 61 in maps/).

Chargen inventory

Every role that can be created must be tested. Every seed that can diverge must be tried.

Variant	Count	Seeds	Defined in `seeds.json`
Base roles (13 × 5 seeds)	65	1, 42, 100, 200, 300	`chargen_seeds.seeds` × `chargen_seeds.sessions`
Race variants	15	42	`chargen_seeds.race_variants`
Alignment variants	10	42	`chargen_seeds.alignment_variants`
Interface chargen	1	42	`interface_seeds` (typed as chargen by filename)

Gameplay inventory

Gameplay sessions are the most diverse category — the dungeon has many rooms, and each must be checked.

Subcategory	Count	Notes
Explicit gameplay traces	16	Movement, combat prefixes (`F`, `G`), multidigit counts, inventory, items, Gnomish Mines
Per-role selfplay (200 turns)	13	Seeds 101–113, one per role — enough turns to exercise pet AI, combat, and level features
Per-role wizard mode	13	Seeds 201–213, wish + explore per role — high-level items, spells, equipment
Option variants	6	Seeds 301–306: verbose on/off, DECgraphics on/off, time on/off
Castle (map dir)	1	`seed42_castle.session.json`

Special level groups

Sixteen branches of the dungeon, each with its own rules, its own traps, its own way of killing you.

The 16 groups tested across 3 seeds (1, 42, 100): mines, sokoban, oracle, bigroom, castle, medusa, quest, gehennom, knox, valley, vlad, wizard, filler, rogue, tutorial, planes. Sokoban gets two extra seeds (200, 300) because its puzzle layouts are particularly sensitive to generation order.

Parity Dimensions

Five channels of truth. A session passes only when all five agree.

Session tests compare JS replay output against C reference captures across five channels, implemented in test/comparison/comparators.js:

Channel	What it checks	How mismatches are reported
RNG	PRNG call sequence (ISAAC64). Entries are normalized: source tags stripped, midlog and composite entries filtered.	First divergent call index with JS and session values, plus call-stack context (the `>funcname` entries preceding the divergence).
Screen	Terminal text output (24 lines). Trailing whitespace trimmed. Gameplay screens compare direct normalized rows (no column-shift padding heuristics); interface screens normalize box-drawing chars (`┌→-`) and version/copyright strings, then compare directly (no left-shift fallback).	First mismatched row with both line contents.
Grid	Dungeon typGrid — the 21×80 integer array of tile types. Cell-by-cell comparison.	First differing cell: (x, y) coordinates, JS value, session value.
Color	ANSI terminal attributes per cell: foreground, background, bold, inverse, underline. Parses full SGR sequences and DEC special graphics (SO/SI mode).	First mismatched (row, col) with full attribute details for both sides.
Metrics	Aggregate counts: rng matched/total, screens matched/total, grids matched/total, colors matched/total.	Summary object on the result — the bird’s-eye view of how close you are.

A session passes when every checked channel matches completely. The firstDivergence field identifies the earliest mismatch; firstDivergences gives the earliest mismatch per channel when multiple channels fail. These are your trail markers when debugging — follow the first divergence, and you will find the bug.

Deterministic Controls

The DevTeam thinks of everything. And so must we — every source of nondeterminism must be nailed down, or the mirrors will never align.

Parity testing requires bitwise-reproducible behavior across C and JS. Six controls eliminate environmental nondeterminism:

Control	What it fixes	C enforcement	JS enforcement
Seed	PRNG initial state (ISAAC64)	Passed via `run_session.py` at harness startup	Passed to `game.init({ seed })` or `replaySession(seed, ...)`
Date/time	In-game clock, moon phase, Friday-the-13th	Harness patches `gettimeofday` / build constants	`replay_core.js` uses fixed epoch; clock functions return deterministic values
Terminal size	Screen dimensions (80 columns × 24 lines)	Harness sets `LINES=24 COLUMNS=80`	`HeadlessDisplay` fixed at 80×24; comparators hard-code width 80
Options	Game behavior flags (verbose, pickup, symset, color, etc.)	Harness writes `.nethackrc` or CLI equivalents	`replayFlags` object merged with `DEFAULT_FLAGS` before replay
Sort stability	Tie-breaking order in object/monster lists	C harness uses stable qsort variant	JS `Array.prototype.sort` is stable per ES2019+ spec
Input replay	Keystroke-by-keystroke game input	Harness pipes recorded keystrokes via TTY	`replaySession()` feeds keys via `input.pushKey()` with per-step screen stabilization

Quality Gates

Two altars guard the path to release. You must sacrifice at both.

PR gate (`npm test`)

Every pull request must pass before merge. No exceptions, no --no-verify.

Requirement	Details
Unit tests green	All 84 unit test files pass
Session tests green	All 203 sessions pass across all types
No regressions	A previously-passing session may not regress to failing

Release gate (`npm run test:all` + golden comparison)

The full suite. Run before any release.

Requirement	Details
PR gate passes	Everything above
E2E tests green	3 Puppeteer browser tests pass (startup, gameplay, game flow)
Golden comparison	`--golden` flag compares JS output against golden-branch C captures

Session Recording

To test the JS dungeon, you must first record the C dungeon. The Oracle cannot compare what she has never seen.

New C reference sessions are generated by Python scripts in test/comparison/c-harness/ that drive a patched C NetHack build:

Script	What it captures
`gen_chargen_sessions.py`	Character creation from `seeds.json` chargen definitions
`gen_selfplay_trace.py`	Selfplay gameplay traces (scripted movement patterns)
`gen_selfplay_agent_trace.py`	Agent-driven selfplay traces (AI movement)
`gen_map_sessions.py`	Multi-depth map generation (typGrid + RNG per level)
`gen_special_sessions.py`	Special level grid captures (all 16 branch types)
`gen_interface_sessions.py`	Pregame UI interaction captures (startup, options, chargen)
`gen_option_sessions.py`	Option-variant gameplay (verbose, DECgraphics, time)
`create_wizard_sessions.py`	Wizard-mode per-role sessions (wish + explore)
`run_session.py`	Low-level single-session runner (building block for the above)
`setup.sh`	Builds the patched C NetHack binary for session capture

All session definitions live in test/comparison/seeds.json. To add a new session:

Add the seed, moves, and options to the appropriate section of seeds.json.
Run the corresponding gen_*.py script from c-harness/ to capture the C reference.
Place the resulting .session.json in sessions/ or maps/ as appropriate.
Run npm run test:session to verify JS matches.

For a full walkthrough of session capture, see COLLECTING_SESSIONS.md.