Development Guide

“Welcome, strider, to the Mazes of Development! You fall through a trap door into a large room filled with source code.”

Prerequisites

Node.js 25+ (see .nvmrc)
Python 3 (for python3 -m http.server and data generators)
Puppeteer (npm install — used by E2E tests)

For C comparison testing (optional):

gcc, make, bison, flex (build tools)
ncurses-dev (libncurses-dev on Linux, Xcode command line tools on macOS)
tmux (drives the C binary headlessly)

Practical Setup (This Repo Runtime)

Use this as the default command setup for development and translator work in this repository.

Check core tools:

node -v
npm -v
python3 -V
conda --version

Use conda Python for translator tooling:

# Confirm conda base has pip + clang bindings
conda run -n base python -m pip --version
conda run -n base python -c "import clang, clang.cindex; print('clang ok')"

Run translator commands with conda Python: ```bash conda run -n base python tools/c_translator/main.py –help

Example parse summary

conda run -n base python tools/c_translator/main.py
–src nethack-c/src/hack.c
–emit parse-summary
–out /tmp/hack.parse.json

File-wide translation capability summary

conda run -n base python tools/c_translator/main.py
–src nethack-c/src/hack.c
–emit capability-summary
–out /tmp/hack.capability.json

Multi-file capability matrix (for scale planning)

conda run -n base python tools/c_translator/capability_matrix.py
–src nethack-c/src/hack.c
–src nethack-c/src/monmove.c
–src nethack-c/src/zap.c
–out /tmp/translator.capability.matrix.json

Note: default excludes apply from

tools/c_translator/rulesets/translation_scope_excluded_sources.json

(tests/fixtures + non-gameplay C subsystems). Use –no-exclude-sources

when intentionally running fixture-only translation checks.

Batch emit-helper generation (hundreds-scale sweeps)

conda run -n base python tools/c_translator/batch_emit.py
–src nethack-c/src/hack.c
–src nethack-c/src/allmain.c
–src nethack-c/src/getpos.c
–out-dir /tmp/translator-batch
–summary-out /tmp/translator-batch-summary.json

Select stitch-ready candidates from a batch summary

conda run -n base python tools/c_translator/select_candidates.py
–summary /tmp/translator-batch-summary.json
–out /tmp/translator-batch-candidates.json

Optional: allow specific diag codes (for example CFG complexity review queue)

conda run -n base python tools/c_translator/select_candidates.py
–summary /tmp/translator-batch-summary.json
–allow-diag CFG_COMPLEXITY
–out /tmp/translator-batch-candidates-plus-cfg.json

Find clean candidates that already map to exported runtime JS functions

conda run -n base python tools/c_translator/runtime_stitch_candidates.py
–summary /tmp/translator-batch-summary.json
–out /tmp/translator-runtime-stitch-candidates.json

Optional: override default translator-scope excludes

conda run -n base python tools/c_translator/runtime_stitch_candidates.py
–summary /tmp/translator-batch-summary.json
–exclude-sources-file tools/c_translator/rulesets/translation_scope_excluded_sources.json
–out /tmp/translator-runtime-stitch-candidates-gameplay.json

Heuristic safety lint for runtime candidates (unknown callee detection)

conda run -n base python tools/c_translator/runtime_candidate_safety.py
–candidates /tmp/translator-runtime-stitch-candidates.json
–out /tmp/translator-runtime-stitch-safety.json

Apply runtime-safe candidates into JS modules (dry run by default)

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .

Write stitched updates

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–write

Optional: skip known-bad auto-translations while stitching

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–denylist tools/c_translator/runtime_stitch_denylist.json
–write

Optional: strict allowlist stitch (only listed pairs will be applied)

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–allowlist /tmp/translator-allowlist.json
–write

Build a refactor queue from rejected safety/signature candidates

(capture stitch dry-run JSON first)

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–denylist tools/c_translator/runtime_stitch_denylist.json \

/tmp/translator-runtime-stitch-apply.json conda run -n base python tools/c_translator/refactor_queue.py
–safety /tmp/translator-runtime-stitch-safety.json
–apply-summary /tmp/translator-runtime-stitch-apply.json
–out /tmp/translator-refactor-queue.json

Hunt non-mechanical aliases and missing import/binding candidates

conda run -n base python tools/c_translator/identifier_hunt.py
–queue /tmp/translator-refactor-queue.json
–out /tmp/translator-identifier-hunt.json

Audit currently-marked autotranslations against current pipeline categories

conda run -n base python tools/c_translator/audit_marked_autotranslations.py
–repo-root .
–summary /tmp/translator-batch-full-summary.json
–candidates /tmp/translator-runtime-stitch-candidates-full.json
–safety /tmp/translator-runtime-stitch-safety-full.json
–apply-summary /tmp/translator-runtime-stitch-apply-full.json
–out /tmp/marked-autotranslation-audit.json

Audit C out-param and Sprintf/Snprintf patterns from batch metadata

conda run -n base python tools/c_translator/audit_outparams_and_formatting.py
–summary /tmp/translator-batch-full-summary.json
–safety /tmp/translator-runtime-stitch-safety-full.json
–out /tmp/outparam-format-audit.json

Notes:
- `runtime_candidate_safety.py` now auto-detects strict alias matches where a
  C identifier differs only by case/underscore from an existing module symbol.
- It also consumes curated non-mechanical alias rules from
  `tools/c_translator/rulesets/identifier_aliases.json`.
- It now also rejects known semantic trap patterns even when syntax is valid:
  pointer-style truthy loops, NUL-sentinel scalar writes, and whole-string
  `highc/lowc` rewrites.
- It also supports module-level semantic blocking via
  `tools/c_translator/rulesets/semantic_block_modules.json` for files whose C
  pointer/string idioms are not yet safely lowerable to JS.
- `refactor_queue.py` emits these as `rename_alias` tasks so we can prioritize
  canonical renames separately from true missing identifiers.

4. Translator policy/annotation checks (Node scripts):
```bash
npm run -s translator:check-policy
npm run -s translator:check-annotations

Core parity test loops:

npm run -s test:unit
npm run -s test:session
# C-parity-session-only coverage report
npm run -s coverage:session-parity

Coverage details:

Session-parity-only coverage design and usage are documented in COVERAGE.md.

Notes:

On some hosts, /usr/bin/python3 may not include pip or clang bindings.
Prefer conda run -n base python ... for tools/c_translator/* commands to ensure clang.cindex is available.

Quick Start

# Install dependencies
npm install

# Run the game locally
npm run serve
# Open http://localhost:8080

# Run all fast tests (unit + session)
npm test

# Run everything (unit + session + e2e)
npm run test:all

Manual Keylog Recording (Canonical)

For manual C keylog recordings used by parity fixtures, use the recorder wrapper instead of launching nethack directly. The wrapper sets deterministic env, creates a controlled .nethackrc, and cleans stale save/lock/bones state.

# Tutorial manual recording example
./record-manual-capture.sh \
  --seed=7 \
  --role=Wizard \
  --race=human \
  --gender=male \
  --align=neutral \
  --no-wizard \
  --tutorial \
  --tmux-socket=default \
  --keylog=test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl

Then convert to a v3 comparison session:

python3 test/comparison/c-harness/keylog_to_session.py \
  --in test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl \
  --out test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json \
  --startup-mode auto \
  --tutorial on \
  --wizard auto

When conversion replay itself is under investigation, capture a v3 session directly during manual play (no second replay pass):

python3 test/comparison/c-harness/record_manual_session_v3.py \
  --seed=8 \
  --name=Tutes \
  --role=Wizard \
  --race=human \
  --gender=male \
  --align=neutral \
  --symset=DECgraphics \
  --tutorial-option=unset

For reproducible C-only capture from an existing keylog, use integrated autofeed mode (exact-byte tmux injection) instead of a separate feeder:

python3 test/comparison/c-harness/record_manual_session_v3.py \
  --autofeed \
  --seed=7 \
  --name=Tutes \
  --role=Wizard \
  --race=human \
  --gender=male \
  --align=neutral \
  --symset=DECgraphics \
  --tutorial-option=on \
  --autofeed-keylog=test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl \
  --keylog=test/comparison/keylogs/seed7_tutorial_autofeed_direct.jsonl \
  --output-session=test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json

The command reports key integrity (AUTOFEED_KEY_MISMATCH_AT=...) so you can verify captured keys match the source stream.

To verify tutorial coverage quickly:

node scripts/tutorial-coverage.mjs \
  test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json

Project Structure

See DESIGN.md for the complete module architecture and C-to-JS correspondence mapping. This guide focuses on workflows and commands.

For a quick reference: the js/ directory contains 32 ES6 modules organized by subsystem (Core, Display & I/O, RNG, World Generation, Creatures, Objects, etc.), each with comments linking to C source files. The test/ directory contains unit tests, E2E browser tests, and C-comparison session tests.

Running Tests

“You feel as if someone is testing you.”

Test Tiers

npm test            # Core: ~12s — 20 unit files + 18 high-signal sessions
npm run test:all    # Full: ~90s — all 3700+ unit tests + 570 sessions

npm test is fast enough to run after every edit. npm run test:all before pushing.

Individual Test Commands

# Unit tests — module-level correctness
npm run test:unit

# Core sessions only (no units)
npm run test:core

# E2E tests — browser rendering via Puppeteer
npm run test:e2e

# Session comparison — replay C reference sessions
npm run test:session

# Single session
node test/comparison/session_test_runner.js <session-path>

# Dump raw JS replay trace from a C gameplay session
npm run replay:dump -- test/comparison/sessions/<file>.session.json --out /tmp/<file>.js-replay.json

# Compare C gameplay session vs JS replay (generated or --js file)
npm run session:compare -- test/comparison/sessions/<file>.session.json

Timeout policy (hang detection):

npm run test:unit enforces a 1000ms timeout per unit test.
Single-session replay runs (node test/comparison/session_test_runner.js <session>) enforce a 10000ms timeout per session by default.
session_test_runner runs the full selected set by default; add --fail-fast only when you explicitly want to stop on first failure.

Systematic Stall CPU Diagnosis

When a session times out and appears CPU-bound/live-locked, use the profiler wrapper:

node scripts/replay_stall_diagnose.mjs \
  --session seed325_knight_wizard_gameplay \
  --timeout-ms 12000 \
  --top 20

What it does:

Runs session_test_runner under Node --cpu-prof.
Stores artifacts in tmp/stall-diagnose/<timestamp>/.
Writes:
- run.log (full replay/test output),
- summary.txt (top self-sample functions and files),
- summary.json (machine-readable summary).

This makes hotspot triage repeatable across agents and avoids ad-hoc manual profiling.

Fast Parity Triage Loop (Comparison Artifacts)

Use this loop when chasing a specific seed/session divergence.

Reproduce one session and emit comparison artifacts:

node test/comparison/session_test_runner.js --verbose \
  test/comparison/sessions/seed110_samurai_selfplay200_gameplay.session.json

List artifacts in the latest run:

node scripts/comparison-window.mjs --list

Inspect first divergence windows (RNG/event):

node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay --channel rng --window 12
node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay --channel event --window 12

Inspect an explicit index:

node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay \
  --channel event --index 1076 --window 10

Inspect per-step turn-accounting drift (RNG/event counts):

node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay \
  --step-summary --step-from 186 --step-to 200

Notes:

Artifacts are written under tmp/session-comparisons/<run-id>/.
tmp/session-comparisons/LATEST points to the most recent run.
comparison-window supports --session and --file. When --dir is not supplied, it searches recent runs for the requested target if it is not present in the latest run.
Use these artifacts to localize real gameplay bugs in core js/ modules; do not patch comparator/harness behavior to hide mismatches.

Timing-window triage:

Strict parity (screens, colors) remains authoritative for pass/fail.
Non-gating timing-window metrics (screenWindow, colorWindow) additionally check JS animation boundary frames for a match within the same step.
If strict mismatch + window match occurs, the result includes rerecordHint with candidate steps. Treat this as a capture-timing signal, not as gameplay parity success.
Re-record those sessions after adding per-turn delay overrides to the session (for C capture):
```
{ "regen": { "key_delays_s": { "106": 0.25 } } }
```

Or annotate the step directly (preferred for persistent per-step intent):

{ "steps": [ ..., { "key": "h", "capture": { "key_delay_s": 0.25 } } ] }

Then run:

python3 test/comparison/c-harness/rerecord.py \
test/comparison/sessions/seed110_samurai_selfplay200_gameplay.session.json

For legacy sessions that omitted explicit --More-- dismiss keys, enable migration mode to record them into the session transcript:
```
python3 test/comparison/c-harness/run_session.py <seed> <out.json> <moves> --record-more-spaces
```

Replay Boundary (Core vs Harness)

Keep gameplay and UI semantics in core runtime modules (js/), not in comparison orchestration.

Core runtime owns command behavior, modal flows, rendering, and state transitions.
Replay/comparison harness owns input driving, capture, normalization, and diff reporting.
If replay needs a special case, prefer a generic capture policy (for example, display-only acknowledgement frames) over per-command behavior forks.
For gameplay screen text diffs, prefer ANSI-cell-derived plain rows when ANSI capture is available; avoid comparator-side column-shift heuristics. Use plain-line DEC decoding only as a legacy fallback when ANSI is unavailable.
For interface screen text diffs, compare normalized rows directly (no left-shift fallback matching).

Session Tests In Detail

The session runner auto-discovers all *.session.json files in test/comparison/sessions/ and test/comparison/maps/ and verifies JS output against C-captured reference data:

Session Type	What It Tests	Example
`"map"` (source: c)	typGrid match + RNG traces + structural validation	`seed16_maps_c.session.json`
`"gameplay"`	Startup typGrid + per-step RNG traces + screen rendering	`seed42.session.json`
`"chargen"`	Character creation reference data (screens, RNG, inventory)	`seed42_chargen_valkyrie.session.json`

Map sessions generate levels 1→5 sequentially on one RNG stream (matching C’s behavior). Each level is checked for:

Cell-by-cell typGrid match against C
RNG call count match (when rngCalls present)
Per-call RNG trace match (when rng present)
Wall completeness, corridor connectivity, stairs placement

Gameplay parity uses an explicit two-phase architecture:

Recorder: run JS from C-captured input keys and capture raw JS trace.
Comparator: compare recorded JS trace to C session using comparator policy.

See COMPARISON_PIPELINE.md for module-level details.

C Comparison (optional, slower)

# One-time setup: clone, patch, and build the C binary
bash test/comparison/c-harness/setup.sh

# Regenerate all C sessions and maps from seeds.json config
python3 test/comparison/c-harness/run_session.py --from-config
python3 test/comparison/c-harness/gen_map_sessions.py --from-config

# Generate character creation sessions for all 13 roles
python3 test/comparison/c-harness/gen_chargen_sessions.py --from-config

# Or capture a single seed/role manually
python3 test/comparison/c-harness/gen_map_sessions.py 42 5 --with-rng
python3 test/comparison/c-harness/gen_chargen_sessions.py 42 v h f n Valkyrie

# Session runner auto-discovers all C-captured files
npm run test:session

Common Development Tasks

Unified Backlog Intake

Use one project backlog for all work, with labels for classification.

Capture candidate issues from:
- failing tests/sessions and CI regressions,
- C-to-JS audit/coverage gaps,
- manual playtesting findings,
- selfplay findings,
- release blockers and user/developer bug reports.
Classify every new issue with labels.
- Use parity for C-vs-JS divergence/parity work.
- Add other domain labels as appropriate (selfplay, infra, docs, etc.).
Keep new issues unowned by default.
- Add agent:<name> only when an agent actively claims the issue.
Use evidence-first issue bodies for parity issues.
- Include seed/session/command, first mismatch point, and expected vs actual behavior.

C NetHack is single-threaded. When more() calls wgetch(), the CPU blocks and nothing else executes until the key is pressed. In JS, await nhgetch() yields the event loop, allowing any pending Promise continuation to fire — breaking the single-threaded contract.

js/modal_guard.js enforces the C contract with runtime assertions:

Modal owners (more, yn, getlin, getdir, menu, getobj) call enterModal(name) before awaiting input and exitModal(name) after.
Game mutation points (moveloop_core, movemon, mattacku, domove_core, rhack) call assertNotInModal(name) at entry.
If game code runs while a modal is active, the guard throws with a diagnostic showing both the violation point and the modal entry stack.

The most common bug this catches: an async function called without await. The orphaned Promise’s continuation fires during an unrelated await yield later, violating execution ordering.

# Enabled by default. Disable for production if needed:
WEBHACK_MODAL_GUARD=0 node ...

# Enable entry stack traces for debugging (collects Error().stack on
# every enterModal call — expensive, off by default):
WEBHACK_MODAL_GUARD_TRACE=1 node ...

Nested modals are valid and supported (stack-based) — e.g., more() inside putstr_message inside ynFunction. The guard catches only non-modal game code running during a modal wait.

C Parity Policy

When working on C-vs-JS parity, follow this rule:

Use failing unit/session tests to decide what to work on next.
Treat session replay results as the primary gameplay parity authority; use unit tests as focused guardrails.
Use C source code (nethack-c/src/*.c) as the behavior spec.
Do not “fix to the trace” with JS-only heuristics when C code disagrees.
If a test reveals missing behavior, port the corresponding C logic path.
Keep changes incremental and keep tests green after each port batch.

Iron Parity Campaign Workflow

For state-canonicalization and translator campaign work, use the Iron Parity docs as the planning authority:

Required gating expectations for campaign changes:

policy classification remains complete for all js/*.js files:

npm run -s translator:check-policy
npm run -s translator:check-annotations

parity/test gates still apply (session replay evidence remains authoritative).
no harness-side suppression of gameplay mismatches.
campaign milestone naming should use shared M0..M6 IDs from IRON_PARITY_PLAN.md when documenting progress.

Iron Parity GitHub workflow:

Create or maintain these issue tiers:
- one campaign tracker epic (IRON_PARITY: Campaign Tracker (M0-M6)),
- one milestone issue per M0..M6,
- subsystem implementation issues linked to the milestone issue.

Use labels:

required: parity, campaign:iron-parity,

one scope label: state translator animation parity-test docs infra,

optional active owner: agent:<name>.

Include evidence-first parity body fields:
- Session/Seed,
- First mismatch (step/index/channel),
- Expected C,
- Actual JS,
- Suspected origin (file:function).
Use dependency links:
- Blocked by #<milestone-or-prereq>
- Blocks #<downstream>

Parity Backlog Intake Loop

Use this workflow whenever session tests are failing and backlog intake needs to be refreshed:

Run the parity suites and capture failures:
```
npm run test:session
```
Group failures by shared first-divergence signature (same subsystem/caller).
File one parity GitHub issue per systematic cluster with evidence-first body:
- session filename(s)
- first mismatch point (step/index/row)
- JS vs C expected behavior
Prioritize issues by:
- blast radius (how many sessions share it)
- earliness (how soon divergence starts)
- leverage (whether one fix likely collapses multiple failures)
Repeat after each landed fix; close stale/superseded issues immediately.

Evidence template for each issue:

Session: seedXYZ_*.session.json
First mismatch: rng step=<...> index=<...> OR screen step=<...> row=<...>
JS: <actual>
C:  <expected>
Caller (if present): <function(file:line)>

Tutorial Parity Notes

Recent parity work on tutorial sessions established a few stable rules:

Tutorial status rows should use Tutorial:<level> instead of Dlvl:<level>.
Tutorial startup/replay should expose Xp-style status output for parity with captured interface sessions.
nh.parse_config("OPTIONS=...") options used by tutorial scripts now feed map flags (mention_walls, mention_decor, lit_corridor) so movement/rendering behavior follows script intent rather than ad-hoc tutorial checks.
Blocked wall movement now keys off mention_walls behavior and matches C tutorial captures (It's a wall.).

With those in place, tutorial interface screen matching is now complete in the manual tutorial session. The remaining first mismatch is RNG-only: an early nhl_random (rn2(100)) divergence immediately after the first tutorial mktrap call.

Inventory-Letter Parity Notes

For C-faithful pickup lettering (assigninvlet behavior):

A dropped item can keep its prior invlet when picked back up, as long as that letter is currently free in inventory.
If that carried-over invlet collides with an in-use letter, inventory-letter assignment falls back to rotated lastinvnr allocation.
This affects visible pickup messages (<invlet> - <item>.) even when RNG remains fully aligned.

Tourist Session Parity Notes (seed6, non-wizard)

Recent work on test/comparison/sessions/seed6_tourist_gameplay.session.json established these practical replay/parity rules:

Throw prompt ?/* in this trace is not generic help; it opens an in-prompt inventory overlay (right-side menu) and keeps prompt flow pending until an explicit dismiss key.
Apply prompt ?/* in this trace also behaves as an in-prompt modal list: while the list is open, non-listed keys are ignored and only listed apply candidate letters should be accepted as selection keys.
Outside that ?/* list mode, apply prompt letter selection is broader than the suggestion list; selecting non-suggested inventory letters can still hit C-style fallback text (Sorry, I don't know how to use that.).
FLINT should not be treated as apply-eligible in normal prompt candidate filtering; this was causing false-positive apply prompts in wizard sessions.
Tourist credit-card apply path is direction-driven (In what direction?); invalid non-wizard direction input must report What a strange direction! Never mind..
In wizard mode, invalid direction input for that apply-direction path is silent (no Never mind. topline).
$ must route to C-style wallet reporting (Your wallet contains N zorkmid(s).).
: on an empty square should report You see no objects here. in this trace.
Throw prompt suggestion letters follow C’s class-filtered set (coins always; weapons when not slinging; gems/stones when slinging; exclude worn/equipped items). This only affects prompt text; manual letter entry is still allowed.
For throw/inventory overlay parity, cap right-side overlay offset at column 41 (offx <= 41) rather than pure cols - maxcol - 2; C tty commonly clamps here for these menu windows.
For unresolved i inventory-menu steps, use captured screen frames as authoritative in replay; JS-only re-rendering can shift overlay columns when item detail text differs ((being worn), tin contents, etc.).
Overlay dismiss must clear the right-side menu region before re-showing the throw prompt, or stale menu rows leak into later captured frames.
Read prompt ?/* is a modal --More-- listing flow; non-dismiss keys keep the listing frame until space/enter/esc returns to the prompt.
In AT_WEAP melee flow, monsters can spend a turn wielding a carried weapon (The goblin wields a crude dagger!) before the first hit roll.
In AT_WEAP melee hit/miss messaging, session parity expects the C-style pre-hit weapon phrase on the same topline as the hit result (for example The goblin thrusts her crude dagger. The goblin hits!), so replay must preserve this pre-hit text in weapon attack flows.
Correct AT_WEAP possessive phrasing depends on monster sex state from creation (mon.female), plus C-style object naming (xname) for the wielded weapon’s appearance name (crude dagger vs discovered object name).
AT_WEAP melee damage must include wielded-weapon dmgval (rnd(sdam)) after base d(1,4) damage; omitting that call shifts later knockback/runmode RNG.
In AT_WEAP ranged flow, monster projectiles must consume minvent stacks and land on floor squares; otherwise later pet dog_goal object scans miss dogfood()->obj_resists calls and RNG diverges downstream.
Potion quaff healing must follow C healup() overflow semantics: when healing exceeds current max HP, increase max HP by potion-specific nxtra and clamp current HP to the new max. Without this, full-HP extra healing quaffs show transient status-row HP drift even when message/RNG flow matches.
--More---split steps and extended-command (#...) typing frames in this session are best handled as capture-authoritative replay frames (screen parity first) when they carry no gameplay state progression.
session_test_runner gameplay divergence step values are 1-based (same indexing used by rng_step_diff --step N), so step numbers can be copied directly between tools without adding/subtracting 1.
Extended-command shorthand Enter synthesis should only apply to letter keys; treating control keys (for example Esc) as shorthand can leak a stray Enter into the input queue and misalign subsequent command prompts.
Double m prefix should cancel silently (clear menuRequested with no message) to match C command-prefix behavior.
Some C captures mix left-side map glyphs and right-side overlay text on the same row (for example inventory category headers). Preserve raw column alignment from core rendering; do not apply tmux col-0 compensation in the comparator.

Measured progress in the latest pass:

First divergence moved from early AT_WEAP messaging drift (step 605) to a late monster-turn RNG boundary (step 760, distfleeck context).
Current metrics: rng=10447/14063, screens=1071/1284, colors=29988/30776.
Current frontier is late-turn monster/replay boundary alignment in the tourist non-wizard session (first visible map drift at step 761).

Modifying the dungeon generator

Make your changes in js/dungeon.js (or related modules)
Run npm run test:session — failures show exactly which cells changed and at which seed/depth

If the change is intentional and matches C, the C reference data doesn’t change. If the C binary also changed, regenerate:

python3 test/comparison/c-harness/run_session.py --from-config
python3 test/comparison/c-harness/gen_map_sessions.py --from-config

Debugging C-vs-JS divergence

“You are hit by a divergent RNG stream! You feel disoriented.”

C map sessions with RNG traces are pre-generated for difficult seeds (configured in test/comparison/seeds.json). The traces include caller function names for readability:

rn2(2)=0 @ randomize_gem_colors(o_init.c:88)
rn2(11)=9 @ shuffle(o_init.c:128)

When a map doesn’t match C:

# The session runner compares per-call and reports the first mismatch:
#   RNG diverges at call 1449: JS="rn2(100)=37" session="rn2(1000)=377"
npm run test:session

# To regenerate C traces after a patch change:
python3 test/comparison/c-harness/gen_map_sessions.py --from-config

Diagnostic Tools for RNG Divergence

Two specialized tools help isolate RNG divergence at specific game turns:

First-response workflow (recommended):

# 1) Reproduce one failing session with caller context on JS RNG entries.
node test/comparison/session_test_runner.js --verbose \
  test/comparison/sessions/seed202_barbarian_wizard.session.json

# 2) Drill into the exact divergent step with a local windowed diff.
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/seed202_barbarian_wizard.session.json \
  --step 16 --window 8

Notes:

Caller tags are on by default in replay/session tooling (@ caller(file:line)).
Parent/grandparent context (<= ... <= ...) is on by default with caller tags.
Set RNG_LOG_TAGS=0 to disable caller tags (faster, shorter logs).
Set RNG_LOG_PARENT=0 to disable parent/grandparent context for shorter lines.
rng_step_diff.js already forces caller tags; export RNG_LOG_TAGS=1 explicitly only when using other runners that override it.

test/comparison/rng_step_diff.js — Step-level C-vs-JS RNG caller diff

Replays a session in JS and compares RNG stream against captured C data. By default it compares a specific step; use --phase startup to compare startup RNG (useful when the first step already starts divergent).

# Inspect first divergence on tutorial accept step
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --step 1 --window 3

# Inspect startup-phase divergence (pre-step RNG drift)
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --phase startup --window 5

# Example output:
# first divergence index=5
# >> [5] JS=rn2(100)=27 | C=rn2(100)=97
#      JS raw: rn2(100)=27 @ percent(sp_lev.js:6607)
#      C  raw: rn2(100)=97 @ nhl_random(nhlua.c:948)

Use when: session_test_runner reports a mismatch and you need exact call-site context at the first divergent RNG call within a specific step.

For tutorial-specific RNG drift, two debug env flags are available:

# Log non-counted raw PRNG advances in JS RNG log output.
WEBHACK_LOG_RAW_ADVANCES=1 \
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --step 1 --window 8

# Override raw draws before first tutorial percent() call.
# Default is N=2; set this env var to compare other values.
WEBHACK_TUT_EXTRA_RAW_BEFORE_PERCENT=0 \
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --step 1 --window 3

selfplay/runner/pet_rng_probe.js — Per-turn RNG delta comparison

Compares RNG call counts between C and JS implementations on a per-turn basis, with filtering for specific subsystems (e.g., dog movement). Runs both C (via tmux) and JS (headless) simultaneously and shows where RNG consumption diverges.

# Compare first 9 turns for seed 13296
node selfplay/runner/pet_rng_probe.js --seed 13296 --turns 9

# Show detailed RNG logs for specific turns
node selfplay/runner/pet_rng_probe.js --seed 13296 --turns 20 --show-turn 7 --show-turn 8

# Output shows per-turn RNG call counts and dog_move specific calls:
# Turn | C rng calls | JS rng calls
#    1 |   37         |   37
#    7 |   12         |   16    <- divergence detected

Use when: RNG traces show divergence but you need to pinpoint exactly which turn and which subsystem (monster movement, item generation, etc.) is responsible.

Throw-Replay Lore (Pet/Throw Parity)

For monster thrown-weapon parity, do not assume one thrwmu() is fully resolved inside one captured input step.

In C sessions, a throw often appears as a multi-step sequence:
- an initial step with top line "The <monster> throws ..." and one rn2(5) at m_throw(mthrowu.c:772),
- later key steps that continue the projectile (rn2(5) again, and sometimes thitu()/dmgval() rolls).
This pattern is easy to see in seed110_samurai_selfplay200.session.json and seed206_monk_wizard.session.json.
Practical implication: if JS resolves full projectile flight/hit/drop in a single turn, it can create false-looking RNG and map glyph drift even when the message text seems close.

selfplay/runner/trace_compare.js — C trace vs JS behavior comparison

Replays a captured C selfplay trace in JS headless mode and compares turn-by-turn behavior (actions, position, HP, dungeon level). Supports position offsets for cases where maps differ slightly but gameplay is similar.

# Compare C trace against JS headless replay
node selfplay/runner/trace_compare.js --trace traces/captured/trace_13296_valkyrie_score43.json

# Compare with position offset adjustment
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --dx 1 --dy 0

# Ignore position mismatches (focus on actions/HP)
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --ignore-position

# Save JS trace for later inspection
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --output /tmp/js_trace.json

# Output shows first 20 mismatches:
# turn 7: diffs=action,position C={"action":"explore","position":{"x":40,"y":11},"hp":14,"hpmax":14,"dlvl":1} JS={"action":"rest","position":{"x":40,"y":12},"hp":14,"hpmax":14,"dlvl":1}

Use when: You have a C selfplay trace showing interesting behavior (combat, prayer, item usage) and want to verify JS reproduces the same decision-making and outcomes.

Typical workflow:

Capture C selfplay trace showing the divergence or interesting behavior
Run trace_compare.js to see when JS behavior diverges from C
Use pet_rng_probe.js to identify which turn RNG consumption differs
Add targeted RNG logging around the suspicious code path
Compare RNG logs to find the extra/missing call

ANSI/Color Parity Gotchas

Recent gameplay color parity work surfaced a few high-impact pitfalls:

For session comparisons, preserve true ANSI source lines.
- test/comparison/session_loader.js:getSessionScreenAnsiLines() must prefer screenAnsi when both screen and screenAnsi are present.
- If this regresses, color checks silently compare against plain text and report misleading fg=7 mismatches.
Headless ANSI export must map color index 8 (NO_COLOR) to SGR 90 (and bg 100), not fall back to 37.
- Missing this mapping produces persistent 7 -> 8 color deltas even when the in-memory screen color grid is correct.
Overlay inventory category headers are inverse-video in C captures.
- Render Weapons/Armor/... heading rows with attr=1 in overlay menus.
Up-stairs (<) use yellow/gold color in captures, while down-stairs (>) remain gray in these flows.
Remembered room floor cells are compared as NO_COLOR tone, while remembered walls/doors retain terrain colors.

CORE vs DISP RNG Audits

For display-only RNG investigations (C rn2_on_display_rng / newsym_rn2 paths), follow the focused playbook in:

docs/plans/RNG_DISPRNG_AUDIT_PLAN.md

Use that workflow before adding any new RNG infrastructure. The default policy is:

port C logic first;
add DISP-specific tracing only when repeated first-divergence evidence points to display paths.

Adding a new test seed

Add the seed to test/comparison/seeds.json:
- map_seeds.with_rng.c for C map sessions with RNG traces
- session_seeds.sessions for full gameplay sessions
- chargen_seeds.sessions for character creation sessions
Regenerate: python3 test/comparison/c-harness/gen_map_sessions.py --from-config
npm run test:session auto-discovers the new file

Character creation sessions

Chargen sessions capture the full interactive character creation sequence for all 13 roles, recording every keystroke (including --More-- as space), screen state, RNG traces, and starting inventory. Configuration is in seeds.json under chargen_seeds:

# Generate all 13 roles
python3 test/comparison/c-harness/gen_chargen_sessions.py --from-config

# Or a single role
python3 test/comparison/c-harness/gen_chargen_sessions.py 42 v h f n Valkyrie

The script adaptively navigates the character creation menus, handling cases where menus are auto-skipped (e.g., Knight has only one valid race and alignment). Each session includes the typGrid and inventory display for comparison with JS.

Regenerating monster/object data

The monster and object tables are auto-generated from C headers:

python3 scripts/generators/gen_monsters.py
python3 scripts/generators/gen_objects.py
python3 scripts/generators/gen_artifacts.py
python3 scripts/generators/gen_constants.py

# Inspect unresolved/deferred header macros with missing dependency details
python3 scripts/generators/gen_constants.py --report-deferred

# Same deferred report as machine-readable JSON
python3 scripts/generators/gen_constants.py --report-deferred-json

# JSON report includes:
# - details[] with missingDeps/rootMissingDeps per deferred macro
# - rootBlockers[] with ownerHint (likely leaf owner)
# - ownerSummary[] and unknownOwnerBlockers[] for ownership coverage checks

# Export report JSON to docs/metrics/deferred_constants_report_latest.json
python3 scripts/generators/export_deferred_constants_report.py

# npm aliases
npm run constants:report
npm run constants:report:write

Converting Lua special levels to JavaScript

NetHack 3.7 uses Lua scripts for special level generation (Castle, Asmodeus, Oracle, etc.). The tools/lua_to_js.py converter translates these to JavaScript modules:

# Convert a single level
python3 tools/lua_to_js.py nethack-c/dat/asmodeus.lua > js/levels/asmodeus.js

# Regenerate all converted levels (131 Lua files → 38 active JS files)
for lua_file in nethack-c/dat/*.lua; do
    base=$(basename "$lua_file" .lua)
    # Convert names: bigrm-XX to bigroom-XX
    js_name=$(echo "$base" | sed 's/^bigrm-/bigroom-/')
    python3 tools/lua_to_js.py "$lua_file" > "js/levels/$js_name.js"
done

What the converter handles

The converter performs careful syntax translation to preserve game semantics:

String handling:

Lua multiline strings [[ ... ]] → JavaScript template literals `...`
Backticks inside multiline strings are escaped: `liberated` → \`liberated\`
Template literals are protected from regex replacements during expression conversion
Regular quoted strings ("...", '...') are preserved as-is

Comments:

Lua comments -- → JavaScript comments //
Comment detection uses string tracking to avoid false matches inside strings

Expression conversion:

String concatenation: .. → +
Logical operators: and → &&, or → ||, not → !
Method calls: obj:method() → obj.method()
Inequality: ~= → !==
Equality: == → ===
Table length: #tbl → tbl.length
Boolean/null: nil → null

Control flow:

for i = 1, n do ... end → for (let i = 1; i <= n; i++) { ... }
if ... then ... end → if (...) { ... }
function name() ... end → function name() { ... }

Data structures:

Arrays: { 1, 2, 3 } → [ 1, 2, 3 ] (simple arrays only)
Objects: { key = value } → { key: value }

Special level DSL:

Preserves des.* calls as-is (same API between Lua and JS)
Handles nested des.map({ ..., contents: function() { ... } }) structures
Maintains proper statement boundaries with depth tracking

Known limitations

Template literals with ${} interpolation syntax would break (none found in NetHack Lua)
Complex nested table expressions may need manual adjustment
Assumes des.* functions have identical signatures between Lua and JS

Debugging converter issues

When a converted file has problems:

Check ASCII maps — dots becoming + means template literal protection failed
Check comments — comments eating code means statement splitting is wrong
Check syntax errors — unbalanced braces usually means multiline collection broke
Run all Lua files — for f in nethack-c/dat/*.lua; do python3 tools/lua_to_js.py "$f" > /tmp/test.js || echo "FAILED: $f"; done

The converter tracks several state machines simultaneously:

String tracking (single/double quote detection)
Brace/paren depth (for multiline call collection)
Template literal extraction (to protect from regex corruption)
Comment context (to avoid converting -- inside strings)

Adding a new C patch

Patches live in test/comparison/c-harness/patches/ and are applied by setup.sh. To add one:

Make changes under nethack-c/.
Export a numbered patch into test/comparison/c-harness/patches/, e.g. cd nethack-c && git diff > ../test/comparison/c-harness/patches/012-your-patch.patch
Run bash test/comparison/c-harness/setup.sh to verify apply/build/install.

The C Harness

“You hear the rumble of distant compilation.”

The C harness builds a patched NetHack 3.7 binary for ground-truth comparison. The C source is frozen at commit 79c688cc6 and never modified directly — only numbered patches in test/comparison/c-harness/patches/ are applied on top (001 through 016 as of 2026-03-02).

Core harness capabilities come from:

001-deterministic-seed.patch — Seed control via NETHACK_SEED.

002-fixed-datetime-for-replay.patch and 011-fix-ubirthday-with-getnow.patch — Fixed datetime support for replay determinism, including shopkeeper-name ubirthday parity.

003-map-dumper.patch — #dumpmap wizard command for raw typ grids.

004-prng-logging.patch, 009-midlog-infrastructure.patch, and 010-lua-rnglog-caller-context.patch — high-fidelity RNG tracing with caller context.

005-obj-dumper.patch and 008-checkpoint-snapshots.patch — object and full-checkpoint state dumps for step-local divergence debugging.

Why raw terrain grids instead of terminal output?

A | on screen could be VWALL, TLWALL, TRWALL, or GRAVE. The raw typ integers are unambiguous. Terminal output also depends on FOV (the player can’t see most of the map), and requires ANSI escape stripping. Integer grids are faster, simpler, and definitive.

Setup gotchas

Lua is required — NetHack 3.7 embeds Lua. setup.sh runs make fetch-lua.
Wizard mode — sysconf must have WIZARDS=*. The script sets this.
Stale game state — Lock files (501wizard.0), saves, and bones from crashed tmux sessions cause “Destroy old game?” prompts. All harness scripts clean these up before each run.
Parallel Lua build race — make -j can race on liblua.a. The script builds Lua separately first.

Event Logging

“You hear a distant clanking sound.”

Event logging tracks game-state mutations (object placement, monster death, pickup/drop, engravings, traps) on both C and JS sides for divergence diagnosis. Events are ^-prefixed lines interleaved with the RNG log.

How it works

C side: The 012-event-logging.patch adds event_log() calls at centralized bottleneck functions (mondead, mpickobj, mdrop_obj, place_object, mkcorpstat, dog_eat, maketrap, deltrap, make_engr_at, del_engr, wipe_engr_at). These write ^event[args] lines to the RNG log file.

JS side: The same bottleneck functions call pushRngLogEntry('^event[args]') to append event entries to the step’s RNG array. The centralized functions live in:

Function	File	Purpose
`mondead(mon, map)`	`js/monutil.js`	Monster death — logs `^die`, drops inventory
`mpickobj(mon, obj)`	`js/monutil.js`	Monster pickup — logs `^pickup`
`mdrop_obj(mon, obj, map)`	`js/monutil.js`	Monster drop — logs `^drop`
`placeFloorObject(map, obj)`	`js/floor_objects.js`	Object on floor — logs `^place`
`removeFloorObject(map, obj)`	`js/floor_objects.js`	Object off floor — logs `^remove`
`make_engr_at(...)`	`js/engrave.js`	Engraving created — logs `^engr`
`del_engr(...)`	`js/engrave.js`	Engraving deleted — logs `^dengr`
`wipe_engr_at(...)`	`js/engrave.js`	Engraving eroded — logs `^wipe`

Event types

Event	Format	Meaning
`^die[mndx@x,y]`	monster index, position	Monster died
`^pickup[mndx@x,y,otyp]`	monster, position, object type	Monster picked up object
`^drop[mndx@x,y,otyp]`	monster, position, object type	Monster dropped object
`^place[otyp,x,y]`	object type, position	Object placed on floor
`^remove[otyp,x,y]`	object type, position	Object removed from floor
`^corpse[corpsenm,x,y]`	corpse monster, position	Corpse created
`^eat[mndx@x,y,otyp]`	monster, position, object type	Monster ate object
`^trap[ttyp,x,y]`	trap type, position	Trap created
`^dtrap[ttyp,x,y]`	trap type, position	Trap deleted
`^engr[type,x,y]`	engrave type, position	Engraving created
`^dengr[x,y]`	position	Engraving deleted
`^wipe[x,y]`	position	Engraving wiped/eroded

Using events for debugging

Events help diagnose state drift — when C and JS RNG diverge because game objects or monsters ended up in different positions. Instead of guessing where state went wrong, compare event sequences to see exactly which object placement or monster action differed.

# Run a session and look at event comparison
node test/comparison/session_test_runner.js --verbose \
  test/comparison/sessions/seed42_gameplay.session.json
# Event mismatches appear in firstDivergences alongside rng/screen channels

Event comparison is informational only — mismatches don’t fail the test. This is intentional: events track state changes that JS may not yet implement identically (e.g., missing monster behaviors), and blocking on them would make RNG parity work harder to iterate on.

Adding new event types

To add a new event type:

C side: Add event_log("newevent[%d,%d]", x, y); at the centralized function in the relevant .c file, and add it to 012-event-logging.patch.
JS side: Add pushRngLogEntry('^newevent[...]') at the corresponding centralized JS function.
Regenerate sessions: python3 test/comparison/c-harness/run_session.py --from-config
Events are automatically recognized by the comparator (any line starting with ^ is treated as an event).

Key design principle: centralized bottlenecks

All event logging happens in centralized bottleneck functions, never at call sites. This mirrors C’s architecture where mondead(), mpickobj(), and mdrop_obj() are the single points through which all deaths, pickups, and drops flow. In JS:

All 10 monster death sites call mondead(mon, map) — never set mon.dead = true directly.
All monster pickup sites call mpickobj(mon, obj) — never call addToMonsterInventory directly for gameplay pickups.
All monster drop sites call mdrop_obj(mon, obj, map) — never splice from minvent directly for gameplay drops.

Note: mondead drops inventory via placeFloorObject (producing ^place events), NOT via mdrop_obj (which would produce ^drop events). This matches C where relobj() calls place_object() directly.

Architecture in 60 Seconds

“You read a blessed scroll of enlightenment.”

Game loop: js/nethack.js runs an async loop. Player input uses await getChar() which yields to the browser event loop. Every function that might need input must be async.

PRNG: js/isaac64.js produces bit-identical uint64 sequences to C. js/rng.js wraps it with rn2(), rnd(), d(), etc. The RNG log (enableRngLog() / getRngLog()) captures every call for comparison.

Level generation: initRng(seed) → initLevelGeneration() → makelevel(depth) → wallification(map). One continuous RNG stream across depths.

Display: <pre> element with per-cell <span> tags. DEC graphics symbols mapped to Unicode box-drawing characters. No canvas, no WebGL.

Testing philosophy: Two layers of truth —

ISAAC64 produces identical sequences (golden reference files)
JS matches C cell-for-cell (C-captured sessions with RNG traces)

Code Conventions

C references: Every ported function has // C ref: filename.c:function_name()
ES6 modules: No build step, no bundler. Import directly in browser.
No frameworks: Vanilla JS, vanilla DOM. The game ran in 1987 without React.
Constants match C: STONE, VWALL, ROOM, etc. are identical values. See js/const.js.

Current Parity Findings (2026-02-18)

npm run test:session currently reports a concentrated gameplay/wizard failure set (26 gameplay-session failures in the latest intake pass).
Initial backlog intake from that pass is tracked in parity issues:
- #6 wizard command-flow prompt cancellation/modal consumption
- #7 wait/search safety counted no-op timing and messaging
- #8 pet combat sequencing/messages/RNG (dog_move/mattackm)
- #9 special-level generation RNG drift (dig_corridor/somex/makelevel)
- #10 object generation RNG ordering (rnd_attr/mksobj/mkobj/m_initweap)
- #11 gameplay map/glyph drift tied to pet/interactions
Working rule: treat each issue above as a cluster root; avoid ad-hoc one-session fixes unless evidence shows it is truly isolated.

Development Guide

Prerequisites

Practical Setup (This Repo Runtime)

Example parse summary

File-wide translation capability summary

Multi-file capability matrix (for scale planning)

Note: default excludes apply from

tools/c_translator/rulesets/translation_scope_excluded_sources.json

(tests/fixtures + non-gameplay C subsystems). Use –no-exclude-sources

when intentionally running fixture-only translation checks.

Batch emit-helper generation (hundreds-scale sweeps)

Select stitch-ready candidates from a batch summary

Optional: allow specific diag codes (for example CFG complexity review queue)

Find clean candidates that already map to exported runtime JS functions

Optional: override default translator-scope excludes

Heuristic safety lint for runtime candidates (unknown callee detection)

Apply runtime-safe candidates into JS modules (dry run by default)

Write stitched updates

Optional: skip known-bad auto-translations while stitching

Optional: strict allowlist stitch (only listed pairs will be applied)

Build a refactor queue from rejected safety/signature candidates

(capture stitch dry-run JSON first)

Hunt non-mechanical aliases and missing import/binding candidates

Audit currently-marked autotranslations against current pipeline categories

Audit C out-param and Sprintf/Snprintf patterns from batch metadata

Quick Start

Manual Keylog Recording (Canonical)

Project Structure

Running Tests

Test Tiers

Individual Test Commands

Systematic Stall CPU Diagnosis

Fast Parity Triage Loop (Comparison Artifacts)

Replay Boundary (Core vs Harness)

Session Tests In Detail

C Comparison (optional, slower)

Common Development Tasks

Unified Backlog Intake

Modal Guard (Single-Threaded Contract)

C Parity Policy

Iron Parity Campaign Workflow

Parity Backlog Intake Loop

Tutorial Parity Notes

Inventory-Letter Parity Notes

Tourist Session Parity Notes (seed6, non-wizard)

Modifying the dungeon generator

Debugging C-vs-JS divergence

Diagnostic Tools for RNG Divergence

Throw-Replay Lore (Pet/Throw Parity)

ANSI/Color Parity Gotchas

CORE vs DISP RNG Audits

Adding a new test seed

Character creation sessions

Regenerating monster/object data

Converting Lua special levels to JavaScript

What the converter handles

Known limitations

Debugging converter issues

Adding a new C patch

The C Harness

Why raw terrain grids instead of terminal output?

Setup gotchas

Event Logging

How it works

Event types

Using events for debugging

Adding new event types

Key design principle: centralized bottlenecks

Architecture in 60 Seconds

Code Conventions

Current Parity Findings (2026-02-18)

Further Reading