Development Guide

“Welcome, strider, to the Mazes of Development! You fall through a trap door into a large room filled with source code.”

See also: DESIGN.md (architecture) | DECISIONS.md (trade-offs) | PARITY_TEST_MATRIX.md (test suites & gates) | COMPARISON_PIPELINE.md (recorder/comparator flow) | LORE.md (porting lessons) | TESTING.md (test dashboard & workflows)

Prerequisites

For C comparison testing (optional):

Practical Setup (This Repo Runtime)

Use this as the default command setup for development and translator work in this repository.

  1. Check core tools:
    node -v
    npm -v
    python3 -V
    conda --version
    
  2. Use conda Python for translator tooling:
    # Confirm conda base has pip + clang bindings
    conda run -n base python -m pip --version
    conda run -n base python -c "import clang, clang.cindex; print('clang ok')"
    
  3. Run translator commands with conda Python: ```bash conda run -n base python tools/c_translator/main.py –help

Example parse summary

conda run -n base python tools/c_translator/main.py
–src nethack-c/src/hack.c
–emit parse-summary
–out /tmp/hack.parse.json

File-wide translation capability summary

conda run -n base python tools/c_translator/main.py
–src nethack-c/src/hack.c
–emit capability-summary
–out /tmp/hack.capability.json

Multi-file capability matrix (for scale planning)

conda run -n base python tools/c_translator/capability_matrix.py
–src nethack-c/src/hack.c
–src nethack-c/src/monmove.c
–src nethack-c/src/zap.c
–out /tmp/translator.capability.matrix.json

Note: default excludes apply from

tools/c_translator/rulesets/translation_scope_excluded_sources.json

(tests/fixtures + non-gameplay C subsystems). Use –no-exclude-sources

when intentionally running fixture-only translation checks.

Batch emit-helper generation (hundreds-scale sweeps)

conda run -n base python tools/c_translator/batch_emit.py
–src nethack-c/src/hack.c
–src nethack-c/src/allmain.c
–src nethack-c/src/getpos.c
–out-dir /tmp/translator-batch
–summary-out /tmp/translator-batch-summary.json

Select stitch-ready candidates from a batch summary

conda run -n base python tools/c_translator/select_candidates.py
–summary /tmp/translator-batch-summary.json
–out /tmp/translator-batch-candidates.json

Optional: allow specific diag codes (for example CFG complexity review queue)

conda run -n base python tools/c_translator/select_candidates.py
–summary /tmp/translator-batch-summary.json
–allow-diag CFG_COMPLEXITY
–out /tmp/translator-batch-candidates-plus-cfg.json

Find clean candidates that already map to exported runtime JS functions

conda run -n base python tools/c_translator/runtime_stitch_candidates.py
–summary /tmp/translator-batch-summary.json
–out /tmp/translator-runtime-stitch-candidates.json

Optional: override default translator-scope excludes

conda run -n base python tools/c_translator/runtime_stitch_candidates.py
–summary /tmp/translator-batch-summary.json
–exclude-sources-file tools/c_translator/rulesets/translation_scope_excluded_sources.json
–out /tmp/translator-runtime-stitch-candidates-gameplay.json

Heuristic safety lint for runtime candidates (unknown callee detection)

conda run -n base python tools/c_translator/runtime_candidate_safety.py
–candidates /tmp/translator-runtime-stitch-candidates.json
–out /tmp/translator-runtime-stitch-safety.json

Apply runtime-safe candidates into JS modules (dry run by default)

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .

Write stitched updates

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–write

Optional: skip known-bad auto-translations while stitching

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–denylist tools/c_translator/runtime_stitch_denylist.json
–write

Optional: strict allowlist stitch (only listed pairs will be applied)

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–allowlist /tmp/translator-allowlist.json
–write

Build a refactor queue from rejected safety/signature candidates

(capture stitch dry-run JSON first)

conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–denylist tools/c_translator/runtime_stitch_denylist.json \

/tmp/translator-runtime-stitch-apply.json conda run -n base python tools/c_translator/refactor_queue.py
–safety /tmp/translator-runtime-stitch-safety.json
–apply-summary /tmp/translator-runtime-stitch-apply.json
–out /tmp/translator-refactor-queue.json

Hunt non-mechanical aliases and missing import/binding candidates

conda run -n base python tools/c_translator/identifier_hunt.py
–queue /tmp/translator-refactor-queue.json
–out /tmp/translator-identifier-hunt.json

Audit currently-marked autotranslations against current pipeline categories

conda run -n base python tools/c_translator/audit_marked_autotranslations.py
–repo-root .
–summary /tmp/translator-batch-full-summary.json
–candidates /tmp/translator-runtime-stitch-candidates-full.json
–safety /tmp/translator-runtime-stitch-safety-full.json
–apply-summary /tmp/translator-runtime-stitch-apply-full.json
–out /tmp/marked-autotranslation-audit.json

Audit C out-param and Sprintf/Snprintf patterns from batch metadata

conda run -n base python tools/c_translator/audit_outparams_and_formatting.py
–summary /tmp/translator-batch-full-summary.json
–safety /tmp/translator-runtime-stitch-safety-full.json
–out /tmp/outparam-format-audit.json


Notes:
- `runtime_candidate_safety.py` now auto-detects strict alias matches where a
  C identifier differs only by case/underscore from an existing module symbol.
- It also consumes curated non-mechanical alias rules from
  `tools/c_translator/rulesets/identifier_aliases.json`.
- It now also rejects known semantic trap patterns even when syntax is valid:
  pointer-style truthy loops, NUL-sentinel scalar writes, and whole-string
  `highc/lowc` rewrites.
- It also supports module-level semantic blocking via
  `tools/c_translator/rulesets/semantic_block_modules.json` for files whose C
  pointer/string idioms are not yet safely lowerable to JS.
- `refactor_queue.py` emits these as `rename_alias` tasks so we can prioritize
  canonical renames separately from true missing identifiers.

4. Translator policy/annotation checks (Node scripts):
```bash
npm run -s translator:check-policy
npm run -s translator:check-annotations
  1. Core parity test loops:
    npm run -s test:unit
    npm run -s test:session
    # C-parity-session-only coverage report
    npm run -s coverage:session-parity
    

Coverage details:

Notes:

Quick Start

# Install dependencies
npm install

# Run the game locally
npm run serve
# Open http://localhost:8080

# Run all fast tests (unit + session)
npm test

# Run everything (unit + session + e2e)
npm run test:all

Manual Keylog Recording (Canonical)

For manual C keylog recordings used by parity fixtures, use the recorder wrapper instead of launching nethack directly. The wrapper sets deterministic env, creates a controlled .nethackrc, and cleans stale save/lock/bones state.

# Tutorial manual recording example
./record-manual-capture.sh \
  --seed=7 \
  --role=Wizard \
  --race=human \
  --gender=male \
  --align=neutral \
  --no-wizard \
  --tutorial \
  --tmux-socket=default \
  --keylog=test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl

Then convert to a v3 comparison session:

python3 test/comparison/c-harness/keylog_to_session.py \
  --in test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl \
  --out test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json \
  --startup-mode auto \
  --tutorial on \
  --wizard auto

When conversion replay itself is under investigation, capture a v3 session directly during manual play (no second replay pass):

python3 test/comparison/c-harness/record_manual_session_v3.py \
  --seed=8 \
  --name=Tutes \
  --role=Wizard \
  --race=human \
  --gender=male \
  --align=neutral \
  --symset=DECgraphics \
  --tutorial-option=unset

For reproducible C-only capture from an existing keylog, use integrated autofeed mode (exact-byte tmux injection) instead of a separate feeder:

python3 test/comparison/c-harness/record_manual_session_v3.py \
  --autofeed \
  --seed=7 \
  --name=Tutes \
  --role=Wizard \
  --race=human \
  --gender=male \
  --align=neutral \
  --symset=DECgraphics \
  --tutorial-option=on \
  --autofeed-keylog=test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl \
  --keylog=test/comparison/keylogs/seed7_tutorial_autofeed_direct.jsonl \
  --output-session=test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json

The command reports key integrity (AUTOFEED_KEY_MISMATCH_AT=...) so you can verify captured keys match the source stream.

To verify tutorial coverage quickly:

node scripts/tutorial-coverage.mjs \
  test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json

Project Structure

See DESIGN.md for the complete module architecture and C-to-JS correspondence mapping. This guide focuses on workflows and commands.

For a quick reference: the js/ directory contains 32 ES6 modules organized by subsystem (Core, Display & I/O, RNG, World Generation, Creatures, Objects, etc.), each with comments linking to C source files. The test/ directory contains unit tests, E2E browser tests, and C-comparison session tests.

Running Tests

“You feel as if someone is testing you.”

Test Tiers

npm test            # Core: ~12s — 20 unit files + 18 high-signal sessions
npm run test:all    # Full: ~90s — all 3700+ unit tests + 570 sessions

npm test is fast enough to run after every edit. npm run test:all before pushing.

Individual Test Commands

# Unit tests — module-level correctness
npm run test:unit

# Core sessions only (no units)
npm run test:core

# E2E tests — browser rendering via Puppeteer
npm run test:e2e

# Session comparison — replay C reference sessions
npm run test:session

# Single session
node test/comparison/session_test_runner.js <session-path>

# Dump raw JS replay trace from a C gameplay session
npm run replay:dump -- test/comparison/sessions/<file>.session.json --out /tmp/<file>.js-replay.json

# Compare C gameplay session vs JS replay (generated or --js file)
npm run session:compare -- test/comparison/sessions/<file>.session.json

Timeout policy (hang detection):

Systematic Stall CPU Diagnosis

When a session times out and appears CPU-bound/live-locked, use the profiler wrapper:

node scripts/replay_stall_diagnose.mjs \
  --session seed325_knight_wizard_gameplay \
  --timeout-ms 12000 \
  --top 20

What it does:

This makes hotspot triage repeatable across agents and avoids ad-hoc manual profiling.

Fast Parity Triage Loop (Comparison Artifacts)

Use this loop when chasing a specific seed/session divergence.

  1. Reproduce one session and emit comparison artifacts:
    node test/comparison/session_test_runner.js --verbose \
      test/comparison/sessions/seed110_samurai_selfplay200_gameplay.session.json
    
  2. List artifacts in the latest run:
    node scripts/comparison-window.mjs --list
    
  3. Inspect first divergence windows (RNG/event):
    node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay --channel rng --window 12
    node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay --channel event --window 12
    
  4. Inspect an explicit index:
    node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay \
      --channel event --index 1076 --window 10
    
  5. Inspect per-step turn-accounting drift (RNG/event counts):
    node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay \
      --step-summary --step-from 186 --step-to 200
    

Notes:

Timing-window triage:

Replay Boundary (Core vs Harness)

Keep gameplay and UI semantics in core runtime modules (js/), not in comparison orchestration.

Session Tests In Detail

The session runner auto-discovers all *.session.json files in test/comparison/sessions/ and test/comparison/maps/ and verifies JS output against C-captured reference data:

Session Type What It Tests Example
"map" (source: c) typGrid match + RNG traces + structural validation seed16_maps_c.session.json
"gameplay" Startup typGrid + per-step RNG traces + screen rendering seed42.session.json
"chargen" Character creation reference data (screens, RNG, inventory) seed42_chargen_valkyrie.session.json

Map sessions generate levels 1→5 sequentially on one RNG stream (matching C’s behavior). Each level is checked for:

Gameplay parity uses an explicit two-phase architecture:

  1. Recorder: run JS from C-captured input keys and capture raw JS trace.
  2. Comparator: compare recorded JS trace to C session using comparator policy.

See COMPARISON_PIPELINE.md for module-level details.

C Comparison (optional, slower)

# One-time setup: clone, patch, and build the C binary
bash test/comparison/c-harness/setup.sh

# Regenerate all C sessions and maps from seeds.json config
python3 test/comparison/c-harness/run_session.py --from-config
python3 test/comparison/c-harness/gen_map_sessions.py --from-config

# Generate character creation sessions for all 13 roles
python3 test/comparison/c-harness/gen_chargen_sessions.py --from-config

# Or capture a single seed/role manually
python3 test/comparison/c-harness/gen_map_sessions.py 42 5 --with-rng
python3 test/comparison/c-harness/gen_chargen_sessions.py 42 v h f n Valkyrie

# Session runner auto-discovers all C-captured files
npm run test:session

Common Development Tasks

Unified Backlog Intake

Use one project backlog for all work, with labels for classification.

  1. Capture candidate issues from:
    • failing tests/sessions and CI regressions,
    • C-to-JS audit/coverage gaps,
    • manual playtesting findings,
    • selfplay findings,
    • release blockers and user/developer bug reports.
  2. Classify every new issue with labels.
    • Use parity for C-vs-JS divergence/parity work.
    • Add other domain labels as appropriate (selfplay, infra, docs, etc.).
  3. Keep new issues unowned by default.
    • Add agent:<name> only when an agent actively claims the issue.
  4. Use evidence-first issue bodies for parity issues.
    • Include seed/session/command, first mismatch point, and expected vs actual behavior.

C NetHack is single-threaded. When more() calls wgetch(), the CPU blocks and nothing else executes until the key is pressed. In JS, await nhgetch() yields the event loop, allowing any pending Promise continuation to fire — breaking the single-threaded contract.

js/modal_guard.js enforces the C contract with runtime assertions:

The most common bug this catches: an async function called without await. The orphaned Promise’s continuation fires during an unrelated await yield later, violating execution ordering.

# Enabled by default. Disable for production if needed:
WEBHACK_MODAL_GUARD=0 node ...

# Enable entry stack traces for debugging (collects Error().stack on
# every enterModal call — expensive, off by default):
WEBHACK_MODAL_GUARD_TRACE=1 node ...

Nested modals are valid and supported (stack-based) — e.g., more() inside putstr_message inside ynFunction. The guard catches only non-modal game code running during a modal wait.

C Parity Policy

When working on C-vs-JS parity, follow this rule:

Iron Parity Campaign Workflow

For state-canonicalization and translator campaign work, use the Iron Parity docs as the planning authority:

  1. IRON_PARITY_PLAN.md
  2. STRUCTURES.md
  3. C_TRANSLATOR_ARCHITECTURE_SPEC.md
  4. C_TRANSLATOR_PARSER_IMPLEMENTATION_SPEC.md

Required gating expectations for campaign changes:

  1. policy classification remains complete for all js/*.js files:
    npm run -s translator:check-policy
    npm run -s translator:check-annotations
    
  2. parity/test gates still apply (session replay evidence remains authoritative).
  3. no harness-side suppression of gameplay mismatches.
  4. campaign milestone naming should use shared M0..M6 IDs from IRON_PARITY_PLAN.md when documenting progress.

Iron Parity GitHub workflow:

  1. Create or maintain these issue tiers:
    • one campaign tracker epic (IRON_PARITY: Campaign Tracker (M0-M6)),
    • one milestone issue per M0..M6,
    • subsystem implementation issues linked to the milestone issue.
  2. Use labels:
    • required: parity, campaign:iron-parity,
    • one scope label: state translator animation parity-test docs infra,
    • optional active owner: agent:<name>.
  3. Include evidence-first parity body fields:
    • Session/Seed,
    • First mismatch (step/index/channel),
    • Expected C,
    • Actual JS,
    • Suspected origin (file:function).
  4. Use dependency links:
    • Blocked by #<milestone-or-prereq>
    • Blocks #<downstream>

Parity Backlog Intake Loop

Use this workflow whenever session tests are failing and backlog intake needs to be refreshed:

  1. Run the parity suites and capture failures:
    npm run test:session
    
  2. Group failures by shared first-divergence signature (same subsystem/caller).
  3. File one parity GitHub issue per systematic cluster with evidence-first body:
    • session filename(s)
    • first mismatch point (step/index/row)
    • JS vs C expected behavior
  4. Prioritize issues by:
    • blast radius (how many sessions share it)
    • earliness (how soon divergence starts)
    • leverage (whether one fix likely collapses multiple failures)
  5. Repeat after each landed fix; close stale/superseded issues immediately.

Evidence template for each issue:

Session: seedXYZ_*.session.json
First mismatch: rng step=<...> index=<...> OR screen step=<...> row=<...>
JS: <actual>
C:  <expected>
Caller (if present): <function(file:line)>

Tutorial Parity Notes

Recent parity work on tutorial sessions established a few stable rules:

With those in place, tutorial interface screen matching is now complete in the manual tutorial session. The remaining first mismatch is RNG-only: an early nhl_random (rn2(100)) divergence immediately after the first tutorial mktrap call.

Inventory-Letter Parity Notes

For C-faithful pickup lettering (assigninvlet behavior):

Tourist Session Parity Notes (seed6, non-wizard)

Recent work on test/comparison/sessions/seed6_tourist_gameplay.session.json established these practical replay/parity rules:

Measured progress in the latest pass:

Modifying the dungeon generator

  1. Make your changes in js/dungeon.js (or related modules)
  2. Run npm run test:session — failures show exactly which cells changed and at which seed/depth
  3. If the change is intentional and matches C, the C reference data doesn’t change. If the C binary also changed, regenerate:
    python3 test/comparison/c-harness/run_session.py --from-config
    python3 test/comparison/c-harness/gen_map_sessions.py --from-config
    

Debugging C-vs-JS divergence

“You are hit by a divergent RNG stream! You feel disoriented.”

C map sessions with RNG traces are pre-generated for difficult seeds (configured in test/comparison/seeds.json). The traces include caller function names for readability:

rn2(2)=0 @ randomize_gem_colors(o_init.c:88)
rn2(11)=9 @ shuffle(o_init.c:128)

When a map doesn’t match C:

# The session runner compares per-call and reports the first mismatch:
#   RNG diverges at call 1449: JS="rn2(100)=37" session="rn2(1000)=377"
npm run test:session

# To regenerate C traces after a patch change:
python3 test/comparison/c-harness/gen_map_sessions.py --from-config

Diagnostic Tools for RNG Divergence

Two specialized tools help isolate RNG divergence at specific game turns:

First-response workflow (recommended):

# 1) Reproduce one failing session with caller context on JS RNG entries.
node test/comparison/session_test_runner.js --verbose \
  test/comparison/sessions/seed202_barbarian_wizard.session.json

# 2) Drill into the exact divergent step with a local windowed diff.
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/seed202_barbarian_wizard.session.json \
  --step 16 --window 8

Notes:

test/comparison/rng_step_diff.js — Step-level C-vs-JS RNG caller diff

Replays a session in JS and compares RNG stream against captured C data. By default it compares a specific step; use --phase startup to compare startup RNG (useful when the first step already starts divergent).

# Inspect first divergence on tutorial accept step
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --step 1 --window 3

# Inspect startup-phase divergence (pre-step RNG drift)
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --phase startup --window 5

# Example output:
# first divergence index=5
# >> [5] JS=rn2(100)=27 | C=rn2(100)=97
#      JS raw: rn2(100)=27 @ percent(sp_lev.js:6607)
#      C  raw: rn2(100)=97 @ nhl_random(nhlua.c:948)

Use when: session_test_runner reports a mismatch and you need exact call-site context at the first divergent RNG call within a specific step.

For tutorial-specific RNG drift, two debug env flags are available:

# Log non-counted raw PRNG advances in JS RNG log output.
WEBHACK_LOG_RAW_ADVANCES=1 \
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --step 1 --window 8

# Override raw draws before first tutorial percent() call.
# Default is N=2; set this env var to compare other values.
WEBHACK_TUT_EXTRA_RAW_BEFORE_PERCENT=0 \
node test/comparison/rng_step_diff.js \
  test/comparison/sessions/manual/interface_tutorial.session.json \
  --step 1 --window 3

selfplay/runner/pet_rng_probe.js — Per-turn RNG delta comparison

Compares RNG call counts between C and JS implementations on a per-turn basis, with filtering for specific subsystems (e.g., dog movement). Runs both C (via tmux) and JS (headless) simultaneously and shows where RNG consumption diverges.

# Compare first 9 turns for seed 13296
node selfplay/runner/pet_rng_probe.js --seed 13296 --turns 9

# Show detailed RNG logs for specific turns
node selfplay/runner/pet_rng_probe.js --seed 13296 --turns 20 --show-turn 7 --show-turn 8

# Output shows per-turn RNG call counts and dog_move specific calls:
# Turn | C rng calls | JS rng calls
#    1 |   37         |   37
#    7 |   12         |   16    <- divergence detected

Use when: RNG traces show divergence but you need to pinpoint exactly which turn and which subsystem (monster movement, item generation, etc.) is responsible.

Throw-Replay Lore (Pet/Throw Parity)

For monster thrown-weapon parity, do not assume one thrwmu() is fully resolved inside one captured input step.

selfplay/runner/trace_compare.js — C trace vs JS behavior comparison

Replays a captured C selfplay trace in JS headless mode and compares turn-by-turn behavior (actions, position, HP, dungeon level). Supports position offsets for cases where maps differ slightly but gameplay is similar.

# Compare C trace against JS headless replay
node selfplay/runner/trace_compare.js --trace traces/captured/trace_13296_valkyrie_score43.json

# Compare with position offset adjustment
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --dx 1 --dy 0

# Ignore position mismatches (focus on actions/HP)
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --ignore-position

# Save JS trace for later inspection
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --output /tmp/js_trace.json

# Output shows first 20 mismatches:
# turn 7: diffs=action,position C={"action":"explore","position":{"x":40,"y":11},"hp":14,"hpmax":14,"dlvl":1} JS={"action":"rest","position":{"x":40,"y":12},"hp":14,"hpmax":14,"dlvl":1}

Use when: You have a C selfplay trace showing interesting behavior (combat, prayer, item usage) and want to verify JS reproduces the same decision-making and outcomes.

Typical workflow:

  1. Capture C selfplay trace showing the divergence or interesting behavior
  2. Run trace_compare.js to see when JS behavior diverges from C
  3. Use pet_rng_probe.js to identify which turn RNG consumption differs
  4. Add targeted RNG logging around the suspicious code path
  5. Compare RNG logs to find the extra/missing call

ANSI/Color Parity Gotchas

Recent gameplay color parity work surfaced a few high-impact pitfalls:

CORE vs DISP RNG Audits

For display-only RNG investigations (C rn2_on_display_rng / newsym_rn2 paths), follow the focused playbook in:

Use that workflow before adding any new RNG infrastructure. The default policy is:

Adding a new test seed

  1. Add the seed to test/comparison/seeds.json:
    • map_seeds.with_rng.c for C map sessions with RNG traces
    • session_seeds.sessions for full gameplay sessions
    • chargen_seeds.sessions for character creation sessions
  2. Regenerate: python3 test/comparison/c-harness/gen_map_sessions.py --from-config
  3. npm run test:session auto-discovers the new file

Character creation sessions

Chargen sessions capture the full interactive character creation sequence for all 13 roles, recording every keystroke (including --More-- as space), screen state, RNG traces, and starting inventory. Configuration is in seeds.json under chargen_seeds:

# Generate all 13 roles
python3 test/comparison/c-harness/gen_chargen_sessions.py --from-config

# Or a single role
python3 test/comparison/c-harness/gen_chargen_sessions.py 42 v h f n Valkyrie

The script adaptively navigates the character creation menus, handling cases where menus are auto-skipped (e.g., Knight has only one valid race and alignment). Each session includes the typGrid and inventory display for comparison with JS.

Regenerating monster/object data

The monster and object tables are auto-generated from C headers:

python3 scripts/generators/gen_monsters.py
python3 scripts/generators/gen_objects.py
python3 scripts/generators/gen_artifacts.py
python3 scripts/generators/gen_constants.py

# Inspect unresolved/deferred header macros with missing dependency details
python3 scripts/generators/gen_constants.py --report-deferred

# Same deferred report as machine-readable JSON
python3 scripts/generators/gen_constants.py --report-deferred-json

# JSON report includes:
# - details[] with missingDeps/rootMissingDeps per deferred macro
# - rootBlockers[] with ownerHint (likely leaf owner)
# - ownerSummary[] and unknownOwnerBlockers[] for ownership coverage checks

# Export report JSON to docs/metrics/deferred_constants_report_latest.json
python3 scripts/generators/export_deferred_constants_report.py

# npm aliases
npm run constants:report
npm run constants:report:write

Converting Lua special levels to JavaScript

NetHack 3.7 uses Lua scripts for special level generation (Castle, Asmodeus, Oracle, etc.). The tools/lua_to_js.py converter translates these to JavaScript modules:

# Convert a single level
python3 tools/lua_to_js.py nethack-c/dat/asmodeus.lua > js/levels/asmodeus.js

# Regenerate all converted levels (131 Lua files → 38 active JS files)
for lua_file in nethack-c/dat/*.lua; do
    base=$(basename "$lua_file" .lua)
    # Convert names: bigrm-XX to bigroom-XX
    js_name=$(echo "$base" | sed 's/^bigrm-/bigroom-/')
    python3 tools/lua_to_js.py "$lua_file" > "js/levels/$js_name.js"
done

What the converter handles

The converter performs careful syntax translation to preserve game semantics:

String handling:

Comments:

Expression conversion:

Control flow:

Data structures:

Special level DSL:

Known limitations

Debugging converter issues

When a converted file has problems:

  1. Check ASCII maps — dots becoming + means template literal protection failed
  2. Check comments — comments eating code means statement splitting is wrong
  3. Check syntax errors — unbalanced braces usually means multiline collection broke
  4. Run all Lua filesfor f in nethack-c/dat/*.lua; do python3 tools/lua_to_js.py "$f" > /tmp/test.js || echo "FAILED: $f"; done

The converter tracks several state machines simultaneously:

Adding a new C patch

Patches live in test/comparison/c-harness/patches/ and are applied by setup.sh. To add one:

  1. Make changes under nethack-c/.
  2. Export a numbered patch into test/comparison/c-harness/patches/, e.g. cd nethack-c && git diff > ../test/comparison/c-harness/patches/012-your-patch.patch
  3. Run bash test/comparison/c-harness/setup.sh to verify apply/build/install.

The C Harness

“You hear the rumble of distant compilation.”

The C harness builds a patched NetHack 3.7 binary for ground-truth comparison. The C source is frozen at commit 79c688cc6 and never modified directly — only numbered patches in test/comparison/c-harness/patches/ are applied on top (001 through 016 as of 2026-03-02).

Core harness capabilities come from:

001-deterministic-seed.patch — Seed control via NETHACK_SEED.

002-fixed-datetime-for-replay.patch and 011-fix-ubirthday-with-getnow.patch — Fixed datetime support for replay determinism, including shopkeeper-name ubirthday parity.

003-map-dumper.patch#dumpmap wizard command for raw typ grids.

004-prng-logging.patch, 009-midlog-infrastructure.patch, and 010-lua-rnglog-caller-context.patch — high-fidelity RNG tracing with caller context.

005-obj-dumper.patch and 008-checkpoint-snapshots.patch — object and full-checkpoint state dumps for step-local divergence debugging.

Why raw terrain grids instead of terminal output?

A | on screen could be VWALL, TLWALL, TRWALL, or GRAVE. The raw typ integers are unambiguous. Terminal output also depends on FOV (the player can’t see most of the map), and requires ANSI escape stripping. Integer grids are faster, simpler, and definitive.

Setup gotchas

Event Logging

“You hear a distant clanking sound.”

Event logging tracks game-state mutations (object placement, monster death, pickup/drop, engravings, traps) on both C and JS sides for divergence diagnosis. Events are ^-prefixed lines interleaved with the RNG log.

How it works

C side: The 012-event-logging.patch adds event_log() calls at centralized bottleneck functions (mondead, mpickobj, mdrop_obj, place_object, mkcorpstat, dog_eat, maketrap, deltrap, make_engr_at, del_engr, wipe_engr_at). These write ^event[args] lines to the RNG log file.

JS side: The same bottleneck functions call pushRngLogEntry('^event[args]') to append event entries to the step’s RNG array. The centralized functions live in:

Function File Purpose
mondead(mon, map) js/monutil.js Monster death — logs ^die, drops inventory
mpickobj(mon, obj) js/monutil.js Monster pickup — logs ^pickup
mdrop_obj(mon, obj, map) js/monutil.js Monster drop — logs ^drop
placeFloorObject(map, obj) js/floor_objects.js Object on floor — logs ^place
removeFloorObject(map, obj) js/floor_objects.js Object off floor — logs ^remove
make_engr_at(...) js/engrave.js Engraving created — logs ^engr
del_engr(...) js/engrave.js Engraving deleted — logs ^dengr
wipe_engr_at(...) js/engrave.js Engraving eroded — logs ^wipe

Event types

Event Format Meaning
^die[mndx@x,y] monster index, position Monster died
^pickup[mndx@x,y,otyp] monster, position, object type Monster picked up object
^drop[mndx@x,y,otyp] monster, position, object type Monster dropped object
^place[otyp,x,y] object type, position Object placed on floor
^remove[otyp,x,y] object type, position Object removed from floor
^corpse[corpsenm,x,y] corpse monster, position Corpse created
^eat[mndx@x,y,otyp] monster, position, object type Monster ate object
^trap[ttyp,x,y] trap type, position Trap created
^dtrap[ttyp,x,y] trap type, position Trap deleted
^engr[type,x,y] engrave type, position Engraving created
^dengr[x,y] position Engraving deleted
^wipe[x,y] position Engraving wiped/eroded

Using events for debugging

Events help diagnose state drift — when C and JS RNG diverge because game objects or monsters ended up in different positions. Instead of guessing where state went wrong, compare event sequences to see exactly which object placement or monster action differed.

# Run a session and look at event comparison
node test/comparison/session_test_runner.js --verbose \
  test/comparison/sessions/seed42_gameplay.session.json
# Event mismatches appear in firstDivergences alongside rng/screen channels

Event comparison is informational only — mismatches don’t fail the test. This is intentional: events track state changes that JS may not yet implement identically (e.g., missing monster behaviors), and blocking on them would make RNG parity work harder to iterate on.

Adding new event types

To add a new event type:

  1. C side: Add event_log("newevent[%d,%d]", x, y); at the centralized function in the relevant .c file, and add it to 012-event-logging.patch.
  2. JS side: Add pushRngLogEntry('^newevent[...]') at the corresponding centralized JS function.
  3. Regenerate sessions: python3 test/comparison/c-harness/run_session.py --from-config
  4. Events are automatically recognized by the comparator (any line starting with ^ is treated as an event).

Key design principle: centralized bottlenecks

All event logging happens in centralized bottleneck functions, never at call sites. This mirrors C’s architecture where mondead(), mpickobj(), and mdrop_obj() are the single points through which all deaths, pickups, and drops flow. In JS:

Note: mondead drops inventory via placeFloorObject (producing ^place events), NOT via mdrop_obj (which would produce ^drop events). This matches C where relobj() calls place_object() directly.

Architecture in 60 Seconds

“You read a blessed scroll of enlightenment.”

Game loop: js/nethack.js runs an async loop. Player input uses await getChar() which yields to the browser event loop. Every function that might need input must be async.

PRNG: js/isaac64.js produces bit-identical uint64 sequences to C. js/rng.js wraps it with rn2(), rnd(), d(), etc. The RNG log (enableRngLog() / getRngLog()) captures every call for comparison.

Level generation: initRng(seed) → initLevelGeneration() → makelevel(depth) → wallification(map). One continuous RNG stream across depths.

Display: <pre> element with per-cell <span> tags. DEC graphics symbols mapped to Unicode box-drawing characters. No canvas, no WebGL.

Testing philosophy: Two layers of truth —

  1. ISAAC64 produces identical sequences (golden reference files)
  2. JS matches C cell-for-cell (C-captured sessions with RNG traces)

Code Conventions

Current Parity Findings (2026-02-18)

Further Reading


“You ascend to the next level of understanding. The strident call of a test suite echoes through the Mazes of Development. All tests pass!”