Development Guide
“Welcome, strider, to the Mazes of Development! You fall through a trap door into a large room filled with source code.”
See also: DESIGN.md (architecture) | DECISIONS.md (trade-offs) | PARITY_TEST_MATRIX.md (test suites & gates) | COMPARISON_PIPELINE.md (recorder/comparator flow) | LORE.md (porting lessons) | TESTING.md (test dashboard & workflows)
Prerequisites
- Node.js 25+ (see
.nvmrc) - Python 3 (for
python3 -m http.serverand data generators) - Puppeteer (
npm install— used by E2E tests)
For C comparison testing (optional):
- gcc, make, bison, flex (build tools)
- ncurses-dev (
libncurses-devon Linux, Xcode command line tools on macOS) - tmux (drives the C binary headlessly)
Practical Setup (This Repo Runtime)
Use this as the default command setup for development and translator work in this repository.
- Check core tools:
node -v npm -v python3 -V conda --version - Use conda Python for translator tooling:
# Confirm conda base has pip + clang bindings conda run -n base python -m pip --version conda run -n base python -c "import clang, clang.cindex; print('clang ok')" - Run translator commands with conda Python: ```bash conda run -n base python tools/c_translator/main.py –help
Example parse summary
conda run -n base python tools/c_translator/main.py
–src nethack-c/src/hack.c
–emit parse-summary
–out /tmp/hack.parse.json
File-wide translation capability summary
conda run -n base python tools/c_translator/main.py
–src nethack-c/src/hack.c
–emit capability-summary
–out /tmp/hack.capability.json
Multi-file capability matrix (for scale planning)
conda run -n base python tools/c_translator/capability_matrix.py
–src nethack-c/src/hack.c
–src nethack-c/src/monmove.c
–src nethack-c/src/zap.c
–out /tmp/translator.capability.matrix.json
Note: default excludes apply from
tools/c_translator/rulesets/translation_scope_excluded_sources.json
(tests/fixtures + non-gameplay C subsystems). Use –no-exclude-sources
when intentionally running fixture-only translation checks.
Batch emit-helper generation (hundreds-scale sweeps)
conda run -n base python tools/c_translator/batch_emit.py
–src nethack-c/src/hack.c
–src nethack-c/src/allmain.c
–src nethack-c/src/getpos.c
–out-dir /tmp/translator-batch
–summary-out /tmp/translator-batch-summary.json
Select stitch-ready candidates from a batch summary
conda run -n base python tools/c_translator/select_candidates.py
–summary /tmp/translator-batch-summary.json
–out /tmp/translator-batch-candidates.json
Optional: allow specific diag codes (for example CFG complexity review queue)
conda run -n base python tools/c_translator/select_candidates.py
–summary /tmp/translator-batch-summary.json
–allow-diag CFG_COMPLEXITY
–out /tmp/translator-batch-candidates-plus-cfg.json
Find clean candidates that already map to exported runtime JS functions
conda run -n base python tools/c_translator/runtime_stitch_candidates.py
–summary /tmp/translator-batch-summary.json
–out /tmp/translator-runtime-stitch-candidates.json
Optional: override default translator-scope excludes
conda run -n base python tools/c_translator/runtime_stitch_candidates.py
–summary /tmp/translator-batch-summary.json
–exclude-sources-file tools/c_translator/rulesets/translation_scope_excluded_sources.json
–out /tmp/translator-runtime-stitch-candidates-gameplay.json
Heuristic safety lint for runtime candidates (unknown callee detection)
conda run -n base python tools/c_translator/runtime_candidate_safety.py
–candidates /tmp/translator-runtime-stitch-candidates.json
–out /tmp/translator-runtime-stitch-safety.json
Apply runtime-safe candidates into JS modules (dry run by default)
conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
Write stitched updates
conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–write
Optional: skip known-bad auto-translations while stitching
conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–denylist tools/c_translator/runtime_stitch_denylist.json
–write
Optional: strict allowlist stitch (only listed pairs will be applied)
conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–allowlist /tmp/translator-allowlist.json
–write
Build a refactor queue from rejected safety/signature candidates
(capture stitch dry-run JSON first)
conda run -n base python tools/c_translator/runtime_stitch_apply.py
–safety /tmp/translator-runtime-stitch-safety.json
–repo-root .
–denylist tools/c_translator/runtime_stitch_denylist.json \
/tmp/translator-runtime-stitch-apply.json conda run -n base python tools/c_translator/refactor_queue.py
–safety /tmp/translator-runtime-stitch-safety.json
–apply-summary /tmp/translator-runtime-stitch-apply.json
–out /tmp/translator-refactor-queue.json
Hunt non-mechanical aliases and missing import/binding candidates
conda run -n base python tools/c_translator/identifier_hunt.py
–queue /tmp/translator-refactor-queue.json
–out /tmp/translator-identifier-hunt.json
Audit currently-marked autotranslations against current pipeline categories
conda run -n base python tools/c_translator/audit_marked_autotranslations.py
–repo-root .
–summary /tmp/translator-batch-full-summary.json
–candidates /tmp/translator-runtime-stitch-candidates-full.json
–safety /tmp/translator-runtime-stitch-safety-full.json
–apply-summary /tmp/translator-runtime-stitch-apply-full.json
–out /tmp/marked-autotranslation-audit.json
Audit C out-param and Sprintf/Snprintf patterns from batch metadata
conda run -n base python tools/c_translator/audit_outparams_and_formatting.py
–summary /tmp/translator-batch-full-summary.json
–safety /tmp/translator-runtime-stitch-safety-full.json
–out /tmp/outparam-format-audit.json
Notes:
- `runtime_candidate_safety.py` now auto-detects strict alias matches where a
C identifier differs only by case/underscore from an existing module symbol.
- It also consumes curated non-mechanical alias rules from
`tools/c_translator/rulesets/identifier_aliases.json`.
- It now also rejects known semantic trap patterns even when syntax is valid:
pointer-style truthy loops, NUL-sentinel scalar writes, and whole-string
`highc/lowc` rewrites.
- It also supports module-level semantic blocking via
`tools/c_translator/rulesets/semantic_block_modules.json` for files whose C
pointer/string idioms are not yet safely lowerable to JS.
- `refactor_queue.py` emits these as `rename_alias` tasks so we can prioritize
canonical renames separately from true missing identifiers.
4. Translator policy/annotation checks (Node scripts):
```bash
npm run -s translator:check-policy
npm run -s translator:check-annotations
- Core parity test loops:
npm run -s test:unit npm run -s test:session # C-parity-session-only coverage report npm run -s coverage:session-parity
Coverage details:
- Session-parity-only coverage design and usage are documented in COVERAGE.md.
Notes:
- On some hosts,
/usr/bin/python3may not includepiporclangbindings. - Prefer
conda run -n base python ...fortools/c_translator/*commands to ensureclang.cindexis available.
Quick Start
# Install dependencies
npm install
# Run the game locally
npm run serve
# Open http://localhost:8080
# Run all fast tests (unit + session)
npm test
# Run everything (unit + session + e2e)
npm run test:all
Manual Keylog Recording (Canonical)
For manual C keylog recordings used by parity fixtures, use the recorder wrapper
instead of launching nethack directly. The wrapper sets deterministic env,
creates a controlled .nethackrc, and cleans stale save/lock/bones state.
# Tutorial manual recording example
./record-manual-capture.sh \
--seed=7 \
--role=Wizard \
--race=human \
--gender=male \
--align=neutral \
--no-wizard \
--tutorial \
--tmux-socket=default \
--keylog=test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl
Then convert to a v3 comparison session:
python3 test/comparison/c-harness/keylog_to_session.py \
--in test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl \
--out test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json \
--startup-mode auto \
--tutorial on \
--wizard auto
When conversion replay itself is under investigation, capture a v3 session directly during manual play (no second replay pass):
python3 test/comparison/c-harness/record_manual_session_v3.py \
--seed=8 \
--name=Tutes \
--role=Wizard \
--race=human \
--gender=male \
--align=neutral \
--symset=DECgraphics \
--tutorial-option=unset
For reproducible C-only capture from an existing keylog, use integrated autofeed mode (exact-byte tmux injection) instead of a separate feeder:
python3 test/comparison/c-harness/record_manual_session_v3.py \
--autofeed \
--seed=7 \
--name=Tutes \
--role=Wizard \
--race=human \
--gender=male \
--align=neutral \
--symset=DECgraphics \
--tutorial-option=on \
--autofeed-keylog=test/comparison/keylogs/seed7_tutorial_manual_wizard.jsonl \
--keylog=test/comparison/keylogs/seed7_tutorial_autofeed_direct.jsonl \
--output-session=test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json
The command reports key integrity (AUTOFEED_KEY_MISMATCH_AT=...) so you can
verify captured keys match the source stream.
To verify tutorial coverage quickly:
node scripts/tutorial-coverage.mjs \
test/comparison/sessions/seed7_tutorial_manual_wizard_gameplay.session.json
Project Structure
See DESIGN.md for the complete module architecture and C-to-JS correspondence mapping. This guide focuses on workflows and commands.
For a quick reference: the js/ directory contains 32 ES6 modules organized by subsystem (Core, Display & I/O, RNG, World Generation, Creatures, Objects, etc.), each with comments linking to C source files. The test/ directory contains unit tests, E2E browser tests, and C-comparison session tests.
Running Tests
“You feel as if someone is testing you.”
Test Tiers
npm test # Core: ~12s — 20 unit files + 18 high-signal sessions
npm run test:all # Full: ~90s — all 3700+ unit tests + 570 sessions
npm test is fast enough to run after every edit. npm run test:all
before pushing.
Individual Test Commands
# Unit tests — module-level correctness
npm run test:unit
# Core sessions only (no units)
npm run test:core
# E2E tests — browser rendering via Puppeteer
npm run test:e2e
# Session comparison — replay C reference sessions
npm run test:session
# Single session
node test/comparison/session_test_runner.js <session-path>
# Dump raw JS replay trace from a C gameplay session
npm run replay:dump -- test/comparison/sessions/<file>.session.json --out /tmp/<file>.js-replay.json
# Compare C gameplay session vs JS replay (generated or --js file)
npm run session:compare -- test/comparison/sessions/<file>.session.json
Timeout policy (hang detection):
npm run test:unitenforces a1000mstimeout per unit test.- Single-session replay runs (
node test/comparison/session_test_runner.js <session>) enforce a10000mstimeout per session by default. session_test_runnerruns the full selected set by default; add--fail-fastonly when you explicitly want to stop on first failure.
Systematic Stall CPU Diagnosis
When a session times out and appears CPU-bound/live-locked, use the profiler wrapper:
node scripts/replay_stall_diagnose.mjs \
--session seed325_knight_wizard_gameplay \
--timeout-ms 12000 \
--top 20
What it does:
- Runs
session_test_runnerunder Node--cpu-prof. - Stores artifacts in
tmp/stall-diagnose/<timestamp>/. - Writes:
run.log(full replay/test output),summary.txt(top self-sample functions and files),summary.json(machine-readable summary).
This makes hotspot triage repeatable across agents and avoids ad-hoc manual profiling.
Fast Parity Triage Loop (Comparison Artifacts)
Use this loop when chasing a specific seed/session divergence.
- Reproduce one session and emit comparison artifacts:
node test/comparison/session_test_runner.js --verbose \ test/comparison/sessions/seed110_samurai_selfplay200_gameplay.session.json - List artifacts in the latest run:
node scripts/comparison-window.mjs --list - Inspect first divergence windows (RNG/event):
node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay --channel rng --window 12 node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay --channel event --window 12 - Inspect an explicit index:
node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay \ --channel event --index 1076 --window 10 - Inspect per-step turn-accounting drift (RNG/event counts):
node scripts/comparison-window.mjs --session seed110_samurai_selfplay200_gameplay \ --step-summary --step-from 186 --step-to 200
Notes:
- Artifacts are written under
tmp/session-comparisons/<run-id>/. tmp/session-comparisons/LATESTpoints to the most recent run.comparison-windowsupports--sessionand--file. When--diris not supplied, it searches recent runs for the requested target if it is not present in the latest run.- Use these artifacts to localize real gameplay bugs in core
js/modules; do not patch comparator/harness behavior to hide mismatches.
Timing-window triage:
- Strict parity (
screens,colors) remains authoritative for pass/fail. - Non-gating timing-window metrics (
screenWindow,colorWindow) additionally check JS animation boundary frames for a match within the same step. - If strict mismatch + window match occurs, the result includes
rerecordHintwith candidate steps. Treat this as a capture-timing signal, not as gameplay parity success. - Re-record those sessions after adding per-turn delay overrides to the session (for C capture):
{ "regen": { "key_delays_s": { "106": 0.25 } } } - Or annotate the step directly (preferred for persistent per-step intent):
{ "steps": [ ..., { "key": "h", "capture": { "key_delay_s": 0.25 } } ] } - Then run:
python3 test/comparison/c-harness/rerecord.py \ test/comparison/sessions/seed110_samurai_selfplay200_gameplay.session.json - For legacy sessions that omitted explicit
--More--dismiss keys, enable migration mode to record them into the session transcript:python3 test/comparison/c-harness/run_session.py <seed> <out.json> <moves> --record-more-spaces
Replay Boundary (Core vs Harness)
Keep gameplay and UI semantics in core runtime modules (js/), not in
comparison orchestration.
- Core runtime owns command behavior, modal flows, rendering, and state transitions.
- Replay/comparison harness owns input driving, capture, normalization, and diff reporting.
- If replay needs a special case, prefer a generic capture policy (for example, display-only acknowledgement frames) over per-command behavior forks.
- For gameplay screen text diffs, prefer ANSI-cell-derived plain rows when ANSI capture is available; avoid comparator-side column-shift heuristics. Use plain-line DEC decoding only as a legacy fallback when ANSI is unavailable.
- For interface screen text diffs, compare normalized rows directly (no left-shift fallback matching).
Session Tests In Detail
The session runner auto-discovers all *.session.json files in
test/comparison/sessions/ and test/comparison/maps/ and verifies
JS output against C-captured reference data:
| Session Type | What It Tests | Example |
|---|---|---|
"map" (source: c) |
typGrid match + RNG traces + structural validation | seed16_maps_c.session.json |
"gameplay" |
Startup typGrid + per-step RNG traces + screen rendering | seed42.session.json |
"chargen" |
Character creation reference data (screens, RNG, inventory) | seed42_chargen_valkyrie.session.json |
Map sessions generate levels 1→5 sequentially on one RNG stream (matching C’s behavior). Each level is checked for:
- Cell-by-cell typGrid match against C
- RNG call count match (when
rngCallspresent) - Per-call RNG trace match (when
rngpresent) - Wall completeness, corridor connectivity, stairs placement
Gameplay parity uses an explicit two-phase architecture:
- Recorder: run JS from C-captured input keys and capture raw JS trace.
- Comparator: compare recorded JS trace to C session using comparator policy.
See COMPARISON_PIPELINE.md for module-level details.
C Comparison (optional, slower)
# One-time setup: clone, patch, and build the C binary
bash test/comparison/c-harness/setup.sh
# Regenerate all C sessions and maps from seeds.json config
python3 test/comparison/c-harness/run_session.py --from-config
python3 test/comparison/c-harness/gen_map_sessions.py --from-config
# Generate character creation sessions for all 13 roles
python3 test/comparison/c-harness/gen_chargen_sessions.py --from-config
# Or capture a single seed/role manually
python3 test/comparison/c-harness/gen_map_sessions.py 42 5 --with-rng
python3 test/comparison/c-harness/gen_chargen_sessions.py 42 v h f n Valkyrie
# Session runner auto-discovers all C-captured files
npm run test:session
Common Development Tasks
Unified Backlog Intake
Use one project backlog for all work, with labels for classification.
- Capture candidate issues from:
- failing tests/sessions and CI regressions,
- C-to-JS audit/coverage gaps,
- manual playtesting findings,
- selfplay findings,
- release blockers and user/developer bug reports.
- Classify every new issue with labels.
- Use
parityfor C-vs-JS divergence/parity work. - Add other domain labels as appropriate (
selfplay,infra,docs, etc.).
- Use
- Keep new issues unowned by default.
- Add
agent:<name>only when an agent actively claims the issue.
- Add
- Use evidence-first issue bodies for
parityissues.- Include seed/session/command, first mismatch point, and expected vs actual behavior.
Modal Guard (Single-Threaded Contract)
C NetHack is single-threaded. When more() calls wgetch(), the CPU
blocks and nothing else executes until the key is pressed. In JS,
await nhgetch() yields the event loop, allowing any pending Promise
continuation to fire — breaking the single-threaded contract.
js/modal_guard.js enforces the C contract with runtime assertions:
- Modal owners (
more,yn,getlin,getdir,menu,getobj) callenterModal(name)before awaiting input andexitModal(name)after. - Game mutation points (
moveloop_core,movemon,mattacku,domove_core,rhack) callassertNotInModal(name)at entry. - If game code runs while a modal is active, the guard throws with a diagnostic showing both the violation point and the modal entry stack.
The most common bug this catches: an async function called without
await. The orphaned Promise’s continuation fires during an unrelated
await yield later, violating execution ordering.
# Enabled by default. Disable for production if needed:
WEBHACK_MODAL_GUARD=0 node ...
# Enable entry stack traces for debugging (collects Error().stack on
# every enterModal call — expensive, off by default):
WEBHACK_MODAL_GUARD_TRACE=1 node ...
Nested modals are valid and supported (stack-based) — e.g., more()
inside putstr_message inside ynFunction. The guard catches only
non-modal game code running during a modal wait.
C Parity Policy
When working on C-vs-JS parity, follow this rule:
- Use failing unit/session tests to decide what to work on next.
- Treat session replay results as the primary gameplay parity authority; use unit tests as focused guardrails.
- Use C source code (
nethack-c/src/*.c) as the behavior spec. - Do not “fix to the trace” with JS-only heuristics when C code disagrees.
- If a test reveals missing behavior, port the corresponding C logic path.
- Keep changes incremental and keep tests green after each port batch.
Iron Parity Campaign Workflow
For state-canonicalization and translator campaign work, use the Iron Parity docs as the planning authority:
- IRON_PARITY_PLAN.md
- STRUCTURES.md
- C_TRANSLATOR_ARCHITECTURE_SPEC.md
- C_TRANSLATOR_PARSER_IMPLEMENTATION_SPEC.md
Required gating expectations for campaign changes:
- policy classification remains complete for all
js/*.jsfiles:npm run -s translator:check-policy npm run -s translator:check-annotations - parity/test gates still apply (session replay evidence remains authoritative).
- no harness-side suppression of gameplay mismatches.
- campaign milestone naming should use shared
M0..M6IDs fromIRON_PARITY_PLAN.mdwhen documenting progress.
Iron Parity GitHub workflow:
- Create or maintain these issue tiers:
- one campaign tracker epic (
IRON_PARITY: Campaign Tracker (M0-M6)), - one milestone issue per
M0..M6, - subsystem implementation issues linked to the milestone issue.
- one campaign tracker epic (
- Use labels:
- required:
parity,campaign:iron-parity, -
one scope label: statetranslatoranimationparity-testdocsinfra, - optional active owner:
agent:<name>.
- required:
- Include evidence-first parity body fields:
Session/Seed,First mismatch (step/index/channel),Expected C,Actual JS,Suspected origin (file:function).
- Use dependency links:
Blocked by #<milestone-or-prereq>Blocks #<downstream>
Parity Backlog Intake Loop
Use this workflow whenever session tests are failing and backlog intake needs to be refreshed:
- Run the parity suites and capture failures:
npm run test:session - Group failures by shared first-divergence signature (same subsystem/caller).
- File one
parityGitHub issue per systematic cluster with evidence-first body:- session filename(s)
- first mismatch point (step/index/row)
- JS vs C expected behavior
- Prioritize issues by:
- blast radius (how many sessions share it)
- earliness (how soon divergence starts)
- leverage (whether one fix likely collapses multiple failures)
- Repeat after each landed fix; close stale/superseded issues immediately.
Evidence template for each issue:
Session: seedXYZ_*.session.json
First mismatch: rng step=<...> index=<...> OR screen step=<...> row=<...>
JS: <actual>
C: <expected>
Caller (if present): <function(file:line)>
Tutorial Parity Notes
Recent parity work on tutorial sessions established a few stable rules:
- Tutorial status rows should use
Tutorial:<level>instead ofDlvl:<level>. - Tutorial startup/replay should expose
Xp-style status output for parity with captured interface sessions. nh.parse_config("OPTIONS=...")options used by tutorial scripts now feed map flags (mention_walls,mention_decor,lit_corridor) so movement/rendering behavior follows script intent rather than ad-hoc tutorial checks.- Blocked wall movement now keys off
mention_wallsbehavior and matches C tutorial captures (It's a wall.).
With those in place, tutorial interface screen matching is now complete in the
manual tutorial session. The remaining first mismatch is RNG-only: an early
nhl_random (rn2(100)) divergence immediately after the first tutorial
mktrap call.
Inventory-Letter Parity Notes
For C-faithful pickup lettering (assigninvlet behavior):
- A dropped item can keep its prior
invletwhen picked back up, as long as that letter is currently free in inventory. - If that carried-over
invletcollides with an in-use letter, inventory-letter assignment falls back to rotatedlastinvnrallocation. - This affects visible pickup messages (
<invlet> - <item>.) even when RNG remains fully aligned.
Tourist Session Parity Notes (seed6, non-wizard)
Recent work on test/comparison/sessions/seed6_tourist_gameplay.session.json
established these practical replay/parity rules:
- Throw prompt
?/*in this trace is not generic help; it opens an in-prompt inventory overlay (right-side menu) and keeps prompt flow pending until an explicit dismiss key. - Apply prompt
?/*in this trace also behaves as an in-prompt modal list: while the list is open, non-listed keys are ignored and only listed apply candidate letters should be accepted as selection keys. - Outside that
?/*list mode, apply prompt letter selection is broader than the suggestion list; selecting non-suggested inventory letters can still hit C-style fallback text (Sorry, I don't know how to use that.). FLINTshould not be treated as apply-eligible in normal prompt candidate filtering; this was causing false-positive apply prompts in wizard sessions.- Tourist credit-card apply path is direction-driven (
In what direction?); invalid non-wizard direction input must reportWhat a strange direction! Never mind.. - In wizard mode, invalid direction input for that apply-direction path is
silent (no
Never mind.topline). $must route to C-style wallet reporting (Your wallet contains N zorkmid(s).).:on an empty square should reportYou see no objects here.in this trace.- Throw prompt suggestion letters follow C’s class-filtered set (coins always; weapons when not slinging; gems/stones when slinging; exclude worn/equipped items). This only affects prompt text; manual letter entry is still allowed.
- For throw/inventory overlay parity, cap right-side overlay offset at column
41(offx <= 41) rather than purecols - maxcol - 2; C tty commonly clamps here for these menu windows. - For unresolved
iinventory-menu steps, use captured screen frames as authoritative in replay; JS-only re-rendering can shift overlay columns when item detail text differs ((being worn), tin contents, etc.). - Overlay dismiss must clear the right-side menu region before re-showing the throw prompt, or stale menu rows leak into later captured frames.
- Read prompt
?/*is a modal--More--listing flow; non-dismiss keys keep the listing frame untilspace/enter/escreturns to the prompt. - In AT_WEAP melee flow, monsters can spend a turn wielding a carried weapon
(
The goblin wields a crude dagger!) before the first hit roll. - In AT_WEAP melee hit/miss messaging, session parity expects the C-style
pre-hit weapon phrase on the same topline as the hit result (for example
The goblin thrusts her crude dagger. The goblin hits!), so replay must preserve this pre-hit text in weapon attack flows. - Correct AT_WEAP possessive phrasing depends on monster sex state from
creation (
mon.female), plus C-style object naming (xname) for the wielded weapon’s appearance name (crude daggervs discovered object name). - AT_WEAP melee damage must include wielded-weapon
dmgval(rnd(sdam)) after based(1,4)damage; omitting that call shifts later knockback/runmode RNG. - In AT_WEAP ranged flow, monster projectiles must consume
minventstacks and land on floor squares; otherwise later petdog_goalobject scans missdogfood()->obj_resistscalls and RNG diverges downstream. - Potion quaff healing must follow C
healup()overflow semantics: when healing exceeds current max HP, increase max HP by potion-specificnxtraand clamp current HP to the new max. Without this, full-HPextra healingquaffs show transient status-row HP drift even when message/RNG flow matches. --More---split steps and extended-command (#...) typing frames in this session are best handled as capture-authoritative replay frames (screen parity first) when they carry no gameplay state progression.session_test_runnergameplay divergencestepvalues are 1-based (same indexing used byrng_step_diff --step N), so step numbers can be copied directly between tools without adding/subtracting 1.- Extended-command shorthand Enter synthesis should only apply to letter keys;
treating control keys (for example
Esc) as shorthand can leak a strayEnterinto the input queue and misalign subsequent command prompts. - Double
mprefix should cancel silently (clearmenuRequestedwith no message) to match C command-prefix behavior. - Some C captures mix left-side map glyphs and right-side overlay text on the same row (for example inventory category headers). Preserve raw column alignment from core rendering; do not apply tmux col-0 compensation in the comparator.
Measured progress in the latest pass:
- First divergence moved from early AT_WEAP messaging drift (
step 605) to a late monster-turn RNG boundary (step 760,distfleeckcontext). - Current metrics:
rng=10447/14063,screens=1071/1284,colors=29988/30776. - Current frontier is late-turn monster/replay boundary alignment in the
tourist non-wizard session (first visible map drift at step
761).
Modifying the dungeon generator
- Make your changes in
js/dungeon.js(or related modules) - Run
npm run test:session— failures show exactly which cells changed and at which seed/depth - If the change is intentional and matches C, the C reference data
doesn’t change. If the C binary also changed, regenerate:
python3 test/comparison/c-harness/run_session.py --from-config python3 test/comparison/c-harness/gen_map_sessions.py --from-config
Debugging C-vs-JS divergence
“You are hit by a divergent RNG stream! You feel disoriented.”
C map sessions with RNG traces are pre-generated for difficult seeds
(configured in test/comparison/seeds.json). The traces include caller
function names for readability:
rn2(2)=0 @ randomize_gem_colors(o_init.c:88)
rn2(11)=9 @ shuffle(o_init.c:128)
When a map doesn’t match C:
# The session runner compares per-call and reports the first mismatch:
# RNG diverges at call 1449: JS="rn2(100)=37" session="rn2(1000)=377"
npm run test:session
# To regenerate C traces after a patch change:
python3 test/comparison/c-harness/gen_map_sessions.py --from-config
Diagnostic Tools for RNG Divergence
Two specialized tools help isolate RNG divergence at specific game turns:
First-response workflow (recommended):
# 1) Reproduce one failing session with caller context on JS RNG entries.
node test/comparison/session_test_runner.js --verbose \
test/comparison/sessions/seed202_barbarian_wizard.session.json
# 2) Drill into the exact divergent step with a local windowed diff.
node test/comparison/rng_step_diff.js \
test/comparison/sessions/seed202_barbarian_wizard.session.json \
--step 16 --window 8
Notes:
- Caller tags are on by default in replay/session tooling (
@ caller(file:line)). - Parent/grandparent context (
<= ... <= ...) is on by default with caller tags. - Set
RNG_LOG_TAGS=0to disable caller tags (faster, shorter logs). - Set
RNG_LOG_PARENT=0to disable parent/grandparent context for shorter lines. rng_step_diff.jsalready forces caller tags; exportRNG_LOG_TAGS=1explicitly only when using other runners that override it.
test/comparison/rng_step_diff.js — Step-level C-vs-JS RNG caller diff
Replays a session in JS and compares RNG stream against captured C data. By
default it compares a specific step; use --phase startup to compare startup
RNG (useful when the first step already starts divergent).
# Inspect first divergence on tutorial accept step
node test/comparison/rng_step_diff.js \
test/comparison/sessions/manual/interface_tutorial.session.json \
--step 1 --window 3
# Inspect startup-phase divergence (pre-step RNG drift)
node test/comparison/rng_step_diff.js \
test/comparison/sessions/manual/interface_tutorial.session.json \
--phase startup --window 5
# Example output:
# first divergence index=5
# >> [5] JS=rn2(100)=27 | C=rn2(100)=97
# JS raw: rn2(100)=27 @ percent(sp_lev.js:6607)
# C raw: rn2(100)=97 @ nhl_random(nhlua.c:948)
Use when: session_test_runner reports a mismatch and you need exact
call-site context at the first divergent RNG call within a specific step.
For tutorial-specific RNG drift, two debug env flags are available:
# Log non-counted raw PRNG advances in JS RNG log output.
WEBHACK_LOG_RAW_ADVANCES=1 \
node test/comparison/rng_step_diff.js \
test/comparison/sessions/manual/interface_tutorial.session.json \
--step 1 --window 8
# Override raw draws before first tutorial percent() call.
# Default is N=2; set this env var to compare other values.
WEBHACK_TUT_EXTRA_RAW_BEFORE_PERCENT=0 \
node test/comparison/rng_step_diff.js \
test/comparison/sessions/manual/interface_tutorial.session.json \
--step 1 --window 3
selfplay/runner/pet_rng_probe.js — Per-turn RNG delta comparison
Compares RNG call counts between C and JS implementations on a per-turn basis, with filtering for specific subsystems (e.g., dog movement). Runs both C (via tmux) and JS (headless) simultaneously and shows where RNG consumption diverges.
# Compare first 9 turns for seed 13296
node selfplay/runner/pet_rng_probe.js --seed 13296 --turns 9
# Show detailed RNG logs for specific turns
node selfplay/runner/pet_rng_probe.js --seed 13296 --turns 20 --show-turn 7 --show-turn 8
# Output shows per-turn RNG call counts and dog_move specific calls:
# Turn | C rng calls | JS rng calls
# 1 | 37 | 37
# 7 | 12 | 16 <- divergence detected
Use when: RNG traces show divergence but you need to pinpoint exactly which turn and which subsystem (monster movement, item generation, etc.) is responsible.
Throw-Replay Lore (Pet/Throw Parity)
For monster thrown-weapon parity, do not assume one thrwmu() is fully
resolved inside one captured input step.
- In C sessions, a throw often appears as a multi-step sequence:
- an initial step with top line
"The <monster> throws ..."and onern2(5)atm_throw(mthrowu.c:772), - later key steps that continue the projectile (
rn2(5)again, and sometimesthitu()/dmgval()rolls).
- an initial step with top line
- This pattern is easy to see in
seed110_samurai_selfplay200.session.jsonandseed206_monk_wizard.session.json. - Practical implication: if JS resolves full projectile flight/hit/drop in a single turn, it can create false-looking RNG and map glyph drift even when the message text seems close.
selfplay/runner/trace_compare.js — C trace vs JS behavior comparison
Replays a captured C selfplay trace in JS headless mode and compares turn-by-turn behavior (actions, position, HP, dungeon level). Supports position offsets for cases where maps differ slightly but gameplay is similar.
# Compare C trace against JS headless replay
node selfplay/runner/trace_compare.js --trace traces/captured/trace_13296_valkyrie_score43.json
# Compare with position offset adjustment
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --dx 1 --dy 0
# Ignore position mismatches (focus on actions/HP)
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --ignore-position
# Save JS trace for later inspection
node selfplay/runner/trace_compare.js --trace traces/captured/trace_79.json --output /tmp/js_trace.json
# Output shows first 20 mismatches:
# turn 7: diffs=action,position C={"action":"explore","position":{"x":40,"y":11},"hp":14,"hpmax":14,"dlvl":1} JS={"action":"rest","position":{"x":40,"y":12},"hp":14,"hpmax":14,"dlvl":1}
Use when: You have a C selfplay trace showing interesting behavior (combat, prayer, item usage) and want to verify JS reproduces the same decision-making and outcomes.
Typical workflow:
- Capture C selfplay trace showing the divergence or interesting behavior
- Run
trace_compare.jsto see when JS behavior diverges from C - Use
pet_rng_probe.jsto identify which turn RNG consumption differs - Add targeted RNG logging around the suspicious code path
- Compare RNG logs to find the extra/missing call
ANSI/Color Parity Gotchas
Recent gameplay color parity work surfaced a few high-impact pitfalls:
- For session comparisons, preserve true ANSI source lines.
test/comparison/session_loader.js:getSessionScreenAnsiLines()must preferscreenAnsiwhen bothscreenandscreenAnsiare present.- If this regresses, color checks silently compare against plain text and
report misleading
fg=7mismatches.
- Headless ANSI export must map color index
8(NO_COLOR) to SGR90(and bg100), not fall back to37.- Missing this mapping produces persistent
7 -> 8color deltas even when the in-memory screen color grid is correct.
- Missing this mapping produces persistent
- Overlay inventory category headers are inverse-video in C captures.
- Render
Weapons/Armor/...heading rows withattr=1in overlay menus.
- Render
- Up-stairs (
<) use yellow/gold color in captures, while down-stairs (>) remain gray in these flows. - Remembered room floor cells are compared as NO_COLOR tone, while remembered walls/doors retain terrain colors.
CORE vs DISP RNG Audits
For display-only RNG investigations (C rn2_on_display_rng / newsym_rn2 paths),
follow the focused playbook in:
docs/plans/RNG_DISPRNG_AUDIT_PLAN.md
Use that workflow before adding any new RNG infrastructure. The default policy is:
- port C logic first;
- add DISP-specific tracing only when repeated first-divergence evidence points to display paths.
Adding a new test seed
- Add the seed to
test/comparison/seeds.json:map_seeds.with_rng.cfor C map sessions with RNG tracessession_seeds.sessionsfor full gameplay sessionschargen_seeds.sessionsfor character creation sessions
- Regenerate:
python3 test/comparison/c-harness/gen_map_sessions.py --from-config npm run test:sessionauto-discovers the new file
Character creation sessions
Chargen sessions capture the full interactive character creation sequence
for all 13 roles, recording every keystroke (including --More-- as space),
screen state, RNG traces, and starting inventory. Configuration is in
seeds.json under chargen_seeds:
# Generate all 13 roles
python3 test/comparison/c-harness/gen_chargen_sessions.py --from-config
# Or a single role
python3 test/comparison/c-harness/gen_chargen_sessions.py 42 v h f n Valkyrie
The script adaptively navigates the character creation menus, handling cases where menus are auto-skipped (e.g., Knight has only one valid race and alignment). Each session includes the typGrid and inventory display for comparison with JS.
Regenerating monster/object data
The monster and object tables are auto-generated from C headers:
python3 scripts/generators/gen_monsters.py
python3 scripts/generators/gen_objects.py
python3 scripts/generators/gen_artifacts.py
python3 scripts/generators/gen_constants.py
# Inspect unresolved/deferred header macros with missing dependency details
python3 scripts/generators/gen_constants.py --report-deferred
# Same deferred report as machine-readable JSON
python3 scripts/generators/gen_constants.py --report-deferred-json
# JSON report includes:
# - details[] with missingDeps/rootMissingDeps per deferred macro
# - rootBlockers[] with ownerHint (likely leaf owner)
# - ownerSummary[] and unknownOwnerBlockers[] for ownership coverage checks
# Export report JSON to docs/metrics/deferred_constants_report_latest.json
python3 scripts/generators/export_deferred_constants_report.py
# npm aliases
npm run constants:report
npm run constants:report:write
Converting Lua special levels to JavaScript
NetHack 3.7 uses Lua scripts for special level generation (Castle, Asmodeus, Oracle,
etc.). The tools/lua_to_js.py converter translates these to JavaScript modules:
# Convert a single level
python3 tools/lua_to_js.py nethack-c/dat/asmodeus.lua > js/levels/asmodeus.js
# Regenerate all converted levels (131 Lua files → 38 active JS files)
for lua_file in nethack-c/dat/*.lua; do
base=$(basename "$lua_file" .lua)
# Convert names: bigrm-XX to bigroom-XX
js_name=$(echo "$base" | sed 's/^bigrm-/bigroom-/')
python3 tools/lua_to_js.py "$lua_file" > "js/levels/$js_name.js"
done
What the converter handles
The converter performs careful syntax translation to preserve game semantics:
String handling:
- Lua multiline strings
[[ ... ]]→ JavaScript template literals`...` - Backticks inside multiline strings are escaped:
`liberated`→\`liberated\` - Template literals are protected from regex replacements during expression conversion
- Regular quoted strings (
"...",'...') are preserved as-is
Comments:
- Lua comments
--→ JavaScript comments// - Comment detection uses string tracking to avoid false matches inside strings
Expression conversion:
- String concatenation:
..→+ - Logical operators:
and→&&,or→||,not→! - Method calls:
obj:method()→obj.method() - Inequality:
~=→!== - Equality:
==→=== - Table length:
#tbl→tbl.length - Boolean/null:
nil→null
Control flow:
for i = 1, n do ... end→for (let i = 1; i <= n; i++) { ... }if ... then ... end→if (...) { ... }function name() ... end→function name() { ... }
Data structures:
- Arrays:
{ 1, 2, 3 }→[ 1, 2, 3 ](simple arrays only) - Objects:
{ key = value }→{ key: value }
Special level DSL:
- Preserves
des.*calls as-is (same API between Lua and JS) - Handles nested
des.map({ ..., contents: function() { ... } })structures - Maintains proper statement boundaries with depth tracking
Known limitations
- Template literals with
${}interpolation syntax would break (none found in NetHack Lua) - Complex nested table expressions may need manual adjustment
- Assumes
des.*functions have identical signatures between Lua and JS
Debugging converter issues
When a converted file has problems:
- Check ASCII maps — dots becoming
+means template literal protection failed - Check comments — comments eating code means statement splitting is wrong
- Check syntax errors — unbalanced braces usually means multiline collection broke
- Run all Lua files —
for f in nethack-c/dat/*.lua; do python3 tools/lua_to_js.py "$f" > /tmp/test.js || echo "FAILED: $f"; done
The converter tracks several state machines simultaneously:
- String tracking (single/double quote detection)
- Brace/paren depth (for multiline call collection)
- Template literal extraction (to protect from regex corruption)
- Comment context (to avoid converting
--inside strings)
Adding a new C patch
Patches live in test/comparison/c-harness/patches/ and are applied by
setup.sh. To add one:
- Make changes under
nethack-c/. - Export a numbered patch into
test/comparison/c-harness/patches/, e.g.cd nethack-c && git diff > ../test/comparison/c-harness/patches/012-your-patch.patch - Run
bash test/comparison/c-harness/setup.shto verify apply/build/install.
The C Harness
“You hear the rumble of distant compilation.”
The C harness builds a patched NetHack 3.7 binary for ground-truth comparison.
The C source is frozen at commit 79c688cc6 and never modified directly —
only numbered patches in test/comparison/c-harness/patches/ are applied on top
(001 through 016 as of 2026-03-02).
Core harness capabilities come from:
001-deterministic-seed.patch — Seed control via NETHACK_SEED.
002-fixed-datetime-for-replay.patch and
011-fix-ubirthday-with-getnow.patch — Fixed datetime support for replay
determinism, including shopkeeper-name ubirthday parity.
003-map-dumper.patch — #dumpmap wizard command for raw typ grids.
004-prng-logging.patch, 009-midlog-infrastructure.patch, and
010-lua-rnglog-caller-context.patch — high-fidelity RNG tracing with
caller context.
005-obj-dumper.patch and 008-checkpoint-snapshots.patch — object
and full-checkpoint state dumps for step-local divergence debugging.
Why raw terrain grids instead of terminal output?
A | on screen could be VWALL, TLWALL, TRWALL, or GRAVE. The raw typ integers
are unambiguous. Terminal output also depends on FOV (the player can’t see most
of the map), and requires ANSI escape stripping. Integer grids are faster,
simpler, and definitive.
Setup gotchas
- Lua is required — NetHack 3.7 embeds Lua.
setup.shrunsmake fetch-lua. - Wizard mode —
sysconfmust haveWIZARDS=*. The script sets this. - Stale game state — Lock files (
501wizard.0), saves, and bones from crashed tmux sessions cause “Destroy old game?” prompts. All harness scripts clean these up before each run. - Parallel Lua build race —
make -jcan race onliblua.a. The script builds Lua separately first.
Event Logging
“You hear a distant clanking sound.”
Event logging tracks game-state mutations (object placement, monster death,
pickup/drop, engravings, traps) on both C and JS sides for divergence
diagnosis. Events are ^-prefixed lines interleaved with the RNG log.
How it works
C side: The 012-event-logging.patch adds event_log() calls at
centralized bottleneck functions (mondead, mpickobj, mdrop_obj,
place_object, mkcorpstat, dog_eat, maketrap, deltrap,
make_engr_at, del_engr, wipe_engr_at). These write ^event[args]
lines to the RNG log file.
JS side: The same bottleneck functions call
pushRngLogEntry('^event[args]') to append event entries to the step’s
RNG array. The centralized functions live in:
| Function | File | Purpose |
|---|---|---|
mondead(mon, map) |
js/monutil.js |
Monster death — logs ^die, drops inventory |
mpickobj(mon, obj) |
js/monutil.js |
Monster pickup — logs ^pickup |
mdrop_obj(mon, obj, map) |
js/monutil.js |
Monster drop — logs ^drop |
placeFloorObject(map, obj) |
js/floor_objects.js |
Object on floor — logs ^place |
removeFloorObject(map, obj) |
js/floor_objects.js |
Object off floor — logs ^remove |
make_engr_at(...) |
js/engrave.js |
Engraving created — logs ^engr |
del_engr(...) |
js/engrave.js |
Engraving deleted — logs ^dengr |
wipe_engr_at(...) |
js/engrave.js |
Engraving eroded — logs ^wipe |
Event types
| Event | Format | Meaning |
|---|---|---|
^die[mndx@x,y] |
monster index, position | Monster died |
^pickup[mndx@x,y,otyp] |
monster, position, object type | Monster picked up object |
^drop[mndx@x,y,otyp] |
monster, position, object type | Monster dropped object |
^place[otyp,x,y] |
object type, position | Object placed on floor |
^remove[otyp,x,y] |
object type, position | Object removed from floor |
^corpse[corpsenm,x,y] |
corpse monster, position | Corpse created |
^eat[mndx@x,y,otyp] |
monster, position, object type | Monster ate object |
^trap[ttyp,x,y] |
trap type, position | Trap created |
^dtrap[ttyp,x,y] |
trap type, position | Trap deleted |
^engr[type,x,y] |
engrave type, position | Engraving created |
^dengr[x,y] |
position | Engraving deleted |
^wipe[x,y] |
position | Engraving wiped/eroded |
Using events for debugging
Events help diagnose state drift — when C and JS RNG diverge because game objects or monsters ended up in different positions. Instead of guessing where state went wrong, compare event sequences to see exactly which object placement or monster action differed.
# Run a session and look at event comparison
node test/comparison/session_test_runner.js --verbose \
test/comparison/sessions/seed42_gameplay.session.json
# Event mismatches appear in firstDivergences alongside rng/screen channels
Event comparison is informational only — mismatches don’t fail the test. This is intentional: events track state changes that JS may not yet implement identically (e.g., missing monster behaviors), and blocking on them would make RNG parity work harder to iterate on.
Adding new event types
To add a new event type:
- C side: Add
event_log("newevent[%d,%d]", x, y);at the centralized function in the relevant.cfile, and add it to012-event-logging.patch. - JS side: Add
pushRngLogEntry('^newevent[...]')at the corresponding centralized JS function. - Regenerate sessions:
python3 test/comparison/c-harness/run_session.py --from-config - Events are automatically recognized by the comparator (any line
starting with
^is treated as an event).
Key design principle: centralized bottlenecks
All event logging happens in centralized bottleneck functions, never at
call sites. This mirrors C’s architecture where mondead(), mpickobj(),
and mdrop_obj() are the single points through which all deaths, pickups,
and drops flow. In JS:
- All 10 monster death sites call
mondead(mon, map)— never setmon.dead = truedirectly. - All monster pickup sites call
mpickobj(mon, obj)— never calladdToMonsterInventorydirectly for gameplay pickups. - All monster drop sites call
mdrop_obj(mon, obj, map)— never splice fromminventdirectly for gameplay drops.
Note: mondead drops inventory via placeFloorObject (producing ^place
events), NOT via mdrop_obj (which would produce ^drop events). This
matches C where relobj() calls place_object() directly.
Architecture in 60 Seconds
“You read a blessed scroll of enlightenment.”
Game loop: js/nethack.js runs an async loop. Player input uses
await getChar() which yields to the browser event loop. Every function
that might need input must be async.
PRNG: js/isaac64.js produces bit-identical uint64 sequences to C.
js/rng.js wraps it with rn2(), rnd(), d(), etc. The RNG log
(enableRngLog() / getRngLog()) captures every call for comparison.
Level generation: initRng(seed) → initLevelGeneration() → makelevel(depth) → wallification(map). One continuous RNG stream across depths.
Display: <pre> element with per-cell <span> tags. DEC graphics
symbols mapped to Unicode box-drawing characters. No canvas, no WebGL.
Testing philosophy: Two layers of truth —
- ISAAC64 produces identical sequences (golden reference files)
- JS matches C cell-for-cell (C-captured sessions with RNG traces)
Code Conventions
- C references: Every ported function has
// C ref: filename.c:function_name() - ES6 modules: No build step, no bundler. Import directly in browser.
- No frameworks: Vanilla JS, vanilla DOM. The game ran in 1987 without React.
- Constants match C:
STONE,VWALL,ROOM, etc. are identical values. Seejs/const.js.
Current Parity Findings (2026-02-18)
npm run test:sessioncurrently reports a concentrated gameplay/wizard failure set (26 gameplay-session failures in the latest intake pass).- Initial backlog intake from that pass is tracked in parity issues:
- #6 wizard command-flow prompt cancellation/modal consumption
- #7 wait/search safety counted no-op timing and messaging
- #8 pet combat sequencing/messages/RNG (dog_move/mattackm)
- #9 special-level generation RNG drift (dig_corridor/somex/makelevel)
- #10 object generation RNG ordering (rnd_attr/mksobj/mkobj/m_initweap)
- #11 gameplay map/glyph drift tied to pet/interactions
- Working rule: treat each issue above as a cluster root; avoid ad-hoc one-session fixes unless evidence shows it is truly isolated.
Further Reading
- DESIGN.md — Detailed architecture and module design
- DECISIONS.md — Design decision log with rationale
- SESSION_FORMAT.md — Session JSON format specification
- COLLECTING_SESSIONS.md — How to capture C reference sessions
- PHASE_1_PRNG_ALIGNMENT.md — The story of achieving bit-exact C-JS parity
- PHASE_2_GAMEPLAY_ALIGNMENT.md — Gameplay session alignment goals & progress
“You ascend to the next level of understanding. The strident call of a test suite echoes through the Mazes of Development. All tests pass!”