Phase 3: Multi-Depth Map Generation

“We conquered depth 1. But NetHack is a 50-level dungeon.”

Phase 1 achieved perfect map generation at depth 1: every seed produced identical terrain grids. Phase 2 extended this to gameplay: 66 turns of a human-played session with every RNG call matching the C reference.

But both phases focused on a single dungeon level. Real NetHack is about descending deeper: going downstairs, generating level 2, then level 3, then deeper still, each level building on the RNG state left by the previous one. A single misaligned call at depth 1 cascades into total divergence by depth 3.

Phase 3 extends bit-exact alignment from single levels to multi-depth dungeon generation: generating levels 1 through 5 for multiple seeds, with every RNG call at every depth matching the C reference. The work revealed subtle issues with test isolation, deterministic replay, and depth-dependent monster selection that affected testing infrastructure and core game logic alike.

Table of Contents

The Challenge

  1. The Goal – depth 1 → depth 5, all seeds
  2. The Test Infrastructure – map sessions, determinism
  3. The Baseline – where we started

The Bugs

  1. Test Isolation Failures – vision state, identity counters
  2. The Statue Bug – depth parameter missing
  3. Remaining Issues – seed1 level 2, depths 4-5

The Craft

  1. Final Result – 90.7% pass rate
  2. Lessons Learned – test infrastructure matters
  3. The Art of Binary Search Debugging

1. The Goal

Generate dungeon levels 1 through 5 for multiple test seeds (seed16, seed72, seed119, seed163, seed306), each using wizard mode level teleport to jump to each depth in sequence, and verify:

  1. Terrain grid matches (typGrid): Every wall, door, room, corridor identical to C
  2. RNG call count matches (rngCalls): Same number of random decisions per level
  3. RNG sequence matches (rng): Every call function, parameter, and result identical

The challenge: depth 3 consumes ~2400 RNG calls, depth 5 over 4000. A single divergence at depth 2 makes depth 3+ unmatchable. Debugging requires tracing through thousands of calls to find the one that’s wrong.

2. The Test Infrastructure

Map Sessions (.session.json files)

Python harness runs C NetHack in wizard mode with tmux:

# Generate map session with RNG logging
NETHACK_RNGLOG=rnglog.txt nethack -u Wizard -D
# Teleport to each depth 1-5 via wizard commands
# Capture terrain grid + RNG trace per level

Each session file contains:

{
  "version": 2,
  "seed": 16,
  "type": "map",
  "levels": [
    {
      "depth": 1,
      "typGrid": [[1,1,0,...], ...],  // 21x80 terrain
      "rngCalls": 198,
      "rng": ["rn2(5)=3 @ makelevel", ...]
    },
    ...
  ]
}

Determinism Tests

Critical requirement: regenerating the same seed must produce identical results. Module-level state leaking between tests breaks this:

// WRONG: Shared state between test runs
let _identCounter = 0;  // Never reset!

// RIGHT: Reset in initialization
export function resetIdentCounter() { _identCounter = 0; }

Test Structure

Initially used it() tests for generation, which broke when filtered:

// WRONG: Result undefined when test filtered out
it('generate maps', () => { result = generateMaps(...); });
it('check grid', () => { expect(result.grid).toEqual(...); });

// RIGHT: Use before() hook which always runs
before(() => { result = generateMaps(...); });
it('check grid', () => { expect(result.grid).toEqual(...); });

3. The Baseline

Starting point (commit c3fa9c3):

After composite filtering (commit a81ff42):

After depth 2 fixes (commits through a083820):

4. Test Isolation Failures

Problem: Running full test suite produced different results than running tests individually. Tests were bleeding state into each other.

Vision System State Leak (commit 05cbd1b)

Symptom: seed42 tests passed individually but failed after seed2_wizard_fountains in full suite.

Root cause: vision.js module-level state:

// These persisted between tests!
let viz_clear = null;
let right_ptrs_arr = null;
let left_ptrs_arr = null;

// do_clear_area() set them to instance arrays
function do_clear_area(instance) {
  viz_clear = instance.viz_clear;
  right_ptrs_arr = instance.right_ptrs_arr;
  left_ptrs_arr = instance.left_ptrs_arr;
  // ... never cleaned up!
}

Functions like couldsee() used these stale references from previous tests.

Fix: Clear module-level state after use:

function do_clear_area(instance) {
  viz_clear = instance.viz_clear;
  right_ptrs_arr = instance.right_ptrs_arr;
  left_ptrs_arr = instance.left_ptrs_arr;

  // ... do work ...

  // Clean up module state
  viz_clear = null;
  right_ptrs_arr = null;
  left_ptrs_arr = null;
}

Identity Counter Leak (commit 05cbd1b)

Symptom: Determinism tests failed - regenerating same seed produced different results.

Root cause: mkobj.js module-level counter never reset:

let _identCounter = 0;  // Monotonic ID counter for objects/monsters

export function next_ident() {
  const res = _identCounter;
  _identCounter += rnd(2);  // Consumes RNG!
  if (_identCounter === 0) _identCounter = rnd(2) + 1;
  return res;
}

Second test run started with _identCounter at 800+ instead of 0, consuming RNG calls at different times.

Fix: Reset in initialization:

export function resetIdentCounter() { _identCounter = 0; }

// Called from init_objects() in o_init.js
export function init_objects() {
  initObjectData();
  resetIdentCounter();  // Reset for each generation
  randomize_gem_colors();
  shuffle_all();
  // ...
}

Test Structure Fix (commit 05cbd1b)

Symptom: “Cannot read properties of undefined (reading ‘grid’)” when running with test filters.

Root cause: Map generation in it() test that Node’s test runner could skip:

let result;
it('generate maps', () => {
  result = generateMapsWithRng(seed, maxDepth);  // Skipped when filtered!
});
it('check depth 1 typGrid', () => {
  expect(result.grids[1]).toEqual(...);  // result is undefined!
});

Fix: Use before() hook which always runs:

let result;
before(() => {
  // Always runs, even when individual tests filtered
  result = generateMapsWithRng(seed, maxDepth);
});
it('check depth 1 typGrid', () => {
  expect(result.grids[1]).toEqual(...);  // result exists
});

Impact of test isolation fixes: +134 tests passing (1097 → 1231)

5. The Statue Bug

The Investigation (2026-02-09, ~5 hours)

Depth 3 diverged at RNG call 1094:

Calls 0-1093 matched perfectly. Then complete divergence.

Hypothesis 1: somexyspace returns null

Looking at fill_ordinary_room statue placement:

if (!rn2(20)) {  // Statue check (1/20 chance)
  const pos = somexyspace(map, croom);  // Find position
  if (pos) {
    const statue = mksobj(STATUE, true, false);  // Create statue
  }
}

Theory: JS’s somexyspace returns null while C’s succeeds, so JS never calls mksobj.

Evidence against: Debug logging showed both consumed identical RNG:

If somexyspace failed, it would retry with more RNG calls. It didn’t.

Hypothesis 2: mksobj doesn’t call rndmonnum

Theory: JS’s mksobj is broken and doesn’t select a monster for the statue.

Testing:

// Isolated test
const statue = mksobj(STATUE, true, false);
// Result: 11 RNG calls consumed including rndmonst_adj!
// JS[198]: rnd(2)=2 @ next_ident
// JS[199]: rn2(3)=2 @ rndmonst_adj
// JS[200]: rn2(4)=3 @ rndmonst_adj
// ...

Evidence against: mksobj works perfectly in isolation. The bug is elsewhere.

The Breakthrough

Comparing isolated test vs actual game:

Why different? Different monster pools! Let me check what depth rndmonnum uses…

// js/makemon.js
export function rndmonnum(depth) {
  return rndmonst_adj(0, 0, depth || 1);  // defaults to 1!
}

// js/mkobj.js - mksobj_init
case ROCK_CLASS:
  if (od.name === 'statue') {
    obj.corpsenm = rndmonnum();  // ← NO DEPTH PARAMETER!
  }

Root cause found: Statue creation calls rndmonnum() with no depth, defaulting to depth=1. At depth=1, the monster pool is smaller (fewer eligible monsters at low difficulty). This produces a different RNG sequence.

The Fix (commit d683995)

One line change:

  case ROCK_CLASS:
    if (od.name === 'statue') {
-     obj.corpsenm = rndmonnum();
+     obj.corpsenm = rndmonnum(_levelDepth);
    }

The _levelDepth module variable tracks current depth. Just pass it through.

The Result

Before fix:

Last matching call: 1093
First divergence at: 1094
JS[1094]: rn2(3)=2
C[1102]: rn2(2)=0 @ rndmonst_adj

After fix:

Last matching call: 1094
First divergence at: 1095
JS[1094]: rn2(2)=0      ← NOW MATCHES!
JS[1095]: rn2(5)=1
C[1103]: rn2(5)=1 @ rndmonst_adj

The “divergence” at 1095 is just formatting (JS logs don’t include source location). The function calls and values are identical.

Impact: +2 tests passing (1231 → 1233), seed163 depth 3 achieves 100% RNG alignment

6. Remaining Issues

seed16 depth 3

Still shows divergence after statue fix. Different issue than seed163. Needs separate investigation.

seed1 level 2 (interface-4i6)

Symptom: Steps 0-66 all pass (descent + level 2 generation RNG matches perfectly). Step 67 fails with completely different level layout.

Analysis:

This is an algorithmic divergence, not RNG mismatch. The same RNG sequence is interpreted differently:

Likely: theme room selection logic differs, or shop placement logic differs. Needs investigation of themeroom assignment and shop creation.

Depths 4-5

Blocked by depth 3 issues. Now unblocked for most seeds. Need to investigate remaining divergences.

7. Final Result

Test Results (as of 2026-02-09):

Map Generation Alignment:

Determinism: All 5 map sessions pass determinism validation (regenerating same seed produces identical results).

Gameplay: seed2_wizard_fountains (37 steps) and seed42 (12 steps) fully passing.

8. Lessons Learned

Test Infrastructure is Critical

Without proper test isolation, you’re debugging phantoms. The vision state leak made tests fail mysteriously depending on run order. Hours wasted debugging the wrong problem.

Rule: Module-level mutable state must be reset between tests.

Determinism Tests Catch State Leaks

The identity counter leak was invisible in single test runs. Only the determinism test (regenerate same seed twice, compare results) caught it.

Rule: Always test that seed N produces identical results on runs 1 and 2.

Default Parameters Hide Bugs

depth || 1 seemed reasonable for a default, but it was wrong. The bug lived silently for months because depth 1 testing didn’t expose it.

Rule: Be suspicious of defaults. Pass explicit parameters.

Binary Search is Your Best Friend

With 2400+ RNG calls per level, linear search is hopeless. Binary search finds the exact divergence call in seconds:

// Check if calls 0-mid all match
// If yes, search mid+1 to end
// If no, search 0 to mid-1

Rule: Build binary search into your debugging tools from day one.

RNG Traces are Gold

Without C’s annotated RNG logs showing function name and source location for every call, debugging would be impossible. You’d just see “numbers don’t match” with no clue why.

Rule: Invest in trace infrastructure early. It pays for itself 100x over.

One-Line Bugs Can Take Days

The statue bug was literally one missing parameter. Finding it took 5 hours of investigation:

Rule: Persistence pays off. Keep eliminating hypotheses until you find the truth.

9. The Art of Binary Search Debugging

When you have 2400 RNG calls and one is wrong, manual inspection is hopeless. Binary search is mandatory.

The Algorithm

Given two RNG traces (JS and C, filtered to remove composite/midlog entries):

let left = 0, right = jsRng.length - 1, lastMatch = -1;

while (left <= right) {
  const mid = Math.floor((left + right) / 2);

  // Check if calls 0 to mid all match
  let allMatch = true;
  for (let i = 0; i <= mid; i++) {
    if (jsRng[i] !== cRng[i]) {
      allMatch = false;
      break;
    }
  }

  if (allMatch) {
    lastMatch = mid;
    left = mid + 1;  // Search upper half
  } else {
    right = mid - 1;  // Search lower half
  }
}

console.log('Last match:', lastMatch);
console.log('First divergence:', lastMatch + 1);

Result: Pinpoints exact divergence in O(log N) comparisons.

The Context Window

Once you find the divergence point, show context:

console.log('Last 3 matching calls:');
for (let i = lastMatch - 2; i <= lastMatch; i++) {
  console.log(`JS[${i}]: ${jsRng[i]}`);
}

console.log('\nFirst diverging call:');
console.log(`JS[${lastMatch + 1}]: ${jsRng[lastMatch + 1]}`);
console.log(`C[${lastMatch + 1}]: ${cRng[lastMatch + 1]}`);

console.log('\nNext 5 calls in each:');
for (let i = lastMatch + 1; i < lastMatch + 6; i++) {
  console.log(`JS[${i}]: ${jsRng[i]}`);
  console.log(`C[${i}]: ${cRng[i]}`);
  console.log();
}

This shows:

Real Example: The Statue Bug

Last matching calls:
JS[1091]: rn2(7)=2
JS[1092]: rn2(3)=1
JS[1093]: rnd(2)=1

First diverging call:
JS[1094]: rn2(3)=2
C[1102]: rn2(2)=0 @ rndmonst_adj(makemon.c:1712)

Analysis:

Conclusion: JS used wrong depth, producing different monster pool, different RNG sequence.

The source annotations (@ rndmonst_adj(makemon.c:1712)) were the key. Without them, you’d just see “rn2(3) != rn2(2)” with no idea what code was involved.


Looking Forward

Phase 3 established:

Next challenges:

The foundation is solid. The methodology works. The hard problems are behind us. Now it’s “just” a matter of finding and fixing the remaining edge cases.

One bug at a time, one depth at a time, one RNG call at a time.