344 lines
12 KiB
ReStructuredText
344 lines
12 KiB
ReStructuredText
Z-Machine Garbled Output Investigation
|
|
========================================
|
|
|
|
Context
|
|
-------
|
|
|
|
Branch: ``zmachine-zork1-integration``
|
|
|
|
The hybrid z-machine interpreter runs Zork 1 but produces garbled output:
|
|
|
|
- Room name "West of House" prints correctly
|
|
- Room description shows "er " instead of full text
|
|
- After "look" command, output starts correctly then becomes garbage with ? characters
|
|
- "There is a small mailbox here." prints correctly
|
|
|
|
Investigation
|
|
-------------
|
|
|
|
The z-string decoder (ZStringTranslator, ZCharTranslator, ZsciiTranslator in
|
|
zstring.py) is correct. The abbreviation table loads correctly. All 96
|
|
abbreviations decode to the right text.
|
|
|
|
The diagnostic script (``scripts/debug_zstrings.py``) revealed the problem:
|
|
|
|
- ``op_print_paddr paddr=0x0051 (byte=0x00a2)`` decodes to "er "
|
|
- Address 0x00a2 is in the abbreviation table area, not where the room
|
|
description should be
|
|
- The z-string at 0x00a2 really is "er " - the decoder is right, the address
|
|
is wrong
|
|
|
|
Root cause: bugs in ``zobjectparser.py`` property parsing.
|
|
|
|
Bugs Found
|
|
----------
|
|
|
|
All bugs are in ``zobjectparser.py``.
|
|
|
|
Bug 1: Off-by-one in shortname skip (ROOT CAUSE)
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``get_prop_addr_len()`` line 392::
|
|
|
|
addr += 2 * self._memory[addr]
|
|
|
|
This skips the shortname data but not the length byte. Compare with
|
|
``get_all_properties()`` which does it correctly::
|
|
|
|
shortname_length = self._memory[addr]
|
|
addr += 1 # skip length byte
|
|
addr += 2 * shortname_length # skip shortname data
|
|
|
|
The off-by-one means the property table scan starts 1 byte too early, reading
|
|
the last byte of the shortname z-string as if it were a property header. This
|
|
corrupts every subsequent property lookup.
|
|
|
|
Bug 2: BitField slice wrong width for V3 property number
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``get_prop_addr_len()`` line 399 and ``get_all_properties()`` line 446::
|
|
|
|
pnum = bf[4:0]
|
|
|
|
BitField slice ``[4:0]`` has start > stop, so it swaps to ``[0:4]`` = 4 bits.
|
|
But V3 property numbers are 5 bits (bits 0-4). Should be ``bf[0:5]``.
|
|
|
|
This means property numbers 16-31 are read as 0-15 (bit 4 is lost).
|
|
|
|
Bug 3: BitField slice wrong width for V3 property size
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``get_prop_addr_len()`` line 400, ``get_all_properties()`` line 447, and
|
|
``get_property_length()`` line 555::
|
|
|
|
size = bf[7:5] + 1
|
|
|
|
BitField slice ``[7:5]`` swaps to ``[5:7]`` = 2 bits. But V3 property sizes are
|
|
3 bits (bits 5-7). Should be ``bf[5:8]``.
|
|
|
|
This means properties with size > 4 bytes are read with the wrong size (e.g.,
|
|
size 5 reads as size 1), causing the property iterator to skip too few bytes.
|
|
|
|
Bug 4: Same BitField slice issue in V5 path
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``get_prop_addr_len()`` lines 409, 413::
|
|
|
|
pnum = bf[5:0] # swaps to [0:5] = 5 bits, should be [0:6] = 6 bits
|
|
size = bf2[5:0] # swaps to [0:5] = 5 bits, should be [0:6] = 6 bits
|
|
|
|
Compare with ``get_all_properties()`` V5 path which correctly uses ``bf[0:6]``.
|
|
|
|
Also in ``get_property_length()`` line 563: same ``bf2[5:0]`` issue.
|
|
|
|
How to Fix
|
|
----------
|
|
|
|
All fixes are in ``zobjectparser.py``:
|
|
|
|
1. Line 392: change ``addr += 2 * self._memory[addr]`` to::
|
|
|
|
addr += 1 + 2 * self._memory[addr]
|
|
|
|
2. Lines 399, 446: change ``bf[4:0]`` to ``bf[0:5]``
|
|
|
|
3. Lines 400, 447, 555: change ``bf[7:5]`` to ``bf[5:8]``
|
|
|
|
4. Lines 409, 413, 563: change ``bf[5:0]`` to ``bf[0:6]``
|
|
|
|
The pattern: BitField uses Python slice semantics ``[start:stop)`` where the
|
|
number of bits extracted is ``stop - start``. The buggy code used ``[high:low]``
|
|
notation thinking it would extract bits high through low inclusive, but the swap
|
|
logic makes it extract fewer bits than intended.
|
|
|
|
This is the same class of bug as the write_global fix in commit e5329d6
|
|
(BitField slice extracting 7 bits not 8).
|
|
|
|
Diagnostic Tools
|
|
----------------
|
|
|
|
``scripts/debug_zstrings.py`` traces all z-string decoding with caller info.
|
|
|
|
Run with::
|
|
|
|
echo -e "look\nquit\ny" | python3 scripts/debug_zstrings.py 2>/tmp/trace.log
|
|
|
|
Trace goes to stderr, game output to stdout.
|
|
|
|
Session 2: Additional Bugs Found and Fixed
|
|
-------------------------------------------
|
|
|
|
Bug 5: Same off-by-one in get_next_property
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
``get_next_property()`` had the same shortname skip bug as Bug 1::
|
|
|
|
addr += 2 * self._memory[addr]
|
|
|
|
Same fix: ``addr += 1 + 2 * self._memory[addr]``
|
|
|
|
Also had the same BitField slice bugs (Bugs 2, 4) for the V3/V5
|
|
property number return values.
|
|
|
|
Bug 6: get_property_length reads wrong byte for V5 two-byte headers
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In the V5 two-byte property header path, the code read the size from
|
|
the first header byte (which contains the property number) instead of
|
|
the second header byte (which contains the size)::
|
|
|
|
bf2 = BitField(self._memory[data_address - 2]) # WRONG: first byte
|
|
size = bf2[0:6]
|
|
|
|
The second byte was already read into ``bf`` at ``data_address - 1``.
|
|
Fix: use ``bf[0:6]`` directly instead of creating ``bf2``.
|
|
|
|
Bug 7: op_dec_chk and op_inc_chk use unsigned comparison
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``zcpu.py``, both ``op_dec_chk`` and ``op_inc_chk`` compared 16-bit
|
|
values as unsigned integers instead of signed. The Z-machine spec
|
|
requires signed comparison.
|
|
|
|
Effect: when a counter decremented from 0 to 65535 (wrapping), the check
|
|
``65535 < 0`` evaluated false (unsigned), causing infinite loops in Z-code
|
|
that counted down past zero. This made Zork 1's word-printing loop run
|
|
forever, dumping raw memory bytes to the screen as characters.
|
|
|
|
Fix: wrap both operands with ``_make_signed()`` before comparing.
|
|
|
|
What's Fixed Now
|
|
-----------------
|
|
|
|
After all fixes:
|
|
|
|
- Room descriptions display correctly on startup
|
|
- The infinite garbage output loop is fixed (dec_chk now works with signed)
|
|
- ``just check`` passes (516 tests)
|
|
|
|
What's Still Broken
|
|
--------------------
|
|
|
|
Commands like "look", "open mailbox", "quit" all produce::
|
|
|
|
You used the word "look" in a way that I don't understand.
|
|
|
|
The parser finds words in the dictionary (dict_addr is correct, e.g.
|
|
"look" at 0x44f2) and stores them in the parse buffer correctly. But the
|
|
game's Z-code parser can't match them to valid verb syntax.
|
|
|
|
Instruction tracing after ``op_sread`` shows the game's parser:
|
|
|
|
1. Reads parse buffer correctly (loadw(0x2551, 1) -> dict_addr 17650)
|
|
2. Checks dict_addr against specific word addresses (je comparisons) -
|
|
these are special-word checks, none match
|
|
3. Calls a function with the dict_addr that reads ``loadb(17650, 4)`` to
|
|
get the word's flags byte (0x41) from the dictionary entry
|
|
4. Tests individual flag bits (16, 64, 8, 32, 128, 4) against 0x41
|
|
5. Bit 6 (64) matches - the function extracts data from byte 5 (0xc2)
|
|
6. Despite finding the word type, the parser still rejects the command
|
|
|
|
The bug is likely in how the parser uses the extracted word-type data to
|
|
look up verb syntax. Possible areas:
|
|
|
|
- The ``and`` / ``je`` sequence that extracts a sub-field from the flags
|
|
byte may produce the wrong value
|
|
- The verb number extracted from byte 5 (0xc2 = 194) may need signed
|
|
interpretation or bit masking
|
|
- Another opcode (``op_jl``, ``op_jg``, ``op_sub``) may have a similar
|
|
unsigned-vs-signed bug
|
|
- The ``call`` opcode may have incorrect return value handling
|
|
|
|
Debugging approach for next session:
|
|
|
|
1. Extend the instruction trace past 100 steps to see what happens after
|
|
the word-type function returns
|
|
2. Compare the execution trace against dfrotz instruction trace (add
|
|
``-t`` flag to dfrotz for tracing)
|
|
3. Check ``op_jl`` and ``op_jg`` for the same unsigned comparison bug
|
|
4. Check ``op_sub`` and ``op_mul`` for signed arithmetic issues
|
|
5. Look at how the ``and`` result (step 21 in trace) flows through to
|
|
the verb syntax lookup
|
|
|
|
Diagnostic Tools
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
``scripts/debug_zstrings.py`` traces all z-string decoding with caller info.
|
|
|
|
Run with::
|
|
|
|
echo -e "look\nquit\ny" | python3 scripts/debug_zstrings.py 2>/tmp/trace.log
|
|
|
|
Trace goes to stderr, game output to stdout.
|
|
|
|
To add instruction tracing after sread, add to ``zcpu.py``::
|
|
|
|
# In step(), after decoding:
|
|
print(f"PC={pc:#06x} {handler_name}({', '.join(str(a) for a in args)})",
|
|
file=sys.stderr)
|
|
|
|
Session 3: Parser Fixed - Interpreter Now Playable
|
|
---------------------------------------------------
|
|
|
|
Bug 8: op_jl uses unsigned comparison
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``zcpu.py``, ``op_jl`` (jump if less-than) compared raw 16-bit values
|
|
without ``_make_signed()``, the same class of bug as Bug 7 (op_dec_chk
|
|
and op_inc_chk).
|
|
|
|
Additionally, ``op_jl`` had a non-standard signature::
|
|
|
|
def op_jl(self, a, *others):
|
|
for b in others:
|
|
if a < b:
|
|
...
|
|
|
|
The Z-machine spec says JL takes exactly 2 operands. Compare with
|
|
``op_jg`` which correctly uses ``def op_jg(self, a, b)``.
|
|
|
|
Fix: add ``_make_signed()`` calls to both operands, normalize signature
|
|
to ``(self, a, b)`` matching ``op_jg``.
|
|
|
|
This alone did not fix the parser - needed in combination with Bug 9.
|
|
|
|
Bug 9: finish_routine return value storage - THE PARSER BUG
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Two sub-bugs in ``finish_routine()`` in ``zstackmanager.py``:
|
|
|
|
**Range check used decimal instead of hex**::
|
|
|
|
if result_variable < 10: # WRONG: should be 0x10 (16)
|
|
self.set_local_variable(result_variable, result)
|
|
|
|
Z-machine variable numbering: 0 = stack push, 1-15 = locals, 16+ = globals.
|
|
The check ``< 10`` meant local variables 10-15 were written as globals.
|
|
|
|
**Missing index adjustment for 1-based variable numbering**::
|
|
|
|
self.set_local_variable(result_variable, result) # WRONG
|
|
|
|
Z-machine variables are 1-based (variable 1 = first local), but
|
|
``set_local_variable()`` uses 0-based indexing (index 0 = first local).
|
|
Should be ``set_local_variable(result_variable - 1, result)``.
|
|
|
|
Both ``_read_variable()`` and ``_write_result()`` in ``zcpu.py``
|
|
correctly used ``addr - 1`` and ``< 0x10``, but ``finish_routine()``
|
|
in ``zstackmanager.py`` did not.
|
|
|
|
Effect: when a function returned a value to a local variable, it went
|
|
to the wrong slot. The caller read the correct slot and got stale data
|
|
(0 or the initialization value from the function call).
|
|
|
|
This is why every verb was recognized by the dictionary but rejected by
|
|
the parser. The word-type checker function returned the correct type,
|
|
but the return value landed in the wrong local variable. The parser saw
|
|
0 and could not match any syntax pattern.
|
|
|
|
Room descriptions worked because those code paths used stack returns
|
|
(variable 0) or global returns, not local variable returns.
|
|
|
|
Fix: change ``< 10`` to ``< 0x10``, add ``- 1`` to the
|
|
``set_local_variable`` call.
|
|
|
|
Bug 10: zlexer does not truncate words for dictionary lookup
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In ``zlexer.py``, V3 dictionary entries store words truncated to 6
|
|
characters (4 bytes of Z-string = 2 Z-words). V4+ uses 9 characters.
|
|
|
|
``parse_input()`` looked up full-length input words, so "mailbox" (7
|
|
characters) did not match "mailbo" (6 characters) in the dictionary.
|
|
|
|
Fix: truncate lookup key to ``6 if version <= 3 else 9`` before
|
|
``dict.get()``. The original word is preserved in the return list for
|
|
correct parse buffer positions.
|
|
|
|
What's Fixed Now
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
After all fixes:
|
|
|
|
- All Zork 1 commands work: look, open mailbox, read leaflet, go north,
|
|
inventory, quit, take, drop
|
|
- Navigation, object manipulation, multi-word commands, game logic all
|
|
functional
|
|
- The interpreter is playable
|
|
|
|
Diagnostic Tools
|
|
----------------
|
|
|
|
``scripts/debug_zstrings.py`` traces all z-string decoding with caller info.
|
|
|
|
Run with::
|
|
|
|
echo -e "look\nquit\ny" | python3 scripts/debug_zstrings.py 2>/tmp/trace.log
|
|
|
|
Trace goes to stderr, game output to stdout.
|
|
|
|
To add instruction tracing after sread, add to ``zcpu.py``::
|
|
|
|
# In step(), after decoding:
|
|
print(f"PC={pc:#06x} {handler_name}({', '.join(str(a) for a in args)})",
|
|
file=sys.stderr)
|