mud/docs/how/zvm-embedding-audit.rst

=============================
zvm embedding audit — Z-machine interpreter feasibility
=============================

zvm is a Python Z-machine interpreter (by Ben Collins-Sussman) being evaluated
for embedding in mudlib to run interactive fiction games over telnet. this audit
covers architecture, isolation requirements, and modification paths. compared
against the viola audit for apples-to-apples decision-making.


1. global state — mostly clean, two leaks
==========================================

state is instance-based. ``ZMachine.__init__`` wires everything together::

  ZMemory(story)         -- memory
  ZStringFactory(mem)    -- string decoding
  ZObjectParser(mem)     -- object tree
  ZStackManager(mem)     -- call/data stacks
  ZOpDecoder(mem, stack) -- instruction decoder
  ZStreamManager(mem, ui) -- I/O streams
  ZCpu(mem, opdecoder, stack, objects, string, streams, ui) -- CPU

dependency graph flows one way: ZCpu depends on everything else, everything
else depends on ZMemory. no circular dependencies. clean layering.

two global leaks:

- zlogging.py — executes at import time. opens debug.log and disasm.log in cwd,
  sets root logger to DEBUG. all instances share the same loggers
- zcpu.py uses ``random.seed()`` / ``random.randint()`` — global PRNG state.
  multiple instances would interfere with each other's randomness

both fixable with ~20 lines. the logging can be made instance-scoped, the PRNG
replaced with ``random.Random()`` instances.

compared to viola's 13/18 modules with mutable globals, this is dramatically
better. multiple ZMachine instances in one process is structurally possible.


2. IO boundary — excellent, purpose-built for embedding
========================================================

this is zvm's strongest feature. the README explicitly states the design goal:
"no user interface. meant to be used as the backend in other programs."

IO is fully abstracted via a four-component ``ZUI`` object (zui.py)::

  ZUI(audio, screen, keyboard_input, filesystem)

each component is an abstract base class with NotImplementedError stubs:

- ZScreen (zscreen.py) — write(), split_window(), select_window(),
  set_cursor_position(), erase_window(), set_text_style(), set_text_color()
- ZInputStream (zstream.py) — read_line(), read_char() with full signatures
  for timed input, max length, terminating characters
- ZFilesystem (zfilesystem.py) — save_game(), restore_game(),
  open_transcript_file_for_writing/reading()
- ZAudio (zaudio.py) — play_bleep(), play_sound_effect()

trivialzui.py is the reference stdio implementation showing how to subclass.

for MUD embedding: implement ZScreen.write() to push to telnet session,
ZInputStream.read_line() to receive from telnet reader, ZFilesystem to store
saves in SQLite. natural fit.

compared to viola where pygame is hardwired and you'd need to create a
TelnetInput class from scratch, zvm hands you the interface contract.


3. error paths — server-friendly
=================================

zero sys.exit() calls in the zvm/ package. only sys.exit() is in the CLI
runner run_story.py, which is appropriate.

clean exception hierarchy:

- ZMachineError (zmachine.py)
- ZCpuError -> ZCpuIllegalInstruction, ZCpuDivideByZero, ZCpuNotImplemented
- ZMemoryError -> ZMemoryIllegalWrite, ZMemoryOutOfBounds, ZMemoryBadMemoryLayout
- ZObjectError -> ZObjectIllegalObjectNumber, ZObjectIllegalAttributeNumber
- ZStackError -> ZStackNoRoutine, ZStackNoSuchVariable, ZStackPopError
- QuetzalError -> QuetzalMalformedChunk, QuetzalMismatchedFile

CPU run loop does not catch exceptions — errors propagate to the caller.
correct behavior for embedding.

one weakness: some opcodes have bare ``assert`` statements that would be
stripped with ``python -O``.

compared to viola's 8+ sys.exit() calls via error.fatal(), this is exactly
what you want for a server.


4. version coverage — V1-V5 declared, V3-V5 structural
========================================================

zmemory.py handles v1-v5 with version-switched code throughout. v6-v8 are
not supported (raises ZMemoryUnsupportedVersion).

opcode table (zcpu.py) has version annotations like ``(op_call_2s, 4)``
meaning "available from v4 onward". systematic and well-done.

v1-v3 object model uses 1-byte pointers (max 255 objects), v4-v5 uses
2-byte (max 65535). properly handled in zobjectparser.py.

test suite only uses curses.z5 — v1-v3 support exists structurally but has
untested code paths.

compared to viola's solid V1-V5 plus partial V6 and theoretical V7-V8,
zvm is narrower but the architecture is cleaner where it does exist.


5. input model — abstracted but unimplemented
===============================================

line input (read_line):
  interface complete — full signature with original_text, max_length,
  terminating_characters, timed_input_routine, timed_input_interval.
  trivial implementation handles basic line editing, ignores timed input.

single char (read_char):
  interface complete — signature includes timed_input_routine and
  timed_input_interval. CPU explicitly raises ZCpuNotImplemented if
  time/routine are nonzero.

timed input:
  feature flag system exists (``features["has_timed_input"] = False``).
  completely unimplemented at CPU level.

**critical problem**: op_sread (v1-v3) has an empty body — silently does
nothing. op_sread_v4 and op_aread (v5) both raise ZCpuNotImplemented.
the interpreter cannot accept text input. it cannot reach the first prompt
of any game.

compared to viola where input works and needs an adapter, zvm has the
better interface design but no working implementation behind it.


6. step execution mode — not available, easy refactor
======================================================

run loop in zcpu.py::

  def run(self):
      while True:
          (opcode_class, opcode_number, operands) = self._opdecoder.get_next_instruction()
          implemented, func = self._get_handler(opcode_class, opcode_number)
          if not implemented:
              break
          func(self, *operands)

tight while-True with no yielding. no step() method, no async support, no
callbacks between instructions, no instruction count limit.

the loop body is clean and self-contained though. extracting step() is a ~5
line change — pull the body into a method, call it from run().

for async MUD embedding, options are:

1. extract step() and call from an async loop with awaits at IO points
2. run each ZMachine in a thread (viable since state is instance-based)

compared to viola's similar tight loop, the refactor here is actually easier
because the body is simpler (no interrupt handling, no recursive execloop).


7. memory/cleanup — clean, no leak risk
=========================================

memory is a bytearray, fixed size, bounded by story file.
ZMachine.__init__ creates two copies (pristine + working).

ZStackManager uses a Python list as call stack. frames are popped in
finish_routine(). bounded by Z-machine stack design.

ZStringFactory, ZCharTranslator, ZLexer all pre-load from story file at init.
bounded by file content.

ZObjectParser does not cache anything — every access reads directly from
ZMemory.

one minor leak: quetzal.py QuetzalParser stores self._file and closes it
manually but does not use a context manager. exception during parsing would
leak the file handle.

compared to viola's 5+ unbounded growth patterns (undo stack, command history,
routine cache, word cache, object caches), zvm is dramatically cleaner.


8. object tree API — complete read, mostly complete write
==========================================================

zobjectparser.py public API:

read:
- get_attribute(objectnum, attrnum) — single attribute (0/1)
- get_all_attributes(objectnum) — list of all set attribute numbers
- get_parent(objectnum) — parent object number
- get_child(objectnum) — first child
- get_sibling(objectnum) — next sibling
- get_shortname(objectnum) — object's short name as string
- get_prop(objectnum, propnum) — property value
- get_prop_addr_len(objectnum, propnum) — property address and length
- get_all_properties(objectnum) — dict of all properties
- describe_object(objectnum) — debug pretty-printer

write:
- set_parent(objectnum, new_parent_num)
- set_child(objectnum, new_child_num)
- set_sibling(objectnum, new_sibling_num)
- insert_object(parent_object, new_child) — handles unlinking
- set_property(objectnum, propnum, value)

missing: set_attribute() and clear_attribute(). the parser can read attributes
but cannot set them. would need to be added.

bug in insert_object(): sibling walk loop at line 273 never advances current
or prev — would infinite-loop. needs fix.

**critical for MUD embedding**: many CPU opcodes that USE the parser are
unimplemented at the CPU level:

- op_test_attr, op_set_attr, op_clear_attr — not wired up
- op_get_sibling — not wired up (parser method exists)
- op_jin (test parent) — not wired up
- op_remove_obj, op_print_obj — not wired up
- op_get_prop_addr, op_get_next_prop, op_get_prop_len — not wired up

the parser infrastructure is there and correct. the CPU just doesn't call it.

compared to viola's zcode/objects.py which has working accessors wired through
all opcodes, zvm has a better parser design but the plumbing is incomplete.


9. save/restore — parse works, write is stubbed
=================================================

quetzal.py QuetzalParser can parse Quetzal save files:
- IFhd chunks (metadata) — working
- CMem chunks (compressed memory) — working
- UMem chunks (uncompressed memory) — has a bug (wrong attribute name)
- Stks chunks (stack frames) — working

QuetzalWriter is almost entirely stubbed. all three data methods return "0".
file writing logic exists but writes nonsense.

all save/restore CPU opcodes (op_save, op_restore, op_save_v4, op_restore_v4,
op_save_v5, op_restore_v5, op_save_undo, op_restore_undo) raise
ZCpuNotImplemented.

alternative for MUD: since ZMemory is a bytearray, could snapshot/restore
raw memory + stack state directly without Quetzal format.

compared to viola's working quetzal implementation, zvm's save system needs
significant work.


10. test suite — minimal
=========================

5 registered test modules:

- bitfield_tests.py — 6 tests, BitField bit manipulation
- zscii_tests.py — 4 tests, string encoding/decoding
- lexer_tests.py — 3 tests, dictionary parsing
- quetzal_tests.py — 2 tests, save file parsing
- glk_tests.py — 7 tests, requires compiled CheapGlk .so

not tested at all: ZCpu (no opcode tests), ZMemory, ZObjectParser,
ZStackManager, ZOpDecoder, ZStreamManager.

all tests that need a story file use stories/curses.z5 with hardcoded paths.

compared to viola which has no tests at all, zvm has some but they don't
cover the critical subsystems.


11. dependencies — zero
========================

pure stdlib. setup.py declares no dependencies. python >= 3.6.

uses: logging, random, time, itertools, re, chunk, os, sys.
ctypes only for optional Glk native integration.

both viola and zvm are pure stdlib. no advantage either way.


12. completeness — the dealbreaker
====================================

of ~108 Z-machine opcodes, zvm implements approximately 46. the remaining ~62
are stubbed with ZCpuNotImplemented or have empty bodies.

unimplemented opcodes include fundamentals:

- ALL input opcodes (op_sread, op_aread) — cannot accept player input
- ALL save/restore opcodes — cannot save or load games
- critical object opcodes (test_attr, set_attr, clear_attr, get_sibling,
  remove_obj, print_obj at CPU level)
- many branch/comparison ops
- string printing variants

the interpreter cannot execute any real interactive fiction game to
completion. it cannot reach the first prompt of Zork.


verdict — comparison with viola
================================

+---------------------+--------------+----------------+
| criterion           | zvm          | viola          |
+---------------------+--------------+----------------+
| IO abstraction      | excellent    | needs adapter  |
| global state        | mostly clean | deeply tangled |
| multi-instance      | structurally | process-only   |
| error handling      | exceptions   | sys.exit()     |
| memory leaks        | none         | 5+ patterns    |
| object tree parser  | complete     | complete       |
| object tree opcodes | ~half wired  | all wired      |
| opcode coverage     | ~46/108      | all V1-V5      |
| can run a game      | NO           | YES            |
| input handling      | abstracted   | working        |
| save/restore        | parse only   | working        |
| dependencies        | zero         | zero           |
| tests               | minimal      | none           |
| maintenance         | abandoned    | abandoned      |
+---------------------+--------------+----------------+

zvm has the architecture you'd want. viola has the implementation you'd need.

zvm is a well-designed skeleton — clean IO abstraction, instance-based state,
proper exceptions, no memory leaks. but it's roughly half-built. finishing the
62 missing opcodes is weeks of work equivalent to writing a new interpreter,
except you're also debugging someone else's partial implementation.

viola is a working interpreter with terrible architecture for embedding —
global state everywhere, sys.exit() in error paths, pygame hardwired. but it
can run Zork right now. the refactoring targets are known and bounded.


pragmatic path
===============

the "moldable world" vision (levels 3-5 from the design discussion) requires
being inside the interpreter with access to the object tree. both interpreters
have the parser infrastructure for this.

option A — fix viola's embedding problems:
  1. patch error.fatal() to raise (small)
  2. swap pygame IO for telnet adapter (medium)
  3. add cleanup hooks for caches (small)
  4. subprocess isolation handles global state (free)
  total: working IF in a MUD, with known limitations on multi-instance

option B — finish zvm's implementation:
  1. implement ~62 missing opcodes (large)
  2. fix insert_object bug (small)
  3. add set_attribute/clear_attribute to parser (small)
  4. complete save writer (medium)
  total: clean embeddable interpreter, but weeks of opcode work first

option C — write our own interpreter:
  designed for embedding from day one. state is a first-class object. object
  tree is an API. multiple games in one process. but it's the longest path and
  testing against real games is the hard part.

option D — hybrid:
  use zvm's architecture (ZUI interface, exception model, instance-based state)
  as the skeleton. port viola's working opcode implementations into it. gets
  the clean design with the working code. medium effort, high reward.

the hybrid path is probably the most interesting. zvm got the hard design
decisions right. viola got the hard implementation work done. merging the two
is less work than either finishing zvm or refactoring viola.