Research Synthesis - Editor-Agnostic CLI Collaboration

THE CORE PROBLEM:

Zed and VSCode have beautiful real-time collaboration. But they lock you into their editors. If you're a vim/helix/kakoune user and want to pair program with a friend, you shouldn't have to make them switch editors. The goal: divorce collaborative editing from any specific editor.

EXISTING APPROACHES ANALYZED:

1. Terminal Multiplexing (upterm, tmate, tmux sharing)

   How it works: Share a PTY over the network. Everyone sees the same terminal output, keystrokes forwarded to the shell.

   Upterm specifically: Reverse SSH tunnel to a central server, clients connect through it. MultiWriter pattern broadcasts output to all connected clients.

   Pros: Works TODAY with any CLI editor. Zero editor integration needed. Good for "let me show you something" pair programming.

   Cons: No concurrent editing (everyone's typing goes to same shell). No offline. No semantic awareness. Last keystroke wins. Not true collaborative editing.

   Verdict: Great for terminal screenshare, not for document collaboration.

2. File-Level Sync (VSCode LiveShare style)

   How it works: Host owns the workspace. Guests get proxied file access. SSH protocol with relay fallback.

   Not actually CRDT-based - more like remote desktop for code.

   Sessions expire after 24 hours. P2P when possible, Microsoft relay otherwise.

   Verdict: Doesn't solve editor-agnostic problem. Guests are still locked to host's environment.

3. CRDT-Based Document Sync (Zed, instant.nvim)

   How it works: Each character gets a unique ID. Operations are "insert after ID xyz" not "insert at position 5". Concurrent edits automatically merge correctly.

   Zed's architecture: Anchors (logical positions), tombstone deletions, Lamport timestamps, version vectors, per-user undo maps. Server for auth/discovery, CRDT for document state.

   instant.nvim: Pure Lua implementation for Neovim. WebSocket server routes messages. Position IDs (tombstone vector clocks) for conflict-free ordering.

   Key insight from instant.nvim: 70% of the code is editor-agnostic (transport + CRDT algorithm). Only 30% is neovim-specific (buffer events, manipulation, cursor display).

THE PROPOSED ARCHITECTURE:

CRDT Daemon + Thin Editor Adapters

The daemon handles all the hard parts:
- CRDT text buffer (using cola or diamond-types)
- Network sync (WebSocket for remote, Unix socket for local)
- Session management
- Peer discovery/auth

Each editor gets a minimal adapter that:
1. Hooks into buffer change events
2. Serializes changes as (offset, length, text)
3. Sends to daemon
4. Receives remote operations from daemon
5. Applies changes to local buffer
6. Optionally: displays peer cursors

Why this split works:
- Solving CRDT correctly is hard. Do it once in the daemon.
- Each editor's adapter is simple. Just event hooks and buffer manipulation.
- Adding new editors is cheap. Write a small plugin, done.
- Multiple different editors can collaborate simultaneously.

THE EDITOR ADAPTER REQUIREMENTS:

For any CLI editor to participate, the adapter needs:

1. Change event hook - Know when user edits the buffer
   - Neovim: nvim_buf_attach with on_lines callback
   - Helix: LSP-based or custom events
   - Kakoune: FIFO-based extension system
   - Vim: +clientserver or plugin

2. Buffer manipulation - Apply remote changes
   - Neovim: nvim_buf_set_lines
   - Others: Similar APIs exist

3. Cursor visualization (optional but nice) - Show where peers are editing
   - Neovim: nvim_buf_set_extmark with virtual text
   - Others: Editor-specific

THE LSP ANGLE:

Many CLI editors already speak LSP (Language Server Protocol). This is interesting because:

- textDocument/didChange already notifies of edits
- textDocument/didOpen and didClose handle lifecycle
- workspace/executeCommand can carry custom operations

A "collaboration language server" could:
1. Receive didChange notifications
2. Run them through CRDT
3. Push remote changes back via workspace edits

This would reduce per-editor work to almost zero - editors already have LSP clients. Worth exploring.

CRDT LIBRARY CHOICE:

Cola (https://github.com/nomad/cola):
- Operation-based CRDT for text
- Buffer-agnostic: doesn't store text, just manages coordinates
- Clean API: Replica, Insertion, Deletion
- Real-time P2P focus
- Serialization via serde or custom encode
- Handles out-of-order delivery via backlog
- Benchmarks show 1.4-2x faster than diamond-types in some cases

Diamond-types (https://github.com/josephg/diamond-types):
- "World's fastest CRDT"
- 5000x-80000x speedup through aggressive RLE
- Stores full history (temporal DAG + spatial state)
- More complex (OpLog, Branch, CausalGraph concepts)
- Great for: large documents, offline-first, audit trails
- WASM support for browser

For our use case: Cola wins.
- Simpler API, easier to integrate
- Real-time focus matches our needs
- We don't need full history storage
- Less cognitive overhead to work with

Diamond-types is overkill for initial prototyping. Could revisit for optimization later.

COMMUNICATION PROTOCOL OPTIONS:

1. Unix socket - Simple, local only. Good for same-machine testing.

2. WebSocket - Works remote. Browser-friendly if we ever want web UI. Good default.

3. stdio pipe - Simplest for CLI tools. Editor spawns daemon, communicates via stdin/stdout.

4. LSP protocol - Leverage existing infrastructure. Interesting but might be awkward fit.

Recommendation: WebSocket as primary (works local and remote), Unix socket as fast local alternative.

REFERENCE IMPLEMENTATIONS:

repos/cola/
- src/replica.rs: Main API, 1200+ lines of docs
- src/insertion.rs, deletion.rs: Operation types
- examples/basic.rs: Simple Document wrapper pattern
- Key pattern: editor maintains buffer + Replica, calls inserted/deleted for local ops, integrate_* for remote ops

repos/instant.nvim/
- lua/instant.lua: Main logic, mixed nvim + algorithm
- lua/instant/websocket_*.lua: Transport layer (portable)
- Position ID generation (genPID): Tombstone vector clocks
- Shows exactly what adapters need to do

repos/upterm/
- host/host.go: Session lifecycle
- io/writer.go: MultiWriter for output broadcast
- Different paradigm but useful for understanding terminal collaboration UX

repos/diamond-types/
- Complex internals, good for understanding CRDT optimization
- INTERNALS.md, BINARY.md explain the RLE approach

NEXT STEPS TO PROTOTYPE:

Phase 1: Minimal daemon
- Rust binary using cola
- Single document support
- WebSocket server
- Two clients can connect, edits sync

Phase 2: Neovim adapter
- Lua plugin
- Connects to daemon via WebSocket
- Hooks nvim_buf_attach for changes
- Applies remote changes via nvim_buf_set_lines
- Test: two neovim instances editing same file

Phase 3: Multi-document
- Session management
- File path mapping
- Join/leave notifications

Phase 4: Second editor
- Helix adapter (or kakoune, or vim)
- Prove the architecture works across editors

Phase 5: Polish
- Peer cursors
- User presence indicators
- Better auth (SSH keys, GitHub)
- Discovery service

OPEN QUESTIONS:

1. Where does the daemon run?
   - Local daemon per machine? Central server? Hybrid?
   - For local-first: daemon on each machine, P2P sync
   - For easy setup: central server handles routing

2. How to handle file paths?
   - Relative to project root? Absolute? UUID-based?
   - Need consistent naming across different machines

3. Undo/redo coordination?
   - Per-user undo (like Zed) or global?
   - Cola doesn't handle this - need to build on top

4. Cursor/selection sync?
   - Nice to have, not essential for MVP
   - Adds complexity (need to track peer positions)

5. Permissions?
   - Can anyone edit anything? Read-only viewers?
   - Future concern, not MVP

THE DREAM:

You're in helix. Friend is in neovim. Another friend is in kakoune. You all open the same project, connect to a session, and just... edit together. Changes flow seamlessly. Each person uses their preferred editor with their preferred config. No one had to install anything they don't normally use.

That's the goal.