colabbd/docs/synthesis.txt
2026-01-27 16:02:36 -05:00

217 lines
7.9 KiB
Text

Research Synthesis - Editor-Agnostic CLI Collaboration
THE CORE PROBLEM:
Zed and VSCode have beautiful real-time collaboration. But they lock you into their editors. If you're a vim/helix/kakoune user and want to pair program with a friend, you shouldn't have to make them switch editors. The goal: divorce collaborative editing from any specific editor.
EXISTING APPROACHES ANALYZED:
1. Terminal Multiplexing (upterm, tmate, tmux sharing)
How it works: Share a PTY over the network. Everyone sees the same terminal output, keystrokes forwarded to the shell.
Upterm specifically: Reverse SSH tunnel to a central server, clients connect through it. MultiWriter pattern broadcasts output to all connected clients.
Pros: Works TODAY with any CLI editor. Zero editor integration needed. Good for "let me show you something" pair programming.
Cons: No concurrent editing (everyone's typing goes to same shell). No offline. No semantic awareness. Last keystroke wins. Not true collaborative editing.
Verdict: Great for terminal screenshare, not for document collaboration.
2. File-Level Sync (VSCode LiveShare style)
How it works: Host owns the workspace. Guests get proxied file access. SSH protocol with relay fallback.
Not actually CRDT-based - more like remote desktop for code.
Sessions expire after 24 hours. P2P when possible, Microsoft relay otherwise.
Verdict: Doesn't solve editor-agnostic problem. Guests are still locked to host's environment.
3. CRDT-Based Document Sync (Zed, instant.nvim)
How it works: Each character gets a unique ID. Operations are "insert after ID xyz" not "insert at position 5". Concurrent edits automatically merge correctly.
Zed's architecture: Anchors (logical positions), tombstone deletions, Lamport timestamps, version vectors, per-user undo maps. Server for auth/discovery, CRDT for document state.
instant.nvim: Pure Lua implementation for Neovim. WebSocket server routes messages. Position IDs (tombstone vector clocks) for conflict-free ordering.
Key insight from instant.nvim: 70% of the code is editor-agnostic (transport + CRDT algorithm). Only 30% is neovim-specific (buffer events, manipulation, cursor display).
THE PROPOSED ARCHITECTURE:
CRDT Daemon + Thin Editor Adapters
The daemon handles all the hard parts:
- CRDT text buffer (using cola or diamond-types)
- Network sync (WebSocket for remote, Unix socket for local)
- Session management
- Peer discovery/auth
Each editor gets a minimal adapter that:
1. Hooks into buffer change events
2. Serializes changes as (offset, length, text)
3. Sends to daemon
4. Receives remote operations from daemon
5. Applies changes to local buffer
6. Optionally: displays peer cursors
Why this split works:
- Solving CRDT correctly is hard. Do it once in the daemon.
- Each editor's adapter is simple. Just event hooks and buffer manipulation.
- Adding new editors is cheap. Write a small plugin, done.
- Multiple different editors can collaborate simultaneously.
THE EDITOR ADAPTER REQUIREMENTS:
For any CLI editor to participate, the adapter needs:
1. Change event hook - Know when user edits the buffer
- Neovim: nvim_buf_attach with on_lines callback
- Helix: LSP-based or custom events
- Kakoune: FIFO-based extension system
- Vim: +clientserver or plugin
2. Buffer manipulation - Apply remote changes
- Neovim: nvim_buf_set_lines
- Others: Similar APIs exist
3. Cursor visualization (optional but nice) - Show where peers are editing
- Neovim: nvim_buf_set_extmark with virtual text
- Others: Editor-specific
THE LSP ANGLE:
Many CLI editors already speak LSP (Language Server Protocol). This is interesting because:
- textDocument/didChange already notifies of edits
- textDocument/didOpen and didClose handle lifecycle
- workspace/executeCommand can carry custom operations
A "collaboration language server" could:
1. Receive didChange notifications
2. Run them through CRDT
3. Push remote changes back via workspace edits
This would reduce per-editor work to almost zero - editors already have LSP clients. Worth exploring.
CRDT LIBRARY CHOICE:
Cola (https://github.com/nomad/cola):
- Operation-based CRDT for text
- Buffer-agnostic: doesn't store text, just manages coordinates
- Clean API: Replica, Insertion, Deletion
- Real-time P2P focus
- Serialization via serde or custom encode
- Handles out-of-order delivery via backlog
- Benchmarks show 1.4-2x faster than diamond-types in some cases
Diamond-types (https://github.com/josephg/diamond-types):
- "World's fastest CRDT"
- 5000x-80000x speedup through aggressive RLE
- Stores full history (temporal DAG + spatial state)
- More complex (OpLog, Branch, CausalGraph concepts)
- Great for: large documents, offline-first, audit trails
- WASM support for browser
For our use case: Cola wins.
- Simpler API, easier to integrate
- Real-time focus matches our needs
- We don't need full history storage
- Less cognitive overhead to work with
Diamond-types is overkill for initial prototyping. Could revisit for optimization later.
COMMUNICATION PROTOCOL OPTIONS:
1. Unix socket - Simple, local only. Good for same-machine testing.
2. WebSocket - Works remote. Browser-friendly if we ever want web UI. Good default.
3. stdio pipe - Simplest for CLI tools. Editor spawns daemon, communicates via stdin/stdout.
4. LSP protocol - Leverage existing infrastructure. Interesting but might be awkward fit.
Recommendation: WebSocket as primary (works local and remote), Unix socket as fast local alternative.
REFERENCE IMPLEMENTATIONS:
repos/cola/
- src/replica.rs: Main API, 1200+ lines of docs
- src/insertion.rs, deletion.rs: Operation types
- examples/basic.rs: Simple Document wrapper pattern
- Key pattern: editor maintains buffer + Replica, calls inserted/deleted for local ops, integrate_* for remote ops
repos/instant.nvim/
- lua/instant.lua: Main logic, mixed nvim + algorithm
- lua/instant/websocket_*.lua: Transport layer (portable)
- Position ID generation (genPID): Tombstone vector clocks
- Shows exactly what adapters need to do
repos/upterm/
- host/host.go: Session lifecycle
- io/writer.go: MultiWriter for output broadcast
- Different paradigm but useful for understanding terminal collaboration UX
repos/diamond-types/
- Complex internals, good for understanding CRDT optimization
- INTERNALS.md, BINARY.md explain the RLE approach
NEXT STEPS TO PROTOTYPE:
Phase 1: Minimal daemon
- Rust binary using cola
- Single document support
- WebSocket server
- Two clients can connect, edits sync
Phase 2: Neovim adapter
- Lua plugin
- Connects to daemon via WebSocket
- Hooks nvim_buf_attach for changes
- Applies remote changes via nvim_buf_set_lines
- Test: two neovim instances editing same file
Phase 3: Multi-document
- Session management
- File path mapping
- Join/leave notifications
Phase 4: Second editor
- Helix adapter (or kakoune, or vim)
- Prove the architecture works across editors
Phase 5: Polish
- Peer cursors
- User presence indicators
- Better auth (SSH keys, GitHub)
- Discovery service
OPEN QUESTIONS:
1. Where does the daemon run?
- Local daemon per machine? Central server? Hybrid?
- For local-first: daemon on each machine, P2P sync
- For easy setup: central server handles routing
2. How to handle file paths?
- Relative to project root? Absolute? UUID-based?
- Need consistent naming across different machines
3. Undo/redo coordination?
- Per-user undo (like Zed) or global?
- Cola doesn't handle this - need to build on top
4. Cursor/selection sync?
- Nice to have, not essential for MVP
- Adds complexity (need to track peer positions)
5. Permissions?
- Can anyone edit anything? Read-only viewers?
- Future concern, not MVP
THE DREAM:
You're in helix. Friend is in neovim. Another friend is in kakoune. You all open the same project, connect to a session, and just... edit together. Changes flow seamlessly. Each person uses their preferred editor with their preferred config. No one had to install anything they don't normally use.
That's the goal.