lofivor/TODO.md

3.1 KiB

lofivor - build roadmap

survivor-like optimized for weak hardware. finding the performance ceiling first, then building the game.

phase 1: sandbox stress test

  • create sandbox.zig (separate from existing game code)
  • entity struct (x, y, vx, vy, color)
  • flat array storage for entities
  • spawn entities at random screen edges
  • update loop: move toward center, respawn on arrival
  • render: filled circles (4px radius, cyan)
  • metrics overlay (entity count, frame time, update time, render time)
  • controls: +/- 100, shift +/- 1000, space pause, r reset

phase 2: find the ceiling

  • test on i5-6500T / HD 530 @ 1280x1024
  • record entity count where 60fps breaks
  • identify bottleneck (CPU update vs GPU render)
  • document findings

findings (AMD Radeon test):

  • 60fps breaks at ~5000 entities
  • render-bound: update stays <1ms even at 30k entities, render time dominates
  • individual drawCircle calls are the bottleneck

phase 3: optimization experiments

based on phase 2 results:

  • batch rendering via texture blitting (10x improvement)
  • rlgl quad batching (2x improvement on top)
  • if cpu-bound: SIMD, struct-of-arrays, multithreading (not needed)
  • re-test after each change

findings:

  • texture blitting: pre-render circle to texture, drawTexture() per entity
  • rlgl batching: submit vertices directly via rl.gl, bypass drawTexture overhead
  • baseline: 60fps @ ~5k entities
  • after texture blitting: 60fps @ ~50k entities
  • after rlgl batching: 60fps @ ~100k entities
  • total: ~20x improvement from baseline
  • see journal.txt for detailed benchmarks

further options (if needed):

  • increase raylib batch buffer (currently 8192 vertices = 2048 quads per flush)
  • GPU instancing (single draw call for all entities)
  • or just move on - 100k @ 60fps is a solid ceiling

phase 4: spatial partitioning

  • uniform grid collision
  • quadtree comparison
  • measure ceiling with n² collision checks enabled

phase 5: rendering experiments

  • increase raylib batch buffer (currently 8192 vertices = 2048 quads)
  • GPU instancing (single draw call for all entities)
  • SSBO instance data (12 bytes vs 64-byte matrices)
  • compute shader entity updates (raylib supports via rlgl)
  • compare OpenGL vs Vulkan backend

findings (i5-6500T / HD 530):

  • batch buffer increase: ~140k @ 60fps (was ~100k)
  • GPU instancing: ~150k @ 60fps - negligible gain over rlgl batching
  • instancing doesn't help on integrated graphics (shared RAM, no PCIe savings)
  • bottleneck is memory bandwidth, not draw call overhead
  • rlgl batching is already near-optimal for this hardware
  • compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)

future optimization concepts (GPU-focused)

  • GPU-side frustum culling in compute shader
  • point sprites for distant/small entities (4 verts → 1)
  • indirect draw calls (glDrawArraysIndirect)

future optimization concepts (CPU - not currently bottleneck)

  • SIMD / SoA / multithreading (if game logic makes CPU hot again)

other ideas that aren't about optimization

  • scanline shader