lofivor/journal.txt

91 lines
2.7 KiB
Text

lofivor optimization journal
=============================
goal: maximize entity count at 60fps for survivor-like game
baseline: individual drawCircle calls
-------------------------------------
technique: rl.drawCircle() per entity in loop
code: sandbox_main.zig:144-151
bottleneck: render-bound (update <1ms even at 30k entities)
benchmark1.log results (AMD Radeon):
- 60fps stable: ~4000 entities
- 60fps breaks: ~5000 entities (19.9ms frame)
- 10k entities: ~43ms frame
- 20k entities: ~77ms frame
- 25k entities: ~97ms frame
analysis: linear scaling, each drawCircle = separate GPU draw call
---
optimization 1: texture blitting
--------------------------------
technique: pre-render circle to 16x16 texture, drawTexture() per entity
code: sandbox_main.zig:109-124, 170-177
benchmark2.log results:
- 60fps stable: ~50,000 entities
- 60fps breaks: ~52,000-55,000 entities (18-21ms frame)
- 23k entities: 16.7ms frame (still vsync-locked)
- 59k entities: 20.6ms frame
extended benchmark (benchmark3):
- 50k entities: 16.7ms (vsync-locked, briefly touches 19ms)
- 60k entities: 20.7ms
- 70k entities: 23.7ms
- 80k entities: 30.1ms
- 100k entities: 33-37ms (~30fps)
comparison to baseline:
- baseline broke 60fps at ~5,000 entities
- texture blitting breaks at ~50,000 entities
- ~10x improvement in entity ceiling
analysis: raylib batches texture draws internally when using same texture.
individual drawCircle() = separate draw call each. drawTexture() with same
texture = batched into fewer GPU calls.
notes: render_ms stays ~16-18ms up to ~50k, then scales roughly linearly.
at 100k entities we're at ~30fps which is still playable. update loop
remains negligible (<0.6ms even at 100k).
---
optimization 2: rlgl quad batching
-----------------------------------
technique: bypass drawTexture(), submit vertices directly via rlgl
code: sandbox_main.zig:175-197
- rl.gl.rlSetTexture() once
- rl.gl.rlBegin(rl_quads)
- loop: rlTexCoord2f + rlVertex2f for 4 vertices per entity
- rl.gl.rlEnd()
benchmark3.log results:
- 40k entities: 16.7ms (vsync-locked)
- 100k entities: 16.7-19.2ms (~55-60fps)
comparison to optimization 1:
- texture blitting: 100k @ 33-37ms (~30fps)
- rlgl batching: 100k @ 16.7-19ms (~55-60fps)
- ~2x improvement
total improvement from baseline:
- baseline: 60fps @ ~5k entities
- final: 60fps @ ~100k entities
- ~20x improvement overall
analysis: drawTexture() has per-call overhead (type conversions, batch state
checks). rlgl submits vertices directly to GPU buffer. raylib's internal batch
(8192 vertices = ~2048 quads) auto-flushes, so 100k entities = ~49 draw calls
vs 100k drawTexture calls with their overhead.
---
optimization 3: [pending]
-------------------------
technique:
results:
notes: