Clean up todo and optimizations doc

This commit is contained in:
Jared Miller 2025-12-17 20:58:31 -05:00
parent 516b4af458
commit 0568204cb7
No known key found for this signature in database
2 changed files with 41 additions and 11 deletions

View file

@ -82,8 +82,8 @@ these target the rendering bottleneck since update loop is already fast.
| technique | description | expected gain | | technique | description | expected gain |
| ---------------------- | -------------------------------------------------------------------- | ------------------------------- | | ---------------------- | -------------------------------------------------------------------- | ------------------------------- |
| ~~SSBO instance data~~ | ~~pack (x, y, color) = 12 bytes instead of 64-byte matrices~~ | **done** - see optimization 5 | | SSBO instance data | pack (x, y, color) = 12 bytes instead of 64-byte matrices | done - see optimization 5 |
| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | significant | | compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | done - see optimization 6 |
| OpenGL vs Vulkan | test raylib's Vulkan backend | unknown | | OpenGL vs Vulkan | test raylib's Vulkan backend | unknown |
| discrete GPU testing | test on dedicated GPU where instancing/SSBO shine | significant (different hw) | | discrete GPU testing | test on dedicated GPU where instancing/SSBO shine | significant (different hw) |
@ -126,6 +126,33 @@ currently not the bottleneck - update stays <1ms at 100k. these become relevant
| entity pools | pre-allocated, reusable entity slots | reduces allocation overhead | | entity pools | pre-allocated, reusable entity slots | reduces allocation overhead |
| component packing | minimize struct padding | better cache utilization | | component packing | minimize struct padding | better cache utilization |
#### estimated gains summary
| Optimization | Expected Gain | Why |
|------------------------|---------------|---------------------------------------------------|
| SIMD updates | 0% | Update already on GPU |
| Multithreaded update | 0% | Update already on GPU |
| Cache-friendly layouts | 0% | CPU doesn't iterate entities |
| Fixed-point math | 0% or worse | GPUs are optimized for float |
| SoA vs AoS | ~5% | Only helps data upload, not bottleneck |
| Frustum culling | 5-15% | Most entities converge to center anyway |
| LOD rendering | 20-40% | Real gains - fewer fragments for distant entities |
| Temporal techniques | ~50% | But with visual artifacts (flickering) |
Realistic total if you did everything: ~30-50% improvement
That'd take you from ~1.4M @ 38fps to maybe ~1.8-2M @ 38fps, or ~1.4M @ 50-55fps.
What would actually move the needle:
- GPU-side frustum culling in compute shader (cull before render, not after)
- Point sprites instead of quads for distant entities (4 vertices → 1)
- Indirect draw calls (GPU decides what to render, CPU never touches entity data)
Your real bottleneck is fill rate and vertex throughput on HD 530 integrated
graphics. The CPU side is already essentially free.
--- ---
## testing methodology ## testing methodology

21
TODO.md
View file

@ -70,13 +70,16 @@ findings (i5-6500T / HD 530):
- rlgl batching is already near-optimal for this hardware - rlgl batching is already near-optimal for this hardware
- compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely) - compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)
## future optimization concepts ## future optimization concepts (GPU-focused)
- [ ] SIMD entity updates (AVX2/SSE) - [ ] GPU-side frustum culling in compute shader
- [ ] struct-of-arrays vs array-of-structs benchmark - [ ] point sprites for distant/small entities (4 verts → 1)
- [ ] multithreaded update loop (thread pool) - [ ] indirect draw calls (glDrawArraysIndirect)
- [ ] cache-friendly memory layouts
- [ ] LOD rendering (skip distant entities or reduce detail) ## future optimization concepts (CPU - not currently bottleneck)
- [ ] frustum culling (only render visible)
- [ ] temporal techniques (update subset per frame) - [ ] SIMD / SoA / multithreading (if game logic makes CPU hot again)
- [ ] fixed-point vs floating-point math
## other ideas that aren't about optimization
- [ ] scanline shader