Clean up todo and optimizations doc

This commit is contained in:
Jared Miller 2025-12-17 20:58:31 -05:00
parent 516b4af458
commit 0568204cb7
No known key found for this signature in database
2 changed files with 41 additions and 11 deletions

View file

@ -82,8 +82,8 @@ these target the rendering bottleneck since update loop is already fast.
| technique | description | expected gain |
| ---------------------- | -------------------------------------------------------------------- | ------------------------------- |
| ~~SSBO instance data~~ | ~~pack (x, y, color) = 12 bytes instead of 64-byte matrices~~ | **done** - see optimization 5 |
| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | significant |
| SSBO instance data | pack (x, y, color) = 12 bytes instead of 64-byte matrices | done - see optimization 5 |
| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | done - see optimization 6 |
| OpenGL vs Vulkan | test raylib's Vulkan backend | unknown |
| discrete GPU testing | test on dedicated GPU where instancing/SSBO shine | significant (different hw) |
@ -126,6 +126,33 @@ currently not the bottleneck - update stays <1ms at 100k. these become relevant
| entity pools | pre-allocated, reusable entity slots | reduces allocation overhead |
| component packing | minimize struct padding | better cache utilization |
#### estimated gains summary
| Optimization | Expected Gain | Why |
|------------------------|---------------|---------------------------------------------------|
| SIMD updates | 0% | Update already on GPU |
| Multithreaded update | 0% | Update already on GPU |
| Cache-friendly layouts | 0% | CPU doesn't iterate entities |
| Fixed-point math | 0% or worse | GPUs are optimized for float |
| SoA vs AoS | ~5% | Only helps data upload, not bottleneck |
| Frustum culling | 5-15% | Most entities converge to center anyway |
| LOD rendering | 20-40% | Real gains - fewer fragments for distant entities |
| Temporal techniques | ~50% | But with visual artifacts (flickering) |
Realistic total if you did everything: ~30-50% improvement
That'd take you from ~1.4M @ 38fps to maybe ~1.8-2M @ 38fps, or ~1.4M @ 50-55fps.
What would actually move the needle:
- GPU-side frustum culling in compute shader (cull before render, not after)
- Point sprites instead of quads for distant entities (4 vertices → 1)
- Indirect draw calls (GPU decides what to render, CPU never touches entity data)
Your real bottleneck is fill rate and vertex throughput on HD 530 integrated
graphics. The CPU side is already essentially free.
---
## testing methodology

21
TODO.md
View file

@ -70,13 +70,16 @@ findings (i5-6500T / HD 530):
- rlgl batching is already near-optimal for this hardware
- compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)
## future optimization concepts
## future optimization concepts (GPU-focused)
- [ ] SIMD entity updates (AVX2/SSE)
- [ ] struct-of-arrays vs array-of-structs benchmark
- [ ] multithreaded update loop (thread pool)
- [ ] cache-friendly memory layouts
- [ ] LOD rendering (skip distant entities or reduce detail)
- [ ] frustum culling (only render visible)
- [ ] temporal techniques (update subset per frame)
- [ ] fixed-point vs floating-point math
- [ ] GPU-side frustum culling in compute shader
- [ ] point sprites for distant/small entities (4 verts → 1)
- [ ] indirect draw calls (glDrawArraysIndirect)
## future optimization concepts (CPU - not currently bottleneck)
- [ ] SIMD / SoA / multithreading (if game logic makes CPU hot again)
## other ideas that aren't about optimization
- [ ] scanline shader