Clean up todo and optimizations doc

2025-12-17 20:58:31 -05:00 · 2025-12-17 20:58:31 -05:00 · 0568204cb7
commit 0568204cb7
parent 516b4af458
2 changed files with 41 additions and 11 deletions
--- a/OPTIMIZATIONS.md
+++ b/OPTIMIZATIONS.md
@ -82,8 +82,8 @@ these target the rendering bottleneck since update loop is already fast.

 | technique              | description                                                          | expected gain                   |
 | ---------------------- | -------------------------------------------------------------------- | ------------------------------- |
-| ~~SSBO instance data~~ | ~~pack (x, y, color) = 12 bytes instead of 64-byte matrices~~        | **done** - see optimization 5   |
-| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync            | significant                     |
+| SSBO instance data     | pack (x, y, color) = 12 bytes instead of 64-byte matrices            | done - see optimization 5       |
+| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync            | done - see optimization 6       |
 | OpenGL vs Vulkan       | test raylib's Vulkan backend                                         | unknown                         |
 | discrete GPU testing   | test on dedicated GPU where instancing/SSBO shine                    | significant (different hw)      |

@ -126,6 +126,33 @@ currently not the bottleneck - update stays <1ms at 100k. these become relevant
 | entity pools          | pre-allocated, reusable entity slots  | reduces allocation overhead |
 | component packing     | minimize struct padding               | better cache utilization    |

+#### estimated gains summary
+
+| Optimization           | Expected Gain | Why                                               |
+|------------------------|---------------|---------------------------------------------------|
+| SIMD updates           | 0%            | Update already on GPU                             |
+| Multithreaded update   | 0%            | Update already on GPU                             |
+| Cache-friendly layouts | 0%            | CPU doesn't iterate entities                      |
+| Fixed-point math       | 0% or worse   | GPUs are optimized for float                      |
+| SoA vs AoS             | ~5%           | Only helps data upload, not bottleneck            |
+| Frustum culling        | 5-15%         | Most entities converge to center anyway           |
+| LOD rendering          | 20-40%        | Real gains - fewer fragments for distant entities |
+| Temporal techniques    | ~50%          | But with visual artifacts (flickering)            |
+
+Realistic total if you did everything: ~30-50% improvement
+
+That'd take you from ~1.4M @ 38fps to maybe ~1.8-2M @ 38fps, or ~1.4M @ 50-55fps.
+
+What would actually move the needle:
+- GPU-side frustum culling in compute shader (cull before render, not after)
+- Point sprites instead of quads for distant entities (4 vertices → 1)
+- Indirect draw calls (GPU decides what to render, CPU never touches entity data)
+
+Your real bottleneck is fill rate and vertex throughput on HD 530 integrated
+graphics. The CPU side is already essentially free.
+
+
+
 ---

 ## testing methodology
--- a/TODO.md
+++ b/TODO.md
@ -70,13 +70,16 @@ findings (i5-6500T / HD 530):
 - rlgl batching is already near-optimal for this hardware
 - compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)

-## future optimization concepts
+## future optimization concepts (GPU-focused)

- [ ] SIMD entity updates (AVX2/SSE)
- [ ] struct-of-arrays vs array-of-structs benchmark
- [ ] multithreaded update loop (thread pool)
- [ ] cache-friendly memory layouts
- [ ] LOD rendering (skip distant entities or reduce detail)
- [ ] frustum culling (only render visible)
- [ ] temporal techniques (update subset per frame)
- [ ] fixed-point vs floating-point math
+- [ ] GPU-side frustum culling in compute shader
+- [ ] point sprites for distant/small entities (4 verts → 1)
+- [ ] indirect draw calls (glDrawArraysIndirect)
+
+## future optimization concepts (CPU - not currently bottleneck)
+
+- [ ] SIMD / SoA / multithreading (if game logic makes CPU hot again)
+
+## other ideas that aren't about optimization
+
+- [ ] scanline shader