Clean up todo and optimizations doc

2025-12-17 20:58:31 -05:00 · 2025-12-17 20:58:31 -05:00 · 0568204cb7
commit 0568204cb7
parent 516b4af458
2 changed files with 41 additions and 11 deletions
--- a/OPTIMIZATIONS.md
+++ b/OPTIMIZATIONS.md
@ -82,8 +82,8 @@ these target the rendering bottleneck since update loop is already fast.
 | technique              | description                                                          | expected gain                   |
 | ---------------------- | -------------------------------------------------------------------- | ------------------------------- |
-| ~~SSBO instance data~~ | ~~pack (x, y, color) = 12 bytes instead of 64-byte matrices~~        | **done** - see optimization 5   |
+| SSBO instance data     | pack (x, y, color) = 12 bytes instead of 64-byte matrices            | done - see optimization 5       |
-| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync            | significant                     |
+| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync            | done - see optimization 6       |
 | OpenGL vs Vulkan       | test raylib's Vulkan backend                                         | unknown                         |
 | discrete GPU testing   | test on dedicated GPU where instancing/SSBO shine                    | significant (different hw)      |
@ -126,6 +126,33 @@ currently not the bottleneck - update stays <1ms at 100k. these become relevant
 | entity pools          | pre-allocated, reusable entity slots  | reduces allocation overhead |
 | component packing     | minimize struct padding               | better cache utilization    |
 #### estimated gains summary
 | Optimization           | Expected Gain | Why                                               |
 |------------------------|---------------|---------------------------------------------------|
 | SIMD updates           | 0%            | Update already on GPU                             |
 | Multithreaded update   | 0%            | Update already on GPU                             |
 | Cache-friendly layouts | 0%            | CPU doesn't iterate entities                      |
 | Fixed-point math       | 0% or worse   | GPUs are optimized for float                      |
 | SoA vs AoS             | ~5%           | Only helps data upload, not bottleneck            |
 | Frustum culling        | 5-15%         | Most entities converge to center anyway           |
 | LOD rendering          | 20-40%        | Real gains - fewer fragments for distant entities |
 | Temporal techniques    | ~50%          | But with visual artifacts (flickering)            |
 Realistic total if you did everything: ~30-50% improvement
 That'd take you from ~1.4M @ 38fps to maybe ~1.8-2M @ 38fps, or ~1.4M @ 50-55fps.
 What would actually move the needle:
 - GPU-side frustum culling in compute shader (cull before render, not after)
 - Point sprites instead of quads for distant entities (4 vertices → 1)
 - Indirect draw calls (GPU decides what to render, CPU never touches entity data)
 Your real bottleneck is fill rate and vertex throughput on HD 530 integrated
 graphics. The CPU side is already essentially free.
 ---
 ## testing methodology
--- a/TODO.md
+++ b/TODO.md
@ -70,13 +70,16 @@ findings (i5-6500T / HD 530):
 - rlgl batching is already near-optimal for this hardware
 - compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)
-## future optimization concepts
+## future optimization concepts (GPU-focused)
- [ ] SIMD entity updates (AVX2/SSE)
+- [ ] GPU-side frustum culling in compute shader
- [ ] struct-of-arrays vs array-of-structs benchmark
+- [ ] point sprites for distant/small entities (4 verts → 1)
- [ ] multithreaded update loop (thread pool)
+- [ ] indirect draw calls (glDrawArraysIndirect)
- [ ] cache-friendly memory layouts
+
- [ ] LOD rendering (skip distant entities or reduce detail)
+## future optimization concepts (CPU - not currently bottleneck)
- [ ] frustum culling (only render visible)
+
- [ ] temporal techniques (update subset per frame)
+- [ ] SIMD / SoA / multithreading (if game logic makes CPU hot again)
- [ ] fixed-point vs floating-point math
+
 ## other ideas that aren't about optimization
 - [ ] scanline shader