From 0568204cb7ec61504b35e8be4964862ea43bf0d6 Mon Sep 17 00:00:00 2001
From: Jared Miller <jared@smell.flowers>
Date: Wed, 17 Dec 2025 20:58:31 -0500
Subject: [PATCH] Clean up todo and optimizations doc

---
 OPTIMIZATIONS.md | 31 +++++++++++++++++++++++++++++--
 TODO.md          | 21 ++++++++++++---------
 2 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/OPTIMIZATIONS.md b/OPTIMIZATIONS.md
index 911110f..a2f70f7 100644
--- a/OPTIMIZATIONS.md
+++ b/OPTIMIZATIONS.md
@@ -82,8 +82,8 @@ these target the rendering bottleneck since update loop is already fast.
 
 | technique              | description                                                          | expected gain                   |
 | ---------------------- | -------------------------------------------------------------------- | ------------------------------- |
-| ~~SSBO instance data~~ | ~~pack (x, y, color) = 12 bytes instead of 64-byte matrices~~        | **done** - see optimization 5   |
-| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync            | significant                     |
+| SSBO instance data     | pack (x, y, color) = 12 bytes instead of 64-byte matrices            | done - see optimization 5       |
+| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync            | done - see optimization 6       |
 | OpenGL vs Vulkan       | test raylib's Vulkan backend                                         | unknown                         |
 | discrete GPU testing   | test on dedicated GPU where instancing/SSBO shine                    | significant (different hw)      |
 
@@ -126,6 +126,33 @@ currently not the bottleneck - update stays <1ms at 100k. these become relevant
 | entity pools          | pre-allocated, reusable entity slots  | reduces allocation overhead |
 | component packing     | minimize struct padding               | better cache utilization    |
 
+#### estimated gains summary
+
+| Optimization           | Expected Gain | Why                                               |
+|------------------------|---------------|---------------------------------------------------|
+| SIMD updates           | 0%            | Update already on GPU                             |
+| Multithreaded update   | 0%            | Update already on GPU                             |
+| Cache-friendly layouts | 0%            | CPU doesn't iterate entities                      |
+| Fixed-point math       | 0% or worse   | GPUs are optimized for float                      |
+| SoA vs AoS             | ~5%           | Only helps data upload, not bottleneck            |
+| Frustum culling        | 5-15%         | Most entities converge to center anyway           |
+| LOD rendering          | 20-40%        | Real gains - fewer fragments for distant entities |
+| Temporal techniques    | ~50%          | But with visual artifacts (flickering)            |
+
+Realistic total if you did everything: ~30-50% improvement
+
+That'd take you from ~1.4M @ 38fps to maybe ~1.8-2M @ 38fps, or ~1.4M @ 50-55fps.
+
+What would actually move the needle:
+- GPU-side frustum culling in compute shader (cull before render, not after)
+- Point sprites instead of quads for distant entities (4 vertices → 1)
+- Indirect draw calls (GPU decides what to render, CPU never touches entity data)
+
+Your real bottleneck is fill rate and vertex throughput on HD 530 integrated
+graphics. The CPU side is already essentially free.
+
+
+
 ---
 
 ## testing methodology
diff --git a/TODO.md b/TODO.md
index ef6e726..b2261bb 100644
--- a/TODO.md
+++ b/TODO.md
@@ -70,13 +70,16 @@ findings (i5-6500T / HD 530):
 - rlgl batching is already near-optimal for this hardware
 - compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)
 
-## future optimization concepts
+## future optimization concepts (GPU-focused)
 
-- [ ] SIMD entity updates (AVX2/SSE)
-- [ ] struct-of-arrays vs array-of-structs benchmark
-- [ ] multithreaded update loop (thread pool)
-- [ ] cache-friendly memory layouts
-- [ ] LOD rendering (skip distant entities or reduce detail)
-- [ ] frustum culling (only render visible)
-- [ ] temporal techniques (update subset per frame)
-- [ ] fixed-point vs floating-point math
+- [ ] GPU-side frustum culling in compute shader
+- [ ] point sprites for distant/small entities (4 verts → 1)
+- [ ] indirect draw calls (glDrawArraysIndirect)
+
+## future optimization concepts (CPU - not currently bottleneck)
+
+- [ ] SIMD / SoA / multithreading (if game logic makes CPU hot again)
+
+## other ideas that aren't about optimization
+
+- [ ] scanline shader