From 0568204cb7ec61504b35e8be4964862ea43bf0d6 Mon Sep 17 00:00:00 2001 From: Jared Miller Date: Wed, 17 Dec 2025 20:58:31 -0500 Subject: [PATCH] Clean up todo and optimizations doc --- OPTIMIZATIONS.md | 31 +++++++++++++++++++++++++++++-- TODO.md | 21 ++++++++++++--------- 2 files changed, 41 insertions(+), 11 deletions(-) diff --git a/OPTIMIZATIONS.md b/OPTIMIZATIONS.md index 911110f..a2f70f7 100644 --- a/OPTIMIZATIONS.md +++ b/OPTIMIZATIONS.md @@ -82,8 +82,8 @@ these target the rendering bottleneck since update loop is already fast. | technique | description | expected gain | | ---------------------- | -------------------------------------------------------------------- | ------------------------------- | -| ~~SSBO instance data~~ | ~~pack (x, y, color) = 12 bytes instead of 64-byte matrices~~ | **done** - see optimization 5 | -| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | significant | +| SSBO instance data | pack (x, y, color) = 12 bytes instead of 64-byte matrices | done - see optimization 5 | +| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | done - see optimization 6 | | OpenGL vs Vulkan | test raylib's Vulkan backend | unknown | | discrete GPU testing | test on dedicated GPU where instancing/SSBO shine | significant (different hw) | @@ -126,6 +126,33 @@ currently not the bottleneck - update stays <1ms at 100k. these become relevant | entity pools | pre-allocated, reusable entity slots | reduces allocation overhead | | component packing | minimize struct padding | better cache utilization | +#### estimated gains summary + +| Optimization | Expected Gain | Why | +|------------------------|---------------|---------------------------------------------------| +| SIMD updates | 0% | Update already on GPU | +| Multithreaded update | 0% | Update already on GPU | +| Cache-friendly layouts | 0% | CPU doesn't iterate entities | +| Fixed-point math | 0% or worse | GPUs are optimized for float | +| SoA vs AoS | ~5% | Only helps data upload, not bottleneck | +| Frustum culling | 5-15% | Most entities converge to center anyway | +| LOD rendering | 20-40% | Real gains - fewer fragments for distant entities | +| Temporal techniques | ~50% | But with visual artifacts (flickering) | + +Realistic total if you did everything: ~30-50% improvement + +That'd take you from ~1.4M @ 38fps to maybe ~1.8-2M @ 38fps, or ~1.4M @ 50-55fps. + +What would actually move the needle: +- GPU-side frustum culling in compute shader (cull before render, not after) +- Point sprites instead of quads for distant entities (4 vertices → 1) +- Indirect draw calls (GPU decides what to render, CPU never touches entity data) + +Your real bottleneck is fill rate and vertex throughput on HD 530 integrated +graphics. The CPU side is already essentially free. + + + --- ## testing methodology diff --git a/TODO.md b/TODO.md index ef6e726..b2261bb 100644 --- a/TODO.md +++ b/TODO.md @@ -70,13 +70,16 @@ findings (i5-6500T / HD 530): - rlgl batching is already near-optimal for this hardware - compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely) -## future optimization concepts +## future optimization concepts (GPU-focused) -- [ ] SIMD entity updates (AVX2/SSE) -- [ ] struct-of-arrays vs array-of-structs benchmark -- [ ] multithreaded update loop (thread pool) -- [ ] cache-friendly memory layouts -- [ ] LOD rendering (skip distant entities or reduce detail) -- [ ] frustum culling (only render visible) -- [ ] temporal techniques (update subset per frame) -- [ ] fixed-point vs floating-point math +- [ ] GPU-side frustum culling in compute shader +- [ ] point sprites for distant/small entities (4 verts → 1) +- [ ] indirect draw calls (glDrawArraysIndirect) + +## future optimization concepts (CPU - not currently bottleneck) + +- [ ] SIMD / SoA / multithreading (if game logic makes CPU hot again) + +## other ideas that aren't about optimization + +- [ ] scanline shader