Update TODO and OPTIMIZATIONS with gpu discovery
This commit is contained in:
parent
1c0f552032
commit
9e926e0646
2 changed files with 38 additions and 11 deletions
|
|
@ -4,9 +4,9 @@ organized by performance goal. see journal.txt for detailed benchmarks.
|
||||||
|
|
||||||
## current ceiling
|
## current ceiling
|
||||||
|
|
||||||
- **100k entities @ 60fps** (AMD Radeon)
|
- **~150k entities @ 60fps** (i5-6500T / HD 530 integrated)
|
||||||
- **50k entities @ 60fps** (i5-6500T integrated)
|
- **~260k entities @ 60fps** (AMD Radeon discrete)
|
||||||
- bottleneck: GPU-bound (update loop stays <1ms even at 100k)
|
- bottleneck: GPU-bound (update loop stays <1ms even at 200k+)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -34,6 +34,25 @@ organized by performance goal. see journal.txt for detailed benchmarks.
|
||||||
- improvement: **2x** over texture blitting, **20x** total
|
- improvement: **2x** over texture blitting, **20x** total
|
||||||
- why it works: eliminates per-call overhead, vertices go straight to GPU buffer
|
- why it works: eliminates per-call overhead, vertices go straight to GPU buffer
|
||||||
|
|
||||||
|
#### optimization 3: increased batch buffer
|
||||||
|
|
||||||
|
- technique: increase raylib batch buffer from 8192 to 32768 vertices
|
||||||
|
- result: ~140k entities @ 60fps (i5-6500T)
|
||||||
|
- improvement: **~40%** over default buffer
|
||||||
|
- why it works: fewer GPU flushes per frame
|
||||||
|
|
||||||
|
#### optimization 4: GPU instancing (tested, minimal gain)
|
||||||
|
|
||||||
|
- technique: `drawMeshInstanced()` with per-entity transform matrices
|
||||||
|
- result: ~150k entities @ 60fps (i5-6500T) - similar to rlgl batching
|
||||||
|
- improvement: **negligible** on integrated graphics
|
||||||
|
- why it didn't help:
|
||||||
|
- integrated GPU shares system RAM (no PCIe transfer savings)
|
||||||
|
- 64-byte Matrix per entity vs ~80 bytes for rlgl vertices (similar bandwidth)
|
||||||
|
- bottleneck is memory bandwidth, not draw call overhead
|
||||||
|
- rlgl batching already minimizes draw calls effectively
|
||||||
|
- note: may help more on discrete GPUs with dedicated VRAM
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## future optimizations
|
## future optimizations
|
||||||
|
|
@ -42,12 +61,12 @@ organized by performance goal. see journal.txt for detailed benchmarks.
|
||||||
|
|
||||||
these target the rendering bottleneck since update loop is already fast.
|
these target the rendering bottleneck since update loop is already fast.
|
||||||
|
|
||||||
| technique | description | expected gain |
|
| technique | description | expected gain |
|
||||||
| ---------------------- | -------------------------------------------------------------------- | ------------- |
|
| ---------------------- | -------------------------------------------------------------------- | ------------------------------- |
|
||||||
| increase batch buffer | raylib default is 8192 vertices (2048 quads). larger = fewer flushes | moderate |
|
| SSBO instance data | pack (x, y, color) = 12 bytes instead of 64-byte matrices | moderate (less bandwidth) |
|
||||||
| GPU instancing | single draw call for all entities, GPU handles transforms | significant |
|
| compute shader updates | move entity positions to GPU entirely, avoid CPU→GPU sync | significant |
|
||||||
| compute shader updates | move entity positions to GPU entirely | significant |
|
| OpenGL vs Vulkan | test raylib's Vulkan backend | unknown |
|
||||||
| OpenGL vs Vulkan | test raylib's Vulkan backend | unknown |
|
| discrete GPU testing | test on dedicated GPU where instancing/SSBO shine | significant (different hw) |
|
||||||
|
|
||||||
#### rendering culling
|
#### rendering culling
|
||||||
|
|
||||||
|
|
|
||||||
12
TODO.md
12
TODO.md
|
|
@ -56,11 +56,19 @@ further options (if needed):
|
||||||
|
|
||||||
## phase 5: rendering experiments
|
## phase 5: rendering experiments
|
||||||
|
|
||||||
- [ ] increase raylib batch buffer (currently 8192 vertices = 2048 quads)
|
- [x] increase raylib batch buffer (currently 8192 vertices = 2048 quads)
|
||||||
- [ ] GPU instancing (single draw call for all entities)
|
- [x] GPU instancing (single draw call for all entities)
|
||||||
|
- [ ] SSBO instance data (12 bytes vs 64-byte matrices)
|
||||||
- [ ] compute shader entity updates (if raylib supports)
|
- [ ] compute shader entity updates (if raylib supports)
|
||||||
- [ ] compare OpenGL vs Vulkan backend
|
- [ ] compare OpenGL vs Vulkan backend
|
||||||
|
|
||||||
|
findings (i5-6500T / HD 530):
|
||||||
|
- batch buffer increase: ~140k @ 60fps (was ~100k)
|
||||||
|
- GPU instancing: ~150k @ 60fps - negligible gain over rlgl batching
|
||||||
|
- instancing doesn't help on integrated graphics (shared RAM, no PCIe savings)
|
||||||
|
- bottleneck is memory bandwidth, not draw call overhead
|
||||||
|
- rlgl batching is already near-optimal for this hardware
|
||||||
|
|
||||||
## future optimization concepts
|
## future optimization concepts
|
||||||
|
|
||||||
- [ ] SIMD entity updates (AVX2/SSE)
|
- [ ] SIMD entity updates (AVX2/SSE)
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue