Explore Help

shmup/lofivor

1

0

You've already forked lofivor

Code Issues Pull requests Projects Releases 1 Packages Wiki Activity Actions

lofivor/TODO.md

Jared Miller 0568204cb7

Clean up todo and optimizations doc

2025-12-17 20:58:31 -05:00

3.1 KiB

Raw Permalink Blame History

lofivor - build roadmap

survivor-like optimized for weak hardware. finding the performance ceiling first, then building the game.

phase 1: sandbox stress test

create sandbox.zig (separate from existing game code)
entity struct (x, y, vx, vy, color)
flat array storage for entities
spawn entities at random screen edges
update loop: move toward center, respawn on arrival
render: filled circles (4px radius, cyan)
metrics overlay (entity count, frame time, update time, render time)
controls: +/- 100, shift +/- 1000, space pause, r reset

phase 2: find the ceiling

test on i5-6500T / HD 530 @ 1280x1024
record entity count where 60fps breaks
identify bottleneck (CPU update vs GPU render)
document findings

findings (AMD Radeon test):

60fps breaks at ~5000 entities
render-bound: update stays <1ms even at 30k entities, render time dominates
individual drawCircle calls are the bottleneck

phase 3: optimization experiments

based on phase 2 results:

batch rendering via texture blitting (10x improvement)
rlgl quad batching (2x improvement on top)
~~if cpu-bound: SIMD, struct-of-arrays, multithreading~~ (not needed)
re-test after each change

findings:

texture blitting: pre-render circle to texture, drawTexture() per entity
rlgl batching: submit vertices directly via rl.gl, bypass drawTexture overhead
baseline: 60fps @ ~5k entities
after texture blitting: 60fps @ ~50k entities
after rlgl batching: 60fps @ ~100k entities
total: ~20x improvement from baseline
see journal.txt for detailed benchmarks

further options (if needed):

increase raylib batch buffer (currently 8192 vertices = 2048 quads per flush)
GPU instancing (single draw call for all entities)
or just move on - 100k @ 60fps is a solid ceiling

phase 4: spatial partitioning

uniform grid collision
quadtree comparison
measure ceiling with n² collision checks enabled

phase 5: rendering experiments

increase raylib batch buffer (currently 8192 vertices = 2048 quads)
GPU instancing (single draw call for all entities)
SSBO instance data (12 bytes vs 64-byte matrices)
compute shader entity updates (raylib supports via rlgl)
compare OpenGL vs Vulkan backend

findings (i5-6500T / HD 530):

batch buffer increase: ~140k @ 60fps (was ~100k)
GPU instancing: ~150k @ 60fps - negligible gain over rlgl batching
instancing doesn't help on integrated graphics (shared RAM, no PCIe savings)
bottleneck is memory bandwidth, not draw call overhead
rlgl batching is already near-optimal for this hardware
compute shaders: update time ~5ms → ~0ms at 150k entities (CPU freed entirely)

future optimization concepts (GPU-focused)

GPU-side frustum culling in compute shader
point sprites for distant/small entities (4 verts → 1)
indirect draw calls (glDrawArraysIndirect)

future optimization concepts (CPU - not currently bottleneck)

SIMD / SoA / multithreading (if game logic makes CPU hot again)

other ideas that aren't about optimization

scanline shader

Powered by Forgejo Version: 9.0.3+gitea-1.22.0 Page: 27ms Template: 1ms

English

Bahasa Indonesia Deutsch English Español Esperanto Filipino Français Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Slovenščina Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API