201 lines
5.5 KiB
Text
201 lines
5.5 KiB
Text
rops: render output units
|
|
=========================
|
|
|
|
what they are, where they came from, and what yours can do.
|
|
|
|
|
|
what is a rop?
|
|
--------------
|
|
|
|
ROP = Render Output Unit (originally "Raster Operations Pipeline")
|
|
|
|
it's the final stage of the GPU pipeline. after all the fancy shader
|
|
math is done, the ROP is the unit that actually writes pixels to memory.
|
|
|
|
think of it as the bottleneck between "calculated" and "visible."
|
|
|
|
a ROP does:
|
|
- depth testing (is this pixel in front of what's already there?)
|
|
- stencil testing (mask operations)
|
|
- blending (alpha, additive, etc)
|
|
- anti-aliasing resolve
|
|
- writing the final color to the framebuffer
|
|
|
|
one ROP can write one pixel per clock cycle (roughly).
|
|
|
|
|
|
the first rop
|
|
-------------
|
|
|
|
the term comes from the IBM 8514/A (1987), which had dedicated hardware
|
|
for "raster operations" - bitwise operations on pixels (AND, OR, XOR).
|
|
this was revolutionary because before this, the CPU did all pixel math.
|
|
|
|
but the modern ROP as we know it emerged with:
|
|
|
|
NVIDIA NV1 (1995)
|
|
one of the first chips with dedicated pixel output hardware
|
|
could do ~1 million textured pixels/second
|
|
|
|
3dfx Voodoo (1996)
|
|
the card that defined the modern GPU pipeline
|
|
had 1 TMU + 1 pixel pipeline (essentially 1 ROP)
|
|
could push 45 million pixels/second
|
|
that ONE pipeline ran Quake at 640x480
|
|
|
|
NVIDIA GeForce 256 (1999)
|
|
"the first GPU" - named itself with that term
|
|
4 pixel pipelines = 4 ROPs
|
|
480 million pixels/second
|
|
|
|
so the original consumer 3D cards had... 1 ROP. and they ran Quake.
|
|
|
|
|
|
what one rop can do
|
|
-------------------
|
|
|
|
let's do the math.
|
|
|
|
one ROP at 100 MHz (3dfx Voodoo era):
|
|
100 million cycles/second
|
|
~1 pixel per cycle
|
|
= 100 megapixels/second
|
|
|
|
at 640x480 @ 60fps:
|
|
640 * 480 * 60 = 18.4 megapixels/second needed
|
|
|
|
so ONE ROP at 100MHz could handle 640x480 with ~5x headroom for overdraw.
|
|
|
|
at 1024x768 @ 60fps:
|
|
1024 * 768 * 60 = 47 megapixels/second
|
|
|
|
now you're at 2x overdraw max. still playable, but tight.
|
|
|
|
|
|
one modern rop
|
|
--------------
|
|
|
|
a single modern ROP runs at ~1-2 GHz and can do more per cycle:
|
|
- multiple color outputs (MRT)
|
|
- 64-bit or 128-bit color formats
|
|
- compressed writes
|
|
|
|
rough estimate for one ROP at 1.5 GHz:
|
|
~1.5 billion pixels/second base throughput
|
|
|
|
at 1920x1080 @ 60fps:
|
|
1920 * 1080 * 60 = 124 megapixels/second
|
|
|
|
one ROP could handle 1080p with 12x overdraw headroom.
|
|
|
|
at 4K @ 60fps:
|
|
3840 * 2160 * 60 = 497 megapixels/second
|
|
|
|
one ROP could handle 4K with 3x overdraw. tight, but possible.
|
|
|
|
|
|
your three rops (intel hd 530)
|
|
------------------------------
|
|
|
|
HD 530 specs:
|
|
- 3 ROPs
|
|
- ~950 MHz boost clock
|
|
- theoretical: 2.85 GPixels/second
|
|
|
|
let's break that down:
|
|
|
|
at 1080p @ 60fps (124 MP/s needed):
|
|
2850 / 124 = 23x overdraw budget
|
|
|
|
that's actually generous! you could draw each pixel 23 times.
|
|
|
|
so why does lofivor struggle at 1M entities?
|
|
|
|
because 1M entities at 4x4 pixels = 16M pixels minimum.
|
|
but with overlap? let's say average 10x overdraw:
|
|
160M pixels/frame
|
|
at 60fps = 9.6 billion pixels/second
|
|
|
|
your ceiling is 2.85 billion.
|
|
|
|
so you're 3.4x over budget. that's why you top out around 300k-400k
|
|
before frame drops (which matches empirical testing).
|
|
|
|
|
|
the real constraint
|
|
-------------------
|
|
|
|
ROPs don't work in isolation. they're limited by:
|
|
|
|
1. MEMORY BANDWIDTH
|
|
each pixel write = memory access
|
|
HD 530 shares DDR4 with CPU (~30 GB/s)
|
|
at 32-bit color: 30GB/s / 4 bytes = 7.5 billion pixels/second max
|
|
but you're competing with CPU, texture reads, etc.
|
|
realistic: maybe 2-3 billion pixels for framebuffer writes
|
|
|
|
2. TEXTURE SAMPLING
|
|
if fragment shader samples textures, TMUs must keep up
|
|
HD 530 has 24 TMUs, so this isn't the bottleneck
|
|
|
|
3. SHADER EXECUTION
|
|
ROPs wait for fragments to be shaded
|
|
if shaders are slow, ROPs starve
|
|
lofivor's shaders are trivial, so this isn't the bottleneck
|
|
|
|
for lofivor specifically: your 3 ROPs are THE ceiling.
|
|
|
|
|
|
what could you do with more rops?
|
|
---------------------------------
|
|
|
|
comparison:
|
|
|
|
Intel HD 530: 3 ROPs, 2.85 GPixels/s
|
|
GTX 1060: 48 ROPs, 72 GPixels/s
|
|
RTX 3080: 96 ROPs, 164 GPixels/s
|
|
RTX 4090: 176 ROPs, 443 GPixels/s
|
|
|
|
with a GTX 1060 (25x your fill rate):
|
|
lofivor could probably hit 5-10 million entities
|
|
|
|
with an RTX 4090 (155x your fill rate):
|
|
tens of millions, limited by other factors
|
|
|
|
|
|
perspective: what 3 rops means historically
|
|
-------------------------------------------
|
|
|
|
your HD 530 has roughly the fill rate of:
|
|
- GeForce 4 Ti 4600 (2002): 4 ROPs, 1.2 GPixels/s
|
|
- Radeon 9700 Pro (2002): 8 ROPs, 2.6 GPixels/s
|
|
|
|
you're running hardware that, in raw pixel output, matches GPUs from
|
|
20+ years ago. but with modern features (compute shaders, SSBO, etc).
|
|
|
|
this is why lofivor is interesting: you're achieving 700k+ entities
|
|
on fill-rate-equivalent hardware that originally ran games with
|
|
maybe 10,000 triangles on screen.
|
|
|
|
the difference is technique. those 2002 games did complex per-pixel
|
|
lighting, shadows, multiple texture passes. lofivor does one texture
|
|
sample and one blend. same fill rate, 100x the entities.
|
|
|
|
|
|
the lesson
|
|
----------
|
|
|
|
ROPs are simple: they write pixels.
|
|
|
|
the number you have determines your pixel budget.
|
|
everything else (shaders, vertices, CPU logic) only matters if
|
|
the ROPs aren't your bottleneck.
|
|
|
|
with 3 ROPs, you have roughly 2.85 billion pixels/second.
|
|
spend them wisely:
|
|
- cull what's offscreen (don't spend pixels on invisible things)
|
|
- shrink distant objects (LOD saves pixels)
|
|
- reduce overlap (spatial organization)
|
|
- keep shaders simple (don't starve the ROPs)
|
|
|
|
your 3 ROPs can do remarkable things. Quake ran on 1.
|