F3DEX3
|
The base version of F3DEX3 was created for RDP bound games like OoT, where new visual effects are desired and increasing the RSP time a bit does not affect the overall performance. If your game is RSP bound, using the base version of F3DEX3 will make it slower.
Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on all critical paths in the microcode, including command dispatch, vertex processing, and triangle processing. Then, the RDP and memory traffic performance improvements of F3DEX3–56 vertex buffer, auto-batched rendering, etc.–should further improve performance from there. This means that switching from F3DEX2 to F3DEX3_LVP_NOC should always improve performance regardless of whether your game is RSP bound or RDP bound.
These are cycle counts for many key paths in the microcode. Lower numbers are better. The timings are hand-counted taking into account all pipeline stalls and all dual-issue conditions. Instruction alignment after branches is sometimes taken into account, otherwise assumed to be optimal.
Vertex / lighting numbers assume no special features (texgen, packed normals, etc.) Tri numbers assume texture, shade, and Z, and not flushing the buffer. All numbers assume default profiling configuration. Empty cells are "not measured yet".
F3DEX2 | F3DEX3_LVP_NOC | F3DEX3_LVP | F3DEX3_NOC | F3DEX3 | |
---|---|---|---|---|---|
Command dispatch | 12 | 12 | 12 | 12 | 12 |
Small RDP command | 14 | 5 | 5 | 5 | 5 |
Vtx before DMA start | 16 | 17 | 17 | 17 | 17 |
Vtx pair, no lighting | 54 | 54 | 81 | 79 | 98 |
Vtx pair, 0 dir lts | Can't | 64 | |||
Vtx pair, 1 dir lt | 73 | 70 | 96 | 182 | 201 |
Vtx pair, 2 dir lts | 76 | 77 | 103 | 211 | 230 |
Vtx pair, 3 dir lts | 88 | 84 | 110 | 240 | 259 |
Vtx pair, 4 dir lts | 91 | 91 | 117 | 269 | 288 |
Vtx pair, 5 dir lts | 103 | 98 | 124 | 298 | 317 |
Vtx pair, 6 dir lts | 106 | 105 | 131 | 327 | 346 |
Vtx pair, 7 dir lts | 118 | 112 | 138 | 356 | 375 |
Vtx pair, 8 dir lts | Can't | 119 | 145 | 385 | 404 |
Vtx pair, 9 dir lts | Can't | 126 | 152 | 414 | 433 |
Light dir xfrm, 0 dir lts | Can't | 95 | 95 | None | None |
Light dir xfrm, 1 dir lt | 141 | 95 | 95 | None | None |
Light dir xfrm, 2 dir lts | 180 | 96 | 96 | None | None |
Light dir xfrm, 3 dir lts | 219 | 121 | 121 | None | None |
Light dir xfrm, 4 dir lts | 258 | 122 | 122 | None | None |
Light dir xfrm, 5 dir lts | 297 | 147 | 147 | None | None |
Light dir xfrm, 6 dir lts | 336 | 148 | 148 | None | None |
Light dir xfrm, 7 dir lts | 375 | 173 | 173 | None | None |
Light dir xfrm, 8 dir lts | Can't | 174 | 174 | None | None |
Light dir xfrm, 9 dir lts | Can't | 199 | 199 | None | None |
Only/2nd tri to offscreen | 27 | 26 | 26 | 26 | 26 |
1st tri to offscreen | 28 | 27 | 27 | 27 | 27 |
Only/2nd tri to clip | 32 | 31 | 31 | 31 | 31 |
1st tri to clip | 33 | 32 | 32 | 32 | 32 |
Only/2nd tri to backface | 38 | 38 | 38 | 38 | 38 |
1st tri to backface | 39 | 39 | 39 | 39 | 39 |
Only/2nd tri to degenerate | 42 | 40 | 40 | 40 | 40 |
1st tri to degenerate | 43 | 41 | 41 | 41 | 41 |
Only/2nd tri to occluded | Can't | Can't | 49 | Can't | 49 |
1st tri to occluded | Can't | Can't | 50 | Can't | 50 |
Only/2nd tri to draw | 172 | 160 | 163 | 160 | 163 |
1st tri to draw | 173 | 160 | 163 | 160 | 163 |
Tri numbers are measured from the first cycle of the command handler inclusive, to the first cycle of whatever is after $ra exclusive. This is in order to capture the extra latency and stalls in F3DEX2.
Vertex processing time as reported by the performance counter in the PA
configuration.
Microcode | Scene 1 | Scene 2 | Scene 3 |
---|---|---|---|
F3DEX3 | 7.41ms | 2.99ms | 2.22ms |
F3DEX3_NOC | 6.85ms | 2.75ms | 1.98ms |
F3DEX3_LVP | 4.12ms | 1.59ms | 1.48ms |
F3DEX3_LVP_NOC | 3.34ms | 1.27ms | 1.16ms |
F3DEX2 | Can't* | Can't* | Can't* |
Vertex count | 3557 | 1548 | 1548 |
*F3DEX2 does not contain performance counters, so the portion of the RSP time taken for vertex processing cannot be measured.