F3DEX3
|
There are several selectable configuration settings when building F3DEX3, which can be enabled in any combination. With a couple minor exceptions, none of these settings affect the GBI–in fact, you can swap between the microcode versions on a per-frame basis if you build multiple versions into your romhack.
If you are not using the occlusion plane feature in your romhack, you can use this configuration, which removes the computation of the occlusion plane in the vertex processing pipeline, saving some RSP time.
If you care about performance, please do consider using the occlusion plane! RDP time savings of 3-4 ms are common in scenes with reasonable occlusion planes, and even saving a third of the total RDP time can sometimes happen. Furthermore, when even a small percentage of the total triangles drawn are occluded, not only is RDP time saved (which is the point), but RSP time is also saved from not having to process those tris. This can offset the extra RSP time for computing the occlusion plane for all vertices.
You can also build both the NOC and base microcodes into your ROM and switch between them on a per-frame basis. If there is no occlusion plane active or the best occlusion plane candidate would be very small on screen, you can use the NOC microcode and save RSP time. If there is a significant occlusion plane, you can use the base microcode and reduce the RDP time. You could also determine which version to use on the profiling results from the previous frame: if the RSP is the bottleneck (e.g. the RDP CLK - CMD
is high), use the NOC version, and otherwise use the base version.
The primary tradeoff for all the new lighting features in F3DEX3 is increased RSP time for vertex processing. The base version of F3DEX3 takes about 2-2.5x more RSP time for vertex processing than F3DEX2 (see Performance Results section below), assuming no lighting or directional lights only. You should use the F3DEX3 performance counters (see below) to determine whether your game is usually RSP or RDP bound.
If your game is usually RDP bound–like OoT–this generally will not affect the game's overall framerate, so you should stick with base F3DEX3:
However, for RSP bound or extremely optimized (Kaze Emanuar) games, base F3DEX3 can become a bottleneck, so the Legacy Vertex Pipeline (LVP) configuration has been introduced.
This configuration replaces F3DEX3's native vertex and lighting code with a faster version based on the same algorithms as F3DEX2. This removes:
However, it retains all other F3DEX3 features:
With both LVP and NOC enabled, F3DEX3 is faster on the RSP than F3DEX2 (see Performance Results).
As mentioned above, F3DEX3 includes many performance counters. There are far too many counters for a single microcode to maintain, so multiple configurations of the microcode can be built, each containing a different set of performance counters. These can be swapped while the game is running so the full set of counters can be effectively accessed over multiple frames.
There are a total of 21 performance counters, including:
The default configuration of F3DEX3 provides a few of the most basic counters. The additional profiling configurations, called A, B, and C (for example F3DEX3_BrZ_PA
), provide additional counters, but have two default features removed to make space for the extra profiling. These two features were selected because their removal does not affect the RDP render time.
SPLightToRDP
commands are removed (they become no-ops)!G_SHADING_SMOOTH
, is removed (all tris are smooth)BrZ
/ BrW
)Use BrZ
if the microcode is replacing F3DEX2 or an earlier F3D version (i.e. SM64), or BrW
if the microcode is replacing F3DZEX (i.e. OoT or MM). This controls whether SPBranchLessZ*
uses the vertex's W coordinate or screen Z coordinate.
dbgN
)Debug Normals has been moved out of the Makefile as it is not a microcode version intended to be shipped. It can still be enabled by changing CFG_DEBUG_NORMALS equ 0
to 1
in the microcode.
To help debug lighting issues when integrating F3DEX3 into your romhack, this feature causes the vertex colors of any material with lighting enabled to be set to the transformed, normalized world space normals. The X, Y, and Z components map to R, G, and B, with each dimension's conceptual (-1.0 ... 1.0) range mapped to (0 ... 255). This is not compatible with LVP as world space normals do not exist in that pipeline. This also breaks vertex alpha and texgen / lookat.
Some ways to use this for debugging are: