Philosophy

The base version of F3DEX3 was created for RDP bound games like OoT, where new visual effects are desired and increasing the RSP time a bit does not affect the overall performance. If your game is RSP bound, using the base version of F3DEX3 will make it slower.

Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on all critical paths in the microcode, including command dispatch, vertex processing, and triangle processing. Then, the RDP and memory traffic performance improvements of F3DEX3–56 vertex buffer, auto-batched rendering, etc.–should further improve performance from there. This means that switching from F3DEX2 to F3DEX3_LVP_NOC should always improve performance regardless of whether your game is RSP bound or RDP bound.

Performance Results

Cycle Counts

These are cycle counts for many key paths in the microcode. Lower numbers are better. The timings are hand-counted taking into account all pipeline stalls and all dual-issue conditions. Instruction alignment after branches is sometimes taken into account, otherwise assumed to be optimal.

Vertex / lighting numbers assume no special features (texgen, packed normals, etc.) Tri numbers assume texture, shade, and Z, and not flushing the buffer. All numbers assume default profiling configuration. Empty cells are "not measured yet".

	F3DEX2	F3DEX3_LVP_NOC	F3DEX3_LVP	F3DEX3_NOC	F3DEX3
Command dispatch	12	12	12	12	12
Small RDP command	14	5	5	5	5
Vtx before DMA start	16	17	17	17	17
Vtx pair, no lighting	54	54	81	79	98
Vtx pair, 0 dir lts	Can't	64
Vtx pair, 1 dir lt	73	70	96	182	201
Vtx pair, 2 dir lts	76	77	103	211	230
Vtx pair, 3 dir lts	88	84	110	240	259
Vtx pair, 4 dir lts	91	91	117	269	288
Vtx pair, 5 dir lts	103	98	124	298	317
Vtx pair, 6 dir lts	106	105	131	327	346
Vtx pair, 7 dir lts	118	112	138	356	375
Vtx pair, 8 dir lts	Can't	119	145	385	404
Vtx pair, 9 dir lts	Can't	126	152	414	433
Light dir xfrm, 0 dir lts	Can't	95	95	None	None
Light dir xfrm, 1 dir lt	141	95	95	None	None
Light dir xfrm, 2 dir lts	180	96	96	None	None
Light dir xfrm, 3 dir lts	219	121	121	None	None
Light dir xfrm, 4 dir lts	258	122	122	None	None
Light dir xfrm, 5 dir lts	297	147	147	None	None
Light dir xfrm, 6 dir lts	336	148	148	None	None
Light dir xfrm, 7 dir lts	375	173	173	None	None
Light dir xfrm, 8 dir lts	Can't	174	174	None	None
Light dir xfrm, 9 dir lts	Can't	199	199	None	None
Only/2nd tri to offscreen	27	26	26	26	26
1st tri to offscreen	28	27	27	27	27
Only/2nd tri to clip	32	31	31	31	31
1st tri to clip	33	32	32	32	32
Only/2nd tri to backface	38	38	38	38	38
1st tri to backface	39	39	39	39	39
Only/2nd tri to degenerate	42	40	40	40	40
1st tri to degenerate	43	41	41	41	41
Only/2nd tri to occluded	Can't	Can't	49	Can't	49
1st tri to occluded	Can't	Can't	50	Can't	50
Only/2nd tri to draw	172	160	163	160	163
1st tri to draw	173	160	163	160	163

Tri numbers are measured from the first cycle of the command handler inclusive, to the first cycle of whatever is after $ra exclusive. This is in order to capture the extra latency and stalls in F3DEX2.

Measurements

Vertex processing time as reported by the performance counter in the PA configuration.

Scene 1: Kakariko, adult day, from DMT entrance
Scene 2: Custom empty scene with Suzanne monkey head with 1 dir light
Scene 3: Same but Suzanne has vertex colors instead of lighting (Link is still on screen and has lighting)

Microcode	Scene 1	Scene 2	Scene 3
F3DEX3	7.41ms	2.99ms	2.22ms
F3DEX3_NOC	6.85ms	2.75ms	1.98ms
F3DEX3_LVP	4.12ms	1.59ms	1.48ms
F3DEX3_LVP_NOC	3.34ms	1.27ms	1.16ms
F3DEX2	Can't*	Can't*	Can't*
Vertex count	3557	1548	1548

*F3DEX2 does not contain performance counters, so the portion of the RSP time taken for vertex processing cannot be measured.