What are the tradeoffs for all these new features?

In other words, when is F3DEX3 worse than F3DEX2?

Vertex processing RSP time for occlusion plane

In the occlusion plane F3DEX3 configuration, vertex processing is slower than in F3DEX2. If using this configuration and there is no occlusion plane or it is occluding almost nothing, the RSP will be slower with no other benefit.

However, when the occlusion plane is occluding even a few percent of the triangles in the scene, the situation changes. This saves RDP time, and most games are RDP bound, so this trades off RSP time for RDP time and makes the game faster overall. Plus, RSP time is also saved for the tris which are not drawn, which can approximately cancel out the extra RSP time for computing the occlusion plane for all vertices.

Functionality in Overlay 3

The following commands are moved to Overlay 3 in F3DEX3 to save IMEM space. This means that code will have to be loaded from DRAM to run them if Overlays 2 or 4 (for lighting) happen to be loaded already.

Push and multiply codepaths for SPMatrix
SPPopMatrix*
SPDma*
SPMemset

However:

Multiplying, pushing, and popping matrices is not recommended for performance or accuracy, and these are not used for most 3D objects in SM64 or OoT.
SPDma* is rarely used except at startup for HLE detection.
SPMemset is a new F3DEX3 command which can improve performance. Plus, it is typically run shortly after render start, when Overlay 3 is already in IMEM.

So there is not a significant practical performance impact from these changes.

Far clipping removal

Far clipping is completely removed in F3DEX3. Far clipping is not intentionally used for performance or aesthetic reasons in levels in vanilla SM64 or OoT, though it can be seen in certain extreme cases. However, it is used on the SM64 title screen for the zoom-in on Mario's face, so this will look slightly different.

Far clipping can be used to cull tris which are fully "fogged out" if the background color (no skybox) is also the fog color, for performance benefits. This effect has a bad reputation in '90s era games for being used as a cheap trick to hide performance problems, though it's occasionally used in "spooky" levels in romhacks. In F3DEX3, SPAlphaCompareCull can be used instead of far clipping to cull these triangles which are fully in fog.

The removal of far clipping saved a bunch of DMEM space, and enabled other changes to the clipping implementation which saved even more DMEM space.

NoN (No Nearclipping) is also mandatory in F3DEX3, though this was already the microcode option used in OoT. Note that tris are still clipped at the camera plane; nearclipping means they are clipped at the nearplane, which is a short distance in front of the camera plane.

Removal of scaled vertex normals

A few clever romhackers figured out that you could shrink the normals on verts in your mesh (so their length is less than "1") to make the lighting on those verts dimmer and create a version of ambient occlusion. In the "advanced" lighting codepath, F3DEX3 normalizes vertex normals after transforming them, which is required for point lights, specular, and Fresnel, so this no longer works. However, F3DEX3 has support for ambient occlusion via vertex alpha, which accomplishes the same goal with some extra benefits:

Much easier to create: just paint the vertex alpha in Blender / fast64. The scaled normals approach was not supported in fast64 and had to be done with scripts or by hand.
The amount of ambient occlusion in F3DEX3 can be set at runtime based on variable scene lighting, whereas the scaled normals approach is baked into the mesh.
F3DEX3 can have the vertex alpha affect ambient, directional, and point lights by different amounts, which is not possible with scaled normals. In fact, scaled normals never affect the ambient light, contrary to the concept of ambient occlusion.

The only case where scaled normals work but F3DEX3 AO doesn't work is for meshes with vertex alpha actually used for transparency (therefore also no fog).

Note that in the "basic" lighting codepath in F3DEX3, vertex normals are treated the same way as in F3DEX2, so scaled normals are supported there. Ambient occlusion is also supported there.

RDP temporary buffers shrinking

In FIFO versions of F3DEX2, there are two DMEM buffers to hold RDP commands generated by the microcode, which are swapped and copied to the FIFO in DRAM. These each had the capacity of two-and-a-fraction full-size triangle commands (i.e. triangles with shade, texture, and Z-buffer). For short commands (e.g. texture loads, color combiner, etc.) there is a slight performance gain from having longer buffers in DMEM which are swapped to DRAM less frequently. And, if a substantial portion of triangles were rendered without shade or texture such that three tris could fit per buffer, being able to fit the three tris would also slightly improve performance. However, in practice, the vast majority of the FIFO is occupied by full-size tris, so the buffers are effectively only two tris in size because a third tri can't fit. So, their size has been reduced to two tris, saving a substantial amount of DMEM.

Segment 0

Segment 0 is now reserved: ensure segment 0 is never set to anything but 0x00000000. In F3DEX2 and prior this was only a good idea (and SM64 and OoT always follow this); in F3DEX3 segmented addresses are now resolved relative to other segments. That is, gsSPSegment(0x08, 0x07001000) sets segment 8 to the base address of segment 7 with an additional offset of 0x1000. So for correct behavior when supplying a direct-mapped or physical address such as 0x80101000, segment 0 must always be 0x00000000 so that this address resolves to e.g. 0x101000 as expected in this example.

Non-textured tris

In F3DEX2, the RSP time for drawing non-textured tris was significantly lower than for textured tris, by skipping a chunk of computation for the texture coefficients if they were disabled. In F3DEX3, no computation is skipped when textures are disabled. However, almost all materials use textures, and F3DEX3 is a little faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still does not send the texture cofficients if they are disabled, saving DRAM access time for RSP -> FIFO and FIFO -> RDP. RDP time savings from avoiding loading a texture are unaffected of course.

Obscure semantic differences from F3DEX2 that should never matter in practice

Changing fog settings–i.e. enabling or disabling G_FOG in the geometry mode or executing SPFogFactor or SPFogPosition–between loading verts and drawing tris with those verts will lead to incorrect fog values for those tris. In F3DEX2, the fog settings at vertex load time would always be used, even if they were changed before drawing tris.
Drawing tris overwrites the 4 bytes stored with G_RDPHALF_1, which is used to hold state during some display list macros which are actually two 8-byte commands. This change is not noticeable when using standard GBI commands, only if something highly custom has been set up.