32bit images weren't used very often, and neither was the 32bit->24bit->21bit video output.
Where you'll see 32bit images used is where you need to preserve a gradient, a color image needs more than a single level of transparency and you don't want to xor it against an i8 image (incidentally more common than using 32bit images), or you're layering images that otherwise would pixelate heavily (such as gradients). The other 98% of the time you're using 16bit color.
That additional degree of precision is usually preserved (but doesn't need to be) until the RDP starts filling. Depending on the framebuffer depth is at what point that precision is lost.
Actual N64 output is always 21bits--7bits per channel. In 16bit output mode, each channel starts with 5bits of color, with the two additional bits provided (optionally) by noise dither and gamma correction.
In 32bit output mode, the alpha channel is ignored, each channel's LSBs are removed, and VI effects are applied. There aren't a lot of reasons to use 32bit out, but if you're dumping framebuffers for screenshots it's useful (the framebuffer(s) is still 32bit), and you can also get away with using only one framebuffer. Honestly, triple-buffering 16bit ones is more practical and looks better.
Incidentally, most fonts in commercial titles are greyscale images or IA images. They can set the "foreground" and "background" color to colorize white and black to colors of your choosing, including transparency. Fonts with palettes are by far much rarer.