Most N64 titles don't use "sprites", even non-3D games. They texture a rectangle (at low-level actually two triangles) with an image to display it. It's much like stretching an image onto a canvas then setting that canvas in a room, except your image is painted on rubber.
There's exceptions; Bangaioh's a decent example, and anything that draws directly to the frame buffer (many of which aren't emulated).
Paper Mario isn't compressed, and although PAL is larger (64MB unless I'm mistaken) you're still talking about needing more than six times the storage? Are you completely certain your video card is displaying the source images you're using at their full resolution?
Console won't work for this.
For simplicity's sake I'll just state at the resolution you're trying to achieve you won't be able to hit hardware at all. On an actual N64 you can't exceed the display depth or tmem; the best quality possible is 1 pixel in:1 pixel out. Using higher-quality source is a factor when doing the computations for elements, but this only goes so far.
Tmem limitations can be exceeded by applying the texture across a greater number of surfaces (multiple texrects versus just one) at the cost of longer RSP/RCP processing times. That's not a direct relationship either. Depending on if the depth filter is in use or how collisions are being detected you could wind up with, worst case, quadratic growth.
Actual image sizes overrunning rdram is pretty easy to work around, but you'd have to implement a load-on-demand scheme. It would be problematic if you're DMAing stupid amounts of data while trying to display them, but it's not impossible.
The problem with console does spread to emulation though!
The problem with depth on console is the problem with all graphics cards. Hi-def image replacement in emulation potentially has the same problem. You can not exceed the maximum depth currently displayed by the card, so that implicitly means there's a depth level at which an image no longer gains quality. In fact, depending on how that case is treated it could dither the image in a way that looks worse than a less complex source.
If you're talking about needing more than 4GB, your images would be, what? Increased by a factor of 100? You seriously need to check if that's truly what's being displayed or not.