Your statement is rather vague, but I generate each sample one at a time at their correct times. So obviously, the S-DSP routine is called and generates exactly one sample ~32,000 times a second. On the real SNES, of course, the S-DSP runs much faster than 32khz, and uses it's extra time to perform memory fetches and internal calculations (output rate is still 32khz, of course). Nobody in the SNES emulation scene has the skill necessary for determining what and when the S-DSP is fetching, so current emulation as in bsnes is about as accurate as you're going to get. anomie wrote the core, and he basically read back the mixed results and compared to his emulation to get it to where it is today.
I should also mention that any "is this an emulator?" checks that rely on more than 32khz precision of the S-DSP would be extremely unreliable on real hardware as well. The S-SMP isn't fast enough and lacks true precision timers necessary to accurately gauge interactions between it and the S-DSP at the clock level.
SNES9x and ZSNES run several samples at the same time, but they tend to invalidate their caches under certain conditions anyway, so the results are pretty close. You should be more shocked that ZSNES doesn't even have opcode timing for its' S-SMP (SPC700) emulation. It treats nop and div as taking the same amount of time to complete.
As for your VRAM question, perhaps you're not taking into account that each tilemap entry has a 10-bit character index, giving you 1,024 tiles to work with per BG. I often stick 8x8 variable-width tiledata after the 8x8 font in VRAM. This is usually reserved for textbox data or something. You can use extra memory around it, or use it all when the textbox isn't onscreen anyway. And if d4s were so inclined, moving the location of the tiledata in VRAM is doable, too (eg if there is background data immediately after the 256 characters used by the 8x8 font). But way more work.