Palette RAM, in particular, is on the same bus as the rest of the VRAM.
It's not that obvious in fact. You access to palette RAM as if you access other parts of VAM (I mean name, attribute and pattern tables), but the address and data does not physically show on the bus, the PPU simply stores the data internally and blocks those read/writes to happen on the bus. Because the nametable chip is not entierely address decoded, it's data will mirror to $3000-$3fff and if any access to those areas would show on the bus, it would access mirrors of name/attribute tables. I know it's strange but that's what they did.
For SNES, GBA and probably many later Nintendo systems, the palette is separate form other VRAM (tilemap and tile data).
This I would like to hear more about, but I think "quite complex"/"dirty trick" is the difference between "possible" and "suitable". The only possibility I can think of is building pre-rendered bits of VWF and pasting them together.
Well there is 2 things I had in mind :
1) If a dialog is very small and uses less than 8 letters per line, a VWF can be made with sprites
2) If a dialog uses the entire screen width and that the main game uses only a single nametable, and that the total VWF area (not including whitespaces) is not more than 30 pixels in heighth, it is doable with CHR-ROM and raster timing. By having a 4k CHR-ROM page where each tile is simply it's number shown in graphics (I'm not sure how to tell this, tile #0 would be blank, tile #1 would have only a single white pixel, and tile #255 would have a entiere horizontal line), it's possible to draw arbitrary graphics by changing the vertical scroll every line.