First: look at the layers and tilesets to see how the text is drawn.
A possible scenario is that it has a section of tiles dedicated to that textbox layer, and if you change that VRAM directly (say, using the debugger), the displayed text image would change.
Let's say that's the case. Most likely, there's some function that writes to those tiles when decoding the string from the game data, probably copying the font into the tileset, one letter at a time.
If so, you'd want to set a breakpoint (BEFORE the text appears/changes) on that video memory, and then make the text show/change. This will point you near the code that modifies that memory. You might have to step around, in case you find yourself in a utility function that copies memory based on parameters.
At some point, you'll likely find a loop, decoding one character from the string at a time and translating that to a font index, then copying the respective font character image to the tileset.
You'd essentially just change how it updates this destination position to account for an 8x16 font (or even VWF.) Depending on how the tiles are formatted, you may have to adjust the stride it uses to copy a bit too.
"Exactly how" really depends on exactly how the game works, but the above is a guideline. It should work pretty much the same way for most tile-based systems, not just the SNES.