I'm not too familiar with PCE, but usually a VWF doesn't take up so much space that you have to expand the ROM. If there's limited space though, you may have to.
It seems as if the game displays two rows of tiles per one row of japanese text, and the extra row add the plosive marks.
The best thing to do here would probably be to remove the code that draws the extra row and modify the code that increments the row position so it works for only have one row instead of two. Then you can add the extra VWF code (there might even be enough extra space from just the code you've removed).
The general technique I use when adding a VWF is this: (note: assuming the font is 1x1 tile, the process is similar if it's bigger)
1) Find an area in RAM with 2 bytes free, you'll be storing two values: the carry (the amount the next letter needs to be shifted) and the total width (to see when to do a newline - optional if you want to do this manually).
2) Modify the code that prints one letter so that it prints to two tiles at once - the proper tile and the next tile. This is necessary because, say the width of the letter is 8 pixels and the tile is 8 pixels wide, then shifting it even 1 pixel over would put 7 pixels of it on one tile and 1 on the next.
3) Now add shifting into the code. I usually just start with a constant value (like 4) to see if everything's working before I make it load from a width table. For the first tile, you want to bitshift the letter right the carry amount (which should be initialized to zero). For the next tile, you want to bitshift the same graphics left (tile width - carry), so if the carry is 5 and your tile width is 8, you'd be shifting it 3. This makes it so the letter is split with its first 3 tiles on the first tile and its next 5 on the second tile.
Note: There are a few issues that make this a little more complicated. 1) If there is a background color built into the tiles sometimes shifting the letters around will put "holes" in the background. This is fixable by shifting a block of background color and ORing it with the shifted letter graphics. I can explain more if needed. 2) If your font isn't 1bpp (which is probably the case) the shifting can be a bit more complicated. This usually isn't too difficult to figure out, for example in 4bpp games you just shift 4*carry instead of just carry.
4) Calculate the new carry value. I usually just do (carry + letter width) % tile_width. Note that this is the only place the width of the tile is used - everywhere else just uses the carry value. Note: only do this after the second half of the letter is drawn. Also, be sure to zero the carry value every time there is a newline or a new window of text or you'll get weird results.
5) Once you have that all working, add a width table - this is just a table of the widths of the font. You can look up width values by loading from [width_table_address + font_value] since each entry is only one byte. This takes up the most space out of anything, but depending on the system you should have more freedom of where to place it. Something you can do in a lot of games it place it where the kanji were in the Japanese font, since you don't need those anymore.
There's a few examples of VWF code in the documents section: http://www.romhacking.net/?page=documents&title=vwf
And here's one for a GBA game I coded: https://github.com/moozilla/Telefang-2-English-Translation-Project/blob/master/asm/VariableWidthFont.asm