My goal is to make the most amazing script extraction/insertion utility EVAR so that nobody will need to write their own custom script extraction/insertion utility ever again...
I'm happy to discuss this elsewhere since it's off topic. I feel it necessary to mention something that it seems like no one takes into consideration when dumping Japanese scripts.
Most 8-bit games that deal with Japanese voiced characters (those with diacritics), place the diacritical mark on a separate font tile. In the script, when printed with a text routine, these marks normally appear directly before the unvoiced character, or directly after. There's a really easy way to deal with this when making a table file though I've never seen anyone other than myself make their tables this way.
The above is an example of a script that places the diacritical mark ゛, here represented with 3A, after the unvoiced character. When the script is dumped with this table file, the character appears properly as が, as opposed to か゛.
This is mostly useful in doing copy/paste dictionary look-ups, but I can't imagine even the most experienced professional translator doesn't at least occasionally
need to look up vocab. The fact that voiced characters are almost never properly dumped unnecessarily wastes the time of translators IMO.
Another common instance is fixed length text that places diacritics on a preceding line.
// ゛ ゛
The above text from Magic Knight Rayearth on the Game Gear is 18 characters long, and places diacritics on the line immediately preceding the unvoiced characters.
I recently wrote a dumper for Saint Tail on the Game Gear in PHP, mostly because it uses some weird padding, which I wrote to deal with these use cases, including diacritics that appear on a line 2 lines preceding the unvoiced characters as indicated below.
It dumps the text properly as follows:
ミニゲーム １ [BR]
I'd expect a dumper to be able to handle these cases, and others that arise, concerning voiced characters.