So, I finally got to write myself the proper script conversion tool and ran my compression algorithms on them. I do no know which table you used, depending on where the letters are placed it could affect the compression ratio a little. I made up some values just to fit them (punctuation symbols, which differs from the japanese anyway).
I assumed all messages are finished by a $00 value, and that empty message are only a $00 values. By making them point to an existing $00, those could be optimized out (but I didn't do that yet).
The script contains once a weird [*C] hyphen I have no idea what it means (bank2, message 74), so I just used [0C] instead as a placebo value.
For the whole script I have 96548 bytes. The best compression is with the Huffman algorithm. Huffman reduces the script to 61532 bytes, however this includes a length table, but since all messages are $00 terminated, the length table can be optimized out. There is a total of 1732 messages, meaning that by suppressing the 16-bit length table, 3464 bytes can be suppressed and the total size of compressed script would be 58068 bytes, a good 40% savings on the uncompressed script.
Huffman is however not the most practical compression to implement, it is simple enough but would be slow for rare symbols as it requires going through the entire table of frequent symbols first, and figure we're not decoding any of them. This slowdown could potentially cause problems if implemented as a romhack in a game that does not expect a slowdown here.
Byte pair/DTE/dictionary based encodings works poorly because some messages in bank2 uses symbols in the $30-$4f range, blocking this range which would normally be used for that. So my temporary estimate as how this would be done is leave bank 2 uncompressed and compress the other banks. Then I add uncompressed bank 2's size to estimate the total. The most interesting is the recursive byte pair encoding (basically it's exactly the compression used in AW Jackson's translation, except a byte pair can point to another byte pair, making the algorithm more powerful). So unlike Huffman this compression alrogithm is almost free, it's about 20 assembly instructions to get it working. Using this we could fit everything in 65783 bytes, a good ~31% of savings on the original size. But by altering the table in order to "pack" all used symbols near the end of the table as much as possible this ratio could be improved, as they'd be room for more byte pairs. Here I have byte pairs between $15 and $60. I'm fairly sure it could be enlarged to include at least $15 to $80, if this can be done, then maybe we can reach compression ratio almost as good as huffman, but it might require some hacking. For example the tiles used to make window borders or other things not present in the script should be moved "up", and all used symbols should be moved "down".
Please tell me if any of those sizes would make the script fit in a 512kb ROM. If not, then I'll have to look at whether it's possible to compress the graphics.