News:

11 March 2016 - Forum Rules

Main Menu

Double-byte vs. single-byte text encoding

Started by Pokeytax, October 07, 2013, 09:32:41 PM

Previous topic - Next topic

Pokeytax

Couldn't easily find a thread but please feel free to link one if you can.

In past translations I've always converted the 16-bit text (Shift-JIS) to an 8-bit custom encoding, to avoid expanding files/RAM allocation and everything else like that.

I'm working on a relatively modern DS game for the first time, and I'm tempted to try using 16-bit text instead, in order to avoid the obstacles I'm used to dealing with.  Does anyone have advice on the tradeoffs involved?  I can see a lot of things that would go much more smoothly, but a lot of things that could be awful with approximately twice the bytes needing to fit everywhere.

Ryusui

You might not have to if the game uses SJIS encoding. SJIS is basically ASCII with extra codes for a full 16-bit Japanese character set.
In the event of a firestorm, the salad bar will remain open.

FAST6191

Though that is the case when you are dealing with a full shiftJIS implementation many of the DS ones might skip the ASCII side of things (though you will often still have the duplicates in the 82?? region), think U16 unicode vs proper unicode.

As for the OP... try it. I have encountered games where it is absolutely fine to do that and likewise I have encountered games that have to copy the entire file into memory or otherwise have memory limits far lower than the pointers of the format or the DS itself would seem to indicate and the usual English being slightly more verbose thing can trip it up, whether you then excise sections to bring it back down I leave to you to consider. Many games will manage to stream things properly, allocate memory properly (or allocate for a single large file and swap things out) and similar such things so you will probably encounter screen real estate issues long before you encounter memory issues.

The DS is not quite the land of unlimited memory/resources but where it is near unthinkable to waste 100 bytes on a NES/SNES game it is probably only something I would note when it rose up to bite me, which is far from a certainty.

Likewise in the cases of memory I would probably consider fixing the streaming/allocation before I attempted an 8 bit conversion of the encoding. That said some of the font/encoding systems are almost defined at file/format level rather than a mess of assembly/binary includes (poke around some of the NFTR stuff) or could take a DTE/MTE mod far more easily.

Pokeytax

Yeah, I've never found one that honors 00-7F ASCII... that space seems to generally be occupied by control codes.

Thanks for the advice, I'll give it a try first.  If it's not going to require massive intervention to get skill names, etc. to 16 characters instead of 8 then it's definitely worth a shot, and it's true that 8-bit conversion is still an option if it doesn't work.  There are relative pointers all over the place, though, so I'm not looking forward to fixing those when the text size blows up.

Bisqwit

If you are starved for space, the best solution still might be to create a custom encoding that uses single-byte encodings for the most common characters.

Zoinkity

It isn't DS, but the N64 AKI fighters used Shift-JIS that recognized ASCII as well.  It also accepted extended single-byte kana mapped to the unused banks.

Of course, just because it displays right doesn't mean that it can use it in every case.  In this case they encoded saved name strings by converting 16bit wide chars into 12bit indices in the font map, and that encoding breaks on ASCII.  They also used several widechar-exclusive snippets of code.  Come to think of it, just assume anything that had to do with text entry was widechar-exclusive.

That's the real trick: you don't so much need single versus wide as much as single+wide.  As far as printing goes that usually isn't an issue, but things like string lengths and anything having to do with string manipulation gets needlessly complicated.

Pokeytax

It turns out in this case that the game miraculously had a single-byte kana font already mapped to 27-7A. So, it goes to show... always check what you've got before you start taking out walls!