I recently wrote a PHP script that I call "table helper" that I talk about a little here: https://youtu.be/ThpHBoxK3Ss
It's really just a time saver, but it relies on, and exposes this issue of Japanese kana and kanji order that you're talking about.
I made this because I'm working on a bunch of 8-bit games right now, mostly Game Gear, but also Famicom that I just want to dump scripts for so I can translate them. Nothing fancy. I use the following character sets:
You can check the video to see how I use these to save some time typing out hex values, but suffice to say these character sets cover many instances of the order of these characters, but not all. In your first Japanese class you'll learn the kana alphabet, it has an order. You'll also learn a way to help remember the order a, ka, sa, ta, na, ha, ma, ya, ra, wa.
Here is hiragana in order:
No text encoding in a video game has to follow this order, and many don't. The main areas that the kana syllabary tends to differ from this order is as follows.
Sometimes there is significant difference in the last 3 characters. Most common I see is をん transposed as んを. Sometimes を is somewhere else, like at the beginning, or mixed in with other characters, so it's just わん. This order can be pretty variable.
Also the voiced characters, i.e. the ones that can have diacritics, are sometimes on their own in the encoding, and sometimes they are mixed with their unvoiced counterparts. KingMike mentioned this in the other thread. it will look as follows.
Also there are small kana:
They are also sometimes inline with their large counterparts:
But normally they are on their own. The order can vary. I've seen:
and other variations.
What this all means is that the "safest" kana to search for when relative searching on Japanese characters are the ones with no extra voicing, and no large/small versions.
if you can find a sequence of 3-4 of these characters in a row, your chances of a relative search hit are high. A variation in the "や" characters like やゃゆゅよょ could alter the relationship of the な and ま characters to the ら characters, so the absolute safest would be a series of 3-4 なにぬねのまみむめも characters together.
The best way to search for these that I've found is to tag their order:
And then use TranslHextion "value scan relative" function. So a search for なにも would be "010209".
All of this is relative. I've encountered encodings where the whole alphabet is backward. I've also found ones where kana characters are simply missing. It just depends on the game and how much space they had, etc... Frequently just seeing the font stored in the ROM can help you sort this out. If there is a non-standard order, you can adjust your relative search to compensate.
With kanji, it gets more complicated, and I myself have said that there isn't necessarily an order, but I've had the chance to make a few kanji table files recently, and there is
an order. A lot of characters are missing
, at least in 8-bit games to save space, but when IDing the kanji, just following the JIS order can be a huge help since the games I've worked with follow this order for the kanji, but only include the kanji that they use in the game script. Therefore when IDing, you can scan the JIS kanji set and pick out the characters that you see in the font tiles. I don't know if Unicode follows the same kanji order that JIS does, but I know it has more kanji in it.
EDIT: To add a bit of a conclusion, relative searching is not going to help you find the text encoding in every instance, but it can be helpful, and there is definitely an order to Japanese kana and kanji.