Technically isn't Shift-JIS 'blank' just 20? (as in, backwards compatible with ASCII. A character below 80 is a single-byte ASCII and 80+ indicates both a Japanese character as well as the first byte of which character?)
However of course, games might not adhere to that and expect 2 bytes per character.
Ah~, I see! I'm pretty slow at figuring this stuff out, as it wasn't until I started asking Google what all the 'OA' characters pre-pending or appending some strings meant; of course it means a newline
So far I've only located eight strings, which is a really poor effort for two days of work. And the last two of those, were done by manually matching the squiggle I can see in the emulator to a table of Shift-JIS characters, which is very tedious.
A new approach is surely required.
I looked at the characters surrounding the strings I'd located in RAM/CD file, to try and notice a pattern. '00' often surrounds strings; sometimes '0020' (or '20', as I've now learnt
) is used, meaning a newline; it's not unusual to see an '8148' for a question mark, which is also a good way to decide where a string should end; and sometimes there are unexplained characters that could start or terminate a string for many reasons, like users before helping me have mentioned
We can now begin to automate. So I wrote a program, which takes the binary file on the Saturn's CD, and breaks apart the stream of bytes by the characters I mention above - 00s, 20s, 8148s, etc.
You can see here, that it works very nicely, because the green boxes show where it has detected a string which is one of the few that I can verify is correct:
Bigger version: http://imgur.com/oWqL8l4
But the seventh string, highlighted in my spreadsheet, has the bytes '06 01 21 E4' before the string shown in the game UI. Therefore, how would the extraction tool know where the string begins, if I can't tell the tool whether or not it should separate lines on those bytes! Like below:http://imgur.com/lLtanf8
Because the extraction tool detects the character(s) to separate lines on, it is only as smart as I am, and that is not that smart, so this tool seems really not that useful.
Of the eight strings I've manually found, three of them have random bytes pre-pending the actual string (that is shown in the game's UI).
Like this: E5 32
And this: 06 03 29 DC
And this one I mentioned: 06 01 21 E4
What the heck do they mean? I can't detect any kind of significance or pattern to these random bytes, so the extraction tool is useless! Blast it~...
August 27, 2017, 04:04:09 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
After taking a day-off from this, and perhaps thinking it was time to stop, I took a closer look at the strange bytes that prepend the genuine byte-strings, causing my extraction tool not to be useful.
So, it goes like this: 00/0A --> weird bytes --> genuine string --> 00/0A.
These are the bytes that preceed three genuine strings:
It's strange how there's a pattern to them. The byte '06' is often seen, with either '01', '03', or '05' directly after it. Followed by two random bytes. And then the '060N' appears again.
I decided to fiddle with one of the three, repeatedly, in the emulator, to observe possible changes that might indicate what the bytes do. Of course, I had to document the 'experiment':
Bigger version: https://imgur.com/a/rfsPv
Not very useful information. In fact, I would even say, this experiment was a complete waste of time.
But, you know what just dawned on me? Those bytes before the genuine strings look like they could be memory addresses.
All genuine strings I have found so far, start at '06' in the game's RAM; the furthest genuine string I have found in RAM, so far, is '0641'.
So, I think that these patterns of bytes going '0605NNNN0605NNNN', '0601NNNN0601NNNN0601NNNN0601NNNN0603NNNN', and '0601NNNN0601NNNN' are pointing to locations in the game's memory
What do the two-byte pairs after the addresses do? I do not know. And why are these address pointers placed directly before a real string? I also do not know.BUT~
A quick Google has revealed that this is not unusual, and that there is information from other people on the Internet and this forum on this, so further reading will reveal light on this matter for me, I bet