Romhacking.net

Romhacking => Newcomer's Board => Topic started by: naxis on March 01, 2014, 03:32:51 pm

Title: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 01, 2014, 03:32:51 pm
I opened up a Super Famicom game in WindHex and discovered what looks like all the English text in the game readily available without making a table.

My question is if the Japanese text is also already in ASCII format, wouldn't I see the Japanese words just as I see the English ones?
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Gideon Zhi on March 01, 2014, 03:59:27 pm
Maybe, except Japanese is never, ever stored as ASCII. There aren't even 100 available characters in the ASCII codepage (it goes from 0x20-0x7E, roughly) which, if you stretch, might barely be enough for a full katakana and hiragana set depending on how diacriticals are handled, but it still wouldn't be "ASCII." To my knowledge there is no 8-bit Japanese text encoding standard.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 01, 2014, 05:22:49 pm
Some games have the game ID written in japanese letters (game ID is obligatory, and it must be in ASCII), but I'm almost sure it's a custom encoding of the console, not really ASCII. An example is Hero Senki (Hero Chronicle), which shows up as ヘロ センキ.

Anyway, just because some of the text is in ASCII, doesn't mean all the text in the game is. Sometimes you need a table for different instances of dialogue (opening, ending, in-battle dialogue). The most I've found (consistently) is credits written in ASCII.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Zoinkity on March 01, 2014, 05:46:09 pm
Some very, very old japanese PCs used a 7bit encoding.  In those cases diacriticals are seperate characters.  0x80 was used as a flag to denote an EOL follows the char.  It's not something you'd call a standard though, and we're talking systems like 80's era microcomputers. 
For the most part you'll only see either wide chars (2 byte chars) or multi-byte encodings--unless they omit all kanji and use something "custom". 

Internal text, like notes, debug strings, included files, etc. may be encoded in an entirely different method than the rest of the game.  If you're lucky, they may have already implemented a multibyte setup so you can switch between ASCII and whatever without effort.  (Really stinking lucky, more like it.)
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: tryphon on March 01, 2014, 06:49:16 pm
First, the question has no sense since ASCII doesn't contain any japanese characters (the 'A' is for 'American', so it's not even sufficient to code accentued letters in other languages using Latin-alphabet).

That said, many more-than-7-bits standard encodings extend ASCII ; that is, if 1 byte coded characters are allowed in the standard, they usually are the same than ASCII. IIRC, it's the case of S-JIS, which is likely the most often japanese encoding used in games.

1 byte japanese encodings exist, but as already said, they are not standard, and you have to make a table.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: VicVergil on March 01, 2014, 10:36:59 pm
As an example, Marvelous for the Super Famicom uses the 00-4F range for hiragana, 50-9F for katakana, A0-EF for latin characters, numbers, punctuation and then a handful of kanji, F0-FF are various control codes.
F5 followed by a byte gives you one of 255 kanji.

Terranigma has single byte values from 00-7F used in the English version as English characters, but in the JP version there's a switch control code before the text to make them switch between katakana/hiragana/latin... and 80/81/82... followed by another byte would give a kanji.
Secret of Mana manages to use in its Japanese version only single byte characters: they have the katakana/hiragana switch, and use a minimal set of kanjis in the 80-FF range.
Games like Bushi Seiriyuden and Chaos Seed have more than a thousand kanjis even (using dual byte, obviously)

Hearing about Lagrange Point... its text encoding is an ugly mess.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 01, 2014, 11:12:50 pm
Speaking of switches, how do you guys deal with them when you make table files? I was working on SRW1, and the whole dump was in katakana because of that. I just put a switch before the text (<kata> and <hira>) to improve readability.

You all probably use a custom dumper/inserter, but I would like to hear what are your takes on this.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 02, 2014, 08:02:32 am
Ok, I definitely see most if not all the English text from the game, but no Japanese characters. So it looks like I have to make a table for the Japanese dialogue after all.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: VicVergil on March 02, 2014, 09:22:18 am
Ok, I definitely see most if not all the English text from the game, but no Japanese characters. So it looks like I have to make a table for the Japanese dialogue after all.

Of course you have to. What did you think? :P
Shift-JIS wasn't really introduced until higher storage consoles came.
The earlier ones would almost always use custom values for Japanese characters.
I recommend Monkeymoore.
Try to find the visuals for the font in the ROM or VRAM to get the order the game uses
(sometimes あぁいぃうぅえぇおぉ・・・ or あいうえお・・・わをんぁぃぅぇぉゃゅょ 
かきくけこさ・・・がぎぐげご・・ or かがきぎくぐけげこごさちす・・・ )
When you get it you write it in Monkeymoore.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: gadesx on March 02, 2014, 12:44:38 pm
try this:

sjis tables for hexadecimal editors
http://dl.dropboxusercontent.com/u/22524283/tablas%20sjis.rar (http://dl.dropboxusercontent.com/u/22524283/tablas%20sjis.rar)
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 04, 2014, 10:21:31 pm
I tried looking in a few different tile editors. I can't find the font. I did use VSNES however and found this in the VRAM of the memory from a savestate:

(http://farm8.staticflickr.com/7450/13013594933_7c2c412c82_c.jpg) (http://www.flickr.com/photos/119847626@N08/13013594933/)

I can see a lot of the Kanji clearly, but the hiragana looks torn apart or something. How am I to make a table of the hiragana if i can't see the order it's placed in properly?
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: VicVergil on March 04, 2014, 11:40:43 pm
Neither the kanji nor the kana are showing "correctly", they're just divided in two.
To see it properly, extract the RAM's contents and open it with your favorite tile editor on the correct mode, with two tiles per line.
Well, that should be useful for the table building part...

Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 04, 2014, 11:47:04 pm
I can't see your screenshot very well, but I think the roman alphabet is a 8x8 font, and kana is 16x16. From what I'm seeing, two roman letters occupy the same space as one kana.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 05, 2014, 07:52:34 pm
Neither the kanji nor the kana are showing "correctly", they're just divided in two.
To see it properly, extract the RAM's contents and open it with your favorite tile editor on the correct mode, with two tiles per line.
Well, that should be useful for the table building part...

I'm sorry Ghanmi. What do you mean extract the ram's contents? How do I do that?
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 05, 2014, 08:33:44 pm
I'm sorry Ghanmi. What do you mean extract the ram's contents? How do I do that?
It's the same thing you did with VSNES. Save a state and open it with a tile editor. If you didn't know, a Save State is just a memory dump (RAM, in this case).
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 05, 2014, 09:00:02 pm
I thought that is what it might of meant. Looking at the save state in Tile Layer Pro and YY-CHR in different formats doesn't show the font clearly.

What is meant by "two tiles per line?"
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 05, 2014, 09:42:45 pm
In YY-CHR, change both Width and Height to 16. A "typical" tile is  8x8, so two tiles per line is 16x16... or is it 8x16? I honestly don't remember.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 08, 2014, 01:05:13 pm
I'm using YY-CHR to find the Japanese font in the game. I'm tried the 16x16 option and this is what I got:

(http://farm4.staticflickr.com/3031/13014219823_6ff5e7c432_c.jpg) (http://www.flickr.com/photos/119847626@N08/13014219823/)

Can't see the Japanese font clearly and it doesn't put them back as one either. There is no 8x16 option.

I tried all the options and the 2 bits per pixel came up most clearly:


(http://farm3.staticflickr.com/2303/13014068995_d0beb5310c_c.jpg) (http://www.flickr.com/photos/119847626@N08/13014068995/)

But still the Japanese font is in half. I'm still not sure how to make the Japanese characters become whole again when looking at them in a tile editor.

What am I missing in this scenario?
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 08, 2014, 01:28:45 pm
Which part of the game you took that savestate on? I took one right before a stage (when the stage title appears), and it shows pretty cleanly here on YY-CHR. Are there any other parts with japanese text?
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: KingMike on March 08, 2014, 02:48:27 pm
Doesn't Tile Molester have a Row Interleaved option or something that might help?
I'm tool lazy to install Java at the moment to check.

Also, you shouldn't be hacking the [h1-C] version, as that means it's a bad dump or something. Look for a ROM with [!] in the filename.
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: naxis on March 09, 2014, 12:54:50 am
Which part of the game you took that savestate on? I took one right before a stage (when the stage title appears), and it shows pretty cleanly here on YY-CHR. Are there any other parts with japanese text?

If you let the game run, there is an intro story that comes on. I took a screenshot of the following that resulted in the cut in half Japanese font:

(http://farm3.staticflickr.com/2041/13025919593_6efef3f3f4_c.jpg) (http://www.flickr.com/photos/119847626@N08/13025919593/)

The savestate from that screenshot looks like it may have the entire kana in it. Only it's cut in half.

I did try taking a screenshot of the screen that introduces the first stage here:

(http://farm8.staticflickr.com/7428/13025870245_69d61f7222_c.jpg) (http://www.flickr.com/photos/119847626@N08/13025870245/)

The entire kana font is not included in there though but it is clearer to see. I thought the font was supposed to be in a specific order. Looking at the stage intro screenshot in YY-CHR, it looks like the font is just the phrases for the different level introductions.

I was hoping I could see the entire kana font for the game clearly so i could make a table.
I'm tempted to try tile molester, but doesn't java have vulnerabilities? Is it safe?
Title: Re: Game has English words already in ASCII format. What about the Japanese?
Post by: Scio on March 09, 2014, 01:01:23 am
About the stage title, it looks like just a few Kanji, not a complete font. It's pretty common. It looks like the game decompresses only the necessary characters and puts them in the memory, so you won't be seeing all the kanjis at once.

About Java, if you just run the program and then close Java VM later, there's no problem. The warnings are usually to servers and people who let Java run 100% of the time.

EDIT: I think I've found it, but like I thought, the game only decompresses the characters printed on screen, not the whole font at once (it decompresses the whole roman alphabet, though).
(http://thumbnails109.imagebam.com/31310/d5545a313094920.jpg) (http://www.imagebam.com/image/d5545a313094920)

As you can see, some letters are properly aligned, while others are not. The kanjis all look aligned, though. I don't know if this is what KingMike referred to as "interleaving", but I recall seeing a similar question before in the board. I think it was about how some tile tools deal with width values. I never looked at these kind of fonts before, so I'll let someone more knowledgeable help you.