News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Generating a Text Table for Kanji?  (Read 4242 times)

Kafke

  • Jr. Member
  • **
  • Posts: 42
    • View Profile
Generating a Text Table for Kanji?
« on: October 29, 2014, 06:18:39 pm »
Wow, I haven't actually posted/been here in a while. Anyway, I recently picked up romhacking again, due to a recent interest in Pokemon Puzzle League. That interest spread to the source material: Panel De Pon. I find it insulting that Nintendo slapped yoshi/pokemon over the original game, and eventually removed the characters altogether.

Anyway, I found that the first game (for the SNES) had a translation patch. But my interest was in the gamecube game: Nintendo Puzzle Collection. The game didn't receive a localization at all! The included Panel De Pon game is pretty similar to Pokemon Puzzle League, but with story/menu differences (the same game modes are available), and it's for gamecube instead of N64.

So I dumped the files from the game, and ran into some .BIN files related to the game in question (I don't care for the Dr Mario and Yoshi games). I found that they were some sort of archive, which I wrote a program to decrypt.

Which finally brings me to my question. The BIN files include a menu/script type of file called "SBF". Which I'm currently working on decoding and figuring out the text and such. Which the progress is going well. I've managed to be able to add/remove text, and replace the Japanese with English.

My problem lies in creating a conversion table. The text is two-byte characters of the format: 0C 23. With the first byte being the character type (Kanji, English caps, english lowercase, katakana, hiragana, etc). So I went through by hand and found the values for the english and kana. That was simple enough as it was all in order (A-Z, Standard Kana order, etc).

But my problem is with the Kanji. I haven't done much digging yet, but I was wondering if I'd have to go through all 2000 possible kanji in order to dump the script. Or if there's a faster way of doing this. Kanji seems to be mostly marked with 04, followed by a "kanji number", but I'm not sure if there's any standard order of Kanji that makes this easier.

Thoughts?

Also, while I'm at it, I might as well ask a couple other things (unrelated to the kanji problem).

1. There is a file format (pretty common in the archive) of "BIF". These seem to be some animated image format, with a 'subfile' of type "BNNF". I was wondering how you'd go about converting/viewing/editing these image files. My approach was going to be simply mess around until I figured out the format of colors/width/height. But if there's an easier way to do this, I'm all ears.

2. The english text seems to be spaced out a bit more than the Japanese. Do you think this is an issue with the font, or some sort of coding thing? What's the general approach to fix this?

This is my first time messing with Japanese text dumping, and messing around with more advanced romhacking than just simple hex editing/pointers. So I apologize if these are kind of stupid questions. Thanks for the help!

Seihen

  • Sr. Member
  • ****
  • Posts: 404
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #1 on: October 29, 2014, 07:02:23 pm »
Hmm.. unfortunately, you may be dealing with a custom kanji table (a GBA game I worked on was like that), and probably severely pared down to include only the kanji they need to use.

Unless I'm doing my math wrong, if it's only two bytes and the first byte is always 04, that leaves you with only 00-FF for the second byte. That's only 256 values -- nearly a power of 10 less than you need for the full 2000 standard (joyo) kanji. Since it's a relatively simple puzzle game with (I assume) a short script, I assume they might have simply included only the kanji they actually use. Either that or the there's more than one possible value for the initial byte (and not only 04).

So, to make a long story short, I think you might have your work cut out for you if it's an abbreviated file, unfortunately. Maybe you can try to find a font or something and see if it shows them in order?

Kafke

  • Jr. Member
  • **
  • Posts: 42
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #2 on: October 29, 2014, 07:13:45 pm »
I actually don't know if it's always 04. As I said, I haven't dug around much. But that seems to be the way things are going. From what it looks like, the 2-byte value is just referencing some sort of font table. Seeing as I found two different values for the lower case english characters (one was an extension of the capital letters -but cut off the last few letters-, the other had it's own type byte).

From what it looks like (I'll have to look more into it) is that the Kanji is either just 04 (which would only give 255 of them as you mentioned), or it might stretch into 05-07 as well, seeing as katakana is 08 and hiragana is 0C. All of the examples I've seen so far is just 04.

"Since it's a relatively simple puzzle game with (I assume) a short script, I assume they might have simply included only the kanji they actually use. Either that or the there's more than one possible value for the initial byte (and not only 04)."

This is what I'm suspecting. 255 kanji would probably be more than enough, but I don't speak Japanese so I don't know  ;D.

Thanks for the help. I'll have to go dig through the script file (I've mostly been using the menu to test) and see if there's any other kanji markers. If there is, I guess it's time to dig through those image files.

Sucks though, that there isn't really a fast way of getting through them.

Do you have any advice for the other two questions?

FAST6191

  • Hero Member
  • *****
  • Posts: 2848
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #3 on: October 29, 2014, 07:20:20 pm »
What Seihen said. Get the font and see if the order follows, see if there is a pattern (the order they appear in the game script, most common first, the inverse of the previous two....), see if it follows the same order as another game from the same dev, see if it follows the same order in a known encoding ( http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml ), see if it follows the order as seen in the various learning lists/dictionaries....
Edit. Forgot to mention stroke order, radicals and moji. All sometimes used as soft groupings.

There is a very limited amount of OCR available that can easily be twisted/used for ROM hacking. A program called crystaltile2 has it, it is not great at all (if you have ever used something on your scanner then far worse, if you have ever used one to rip subs from DVDs and other bitmap or hardcoded text in video then not even close), it is available from the tools menu when you are in the graphics viewer, get it down to one character/glyph per tile and then give it a go . In fact you probably would be better off using a scanner.

Not sure about the BIF format, you might want to look at some of the gamecube, wii or DS stuff though as it could well share some similarities.

As for character spacing it is typically a font handling thing. The best option would be to do a proper variable width font replacement, that is hard though so most settle for just making the font narrower. You are probably not going to have a proper font format on the N64 so it is going to involve getting your hands slightly dirty with N64 assembly.
« Last Edit: October 29, 2014, 09:22:21 pm by FAST6191 »

BlackDog61

  • Hero Member
  • *****
  • Posts: 784
    • View Profile
    • Super Robot Wars A Portable translation thread
Re: Generating a Text Table for Kanji?
« Reply #4 on: October 30, 2014, 02:50:46 pm »
1. There is a file format (pretty common in the archive) of "BIF". These seem to be some animated image format, with a 'subfile' of type "BNNF". I was wondering how you'd go about converting/viewing/editing these image files. My approach was going to be simply mess around until I figured out the format of colors/width/height. But if there's an easier way to do this, I'm all ears.

This might be wishful thinking, but "BIF" sounds an awful lot like "Bmp animated gIF" to me. Have you checked for similarities with that?
http://en.wikipedia.org/wiki/Graphics_Interchange_Format#Animated_GIF

Also just a quick note that the team having a go at SRW GC sees the "yagcd" as a good source of info for that platform:
http://hitmen.c02.at/files/yagcd/yagcd/
I didn't see any mention of BIF there, though.

Good luck! You seem to be handling your work pretty well.

Pennywise

  • Hero Member
  • *****
  • Posts: 2316
  • I'm curious
    • View Profile
    • Yojimbo's Translations
Re: Generating a Text Table for Kanji?
« Reply #5 on: October 30, 2014, 04:16:06 pm »
The best advice I can give you is that you need to somehow extract the font and have someone ID the kanji for you. Some kind soul will do it for you if you post the kanji on the script help board.

Kafke

  • Jr. Member
  • **
  • Posts: 42
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #6 on: October 30, 2014, 05:47:26 pm »
Alright, I'm back with more info. The Kanji does indeed seem to be limited to just the 04 marker, with 254 kanji in total (ranging from 01-FF). There doesn't appear to be any standard/particular order, as the kanji are mostly just marked as they appear (earlier kanji are values, later kanji are higher). Which means I'm probably off to go find the font file (no luck yet, but I think I remember seeing something related at one point). And probably to figure out how to read/edit those BIF (with internal codes FIB3 and BNFF) files.

As far as Kanji recognition goes, my Mac has a thing where I can write the kanji (stroke order doesn't matter), and it types the character, which is really handy for creating a table. The only problem being that some of the kanji can get pretty blurry with the font (example: http://i.imgur.com/4OMxU2Z.png). There was even one that just looked like a pink solid square! Hopefully finding the font makes it clearer.

The BIF files don't seem to be any popular/known format. Looking around online gets nothing.

Seihen

  • Sr. Member
  • ****
  • Posts: 404
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #7 on: October 30, 2014, 06:38:05 pm »
The only problem being that some of the kanji can get pretty blurry with the font (example: http://i.imgur.com/4OMxU2Z.png). There was even one that just looked like a pink solid square! Hopefully finding the font makes it clearer.

The kanji in the image is 設定 (せってい, settei, settings).  As for the solid square, that actually is included in Japanese fonts and may actually really just be a solid square (■, for example).  But as Pennywise suggested, the fastest way would probably be to post it to the board once you find the font file. Thought there is a certain satisfaction to doing it yourself!

KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 6975
  • *sigh* A changed avatar. Big deal.
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #8 on: October 30, 2014, 10:56:23 pm »
The best advice I can give you is that you need to somehow extract the font and have someone ID the kanji for you. Some kind soul will do it for you if you post the kanji on the script help board.

I heard Nintendo Puzzle Collection was one of the games to use the BIOS font (which is suggested by videos I've seen with mostly glitched text, likely as a result of being played on on a western console.)
"My watch says 30 chickens" Google, 2018

Kafke

  • Jr. Member
  • **
  • Posts: 42
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #9 on: October 31, 2014, 01:11:19 am »
The good news is that I've found the font files. There's 8 of them (2 for english, 2 for each kana, and 2 for kanji). The bad news is that I can't figure out the image format.

I've managed to extract one image: http://i.imgur.com/2YDnx4x.png

That particular image has a color table, followed by each pixel referencing a certain color. However, other images that seemingly use the same format, come out somewhat recognizable, but are clearly wrong: http://i.imgur.com/JwYSvlK.png

And worse, there are some images (like the font tables!) that seem to not follow this format, and instead do something else.

Instead of going to the color table (of RGBA format, with an indicator for how many colors), it goes to something that looks like this:
F7 F0 F0 F0 F0 F0 F0 F0 F0 F6 F0 F0 F0 F0 F0 F0 F0 F0 82 F0 82 00 F7 F0 F0 F0 F0 F0 F0 F0 F0 F6 F0 F0 F0 F0

It's F0's all the way down pretty much. I thought that it might simply be the RGBA (or some variant) format, just without a color table, so I tried that and this was the result: http://i.imgur.com/cvBGxdb.png

Note, it's not just all F0-FF (which I thought it might be). As another section of the image looks like this:
0F A5 83 F0 08 0F 2D 5A 78 78 96 2D 0F B4 82 F0 09 78 0F 0F 1E 5A A5 B4 96 78 E1 F7

Clearly there's problems. There's no discernible palette file for these either. Any thoughts? Am I forgetting something about image formats that may be helpful here?

"I heard Nintendo Puzzle Collection was one of the games to use the BIOS font (which is suggested by videos I've seen with mostly glitched text, likely as a result of being played on on a western console.)"

As far as Panel De Pon goes, the font is included in the "MENU.BIN" file. The game runs fine on my US Wii (through USB launcher). I can't say for the other two included games.

"and may actually really just be a solid square (■, for example). "

Haha, though it's not actually just a square. This is actually what I was referring to: http://i.imgur.com/2MCQsSb.png. That screenshot is actually a bit clearer than when I was looking at it (it moves around too). I'd imagine if you were familiar with the language, it's easy to know which it is. But honestly, I can't tell where the lines are, and it just looks like a big blob with an L next to it.

Seihen

  • Sr. Member
  • ****
  • Posts: 404
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #10 on: October 31, 2014, 01:28:29 am »
The good news is that I've found the font files. There's 8 of them (2 for english, 2 for each kana, and 2 for kanji). The bad news is that I can't figure out the image format. Haha, though it's not actually just a square. This is actually what I was referring to: http://i.imgur.com/2MCQsSb.png. That screenshot is actually a bit clearer than when I was looking at it (it moves around too). I'd imagine if you were familiar with the language, it's easy to know which it is. But honestly, I can't tell where the lines are, and it just looks like a big blob with an L next to it.

Aah, I gotcha. Yeah, if you don't know the language (and even when you're learning), it's really hard to tell one thing from the next!
That one seems to be 道 (road/way/path/what-have-you).

As for the image file formats, unfortunately I'm afraid I'll have to leave that to others. Seems like you're making good progress, though!

Kafke

  • Jr. Member
  • **
  • Posts: 42
    • View Profile
Re: Generating a Text Table for Kanji?
« Reply #11 on: November 02, 2014, 03:26:46 am »
I've almost got these image files figured out. But I seem to be running into a problem with certain little bytes thrown around the file. It's obviously some sort of compression, but it doesn't seem to have any rhyme or reason to the compression. The "mystery" format actually ended up being single byte values, with the first four bits being used to determine a 0-15 value for black/white value, and the other four bits being used for the alpha.

The problem I'm having now is that some of the bytes make the next few repeat, to compress the file size. I handled something like this back when I was working on gameboy games (where map data was compressed in a similar way). But I can't figure out how the exact compression works.

It seems that if the leading byte for a "group" is >8X, then the next few bytes (seemingly a random number) repeat for the image. I have a sneaking suspicion it's more bit stuff, but I can't seem to figure out what determines how many times the bytes repeat.

I dunno if anyone would be able to help with this without actually looking at the game/file. So I think I'm on my own at this point. But I'll keep this thread updated, incase anyone else stumbles along in the future (or if any of you are curious).

November 04, 2014, 12:40:27 am - (Auto Merged - Double Posts are not allowed before 7 days.)
I figured it out !  ;D

Here are the two successfully extracted kanji files:
http://i.imgur.com/30cIk3S.png

http://i.imgur.com/TO5MbhL.png

There appears to be 355 kanji in the font (354 in the font missing one). If anyone could help create a table, that'd be awesome.

The compression byte thing I mentioned in my last comment was exactly that. For values <80 (hex) it was simply how many bytes were interpreted normally. If it is between 80 and BF it loops a single byte that many times (from 1 to 64) and if was greater than that, it looped multiple bytes anywhere from 1-8 times. The second 4 bits was modulo 8, if they were higher than 8, that's an extra byte to how many pixels were looped. So from 2-9 pixels. The code doesn't seem to work on some of the image files, which means there's still more to dig through. But my goal was to extract the kanji, so I dunno if I'm going to dig much further into the images.

The next steps are to get the kanji into a table, and finish up the script extraction. Then pass off the script to a translator and begin work on re-insertion. And possibly looking into fixing the font width.
« Last Edit: November 04, 2014, 12:40:27 am by Kafke »