News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Where to start with Japanese text extraction from GBC games.  (Read 1962 times)

pichuscute

  • Jr. Member
  • **
  • Posts: 5
    • View Profile
Hey guys! I'd like to make a Japanese to English translation for a GBC game (Jisedai Begoma Battle Beyblade), since I'm already going to be playing through it and translating parts of it personally. I can read/type in hiragana/katakana just fine, have read through the apparently outdated definitive guide to romhacking tables on this website, and have BGB downloaded.

So, essentially my question is:

What's the current software I should have in order to build a JP text table and then dump the game text so I can get on with translating it? Since the guide uses NES as an example and is outdated, I'd like to make sure there's not a simple way to do things on GBC these days. I'm looking for the straightest shot to my goal of acquiring the main portion of the game's Japanese text, essentially.

I appreciate any help given! I've found it surprising difficult to search for information about this type of thing (I guess I have no clue how to search for it or something), so that's why I've found myself here, lol. Thanks!

Psyklax

  • Hero Member
  • *****
  • Posts: 1112
    • View Profile
    • Psyklax Translations
Re: Where to start with Japanese text extraction from GBC games.
« Reply #1 on: June 07, 2019, 02:52:14 am »
Welcome to the forum! Having looked at the game quickly on YouTube, I'll be quite happy to help you dump the text. Since the game only uses hiragana and katakana, it shouldn't be too big of a job.

A guide for the NES will usually suffice for the GBC since the principles remain the same. Some games are quite different because of how the graphics work. On the NES, the entire alphabet or kana needs to be loaded into the video RAM in one piece because the NES can't directly access the memory. This isn't the case on the GBC, so in some games only the characters which are necessary are loaded into the VRAM, and this can make finding the text a little more tricky.

Anyway, give me a bit of time and I imagine I'll be able to dump the text and explain how I did it. ;)

EDIT: Okay, here's what I've done so far. :)

First thing I always do is open the ROM in Tile Molester to see what graphics I can see. It can be a neat shortcut to just look directly at the ROM, if there's no compression. GBC doesn't commonly use compression because the ROMs are so huge. A quick rush through the ROM leads me to a couple of character sets at $442A4 and $452A4 (pressing plus and minus moves byte-by-byte, which is necessary here because it starts at $A4 not $A0). This is a good sign, so I make a table file with Tabular that corresponds to what I see.

So Tabular makes things pretty easy: you can insert Romaji for kana and alphabets with a click of a button. You can mess around with the table file later, but it's a good start. Save it as a .tbl file and open up WindHex32 EX, as this is a hex editor that supports table files. Open the ROM and the table file in the File menu. Now, we don't actually know where the text is, but if we were right about our table file, then we can actually do a search for the first line of dialogue.

The first line includes "omatase" (I'm using Romaji for ease of reading), but WH32EX sucks at searching, so let's open HxD instead, since it's a much more comprehensive hex editor all round. Let's search for the hex instead: open the table file and look for the four hex codes for "omatase". It's 65 7F 70 6E. Ooh, one match: $6C2AB. Let's go there in WH32EX...

Well look at that: looks like we have the actual dialogue. :) Let's try changing "omatase" to a word using the English alphabet already in the ROM...



Lovely. :) But hold on, it's not as simple as that. You see, there are control codes to say when to go to the next line, when to wait for a button push, and so on. Those are usually kind of easy to get to grips with, but here there's an extra complication. If you look at what's there, you'll notice some weird characters in the middle of the dialogue, so you get things like "haya#ku Masuda-san no # ie ni #kou yo", with the #s being things that really shouldn't be there. This indicates something a little more complex, so I'll have to take a closer look.

Still it's a start, and it should give you some idea of how to do this in the future. :) Just give me a little more time, and hopefully it won't be difficult to figure out what's going on with these weird characters.

(I should put a disclaimer that the method I used here is certainly NOT going to work every time - if a game uses compression of either the graphics or the text, you'll need some more advanced skills)
« Last Edit: June 07, 2019, 05:24:07 am by Psyklax »

pichuscute

  • Jr. Member
  • **
  • Posts: 5
    • View Profile
Re: Where to start with Japanese text extraction from GBC games.
« Reply #2 on: June 07, 2019, 04:27:52 am »
Well, that would certainly solve my problem then if I don't have to do it myself haha. I figured the general idea was about the same, just was unsure what actual software is around these days to pull it off for GBC.

In any case, I'd be happy to wait if you think you can do it. :)

June 09, 2019, 07:37:26 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
Just saw your edit. Great to see such an in-depth explanation so far, thank you. This will be very helpful in the future, since I'll probably end up translating more than just this GBC game haha.

Let me know if you have an update on those weird characters! I really appreciate it!
« Last Edit: June 09, 2019, 07:37:26 pm by pichuscute »

filler

  • RHDN Patreon Supporter!
  • Hero Member
  • *****
  • Posts: 976
  • "WINNERS DON'T SELL REPROS"
    • View Profile
    • Filler's Translation Projects
Re: Where to start with Japanese text extraction from GBC games.
« Reply #3 on: June 12, 2019, 12:33:11 am »
I'll also direct you to my series of short videos on dumping a script. My approach also won't work for all games, but I've managed to dump a number of scripts using this method. Incidentally, I should probably redo them since my editing software wasn't working at the time and I ended up posting the raw videos. I make some factual misstatements, but on the plus side you get to hear me get frustrated and swear so maybe it's more entertaining in the long run. https://www.youtube.com/playlist?list=PLkybU1NLulWjcsr1mN6rKcVJBnMsF2Efs
« Last Edit: June 12, 2019, 12:40:11 am by filler »

pichuscute

  • Jr. Member
  • **
  • Posts: 5
    • View Profile
Re: Where to start with Japanese text extraction from GBC games.
« Reply #4 on: June 12, 2019, 09:21:23 am »
I'll also direct you to my series of short videos on dumping a script. My approach also won't work for all games, but I've managed to dump a number of scripts using this method. Incidentally, I should probably redo them since my editing software wasn't working at the time and I ended up posting the raw videos. I make some factual misstatements, but on the plus side you get to hear me get frustrated and swear so maybe it's more entertaining in the long run. https://www.youtube.com/playlist?list=PLkybU1NLulWjcsr1mN6rKcVJBnMsF2Efs
Looks like a good resource. I'll definitely check it out and reference it while I work! Thanks!

Edit: I went ahead and started building my own table file through watching these videos (so much easier to figure out what's going on with someone actually showing me). It looks like I should be well on my way now! I'm not sure if the guy from earlier will be faster at getting the text than I am, but I have started the process now. Definitely unsure whether I fully understand what's going on with some of the text here (like Katakana), though.
« Last Edit: June 13, 2019, 01:44:51 am by pichuscute »

Psyklax

  • Hero Member
  • *****
  • Posts: 1112
    • View Profile
    • Psyklax Translations
Re: Where to start with Japanese text extraction from GBC games.
« Reply #5 on: June 14, 2019, 11:09:07 am »
I kinda forgot about this after doing it last week, been busy with other things. I haven't done any more research into it, but regarding the weird stuff in the text, my theory is that it's some kind of run-length encoding or some such. The unusual bytes are possibly headers to signify something, but without actually checking, I can't say for sure. If I get any free time, I'll have a look.