News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: How do I extract text into something usable from gameboy color roms?  (Read 301 times)

comicmaster140

  • Jr. Member
  • **
  • Posts: 9
    • View Profile
I am a new comer to GBC romhacking want to start an English Translation for Sakura Wars GB with a team. I have experienced translators but we have no idea where to start programming wise. How do I do things like extract text into something usable from gameboy color roms?

FAST6191

  • Hero Member
  • *****
  • Posts: 3129
    • View Profile
Same as you do for anything else. Given the memory is sub 100KB for everything https://gbdev.io/pandocs/#memory-map then you might find yourself dealing with some of the things more commonly seen on the NES than you might encounter on modern PC games, modern consoles or even the GBA but that is getting ahead of things.

http://www.romhacking.net/start/ is there for a reason but for a quick overview

That then leaves you finding the encoding, font, location and figuring out what you need to do change things to work for your translation (Japanese does not always feature English characters, Japanese also tends to be a less verbose language so you might also need to find more space and also figure out what to do when it is not just storage space but screen display space that is also involved, assuming it is).

Anyway most usually start by making a text decode table.
Games overwhelmingly don't use common encodings like http://rikai.com/library/kanjitables/kanji_codes.sjis.shtml and all the rest on that site and will instead be quite custom. If playing ROM hacker you get to figure this out.
For Japanese this can be harder as simpler tools like relative search are tricky to use. If there is an English phrase in the game somewhere you might try using something like monkey moore to search for it though, or you can try font relative searching for things (the order of the font in the game is quite often the order of the encoding too).
Most then will use some combination of corruption, tracing ( https://www.romhacking.net/documents/361/ ), pointer inference (more on that later), elimination (what is graphics is not text, unless it is literally a bitmap image of some text which can and does happen quite often and that also means you have your text).

Fonts. Nothing wrong with starting out either finding the font in RAM to search the ROM for that (assuming it is not compressed then as it appears in RAM is often as it appears in the ROM) or opening the ROM in a tile editor (tiled2002, crystaltile2, tiledggd, tile molester and all manner of things on http://www.romhacking.net/?page=utilities&category=10&platform=&game=&author=&os=&level=&perpage=20&title=&desc=&utilsearch=Go ), setting it to various GB/GBC graphics modes (most consoles have their own in most tile editors/their hardware) and scanning through the ROM to see what you find.

If it has Roman characters plus whatever punctuation you need then great, if not you get to make it. Most will suggest you overwrite existing data rather than adding new things for older systems like this and I would go with that (only add new stuff when absolutely necessary or if it is a nice modern font format like we started to see on the DS).

Location also means pointers to the data. Games don't know where things start and end by magic and instead it has something like the contents page of a book telling you where it is. Add a bunch of extra data or remove things, or indeed change the effective location as few ROM hackers will ever add data on an all in one ROM like most things on the GB/GBC). Chances are the pointers will be for how it appears in memory (see memory map on the pandocs linked at the start).
Pointers can be useful in finding data -- if you see a bunch of pointers then maybe consider following where they go and how much the difference is between them (text sentences tend to be random length, graphics data tends to be fairly consistent). GB/GBC pointers are not as obvious as some other types of pointers on other systems where anything looking at a given memory range is both obvious and unlikely to be anything not interesting.

What you need to, and what you want to, change will become apparent once you start trying to insert the text back in. Usually it is trying to make things fit within limited memory (hopefully your translators can constrain themselves and chop things to fit but if not you get to find out how to get more data in) or make the fonts look nicer (every Japanese character is generally the same width, Roman characters like ijlf vs WXM...)

4l3j4ndr0

  • Jr. Member
  • **
  • Posts: 82
  • Gameboy Color Rules!
    • View Profile
If you open that game on a tile editor you can see the font. isnt compressed at all and have all the english characters.

This game have 1byte table and 4 tables full of kanjis of 2 bytes.

FAST6191

  • Hero Member
  • *****
  • Posts: 3129
    • View Profile
Wow that lower case... not that the upper case is much better. Anyway if that is the whole thing then maybe when adding some punctuation in (looks like there is a whole bunch of spare kana to overwrite there) then it can be tweaked to be nicer on the eye.