Romhacking.net

Romhacking => Newcomer's Board => Topic started by: zalor on May 17, 2021, 06:17:52 am

Title: How to find compressed text in a ROM
Post by: zalor on May 17, 2021, 06:17:52 am
I have been trying to hack Riviera: The Promised Land for the GBA. My first step was to convert a word used in a dialogue into its corresponding hex value with ASCII encoding and search the rom for it. I did this for multiple words and nothing came up. I also used the program monkey-moore to do a relative search, and again I didn't get any useful information about the text. Furthermore, I know that the game is both a text heavy, and graphics heavy (lots of use of CGs and BGS). So the fact that it is compressed isn't a huge surprised. However, I'm not sure what to do next.

From other threads I managed to find, it seems that my only other option is to trace the assembly code, find the text, figure out how to decompress it, and create a program that will decompress it for me. The issue with that however is that I'm not a programmer. Furthermore, I did try looking at the assembly with no$gba, and I set breakpoints. I successfully managed to get the game to stop right before writing the text, which indicates that I might be close. However, assembly is basically gibberish to me. So even if I'm close, I have no idea how to interpret that information in any useful way. Is tracing really my only option? Are there not other methods I can try before going back to assembly? And if that is so, what is the most newbie friendly way to get comfortable enough with assembly to accomplish my particular goal.
Title: Re: How to find compressed text in a ROM
Post by: Jorpho on May 17, 2021, 10:25:03 am
I have been trying to hack Riviera: The Promised Land for the GBA.
First question: are you quite sure no one has done work on this game before?

Quote
The issue with that however is that I'm not a programmer.
Then you have much to learn.

Have you read about the use of LZ77 in GBA games? There have been similar threads in the past.
https://www.romhacking.net/forum/index.php?topic=30470.0
Title: Re: How to find compressed text in a ROM
Post by: FAST6191 on May 17, 2021, 04:24:50 pm
You more of less have it. If it is not plain ASCII (not unheard of for things to be in plain encodings on the GBA but still rare) and relative search did not yield anything then you are probably dealing with compression if it is an English/European language game, could still be a non relative encoding but hey. For relatively search do make sure you did less than a line -- if the game has a new line indicator in the text then it might have confused it, and if you want to do some of the earlier words in the game by themselves you might get something too -- most compression has to start somewhere to build up a base of things it can reference back to and then go from there so the earlier stuff will tend to be uncompressed.

The GBA has compression formats built into the BIOS and accessed via so called SWI (software interrupt) calls which most games will use if they use compression (some custom stuff exists but it is rare). There are various emulators that will log these calls for you to look at later, some tools (though mostly graphics) even reading the logs and acting accordingly. There are also some tells you can find for it so some tools will search for it and try to decode accordingly.
http://members.iinet.net.au/~freeaxs/gbacomp/#BIOS%20Decompression%20Functions
https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf
http://problemkaputt.de/gbatek.htm#biosdecompressionfunctions

There are other techniques I like to use -- GBA pointers are usually pretty plain (most games are 16 megs or less and only use the one read location, Riviera though is 32 megs and thus will use the higher range) so if you see a whole bunch of 4 byte long entries of the form 08?????? and in this case 09XXXXXX as well then you probably have some pointers. Could be anything but also could be something. Even if the encoding is not relative there will probably be some obvious tells -- most things barely go 8 characters without a space and rarely will it be more than 13 or so, it also tending to be the most common character. If you see an irregular pattern of the most common character in a section but rarely more than 8 and generally less than 13 characters then do have a look. On the subject of common characters then most text sections of any length will have an ordering such that... scrabble https://en.wikipedia.org/wiki/Scrabble_letter_distributions has the ordering actually, and like a space every word will have a vowel or a y. Compression can get in the way of a lot of these.
Corruption is also a thing -- alter parts of the ROM to be something else, run the game and see what changed. It is crude but it will work eventually.

Though I would probably consider playing with the debugger https://www.romhacking.net/documents/361/ as that can tell you what it is quite quickly, especially if you already have it. If it in turn tells you use this SWI on this location in the ROM then you don't need to know all the various ins and outs of ARM and THUMB instructions that the GBA otherwise uses as you have the location and how it is encoded, if it is not compression and instead a DMA (or maybe CPU) read then you have that too.
Title: Re: How to find compressed text in a ROM
Post by: zalor on May 18, 2021, 03:18:46 am
You more of less have it. If it is not plain ASCII (not unheard of for things to be in plain encodings on the GBA but still rare) and relative search did not yield anything then you are probably dealing with compression if it is an English/European language game, could still be a non relative encoding but hey. For relatively search do make sure you did less than a line -- if the game has a new line indicator in the text then it might have confused it, and if you want to do some of the earlier words in the game by themselves you might get something too -- most compression has to start somewhere to build up a base of things it can reference back to and then go from there so the earlier stuff will tend to be uncompressed.

The GBA has compression formats built into the BIOS and accessed via so called SWI (software interrupt) calls which most games will use if they use compression (some custom stuff exists but it is rare). There are various emulators that will log these calls for you to look at later, some tools (though mostly graphics) even reading the logs and acting accordingly. There are also some tells you can find for it so some tools will search for it and try to decode accordingly.
http://members.iinet.net.au/~freeaxs/gbacomp/#BIOS%20Decompression%20Functions
https://ece.uwaterloo.ca/~ece611/LempelZiv.pdf
http://problemkaputt.de/gbatek.htm#biosdecompressionfunctions

There are other techniques I like to use -- GBA pointers are usually pretty plain (most games are 16 megs or less and only use the one read location, Riviera though is 32 megs and thus will use the higher range) so if you see a whole bunch of 4 byte long entries of the form 08?????? and in this case 09XXXXXX as well then you probably have some pointers. Could be anything but also could be something. Even if the encoding is not relative there will probably be some obvious tells -- most things barely go 8 characters without a space and rarely will it be more than 13 or so, it also tending to be the most common character. If you see an irregular pattern of the most common character in a section but rarely more than 8 and generally less than 13 characters then do have a look. On the subject of common characters then most text sections of any length will have an ordering such that... scrabble https://en.wikipedia.org/wiki/Scrabble_letter_distributions has the ordering actually, and like a space every word will have a vowel or a y. Compression can get in the way of a lot of these.
Corruption is also a thing -- alter parts of the ROM to be something else, run the game and see what changed. It is crude but it will work eventually.

Though I would probably consider playing with the debugger https://www.romhacking.net/documents/361/ as that can tell you what it is quite quickly, especially if you already have it. If it in turn tells you use this SWI on this location in the ROM then you don't need to know all the various ins and outs of ARM and THUMB instructions that the GBA otherwise uses as you have the location and how it is encoded, if it is not compression and instead a DMA (or maybe CPU) read then you have that too.

First of all, thank you so much for your thorough reply! You hit the nail on the head with the SWI command. And judging from that same line of code, it looks like the compression is LZ77 as Jorpho mentioned. This is a screenshot I took

Spoiler:
(https://imgur.com/8C0KoJT.png)

I'm assuming the code in that area is all focused on decompressing graphics (and text) data. What I'm confused by, is how to deduce where the text (and graphical) data is with this.

One other thing I should note is that when I check the Map and Tile viewer, I don't see any dialogue text. However, when I view the OAM viewer I am able to find the text. From what I understand, OAM controls sprites and is in the 07000000 area. Would it therefore be safe to assume that the text is near where the sprite data is, or am I misunderstanding something?
Title: Re: How to find compressed text in a ROM
Post by: FAST6191 on May 19, 2021, 07:45:08 pm
Hmm. Most text is on the BG layers, though OAM can be useful for small and highly animated things.


If it is SWI 11h then chances are it is text -- 11h means WRAM rather than V(ideo)RAM. It is not impossible that the thing copies from the ROM to WRAM and then decompresses and goes to VRAM but most likely not. Indeed when it executes (if you have no$gba debug then you probably want a break on execute, others might do run to line or something similar) then whatever is in R0 should be the source location.
Anyway if you have the call it inherently includes a source and destination (you presumably mostly caring about the source but destination is also good stuff if you want to compare to figure something out)

Some OAM stuff is found in the 07?????? range ( http://problemkaputt.de/gbatek.htm#lcdobjoverview http://www.coranac.com/tonc/text/regobj.htm#sec-oam though for the latter might want to go from graphics in general) but that is merely the control of where it is (screen location, colours used in some modes, flipped...)
Title: Re: How to find compressed text in a ROM
Post by: Jorpho on May 19, 2021, 08:43:15 pm
First question: are you quite sure no one has done work on this game before?
Before you go tearing your hair out, have you at least tried to contact whoever made https://www.romhacking.net/translations/2248/ ?