Help with Disassembling GB/GBC Roms with better aproach

Started by wiseguy12855, July 27, 2010, 11:49:46 AM

Previous topic - Next topic

wiseguy12855

As most people should know rom's obviously contain stuff besides opcodes and immidiates, they contains pictures, sound, and other information that will show up as trash or garbage opcodes not making any sense. I need help with a rom disassembling program for the GB/GBC which disassembles the correct way by doing what the z80 chip would do by following each function call and jump instead of starting at the top of the rom and working towards the bottom, taking out binary sections along the way.

The result is not only an assembly file that's re compilable but extremely easy to edit and understand.

Tauwasser

So what do you already have? What programming languages do you use? What are the hurdles you need help to overcome?

Generally, it is nigh impossible to disassemble a rom without prior knowledge of the inner workings of the assembled game. You can probably write code that finds pointer tables while running through code and which flags data as such and code as such... The really hard part is getting stuff to still be right when it comes out of your wonder tool. As for jumping and bank switching ― that can be a real pain in the ass, so yeah. You will probably end up having to build custom logic for the specific rom you are disassembling for that, if it doesn't use stupendously generic code from some SDK.

cYa,

Tauwasser

wiseguy12855

For a base game I'm using Pokemon Red, which I specifically chose because of it's complexity. I have a program written in C++ that so far can decode any opcode and value successfully and fast using a special binary file format I made. Currently all this program can do is start from the top and work to the bottom, It was more of a pain than I thought to get it to follow execution jumps and calls.

I need help with making it getting a list of all the code blocks starting with the entrance jump/call and ending with another jump or a return, basically tracing through the game. Then go through and disassemble the opcodes and cut out the other parts which could be redundant code or binary to be placed back in on recompilation.

It's suppose to disassemble to be re-assembled with rgbasm.

The gameboy doesn't understand the binary garbage either, the code tells the processor what to do with it and where to put it, whatever the code tells the gameboy also tells anyone else so it should be very possible to do this.

tcaudilllg

The way the GB uses banks it shouldn't be that difficult, I don't think. Actually you might be able to get away with scanning the banks for invalid opcodes... that might be enough in a lot of cases. (certainly it would work with Another Bible) If you scan a bank and find an invalid opcode, then it probably doesn't have any program data in it.

wiseguy12855

Actually that wouldn't work with the games I chose. Pokemon Red / Blue are known for plopping data sections scattered all throughout the ROM. It's known for it's disorganization and all the data is highly compressed meaning there's a very large amount of data sections. To add to the mess and data section's Pokemon games have several other data types besides pictures, text, and music that are specific only to Pokemon games in general. If your the type of person who likes a tough challenge then you see why I chose Pokemon Red/Blue specifically instead of the others which were simplified.

Now I realize that the point of this discussion is a bit foggy, let me clear this up so I can hopefully get more involvement from the community.

I'm trying to make a very advanced tracer that traces every function call and jump, disassembles the opcodes, and breaks it up into function blocks. The leftover data is placed in a separate file to be linked in the exact same position as the original game. Each function block will also be linked into the exact same position. Comments will be placed for the obvious lines like changes to a gameboy register. All variables will have their own file named by their starting position in hexadecimal.

The purpose of this project is to fulfill a need for one. There is no program like this as far as I know, in case your wondering why you would ever need something like this is because it greatly if not admirably simplifies rom hacking. basically everything is done for you - all you have to do is skim through the disassembly and make sense out of it with the added bonus you have the full disassembly. No more spending days and weeks getting a memory map and how the sprites and graphics are generated and their locations.

Ideally I would like this to be a group project but that's not the point of this discussion

Tauwasser

Quote from: tcaudilllg on July 27, 2010, 11:05:39 PM
If you scan a bank and find an invalid opcode, then it probably doesn't have any program data in it.
Lolwut? Err, no. Please actually look at a gb game once in a while before giving out this kind of misinformation.

Quote from: wiseguy12855 on July 28, 2010, 01:19:04 AM
Actually that wouldn't work with the games I chose. Pokemon Red / Blue are known for plopping data sections scattered all throughout the ROM. It's known for it's disorganization and all the data is highly compressed meaning there's a very large amount of data sections. To add to the mess and data section's Pokemon games have several other data types besides pictures, text, and music that are specific only to Pokemon games in general. If your the type of person who likes a tough challenge then you see why I chose Pokemon Red/Blue specifically instead of the others which were simplified.
While RBGY feature programming abomination after abomination, they are not particularly different from other games in their data management ― meaning it isn't particularly scattered just to be scattered. It just seems the compiler/programmer did a really bad job of organizing things every once in a while.

Quote from: wiseguy12855 on July 28, 2010, 01:19:04 AM
I'm trying to make a very advanced tracer that traces every function call and jump, disassembles the opcodes, and breaks it up into function blocks.
So another IDA Pro?

Quote from: wiseguy12855 on July 28, 2010, 01:19:04 AM
The leftover data is placed in a separate file to be linked in the exact same position as the original game.
Won't work. Once you edit it and some banks cannot hold data anymore, what do you do? You will have to move data and update all references, some of which might be circumstantial base offset trickery with multiplication etc...

Quote from: wiseguy12855 on July 28, 2010, 01:19:04 AM
Comments will be placed for the obvious lines like changes to a gameboy register.
Shouldn't it work the other way around? Place comments for things that aren't obvious?

Quote from: wiseguy12855 on July 28, 2010, 01:19:04 AM
All variables will have their own file named by their starting position in hexadecimal.
I think you mean variable name and not file name. Or are you going to extract RAM maps, too?

Quote from: wiseguy12855 on July 28, 2010, 01:19:04 AM
The purpose of this project is to fulfill a need for one. There is no program like this as far as I know, in case your wondering why you would ever need something like this is because it greatly if not admirably simplifies rom hacking. basically everything is done for you - all you have to do is skim through the disassembly and make sense out of it with the added bonus you have the full disassembly. No more spending days and weeks getting a memory map and how the sprites and graphics are generated and their locations.
This still requires much effort on the part of the hacker. You won't find a program that will tell you the how and the why of most things. You should IMO strive to be very good at identifying all pointers etc, since those needn't be in any native format at all, or needn't be immediate values somewhere. Instead, it can use an id, multiply it by some factor, add an offset, flip some bits and voila, pointer ready. You will have to account for those things... or at least you should.



So the basic things you would need now is something that traverses the init routine, marks all calls for traversal as well and actually expands them before traversing the init code and subsequent code further... Doesn't sound that hard to me, actually. The pitfalls will clearly be stack operations, custom pointer mechanisms, pointers into data tables with added base offset shifts (so you might expand the same data block twice by mistake...) and maybe custom mapper functions [currently working on PKMN Crystal's mobile phone code. A disassembler wouldn't know where those registers are set and might even think they're invalid. If that works, the compiler will surely optimize stuff that shan't be optimized...].

cYa,

Tauwasser

wiseguy12855

As much as I am for working on this project i don't want to re-invent the wheel. I wasn't aware of IDA Pro, It looks very impressive but very expensive. Can this program actually do everything I'm talking about above. I know it may not extract everything into separate files but if it does this much heck I could do that myself, testing the demo on a windows application I made produced very impressive results.

tcaudilllg

I think you should offer a number of options. As Tauwasser argues, one size does not fit all. You might want to aim for a "toolkit" approach which offers a number of style-specific modifications. For example, if you do an invalid operand scan on the ROM for Another Bible, then you'll probably manage to distinguish the code banks from the non-code banks. Or you might want to distinguish those banks which have invalid opcodes vs those who have none, and just work from there. But as mentioned, this wouldn't work with Pokemon ROM, so you'd want to use a different technique in that case.

Community research could, over time, lead to a build-up of information about which modification strategies work for which ROMs.

phire

IDA can only really extract code, it's next to useless for data.

I've been exploring the Pokemon Red rom as well, and I've had a few of the same thoughts as you.
One of the reasons that data and code is so intermixed in Pokemon Red due to the dialog event interpreter.

It's very limited and is always falling back to assembly code to do its work. It can't even pop up the dialog box at the bottom of the screen without falling back to assembly. Assembly code blocks are prefixed with a 0x08 byte and always must load the value of the next opcode into hl before jumping to 247d.

Because these assembly blocks are accessed dynamically there is no way to automatically get their address short of emulating the rom and playing through the entire game catching every single jp hl instruction.
To make sense of it all you will have to decode every single script event, and to get all the script events you need to work out how to extract them from the map data.

So once you finish you will have a program that can extract the pokemon red rom, and possibly the pokemon blue and green roms too, but your program will need to be adapted before it can decode another game. Still, I think its a worthy goal, just for the pokemon rom. Any chance you want to collaborate?

wiseguy12855

#9
Sorry to revive this dead thread but this is a continuation of this topic and not another topic so all the posts on it are relevant and the topic is the same.

Also, phire, I know it's been a while but I'd be interested in collaborating if you haven't changed your mind.

====

Basically I trashed the project and am starting over on a clean slate as it had gotten quite messy, I'm still faced with the tracing problem. The biggest problem is determining where code blocks start and end and where data blocks start and end.

I proposed the idea on paper to consider all function calls and jumps a new code block since it would never jump to a data block but sometimes, due to high optimization and compression, code blocks will jump to a middle of another code block or even the same code block so this method won't work.

This is my biggest hold-up, recursively scanning through tons of jumps and calls then piecing together the "bigger image" with all the code identified from all the data and while at it, determining the start and end of each code block.

As for determining the Pokemon Red data blocks purpose, my philosophy on this (as stated very vaguely before but now more clearly) is the disassembler can only know as much as the gameboy CPU. To the Gameboy, data blocks are gibberish as well. The code takes gibberish data from various parts of the game and tells the Gameboy CPU what to do with it. Whatever the code tells the Gameboy it also tells the disassembler. If one code block takes a bunch of numbers from some part of the game and starts loading them into video memory then, even if it's compressed in a custom way, it's pretty obvious that piece of data is graphics and the code explains how the compression and decompression is done. ---- Therefore it's possible, with a little logic, decipher many data blocks.

Can anybody help me with the issue of separating code from data (which seems the hardest), before performing some base logic at data blocks as mentioned above.

January 04, 2012, 03:24:21 AM - (Auto Merged - Double Posts are not allowed before 7 days.)

I hate to double post but I've started a Google Document where I'm gathering the plans all in one place if anyone wants to look or make a comment, https://docs.google.com/document/d/1H0dNVn3pPp3w5tMH73RfGCl8k3it1n7WRCdo1qlKdLw/comment

phire

I'm not the most reliable person to collaborating with at the moment, but I will at least try.
I've made some comments on your google doc, (you have already seen it, but for the benefit of anyone else reading through this)

Last year, I experimented with making an interactive disassembler/decompiler. So far it only supports one arch, but usefully for this case, that arch just so happens to be Gameboy flavored Z80.

The idea is to describe each processor in such a way that the core doesn't need to know anything about any processor.  It doesn't need to know how the call instruction works on each cpu. It just gets a mathematical description of what the instruction does to each registers (including the program counter) and works it out from there.
It is very much a prototype and doesn't run on my computer at the moment, but when it was working it could follow jumps and calls and could work out which conditional jump would be taken based on the starting state of the gameboy hardware. But returns confused it, because it didn't know wtf a stack was.

Here is the description of the z80 processor: https://gist.github.com/1563516
I'll get the rest running later, and maybe teach it about the stack.


Nagato

Deleted.

phire

Quote from: Nagato on January 05, 2012, 06:24:20 PM
I suggest studying the halting problem long and hard before going any further.

The halting problem is an software engineering vs computer science issue.

The computer scientist knows that it is impossible, so he saves a lot of time by not even starting.

The software engineer doesn't care if its impossible to write a program that is always correct (When have you ever seen a program free of bugs?) and writes a program that provides the correct answer 90% of the time. The other 10% of the time it will hopefully bail with an error message. With enough time, each deficiency will be patched with another algorithm or workaround until the program works for all commonly supplied inputs.
Sure there might be some cleverly handcrafted input that causes an issue, but the software engineer doesn't care because it works 99.9% of the time, and that's good enough.

In the end, the computer scientist was right but the software engineer ended up with an awesome program that works well enough.

Nagato

Deleted.

IIMarckus

I don't know if you're interested in Pokémon specifically or just using it as an example, but I and a few others have been disassembling Pokémon Red. So if you want to contribute, or use it for some examples of Game Boy data/code, feel free.

FAST6191

Others have already mentioned halting problem and fun maths kind of related to it and I have not spent enough time pulling apart GB/GBC roms to see it but I would also have to wonder if the GBC ever did any in game opcode modification (I do not want to use the term dynamic recompilation but I will at least say it)- if there is one thing I have learned about 16 bit and old consoles is that memory is tight as you like but if there is two it is that games on these systems do love there little tricks.

Back on topic so to speak though what I would instead point you at is one of the things we saw on the DS via tools like no$gba and crystaltile2 (although it stems from the nitroSDK itself) was a kind of IO abstraction/commenting and function mapping tool (NEF files would be a fair starting point for a search term although http://gbatemp.net/topic/73394-gbatemp-rom-hacking-documentation-project-wip/page__view__findpost__p__2782288 has something)- learning ASM both in general programming and for rom hacking, seeing the better guides to it (got to love HLA / http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/index.html ) and watching others do the same it is usually not very hard to teach them basics of opcode construction and if you were to say find the subtract lives routine and say I am going to change this into an add lives routine that would be absolutely fine too. Going for another example consider say the GBA where the first line of the rom is a jump to the binary but after that we look for the first reference back to the rom location (usually 08XXXXXX) as that marks the end of the IO, stack pointer setup and what have you.

I also want to link http://www.youtube.com/watch?v=_MBWTqfCFd0 as it might well be relevant to what you are trying to do.

For my money this would be nearly as useful and spares you running up against the likes of the halting problem.

Whipon

Quote from: Hulta on April 01, 2022, 12:39:45 PM
Is there any other who is playing GB/GBC games in 2022 just to suggest a good emulator for win7, or maybe an emulator that support all kind of retro console games?

I don't know any GB / GBC emulators, but, Have you tried Retroarch?.
It supports dozens of systems.
Only downfall is that it might be a bit tricky to configure.

Regards.

Hulta

Quote from: Whipon on April 01, 2022, 03:06:38 PM
I don't know any GB / GBC emulators, but, Have you tried Retroarch?.
It supports dozens of systems.
Only downfall is that it might be a bit tricky to configure.

Regards.
I read about RetroArch which is a universal emulator that supports all kinds of retro consoles but I didn't understand the part that I need some codes or cores something that this emulator will function for different types of games for example I need a patch for GBC ROMs and a different code/patch for GB ROMs.

Whipon

Quote from: Hulta on April 02, 2022, 07:52:31 AM
I read about RetroArch which is a universal emulator that supports all kinds of retro consoles but I didn't understand the part that I need some codes or cores something that this emulator will function for different types of games for example I need a patch for GBC ROMs and a different code/patch for GB ROMs.

You are correct, it's basically a frontend. You need the required cores for each system.

Here's a tutorial in English:

https://www.fantasyanime.com/emuhelp/retroarch-windows

Good luck ;).

TiCo.KH

Try Ghidra.
I sure have base Z80 processor module, you just need update one of them with GB/GBC special instructions and even the de-compiler will works ok.
Check my HuC6280 module I posted in Personal Project.

It was a bit confusing first how sleigh (ghidra own processor descriptor script language) works. But after 2 months module is compiled and worth every day what I spent on that. Speed up my PC-Engine translation revers-engineering tasks 10x times.