I've always had this special nostalgia for the original Zelda Link's Awakening and after making an emulator few years ago, I wanted to go further.
So I started this project to automate ROM reverse engineering.
Here is a screenshot of the setup. Code at top of page is with all optimizations, code below is without most.
As you can see, loops are reconstructed and call parameters are annotated next to call instructions.
The rightmost image is the current state of Zelda analysis. Green means known code, blue is obvious data (text and sprites) and dark blue is 0xFF padding.
- static decompilation (no emulation/tracing, all code paths treated equally)
- expression reconstruction - sequences of instructions are replaced with c-style assignments.
For example: LD A, 0x30; LD [0xFF00], A is replaced with [0xFF00] = 0x30. This also works across jump boundaries.
- register lifetime analysis - this enables two things: unneeded flag calculations are stripped away and also I get all ins and outs of a procedure, which helps analysis at global scope.
- bank switching analysis - direct consequence of the above. I treat rombank as just another register and can recover ambiguous calls this way. In Zelda, bank switching occurs very often, so I get nice results from this. For 2000 procedures analysed (that's much, but I got this number with aggressive procedure splitting) only around 15 contain calls that couldn't be followed.
- control structure recovery - probably the coolest. Finds loops and forks and uses some heuristics to generate nice if/else and do/while. Some gotos remain, but usually the code is nice.
- switch statement/jumptables - Zelda really abuses this construct - I found more than 200 jumptables. Heuristics decide on jumptable sizes, sometimes giving false positives, but overall it is helpful.
- procedure references - For each procedure I get a list of callers. This can also output call graph to GraphViz, but at the moment it is too big to be useful.
- data references - most recent feature, currently works only for fixed addresses. For each global variable I get a list of procedures that access it. Ultimate helper. Refs to 0xFF00? Yeah, here must be the joypad routine.
- written in Python, GUI is served as HTML. Allows assigning names to procedures, etc.
- output is not really C at the moment, but gives an idea of what's going on in the code.
- recognize array indexing
- obtain full code coverage with no more ambiguous calls. I believe I've got around 90% of procedures now.
- find a way to cope with so many small procedures.
- a lot of other work...
I would like to release it one day, but it is very custom right now and requires coding Python to do stuff. Maybe when I improve the algorithms I can drop some hardcoded Zelda stuff and it will be useful for other Roms too.
Also, source: https://github.com/devdri/awake