News:

11 March 2016 - Forum Rules

Main Menu

ROM Decompiler - how to decode NES ROM files?

Started by addemup, June 01, 2015, 08:33:20 PM

Previous topic - Next topic

addemup

Hello!  :)
I am currently working on a Super Mario Bros Decompiler, and currently the output text file shows nothing but garbled UTF-8 characters.

Here is my code:


public class Decompiler {

public void decompile() throws IOException {
StringBuilder sb = new StringBuilder(2 ^ 36 - 5);
File tstFl = new File(Main.textField_1.getText() + "/test.txt");
PrintWriter pw = new PrintWriter(tstFl);
BufferedReader br = new BufferedReader(new FileReader(Main.romFile));
int c = 0;
while((c = br.read()) != -1) {
sb.append((char) c);
}
pw.println(sb.toString());
br.close();
pw.close();
}

}


Thanks in advance! :)

STARWIN

what do you want the output text file to contain?

Disch

Errr... you're not actually disassembling anything.  You're just taking the data from the ROM and putting it in a text file.  You'd have the same effect just opening the ROM in notepad.

Assuming you are working with assembly code in the ROM, each byte is an "opcode" which represents a specific instruction, and has 0-2 bytes following it as parameters for that instruction.

For example, the opcode for LDA immediate is $A9, and the opcode for STA absolute is $8D

So if you were to come across the following code in the ROM:
A9 00 8D 15 40

That would disassemble to:

LDA #$00
STA $4015


A full list of opcodes can be found here:
http://www.obelisk.demon.co.uk/6502/reference.html

henke37

Oh, and there is a tiny problem with just translating data to assembly blindly: not all data is instructions. You are going to have to figure out which data is code, which is graphics and which is "other" data.

omega_rugal

Also you must strip the iNES header... and split the PRG and CHR data...
Done your packing?, your life journey is over...

addemup

Ok, so I'm successfully getting 6502 opcodes from output.

However, some of the opcodes I'm getting are only one byte instead of two.

Is this normal? Or is it not even PRG data?

Code:
public class Decompiler {
   
   public void decompile() throws IOException {
      try {
         BufferedReader br = new BufferedReader(new FileReader(Main.romFile));
         for(int read = 0; read < 10; read++) {
            System.out.print(br.read() + " ");
         }
      } catch (FileNotFoundException e) {
         e.printStackTrace();
      }
   }

}
Output:
78 69 83 26 2 1 1 0 0 0

Disch

FileReader/BufferedReader are intended for reading text files, and therefore they are going to mangle the data and give you incorrect results.  You do not want to use them for reading a ROM.  You want to use something designed for reading binary files.  I'm not familiar enough with Java to know exactly which class to use, but maybe something like BinaryFileReader or something?

QuoteHowever, some of the opcodes I'm getting are only one byte instead of two.

Every opcode is exactly 1 byte.  Always.
Each instruction consists of an opcode byte, followed by 0-2 additional bytes of data for the operand.  How many bytes follow the opcode depends on the opcode itself.

For a disassembler, you'll want to read 1 byte as the opcode, check which opcode it is to see how many bytes are supposed to follow it.

Example, if you are given the below bytes:


A9 00 18 6D 31 03 85 80


It should disassemble to the below:


LDA #$00    ; A9 00     'A9' opcode is "LDA immediate".. so 1 byte follows the opcode (2 bytes total)
CLC         ; 18        '18' opcode is "CLC implied"..   so 0 bytes after opcode (1 byte total)
ADC $0331   ; 6D 31 03  '6D' opcode is "ADC absolute"..  so 2 bytes following (3 total)
STA $80     ; 85 80     '85' opcode is "STA zero page".. so 1 byte following (2 total)


A reference for all opcodes and what they mean can be found here:
http://www.obelisk.demon.co.uk/6502/reference.html


To check to make sure you are reading bytes properly from the file -- get a hex editor and look at the file.  A hex editor allows you to view every single byte of the file in raw binary form.  It's invaluable to doing any kind of binary work because you can see the exact layout and know exactly what data you're dealing with.

HxD is my favorite free hex editor:
http://mh-nexus.de/en/hxd/

Again -- when programming, you do not want to read the file as text.  You want to read it as binary data.

snarfblam

I feel the need to point out that if you don't understand the following (among other things)...

  • How to write meaningful 6502 ASM
  • How an assembler converts this to machine code
  • How to manually convert machine code back into ASM

...you will never be able to write any kind of software that reverse engineers code. It's like trying to drive without knowing how to shift. You'll never get anywhere.

Disch

I don't know.  I think writing a disassembler is a good way to learn those kind of things.

The whole idea of educational projects is to attempt something that's out of your reach, and learn as you go along.  Hell, the way I learned 6502 was by writing an NSF player.

oziphantom

What does the NES memory map look like? Is like a SNES LO ROM 8000+ = ROM < 2000 ram and the rest is registers.

Basically you want Regenerator, there is a C64 version ( which will do Atari 8bit and Apple][ with some fancy bin manipulation ) which will handle the base ROM fine as line as you clip the area it decodes, but it will only handle 1 bank of 64K data. It doesn't do N banks of Data From X to Y. My Super Regenerator handles SNES LO ROM which means it pulls all the data from the ROM as if it is 8000-FFFF and deals with banks and puts the labels and pointers etc with respect to banks. The 65816 is binary compat with the 6502 so it will read the op codes correctly but the NES doesn't have 24 bit address space so it might not work so well off the bat. But a NES version could be made.

addemup

Quote from: oziphantom on August 14, 2015, 05:19:29 AM
Basically you want Regenerator.

Apparently, you're getting what I'm going for.

My end result is going to be something similar to Earthbound's CoilSnake Hacking program that outputs the code as .yml and .png formats.

snarfblam

Quote from: Disch on August 13, 2015, 06:56:20 PM
I don't know.  I think writing a disassembler is a good way to learn those kind of things.

That's a good point. Maybe there's a better way to word what I'm trying to get at. You have to be willing to learn all those things. If you've successfully written a disassembler, then you've advanced to the point where you have a very solid foundation for understanding assembly. You've got to do the work to get there.

oziphantom

You do realise that it can't be done generically for every ROM right? The best you can do generically is make a ASM file and a bunch of .byte <hex values> as an output. Graphics to png could be done you could let people mark which parts of ROM are this tile format and then export, which would be fine if there is no compression on the tile data.

Which ROM/ Game are you thinking of making it for?

Gemini

Just use IDA with Python scripts to fix stuff around.

addemup


Disch


addemup

Quote from: Disch on August 16, 2015, 06:00:23 PM
Just FYI -- SMB has already been fully disassembled:

Darn... So, I guess I can't make something like Coilsnake, but for SMB1?

Disch

No you still can.  I thought you were trying to make an intelligent game-specific disassembler -- but it appears I misunderstood.

At any rate, you can use the disassembly as a reference.