News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: How do I use this disassembled code in an assembler?  (Read 13588 times)

Videogamer555

  • Jr. Member
  • **
  • Posts: 24
    • View Profile
How do I use this disassembled code in an assembler?
« on: March 22, 2011, 08:41:45 pm »
I used DisPel and disassembled a rom and got some code. Now here's a snippet of just some of the disassembled code:

Code: [Select]
C0/0000: 0707    ORA [$07]
C0/0002: 0F091F12 ORA $121F09
C0/0006: 3F32527F AND $7F5232,X
C0/000A: 5F501614 EOR $141650,X
C0/000E: 7F7D0707 ADC $07077D,X
C0/0012: 090912  ORA #$1209
C0/0015: 1232    ORA ($32)
C0/0017: 3272    AND ($72)
C0/0019: 7259    ADC ($59)
C0/001B: 501D    BVC $003A

Notice a problem? It has the address listed, then the raw machine code, and then the actual opcodes. Sadly though, that's completely unusable. I need something where ONLY the opcodes are visible, so that I can run it back through a compiler and get the EXACT rom back out of it. A recompiled rom that is then reassembled should be 100% identical to the rom before original disassembly, and should therefore be able to be played back on an emulator and have 100% of the functionality of the original rom. How do I systematically get rid of these stupid line numbers and raw machine code that's display on EVERY LINE of the assembly code, so I have code that can be chugged right back through an assembler and pop out the same rom file I started with?

Gideon Zhi

  • Discord Staff
  • Hero Member
  • *****
  • Posts: 3532
    • View Profile
    • Aeon Genesis
Re: How do I use this disassembled code in an assembler?
« Reply #1 on: March 22, 2011, 08:53:28 pm »
Well, you could use a text editor like Textpad that has a block select mode, which would make it easy to just select and delete a few columns from your output. But that's not your only problem - that code you've pasted very likely isn't proper code from whatever game you're trying to disassemble.

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #2 on: March 22, 2011, 09:02:34 pm »
Yeah, it's just junk.
In the event of a firestorm, the salad bar will remain open.

KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 7069
  • *sigh* A changed avatar. Big deal.
    • View Profile
Re: How do I use this disassembled code in an assembler?
« Reply #3 on: March 22, 2011, 09:06:39 pm »
I need something where ONLY the opcodes are visible, so that I can run it back through a compiler and get the EXACT rom back out of it.
The simple answer: you can't.
Disassemblers aren't smart enough to detect code from data. So it's going to try turning the data into junk opcodes.
There have been other topics before stating why building a disassembler that IS smart enough is practically impossible.

Disassembly is only useful if you have small piece of the ROM you know is code (like a subroutine) and you don't want to have to trace every possible branch in an emulator.
"My watch says 30 chickens" Google, 2018

Videogamer555

  • Jr. Member
  • **
  • Posts: 24
    • View Profile
Re: How do I use this disassembled code in an assembler?
« Reply #4 on: March 22, 2011, 09:46:52 pm »
I need something where ONLY the opcodes are visible, so that I can run it back through a compiler and get the EXACT rom back out of it.
The simple answer: you can't.
Disassemblers aren't smart enough to detect code from data. So it's going to try turning the data into junk opcodes.
There have been other topics before stating why building a disassembler that IS smart enough is practically impossible.

Disassembly is only useful if you have small piece of the ROM you know is code (like a subroutine) and you don't want to have to trace every possible branch in an emulator.

I want to generate the full disassembly of a game (lets say  Super Mario World so I can hack some cheats into it), then recompile the rom so I can play on an emulator. How do I do that. Say I want to disassemble SMW starting at $103C, but I don't know if $103C is RIGHT IN THE MIDDLE OF A COMMAND, or if it is at the border BETWEEN 2 commands. How do I know which it is. If I try to insert my own code at a point IN THE MIDDLE OF A COMMAND, then I'm going to break the rom. If I insert it BETWEEN to commands, then I'm good, EXCEPT in the event that absolute-location based JMP commands now no longer point to the right place (Would NOT cause a problem though if the JMP command was relative).


As for disassembling the full code, that should be easy.
One clue to detect code from data is this.  Looking at some info online, ROM can be accessed by calling to these memory locations.
$8000-$FFFF    ROM    Data that is mapped here depends on the cartridge.
$40-$7D    $0000-$FFFF    ROM    Data that is mapped here depends on the cartridge.

IN other words, raw data pressent in the rom can be read from here.
So if there is code earlier in the rom that calls for reading from these locations, then these locations are being used for data NOT CODE, thus the regions of the rom used data regions can be mapped out and stored in a separate data file (which WILL be needed when recompiled, so that the game's data can be reinserted), and therefore NOT mistaken for code. If lets say the disassembler read this from an earlier point in the rom

LDA #$00    ;set counter to 0
STA $0000    ;set counter to 0
MyLoop:
LDA $(8000+$0000)    ;take rom data represented by memory location $(8000 + counter) and
STA $(0001+$0000)    ;put it in ram memory location $(0001 + counter)
INC $0000
LDA $0000
CMP #$FF
BEQ SomewhereOutsideThisLoop
JMP MyLoop
SomewhereOutsideThisLoop:
more ASM code down here...


then the didsassembler will know by the fact that memory locations $8000 to $80FE were read means clearly that ROM locations $0000 to $00FE contain GAME DATA, and NOT programing code, and therefore this region of ROM should be sent to a separate data file, and IGNORED by the disassembler.

It's as simple as that.



By the way, the game I'm disassembling isn't a standard SNES game in this case. I've been trying to disassemble a Satellaview game, BS Zelda.
« Last Edit: March 22, 2011, 09:52:07 pm by Videogamer555 »

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #5 on: March 22, 2011, 10:45:24 pm »
You don't need a full disassembly. Just compile your hacks into the game with xkas or WLA. That's what we do.
In the event of a firestorm, the salad bar will remain open.

Videogamer555

  • Jr. Member
  • **
  • Posts: 24
    • View Profile
Re: How do I use this disassembled code in an assembler?
« Reply #6 on: March 22, 2011, 11:18:15 pm »
You don't need a full disassembly. Just compile your hacks into the game with xkas or WLA. That's what we do.

Ok, say I make a hack and expand the rom so I have some place to put it.
Now how do I make the game JMP to the correct spot in the ROM, and then more importantly, after running the hack, JMP back to the CORRECT  spot for the game start (the place it would have originally jumped to before I hacked the game)?

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #7 on: March 22, 2011, 11:23:20 pm »
Why would you want to use JMP? You use JSR/JSL to branch execution to your hacks, and RTS/RTL to return.
« Last Edit: March 23, 2011, 04:54:58 am by Ryusui »
In the event of a firestorm, the salad bar will remain open.

Videogamer555

  • Jr. Member
  • **
  • Posts: 24
    • View Profile
Re: How do I use this disassembled code in an assembler?
« Reply #8 on: March 23, 2011, 12:22:20 am »
Why would you want to use JMP? You use JSR/JSL to branch execution to your hacks, and RTN/RTL to return.

Well for one thing, if the command I'm working with is a JMP to start with (say I start by editing a JMP that was already in the original code of the game), then it is needing a JMP to start with. For example, lets say I locate a JMP in the original code, and edit its pointer to go to my hack. Then I'll need another JMP pointing back to the original destination of the original JMP in order to make sure that the rest of the program gets executed as it should.
« Last Edit: March 23, 2011, 12:33:16 am by Videogamer555 »

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #9 on: March 23, 2011, 12:53:55 am »
I fail to see the problem, then. You already understand what needs to be done. Just insert a new JMP over the existing one and make sure your own code hack ends with a JMP back to the appropriate ROM location.
In the event of a firestorm, the salad bar will remain open.

Videogamer555

  • Jr. Member
  • **
  • Posts: 24
    • View Profile
Re: How do I use this disassembled code in an assembler?
« Reply #10 on: March 23, 2011, 03:47:17 am »
What exactly is the difference between JMP and JML? I know the one ending in L means a "long jump" but what does that really mean?

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #11 on: March 23, 2011, 04:54:46 am »
The SNES can't address the entire cartridge at once. Instead, the cartridge is split up into banks. There are two formats for this: LoROM (32KB banks) and HiROM (64KB banks).

JMP and JSR both provide a two-byte (or 16-bit) memory address to jump to. That is, they only allow you to jump to a new location in the same bank. JML and JSL provide a three-byte memory address which includes the bank; they are "long" jumps because they can jump to code in a different bank. Note that JSR and JSL also push the return address (the program counter for the next instruction) onto the stack; RTS and RTL pop the return address into the program counter. Since this return address will be two bytes for JSR and three bytes for JSL, it is imperative that you use RTS exclusively with JSR and RTL exclusively with JSL.
In the event of a firestorm, the salad bar will remain open.

RyanfaeScotland

  • Sr. Member
  • ****
  • Posts: 366
    • View Profile
    • My Brill Game Site
Re: How do I use this disassembled code in an assembler?
« Reply #12 on: April 16, 2011, 05:06:40 am »
Hey KingMike (and anyone else really),

We studied languages in uni for a bit and the idea of a language that can parse and translate from one language to another didn't seem that impossible. Granted we were looking at machine translation but the ideas are the same really, translating from one computing language to another is basically what compiling/recompiling is. So long as you are working with a fixed, short grammar and a finite alphabet it should be possible. Anyway I'm not really looking to debate this since you said there are other discussions on it which I haven't read. [edit: have read up on the few topics I can find here at RHDN]

What I am wondering is if one of the debugging emulators out there can't be modded to create recompiled output files or alternatively couldn't a program be created that processes the standard output from one of the debugging emulators into a complete decompile, although it would still need analysed to be understood.

An example from my experience is using Gens Tracer and Hacking Gens to alter code for infinite lives. As I'm sure you know this process generates long text files of decompiled code generated as the game plays through. Couldn't something be written to analyse this output and put it into some sort of decoded order?

EDIT: Let me just make it clear that this isn't a 'hey here is a revolutionary idea for you to implement' post. I'm geniuenly interested in this myself and would like to know a bit more about what's been done, what people believe can and can't be done. It may be something I pick up and have a stab at myself but it really depends on what's been tried in the past and the answers to some of my questions.
« Last Edit: April 16, 2011, 02:37:09 pm by RyanfaeScotland »

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #13 on: April 17, 2011, 05:36:44 am »
The problem with using a trace to take apart and reconstruct a game is that the code would need to go through every possible path in order to generate a complete disassembly.

Think of it like transcribing a Choose Your Own Adventure book. You need to read through the whole thing several times before you see every page. For a game, multiply that by a factor of a hundred thousand. There are also things that might never be executed, like debugging functions and dummied-out code.

You might get something runnable, but it wouldn't exactly be complete. Not by a long shot.
In the event of a firestorm, the salad bar will remain open.

BRPXQZME

  • Hero Member
  • *****
  • Posts: 4572
  • じー
    • View Profile
    • The BRPXQZME Network
Re: How do I use this disassembled code in an assembler?
« Reply #14 on: April 17, 2011, 05:55:11 am »
The concept of full decompilation generally simplifies to the halting problem; that technically makes it an unsolvable problem—in theory-world.

What we do in the real world is to put certain limits on decompiling to make it practical. We can reduce the scope of what we look at (e.g. a trace, a real example of code that actually happens) or we limit what the program itself does (e.g. code for Google Native Client is, by design, sandboxed and analyzable). Certainly, “some” programs can be fully decompiled. But not “all”, and frankly, it isn’t even “most” when you use certain platforms.
we are in a horrible and deadly danger

RyanfaeScotland

  • Sr. Member
  • ****
  • Posts: 366
    • View Profile
    • My Brill Game Site
Re: How do I use this disassembled code in an assembler?
« Reply #15 on: April 17, 2011, 07:36:33 am »
The problem with using a trace to take apart and reconstruct a game is that the code would need to go through every possible path...

Think of it like transcribing a Choose Your Own Adventure book. You need to read through the whole thing several times before you see every page...

See this is the approach I was thinking you could come at it from. One idea, the long one, would be to play through the game with the tracer on and then feed the trace log into a program which combines repeated passes through the same code. Now I'm no fool, I know this would take a long time and, as mentioned, may still not uncover everything, but presumably it should reveal enough to be of use.

The other idea, the better one, would be to have a recursive algorithm that goes through the code, splitting on each decision and following a new track each time before returning and following another track. Recursion doesn't need to be what is used but it's what springs to mind.

To use your metaphor: we enter the book at the start point, page 1 and we read through recording everything as we do. Finally we get to the branch, either page 34 or 62, so we go with 34 for no reason other than say it's the closest. We repeat this over and over, recording and branching as we go. Once we get to a certain point we return back to our most recent branch and go the other way and repeat. This should eventually lead to every reachable passage in the book. It won't give us things that aren't normally reachable in the game, in instruction pages, publishers, pictures and so on but it would give use everything else to play the game. We could even compare sizes of the original and the output to see how much we are missing then use traditional methods to find the rest.

tl:dr - We go through the code, branching at every point and recoding the results then branch the other way and do the same

Quote from: BRPXQZME
The concept of full decompilation generally simplifies to the halting problem...

What we do in the real world is to put certain limits on decompiling to make it practical...
Yes this is something we discussed in depth on the course I mentioned above. I'll need to refresh myself but if I remember correctly it's along the lines of 'there is no way for one program to know when another stops or no' Please feel free to correct me. But imoh you are giving the solution to your own problem! We don't need a nice algorithm that can spot halting, we can put in artificial limits and still have something that would be perfectly usable. Using the book example from above lets say we go page 1, then 34 then 56 which leads us to a loop back to 34 -> 54 -> 34, we could put in safe guards that say if we loop some 100, 1000 or 10000 times then we break from that recursion and take the other branch. Would could put in another condition where if we haven't say any new code in X length of time or X iterations then we assume we have saw everything. Again yes this wouldn't reveal everything but would still give enough to be useable.

On a side note I'm pretty sure the halting problem refers to a single algorithm that can be used on any program for any input in any language. This isn't what we are working with, we have a fixed domain (for me it would be the 68K) so should be able to write a program that can analyse this. As mentioned I do need to refresh on things :)

tl:dr - Halting problem does effect us cause we are in a fixed domain and can put it artificial limits (we aren't creating an academic proof here!)

BRPXQZME

  • Hero Member
  • *****
  • Posts: 4572
  • じー
    • View Profile
    • The BRPXQZME Network
Re: How do I use this disassembled code in an assembler?
« Reply #16 on: April 17, 2011, 08:12:02 am »
The halting problem works on a Turing machine, which simply doesn’t and can’t exist (it has nothing to do with “any” language, because the language really applies to real code reasonably well; the reason halting is undecidable is because the Turing machine has infinite memory). However, there are things that make it relevant when you try to apply it for real. I bombed out of third-year CS, so here’s an article listing the bad news that someone else wrote because I could not write one that well. And here’s a complementary article on why it is not completely impossible.

In short, it is not completely impossible. That does not mean it is not a complete son of a bitch.
we are in a horrible and deadly danger

RyanfaeScotland

  • Sr. Member
  • ****
  • Posts: 366
    • View Profile
    • My Brill Game Site
Re: How do I use this disassembled code in an assembler?
« Reply #17 on: April 17, 2011, 08:48:18 am »
Cheers for the links, I''ve read through the one that reckons it is not completely impossible and will read through the rest when I've a bit more time.

I'll need to pull out my third year notes as well, that's where we covered this stuff as well :)

Nightcrawler

  • Hero Member
  • *****
  • Posts: 5787
    • View Profile
    • Nightcrawler's Translation Corporation
Re: How do I use this disassembled code in an assembler?
« Reply #18 on: April 18, 2011, 10:53:41 am »
Since you can't disassemble all code (we've established that on a technical level), what is the useful purpose of this? It goes back to then just trace logging a section of relevant and useful code instead, which one can do in a few minutes.

By the way, platforms such as the SNES can do this:
JMP ($1234),x

It's primarily used as a function jump table.  The code branch is determined by the X input state, which is determined by current game state. Even if you tried to do all possible branches, what would you do here? Try all 256 possible X register inputs? Nope, that doesn't work because the game won't have 256 valid entries. It'll only use some small handful (no way to predetermine how many). You're going to end up disassembling junk.

I'm sure there's plenty of other gotchas. This topic has been discussed many times and always ends the same. The nature of the beast on platforms where dis-assembly of all code is not possible leaves little use or advantage for the effort necessary over simply tracing relevant code when needed.
TransCorp - Over 20 years of community dedication.
Dual Orb 2, Wozz, Emerald Dragon, Tenshi No Uta, Glory of Heracles IV SFC/SNES Translations

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: How do I use this disassembled code in an assembler?
« Reply #19 on: April 18, 2011, 01:04:29 pm »
By the way, platforms such as the SNES can do this:
JMP ($1234),x

It's primarily used as a function jump table.  The code branch is determined by the X input state, which is determined by current game state. Even if you tried to do all possible branches, what would you do here? Try all 256 possible X register inputs? Nope, that doesn't work because the game won't have 256 valid entries. It'll only use some small handful (no way to predetermine how many). You're going to end up disassembling junk.

GB and GBA games do it too. I've seen it myself.
In the event of a firestorm, the salad bar will remain open.