News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Controlled Disassembly  (Read 4518 times)

zonk47

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Controlled Disassembly
« on: April 01, 2016, 06:31:46 pm »
Beta release 0.2.1 can now be downloaded here: http://s000.tinyupload.com/index.php?file_id=37867285792111098461

This is a project to create a controlled disassembler using the TreeView Windows GUI control.

The TreeView control is an ethereal beast: there is substantial documentation on it but it is generally pretty confusing. MS says that adding keys to nodes is optional, but as far as I can tell the only way is add children ("branches") to the tree to reference the key of the parent. Also, the TreeView control's methods must be used directly... the tree nodes themselves have no methods that can be used to create children (the documented node methods don't actually change the tree).

Preliminary tests suggest that the TreeView control has bugs... it does for example create negative index values for its nodes for some reason. I got an out-of-bounds error after making a branch of thousands of nodes that had hundreds of children...



The way the disassembler works is that it starts from a point and disassembles until either unconditional jump or the end of the file. When it encounters a subroutine or conditional branch, it makes a tree branch and a "dummy" node under it, to make the branch real until the user expands it, at which point it disassembles/expands the subroutine in question. In practical terms its current design only works with small programs, particularly as regards the Z80 core I'm using in the prototype. Once I've worked out a decent method for "mapping", it'll be more useful.

The primary advantage of tree-guided "controlled" disassembly is that it gives form to the logical structure of a program. This makes the code itself seem easier and less nebulous to follow. The only real alternative is to use hyperlinks, but even then the brain struggles to cope when there is a great deal of ambiguity (as in the case of raw disassembly). This prototype is making use of low-grade decompilation (to BASIC) to further assist the programmer in comprehending the disassembly. Ideally it would be integrated into an emulator, to allow for a more detailed analysis. (for those wondering, the file disassembled in the screenshot is not a Z80 program, but a random text file, so the disassembly/decompilation looks "buggy" in that it's branching without prior compares).

April 02, 2016, 08:35:26 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
I realized that I could deal with the mappers by just creating branches for all the banks. This way the user can disassemble banks as needed.--nevermind, I realized that won't work because I don't know the start address of the routines except at the last bank. Only way is some kind of multiple document interface scheme so that every bank can have its per bank... unless I had a scheme where you set the text of the branch and use that as the start offset of the disassembly on expansion. Which might work... but it might also be confusing.

The MS TreeView control may not be cut out for this kind of work... may have to make a new version of it for the modern era. Will ask around about that.

I'm just about ready for a release with all the features I had planned in place (including address labeling for hardware and software)... or so I thought. Looks like I'll have to look up MDI for VB6.

v.0.2.1 - fixes in the labeling, added GB proj + binaries

v.0.2 - fixed endianness

v.0.1 - initial release


« Last Edit: April 06, 2016, 10:24:56 pm by zonk47 »
A good slave does not realize he is one; the best slave will not accept that he has become one.

henke37

  • Hero Member
  • *****
  • Posts: 643
    • View Profile
Re: Controlled Disassembly
« Reply #1 on: April 03, 2016, 04:33:37 am »
This isn't going to work out. Trees are poor for visualizing very complex structures. And in particular, they are horrible at dealing with structures that aren't strictly trees. Things such as loops and nodes being in multiple places at the same time really make things ugly.

zonk47

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Controlled Disassembly
« Reply #2 on: April 03, 2016, 03:10:31 pm »
It actually helps me a lot, because it helps me build my knowledge of the code in encapsulated terms. I disagree that trees are poor for visualizing complex structures... can you provide evidence for this claim? We certainly have no other means for visualizing complex structures. Flow charts are possible, but the last (only?) flow chart for a game disassembly I saw is the Galaga one... even something that simple was very hard to follow, and few programmers have the patience to make one.

I've reworked the design to include MDI. The new design allows to the user to specify the machine's bank size and the number of program banks in the ROM. Then the user disassembles banks as needed, with each bank getting its own window.
A good slave does not realize he is one; the best slave will not accept that he has become one.

Tenkarider

  • Jr. Member
  • **
  • Posts: 35
    • View Profile
Re: Controlled Disassembly
« Reply #3 on: April 03, 2016, 03:24:14 pm »
how do you handle the bytes that are used as table values instead of being used directly as Assembly? does your code recognize those situations?

Gemini

  • Hero Member
  • *****
  • Posts: 2026
  • 時を越えよう、そして彼女の元に戻ろう
    • View Profile
    • Apple of Eden
Re: Controlled Disassembly
« Reply #4 on: April 03, 2016, 03:31:40 pm »
If you're looking for better representations:

IDA does pretty much a majestic job.
I am the lord, you all know my name, now. I got it all: cash, money, and fame.

zonk47

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Controlled Disassembly
« Reply #5 on: April 03, 2016, 08:54:37 pm »
how do you handle the bytes that are used as table values instead of being used directly as Assembly? does your code recognize those situations?

You mean situations where the routine jumped to is an offset calculated by adding register contents to the contents of a place in memory? Well there's only so much that can be done outside of an emulator... even if you do register emulation, sooner or later you run into a situation where you can't get data unless you step through the program. An emulator is outside the scope of this project, although it's certainly feasible to integrate this project into one.

I'll have to do some extra testing to make sure it's NOT trying to do this.

Gemini, I'd love to make a clone of IDA, but I don't have the time. Maybe we could all pitch in and make something. I'm certain that most people don't have the money for IDA (particularly rom hackers), and with such high sums involved I'd hate to tempt the fates.
A good slave does not realize he is one; the best slave will not accept that he has become one.

Rotwang

  • Full Member
  • ***
  • Posts: 170
    • View Profile
Re: Controlled Disassembly
« Reply #6 on: April 04, 2016, 01:07:18 am »
The entire foundation of this community is unauthorized copies of game software, do we really have to keep pretending like we actually paid for IDA Pro?

DackR

  • Full Member
  • ***
  • Posts: 130
  • Mo~
    • View Profile
    • Hackaday.io Page
Re: Controlled Disassembly
« Reply #7 on: April 04, 2016, 02:34:49 am »
It really looks useful, but DAMN, that's expensive!

zonk47

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Controlled Disassembly
« Reply #8 on: April 04, 2016, 03:44:43 pm »
Beta released (link at top) with source. This version has MDI to support disassemblies of multiple banks. I went through the Z80 instruction set and it doesn't seem like there are any index + accumulator regs... or it may be I've left out a mode and the instruction set is incomplete?

Notes:
- you can define labels for hardware addresses (memory addresses only for now) by including creating a file called "machine.txt" in the same directory as the disassembler and defining the addresses/labels in key/value pairs. You can also make comment lines by starting them with (')

Ex:
'Video Hdwr
2000:Pixel_Processor

- when you open a file for disassembly, the disassembler looks in its own directory for a file with the same name but a "txt" extension. It treats this as an additional set of labels. However, for accurate mapping these key/value pairs must be preceded with the bank number at which the label is relevant.

Ex:
'Lives at start
1:C045:PLAYER_LIVES


Let me know if you find any bugs.
« Last Edit: April 04, 2016, 04:23:00 pm by zonk47 »
A good slave does not realize he is one; the best slave will not accept that he has become one.

Calindro

  • Jr. Member
  • **
  • Posts: 50
    • View Profile
    • Emulicious
Re: Controlled Disassembly
« Reply #9 on: April 04, 2016, 04:27:44 pm »
Which system(s) do you plan to target with this? Or do you plan to target the Z80 in general?
I tried answering that myself by looking for hints on that in your posts and in your source code but your download link doesn't work for me (it says file does't exist).

zonk47

  • Sr. Member
  • ****
  • Posts: 343
    • View Profile
Re: Controlled Disassembly
« Reply #10 on: April 04, 2016, 04:50:32 pm »
Which system(s) do you plan to target with this? Or do you plan to target the Z80 in general?
I tried answering that myself by looking for hints on that in your posts and in your source code but your download link doesn't work for me (it says file does't exist).

Try now. I had to do some extra work/corrections.

It targets any machine which uses the z80. Create "machine.txt" and populate it with appropriate labels to customize it to a particular machine.

While it can be used on its own (it's particularly good for tracking down compression routines) it would ideally be integrated into an emulator. Could be modified so that instead of it dumping on demand, it would dump along with the emulator while avoiding repeat dumps. This would make game cartography much easier. I've been looking around and there are apparently VB6-based emulators for the x86, Z80, and 6502 in existence (and there may be others), so it's certainly feasible to make the integration. The disassembler is designed with modularity in mind, so you can easy swap out cpu cores at source level.

Update: found a bug... Memory mapping isn't working right... working on it.fixed

April 06, 2016, 01:26:11 am - (Auto Merged - Double Posts are not allowed before 7 days.)
OK I tried disassembling FF Legend. A few weird things:
- the disassembly is inaccurate because of the uniqueness of the GB CPU. (the overflow test jumps in particular seem either not to exist or are remapped to different opcodes).
- I've heard that the Gameboy is little endian, but the jump addresses appear to be big endian(?)

Would like some feedback about the addressing. As for the emulation inaccuracy, I will make an alternative core for use with the GB CPU.

I'm using Game Lad to investigate the accuracy of the disassembly.

It disassembles Gameboy games correctly now (mostly... CB mode hasn't been implemented yet. No ETA on that... I'm taking a break).
« Last Edit: April 06, 2016, 10:26:58 pm by zonk47 »
A good slave does not realize he is one; the best slave will not accept that he has become one.