News: 11 March 2016 - Forum Rules

Author Topic: Extracting assembly as diagrams/tables  (Read 3820 times)

Dugongue

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Extracting assembly as diagrams/tables
« on: May 31, 2019, 12:36:15 pm »
I have been having fun trying to reverse engineer NES code & I thought it would be cool to have it in an easy-to-comprehend format on my website, so I had the idea of writing a rudimentary disassembler that formats data/code based on classes assigned in an ini file.
For functions I used a js tool called mermaid which has a syntax that converts into auto-formatted SVG flowcharts.

Here are some of my WIPs:
* http://dugongue.com/WIP/mermaid/test.html (first attempt)
* http://dugongue.com/WIP/mermaid/test2.html
* http://dugongue.com/WIP/mermaid/index.html
* http://dugongue.com/newDisassembly/Nintendo%20Entertainment%20System/Chip%20'N%20Dale%20Rescue%20Rangers/index.html

This current setup is a pretty unfeasible solution for several reasons but I just wanted to know if my basic concept is a good idea or if I should direct my efforts elsewhere

I did see one other document on the site that used flowcharting:
http://www.romhacking.net/documents/%5B432%5Drs1re.pdf

Mauron

  • Submission Reviewer
  • Hero Member
  • *****
  • Posts: 551
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #1 on: May 31, 2019, 01:12:52 pm »
That looks interesting. I'd love to play with it some myself.
Mauron wuz here.

FAST6191

  • Hero Member
  • *****
  • Posts: 3358
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #2 on: May 31, 2019, 04:36:59 pm »
Don't think I have seen disassembly as a flow chart before. Fairly obvious now that I think about it.

I wonder if you could do some kind of autogenerate for this using a variation on FCEUX's find function stuff. That would really set this off.

Dugongue

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #3 on: May 31, 2019, 06:00:26 pm »
Yes, you could probably do something like that. My current JS implementation requires the page # and the start/end addresses of the function to map it fully but you could give it logic to auto-detect the end of functions (by keeping tracking of all of the branch/jump destinations as it prints).
There are some things that would confuse it, like data in the middle of the code (e.g. stack trickery after subroutine calls). Or cases where a branch is always true & used like a jump. But you could note these instances in the ini file to adjust the flowchart accordingly.

I'm also working in a way to have it format in-game data tables.
If you ID the tables with their specific functions you could have them formatted accordingly. For example once you ID the pointer table to the game's palettes, I wrote code that will format it into a table displaying the exact colors.

Also I've been looking at the Koei games & they use a special formatting for most of their compressed "higher-level" functions (I nicknamed them "k-functions"); I could use the same basic code structure to extract those as well.

My intent isn't necessarily to 100% automate the process, but just to make it easier to begin disassembling a game from scratch.
I also want to be able to share this output with others so that multiple people can contribute.

June 02, 2019, 01:06:10 pm - (Auto Merged - Double Posts are not allowed before 7 days.)
Been experimenting with rendering these in yEd which has insanely good automatic layout algorithms.
« Last Edit: June 02, 2019, 01:06:10 pm by Dugongue »

Cyneprepou4uk

  • Hero Member
  • *****
  • Posts: 734
  • I am the baldest romhacker
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #4 on: June 03, 2019, 02:31:49 pm »
Last screenshot is great. I was just looking for some nes code schemes programm for my guide, and here you are.

So how does it works exactly? You copy some bytes of code from rom, paste somewhere and this picture will appear?

Also, can this programm or whatever read ram .nl files from fceux emulator and show comments for ram addresses at this picture? And I wonder about cyrillic symbols as well

Dugongue

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #5 on: June 03, 2019, 11:25:12 pm »
To be honest the last picture was not done automatically, I generated a basic flowchart with my javascript utility and then manually built & formatted it in the flowchart program yEd. It accepts a format called graphml though so you could possibly have my code generate graphml files and then use yED to perform the auto-layout.
Applying custom names to RAM addresses is something I worked in too.
Cyrillic symbols shouldn't be a problem at all.
Though I'm still a beginner so I don't really have a finished product to share with anyone yet.

abw

  • Hero Member
  • *****
  • Posts: 592
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #6 on: June 04, 2019, 08:50:25 am »
Yeah, that last screenshot is pretty nice. I'm also interested in hearing more details about what you're doing. Automated analysis of NES ROMs is problematic for a number of reasons, and handling the full range of craziness that exists in the wild requires a pretty sophisticated process. How fancy have you gotten so far?

Dugongue

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #7 on: June 04, 2019, 03:51:11 pm »
Well for a lot of functions the issue is that it's hard to work out the best way to discretely divide them into individual functions since there's typically multiple cases of the same section of code having multiple "entry points" or a function ending with a JMP (though that is for the most part functionally identical to a JSR followed by an RTS).
It probably does require some human intuition to guide the process.

Here is another partially-labeled mockup of a more complex function.

The main structure of the flowchart can be constructing using the start/end address as input & labels/comments/subgraphs could be defined by the user.
Having color-coded text would help as well to differentiate things though this doesn't seem possible purely within yEd (would possibly require the graph to be exported to SVG and then altered from there); same w/ additional features like hyperlinking on function calls or add'l data provided upon hovering (Mesen debugger is an excellent example of these features)
« Last Edit: June 04, 2019, 03:57:55 pm by Dugongue »

abw

  • Hero Member
  • *****
  • Posts: 592
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #8 on: June 06, 2019, 11:15:15 pm »
Yup, that's one issue. Given a block of code, you could eventually trace all of its entry points backwards until they reach a common point and call that or its nearest entry point the start of the function, but that gets tricky in the general case too. I really like the visualization aspect of this, but I'm actually more interested in what's going on behind the diagrams.

One of my pet projects has been a NES disassembler that leverages a CDL file to provide (with many caveats) code/data separation, detects and labels intra-bank absolute and indexed control flow and data load targets, and utilizes a separate file to augment that with various other information such as assumptions for unknown bytes, targeting info for inter-bank operations and indirect jump/pointer tables, comments for blocks and lines of code/data, etc. with the end goal being the generation of a fully labelled and commented re-assemblable disassembly. Most of the time it works out pretty well, but there are some annoying edge cases to deal with.

Anyway, it sounds like you're making progress - keep us updated :).

Dugongue

  • Jr. Member
  • **
  • Posts: 6
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #9 on: June 07, 2019, 10:13:12 pm »
Do you have any examples of the output of that?

abw

  • Hero Member
  • *****
  • Posts: 592
    • View Profile
Re: Extracting assembly as diagrams/tables
« Reply #10 on: June 09, 2019, 11:50:38 pm »
The whole thing is still very much a work in progress (in particular, I haven't gotten around to actually putting labels in the output yet), but here's an example of what I'm playing with at the moment, taken from Dragon Warrior II (U) [!].nes.

Version 1: this is the basic output leveraging FCEUX's CDL file. In addition to the stuff you usually get from a disassembler, I've included the ROM address in the output, and this section exhibits highlighting of control flow and indirect data load targets and code/data/unknown transitions; for unknown bytes, I supply the hypothetical disassembly as inline comments to assist in deciding whether unknown bytes could plausibly be code. Note the calls to $FA2E - they're always followed by 1 byte of data, which would tend to mess up most disassemblers. DW2 has several functions like that; I've come across about a dozen so far, some of them taking up to 4 bytes of data read via manipulating the stack pointer.
Code: [Select]
; control flow target (from $9B88)
0x019BB6|$06:$9BA6:A9 5B    LDA #$5B
0x019BB8|$06:$9BA8:20 69 A3 JSR $A369
0x019BBB|$06:$9BAB:C9 08    CMP #$08
0x019BBD|$06:$9BAD:B0 0B    BCS $9BBA
; call to code in a different bank ($0F:$FA2E)
0x019BBF|$06:$9BAF:20 2E FA JSR $FA2E

; code -> data
; indirect data load target
0x019BC2|$06:$9BB2:05

; data -> code
; call to code in a different bank ($0F:$FA2E)
0x019BC3|$06:$9BB3:20 2E FA JSR $FA2E

; code -> data
; indirect data load target
0x019BC6|$06:$9BB6:4B

; data -> code
0x019BC7|$06:$9BB7:4C 48 95 JMP $9548

; control flow target (from $9BAD)
; call to code in a different bank ($0F:$FA2E)
0x019BCA|$06:$9BBA:20 2E FA JSR $FA2E

; code -> data
; indirect data load target
0x019BCD|$06:$9BBD:4C

; data -> code
; call to code in a different bank ($0F:$C3AB)
0x019BCE|$06:$9BBE:20 AB C3 JSR $C3AB
0x019BD1|$06:$9BC1:A5 32    LDA $32
0x019BD3|$06:$9BC3:30 2D    BMI $9BF2

; code -> unknown
0x019BD5|$06:$9BC5:20 4F 9D ; JSR $9D4F
0x019BD6|$06:$9BC6:4F      ; INVALID OPCODE
0x019BD7|$06:$9BC7:9D 18 69 ; STA $6918,X
0x019BD8|$06:$9BC8:18      ; CLC
0x019BD9|$06:$9BC9:69 0E    ; ADC #$0E
0x019BDA|$06:$9BCA:0E AA BD ; ASL $BDAA
0x019BDB|$06:$9BCB:AA      ; TAX
0x019BDC|$06:$9BCC:BD 2D 06 ; LDA $062D,X
0x019BDD|$06:$9BCD:2D 06 85 ; AND $8506
0x019BDE|$06:$9BCE:06 85    ; ASL $85
0x019BDF|$06:$9BCF:85 8F    ; STA $8F
0x019BE0|$06:$9BD0:8F      ; INVALID OPCODE
0x019BE1|$06:$9BD1:BD 2E 06 ; LDA $062E,X
0x019BE2|$06:$9BD2:2E 06 85 ; ROL $8506
0x019BE3|$06:$9BD3:06 85    ; ASL $85
0x019BE4|$06:$9BD4:85 90    ; STA $90
0x019BE5|$06:$9BD5:90 46    ; BCC $9C1D
0x019BE6|$06:$9BD6:46 90    ; LSR $90
0x019BE7|$06:$9BD7:90 66    ; BCC $9C3F
0x019BE8|$06:$9BD8:66 8F    ; ROR $8F
0x019BE9|$06:$9BD9:8F      ; INVALID OPCODE
0x019BEA|$06:$9BDA:BD 2D 06 ; LDA $062D,X
0x019BEB|$06:$9BDB:2D 06 38 ; AND $3806
0x019BEC|$06:$9BDC:06 38    ; ASL $38
0x019BED|$06:$9BDD:38      ; SEC
0x019BEE|$06:$9BDE:E5 8F    ; SBC $8F
0x019BEF|$06:$9BDF:8F      ; INVALID OPCODE
0x019BF0|$06:$9BE0:9D 2D 06 ; STA $062D,X
0x019BF1|$06:$9BE1:2D 06 BD ; AND $BD06
0x019BF2|$06:$9BE2:06 BD    ; ASL $BD
0x019BF3|$06:$9BE3:BD 2E 06 ; LDA $062E,X
0x019BF4|$06:$9BE4:2E 06 E5 ; ROL $E506
0x019BF5|$06:$9BE5:06 E5    ; ASL $E5
0x019BF6|$06:$9BE6:E5 90    ; SBC $90
0x019BF7|$06:$9BE7:90 9D    ; BCC $9B86
0x019BF8|$06:$9BE8:9D 2E 06 ; STA $062E,X
0x019BF9|$06:$9BE9:2E 06 20 ; ROL $2006
0x019BFA|$06:$9BEA:06 20    ; ASL $20
0x019BFB|$06:$9BEB:20 2A FA ; JSR $FA2A
0x019BFC|$06:$9BEC:2A      ; ROL
0x019BFD|$06:$9BED:FA      ; INVALID OPCODE
0x019BFE|$06:$9BEE:0C      ; INVALID OPCODE
0x019BFF|$06:$9BEF:4C 48 95 ; JMP $9548
0x019C00|$06:$9BF0:48      ; PHA
0x019C01|$06:$9BF1:95 20    ; STA $20,X

; unknown -> code
; control flow target (from $9BC3)
0x019C02|$06:$9BF2:20 4F 9D JSR $9D4F
0x019C05|$06:$9BF5:AA      TAX
0x019C06|$06:$9BF6:A9 20    LDA #$20
0x019C08|$06:$9BF8:1D 2D 06 ORA $062D,X
0x019C0B|$06:$9BFB:9D 2D 06 STA $062D,X
; call to code in a different bank ($0F:$FA2E)
0x019C0E|$06:$9BFE:20 2E FA JSR $FA2E

; code -> data
; indirect data load target
0x019C11|$06:$9C01:4D

; data -> code
0x019C12|$06:$9C02:4C 48 95 JMP $9548

Version 2: this is the version after incorporating assumptions about the unknown bytes, subroutine/RAM address descriptions, and inline commenting from a second file. Still far from perfect, but becoming good enough to be useful.
Code: [Select]
; if Midenhall has equipped the Armour of Erdrick, chest is empty, otherwise it's a trap with 50/50 chance for the party leader losing half their current HP or getting poisoned
; control flow target (from $9B88)
0x019BB6|$06:$9BA6:A9 5B    LDA #$5B ; Item ID #$5B: Armor of Erdrick (equipped)
0x019BB8|$06:$9BA8:20 69 A3 JSR $A369 ; check for item A in party inventory, returning inventory index of item in A/X if found, #$FF if not
0x019BBB|$06:$9BAB:C9 08    CMP #$08 ; only Midenhall can equip the Armour of Erdrick, so inventory index >= #$08 means he does not have it equipped
0x019BBD|$06:$9BAD:B0 0B    BCS $9BBA ; if you don't have it or it isn't equipped, the treasure chest is a trap :(
; call to code in a different bank ($0F:$FA2E)
0x019BBF|$06:$9BAF:20 2E FA JSR $FA2E ; display string ID specified by next byte + #$0100

; code -> data
; indirect data load target
0x019BC2|$06:$9BB2:05 ; String ID #$0105: Seeing a treasure chest, [name] opened it.[wait][end-FC]

; data -> code
; call to code in a different bank ($0F:$FA2E)
0x019BC3|$06:$9BB3:20 2E FA JSR $FA2E ; display string ID specified by next byte + #$0100

; code -> data
; indirect data load target
0x019BC6|$06:$9BB6:4B ; String ID #$014B: But it was empty.[end-FC]

; data -> code
0x019BC7|$06:$9BB7:4C 48 95 JMP $9548

; control flow target (from $9BAD)
; call to code in a different bank ($0F:$FA2E)
0x019BCA|$06:$9BBA:20 2E FA JSR $FA2E ; display string ID specified by next byte + #$0100

; code -> data
; indirect data load target
0x019BCD|$06:$9BBD:4C ; String ID #$014C: The treasure chest was a trap![end-FC]

; data -> code
; call to code in a different bank ($0F:$C3AB)
0x019BCE|$06:$9BBE:20 AB C3 JSR $C3AB ; generate a random number and store it in $32-$33 (two passes)
0x019BD1|$06:$9BC1:A5 32    LDA $32 ; RNG byte 0
0x019BD3|$06:$9BC3:30 2D    BMI $9BF2 ; 50% chance to branch and get poisoned or not branch and lose half your current HP
0x019BD5|$06:$9BC5:20 4F 9D JSR $9D4F ; given hero ID in $97, set A to the offset of that hero's data in $062D
0x019BD8|$06:$9BC8:18      CLC
0x019BD9|$06:$9BC9:69 0E    ADC #$0E ; offset for hero's current HP, low byte
0x019BDB|$06:$9BCB:AA      TAX
0x019BDC|$06:$9BCC:BD 2D 06 LDA $062D,X ; Midenhall status (80 = Alive, 40 = Sleep, 20 = Poison, 10 = ?, 08 = ?, 04 = In Party, 02 = ?, 01 = Silence)
0x019BDF|$06:$9BCF:85 8F    STA $8F ; hero's current HP, low byte
0x019BE1|$06:$9BD1:BD 2E 06 LDA $062E,X ; Midenhall ???
0x019BE4|$06:$9BD4:85 90    STA $90 ; hero's current HP, high byte
0x019BE6|$06:$9BD6:46 90    LSR $90 ; divide 16-bit current HP by 2 (round down)
0x019BE8|$06:$9BD8:66 8F    ROR $8F
0x019BEA|$06:$9BDA:BD 2D 06 LDA $062D,X ; Midenhall status (80 = Alive, 40 = Sleep, 20 = Poison, 10 = ?, 08 = ?, 04 = In Party, 02 = ?, 01 = Silence)
0x019BED|$06:$9BDD:38      SEC ; set 16-bit current HP to 1/2 current HP, rounded up
0x019BEE|$06:$9BDE:E5 8F    SBC $8F
0x019BF0|$06:$9BE0:9D 2D 06 STA $062D,X ; Midenhall status (80 = Alive, 40 = Sleep, 20 = Poison, 10 = ?, 08 = ?, 04 = In Party, 02 = ?, 01 = Silence)
0x019BF3|$06:$9BE3:BD 2E 06 LDA $062E,X ; Midenhall ???
0x019BF6|$06:$9BE6:E5 90    SBC $90
0x019BF8|$06:$9BE8:9D 2E 06 STA $062E,X ; Midenhall ???
; call to code in a different bank ($0F:$FA2A)
0x019BFB|$06:$9BEB:20 2A FA JSR $FA2A ; display string ID specified by next byte

; code -> data
0x019BFE|$06:$9BEE:0C ; String ID #$000C: [name]'s HP is reduced by [number].[end-FC]

; data -> code
0x019BFF|$06:$9BEF:4C 48 95 JMP $9548

; control flow target (from $9BC3)
0x019C02|$06:$9BF2:20 4F 9D JSR $9D4F ; given hero ID in $97, set A to the offset of that hero's data in $062D
0x019C05|$06:$9BF5:AA      TAX
0x019C06|$06:$9BF6:A9 20    LDA #$20 ; Poison
0x019C08|$06:$9BF8:1D 2D 06 ORA $062D,X ; Midenhall status (80 = Alive, 40 = Sleep, 20 = Poison, 10 = ?, 08 = ?, 04 = In Party, 02 = ?, 01 = Silence)
0x019C0B|$06:$9BFB:9D 2D 06 STA $062D,X ; Midenhall status (80 = Alive, 40 = Sleep, 20 = Poison, 10 = ?, 08 = ?, 04 = In Party, 02 = ?, 01 = Silence)
; call to code in a different bank ($0F:$FA2E)
0x019C0E|$06:$9BFE:20 2E FA JSR $FA2E ; display string ID specified by next byte + #$0100

; code -> data
; indirect data load target
0x019C11|$06:$9C01:4D ; String ID #$014D: The poison weakened [name].[end-FC]

; data -> code
0x019C12|$06:$9C02:4C 48 95 JMP $9548