News: 11 March 2016 - Forum Rules

Author Topic: Code Naturalizer  (Read 26026 times)

UglyJoe

  • Hero Member
  • *****
  • Posts: 869
  • smoke and mirrors
    • View Profile
    • ximwix.net/xb
Re: Code Naturalizer
« Reply #20 on: September 01, 2010, 07:56:28 pm »
UPDATE: I just tried loading the entire code page for the Bomb Sweeper ROM. Firefox began to choke. I only have 512M memory so that might have had something to do with it. Does anyone else have problems?

It does work for me, but the CPU usage spikes to 100%.  Taking a closer look with Firebug shows me that the processing done by naturalizeCode is very fast and not a problem.  The CPU usage cranks up once the scripts dumps the naturalized code into the textarea.  The reason for this, I suspect, is that the textarea form element is not really intended to hold 1.2 megabytes of text :P

A better solution is to fill a "pre" element with the output.  That is, add this to the bottom of the page:

Code: [Select]
<pre id="fillit"></pre>
And then, at the end of naturalizeCode, do this:

Code: [Select]
document.getElementById('fillit').innerHTML = output;
instead of this:

Code: [Select]
ReadoutBox.value = output;
« Last Edit: September 01, 2010, 08:01:58 pm by UglyJoe »

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #21 on: September 01, 2010, 08:24:10 pm »
Yeah I think you're right.

However, I think this complicates matters greatly, because now users have to scroll the entire page to read the code. Although there are work arounds, they aren't easily implemented.

UPDATE #1: I put the code at the bottom, like you suggested, UglyJoe.

UPDATE#2: OK I'm realizing a bunch of complications with the design. For one, the naturalizer isn't distinguishing between reads and writes when it labels addresses.

Second, I'm beginning to see the full implications of "mapper hell". Locations in ROM have precise offsets, but the program only refers to these in the context of pages. Some way I must simulate the mapper switcher.
« Last Edit: September 02, 2010, 07:09:16 am by tcaudilllg »

UglyJoe

  • Hero Member
  • *****
  • Posts: 869
  • smoke and mirrors
    • View Profile
    • ximwix.net/xb
Re: Code Naturalizer
« Reply #22 on: September 02, 2010, 08:26:44 pm »
However, I think this complicates matters greatly, because now users have to scroll the entire page to read the code. Although there are work arounds, they aren't easily implemented.

If I'm understanding you correctly, it's actually very easily implemented.  Just use CSS to make the pre element act more like a textarea.

Ditch the pre element at the bottom of the page.  Remove the textarea (ReadoutBox) altogether and, in its place, drop in this:

Code: [Select]
<pre style="width: 600px; height: 150px; overflow: auto; border: 1px solid black; float: left;" id="fillit"></pre>
That'll give you a scrolling pre element.  Tweak the height/width/border to your liking.

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #23 on: September 03, 2010, 04:52:53 am »
There's a complication, though. See I already put it at the bottom, and used the top space to make room for the documentation function. The user can load and save label/address correspondences by typing them in a textbox.

Ah I think I've got an idea. I'll use the display property to hide the boxes when not in use, and use a switch up top to switch between them.

UPDATE:
I've made a decision about the line-based user doc system. It won't be implemented... best only to document addresses that have a specific, stable purpose, the same being the responsibility of the user to determine. I may implement the line-based doc system as an extension of the non-line based system at some point in the future. There will obviously be a need for mappers, the implementation of which is my focus for the next version.

I aim to make a list of mappers to be tested an read for when the program loads. This file will be in the same directory as the program. If not present, it won't be used.

UglyJoe, more people should know about the power of CSS scrollbars.
« Last Edit: September 13, 2010, 02:59:42 pm by tcaudilllg »

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #24 on: October 07, 2010, 02:51:40 am »
Adding basic mapper support. The naturalizer can now determine which ROM address an indirect or absolute address is referring to. (only if the address is > 32,767 (0x7FFF)) Coming feature: naturalize by bank.

(The bank feature was inspired by Silenthal's GBRead).

User doc support is not perfect because it doesn't distinguish between writes and reads yet. (although I don't see how that could be a problem)

UPDATE:
I experimented with decompilation. Only two commands were involved, LDA and STA. LDA is omitted from the readout, and STA is replaced with an assignment operator.

The non-decompiled naturalization of BombSweeper's first 100 program bytes.
Code: [Select]
621|$655: Disable decimal mode
1622|$656: Load value at 0 into X-index
1624|$658: Move X-index to address 8192 / 2000 [Picture Processor Control 1]
1627|$65B: Move X-index to address 8193 / 2001 [Picture Processor Control 2]
1630|$65E: X-index - 1
1631|$65F: Set stack pointer address to Index-X
1632|$660: Load value at 8194 / 2002 [Picture Processor Status] into Accumulator
1635|$663: Jump to instruction at address 1384 if last operation returned positive
1637|$665: Load value at 8194 / 2002 [Picture Processor Status] into Accumulator
1640|$668: Jump to instruction at address 1389 if last operation returned positive
1642|$66A: Load value at 192 into Accumulator
1644|$66C: Move Accumulator to address 16407 / 4017 [Control Pad 2 / Expansion Slot]
1647|$66F: Call subroutine at address 51813 / ca65 [ROM address 51813, bank 0] (program counter address saved to stack)
1650|$672: Call subroutine at address 49472 / c140 [ROM address 49472, bank 0] (program counter address saved to stack)
1653|$675: Load value at 6 into Accumulator
1655|$677: Move Accumulator to address 3
1657|$679: Move Accumulator to address 8193 / 2001 [Picture Processor Control 2]
1660|$67C: Load value at 136 into Accumulator
1662|$67E: Move Accumulator to address 2
1664|$680: Move Accumulator to address 8192 / 2000 [Picture Processor Control 1]
1667|$683: Load value at 0 into Accumulator
1669|$685: Move Accumulator to address 5
1671|$687: Jump to instruction at address 50832 / c690 [ROM address 50832, bank 0]
1674|$68A: Call subroutine at address 51547 / c95b [ROM address 51547, bank 0] (program counter address saved to stack)
1677|$68D: Call subroutine at address 51868 / ca9c [ROM address 51868, bank 0] (program counter address saved to stack)
1680|$690: Call subroutine at address 51530 / c94a [ROM address 51530, bank 0] (program counter address saved to stack)
1683|$693: Jump to instruction at address 50826 / c68a [ROM address 50826, bank 0]
1686|$696: Move X-index to address 16
1688|$698: Move Y-index to address 17
1690|$69A: Shift the value in Accumulator left one bit
1691|$69B: Move Accumulator to X-index
1692|$69C: Load value at 56204 / db8c [ROM address 56204, bank 0] into Accumulator
1695|$69F: Move Accumulator to address 248
1697|$6A1: Load value at 56205 / db8d [ROM address 56205, bank 0] into Accumulator
1700|$6A4: Move Accumulator to address 249
1702|$6A6: Load value at 0 into Y-index
1704|$6A8: Load value at 248 into Accumulator
1706|$6AA: Y-index + 1
1707|$6AB: Move Accumulator to address 20
1709|$6AD: Load value at 19 into X-index
1711|$6AF: Load value at 248 into Accumulator
1713|$6B1: Y-index + 1
1714|$6B2: Set carry bit to 0
1715|$6B3: Add value at address 17 to Accumulator (+ 1 if carry set)
1717|$6B5: Move Accumulator to address 512
1720|$6B8: Load value at 248 into Accumulator

The decompiled version:
Code: [Select]
1621|$655: Disable decimal mode
1622|$656: Load value at 0 into X-index
1624|$658: Move X-index to address 8192 / 2000 [Picture Processor Control 1]
1627|$65B: Move X-index to address 8193 / 2001 [Picture Processor Control 2]
1630|$65E: X-index - 1
1631|$65F: Set stack pointer address to Index-X
1635|$663: Jump to instruction at address 1384 if last operation returned positive
1640|$668: Jump to instruction at address 1389 if last operation returned positive
1644|$66C: 16407 / 4017 [Control Pad 2 / Expansion Slot] = 192
1647|$66F: Call subroutine at address 51813 / ca65 [ROM address 51813, bank 0] (program counter address saved to stack)
1650|$672: Call subroutine at address 49472 / c140 [ROM address 49472, bank 0] (program counter address saved to stack)
1655|$677: 3 = 6
1657|$679: 8193 / 2001 [Picture Processor Control 2] = 6
1662|$67E: 2 = 136
1664|$680: 8192 / 2000 [Picture Processor Control 1] = 136
1669|$685: 5 = 0
1671|$687: Jump to instruction at address 50832 / c690 [ROM address 50832, bank 0]
1674|$68A: Call subroutine at address 51547 / c95b [ROM address 51547, bank 0] (program counter address saved to stack)
1677|$68D: Call subroutine at address 51868 / ca9c [ROM address 51868, bank 0] (program counter address saved to stack)
1680|$690: Call subroutine at address 51530 / c94a [ROM address 51530, bank 0] (program counter address saved to stack)
1683|$693: Jump to instruction at address 50826 / c68a [ROM address 50826, bank 0]
1686|$696: Move X-index to address 16
1688|$698: Move Y-index to address 17
1690|$69A: Shift the value in Accumulator left one bit
1691|$69B: Move Accumulator to X-index
1695|$69F: 248 = 56204 / db8c [ROM address 56204, bank 0]
1700|$6A4: 249 = 56205 / db8d [ROM address 56205, bank 0]
1702|$6A6: Load value at 0 into Y-index
1706|$6AA: Y-index + 1
1707|$6AB: 20 = 248
1709|$6AD: Load value at 19 into X-index
1713|$6B1: Y-index + 1
1714|$6B2: Set carry bit to 0
1715|$6B3: Add value at address 17 to Accumulator (+ 1 if carry set)
1717|$6B5: 512 = 248

Clearly there are some bugs to work out. The decompiler doesn't take into account the accumulator's addressing mode, leading to some confusion over what values are actually being used. It would probably be better to include the LDAs so as to prevent confusion when branching instructions read the accumulator.

I want to emphasize that the point of the decompiler is not to port the code, but to make it more readable and less intimidating.
« Last Edit: October 13, 2010, 09:24:09 pm by tcaudilllg »

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #25 on: October 14, 2010, 09:32:29 am »
I completed the decompiler (mostly, no memory model yet) and distinguished between addresses and immediate operands. As such, this tool may be ready for regular use (I would like your opinion on whether that is true or not). The mode distinguishing function also enabled me to clean up the naturalizer's output a little.

Here's the naturalized output for Bombsweeper:
Code: [Select]
1621|$655: Disable decimal mode
1622|$656: Load 0 into X-index
1624|$658: Move X-index to @8192 / 2000 [Picture Processor Control 1]
1627|$65B: Move X-index to @8193 / 2001 [Picture Processor Control 2]
1630|$65E: X-index - 1
1631|$65F: Set stack pointer address to Index-X
1632|$660: Load @8194 / 2002 [Picture Processor Status] into Accumulator
1635|$663: Jump to instruction at @1384 if last operation returned positive
1637|$665: Load @8194 / 2002 [Picture Processor Status] into Accumulator
1640|$668: Jump to instruction at @1389 if last operation returned positive
1642|$66A: Load 192 into Accumulator
1644|$66C: Move Accumulator to @16407 / 4017 [Control Pad 2 / Expansion Slot]
1647|$66F: Call subroutine at @51813 / ca65 [ROM address 51813, bank 0] (program counter address saved to stack)
1650|$672: Call subroutine at @49472 / c140 [ROM address 49472, bank 0] (program counter address saved to stack)
1653|$675: Load 6 into Accumulator
1655|$677: Move Accumulator to @3
1657|$679: Move Accumulator to @8193 / 2001 [Picture Processor Control 2]
1660|$67C: Load 136 into Accumulator
1662|$67E: Move Accumulator to @2
1664|$680: Move Accumulator to @8192 / 2000 [Picture Processor Control 1]
1667|$683: Load 0 into Accumulator
1669|$685: Move Accumulator to @5
1671|$687: Jump to instruction at @50832 / c690 [ROM address 50832, bank 0]
1674|$68A: Call subroutine at @51547 / c95b [ROM address 51547, bank 0] (program counter address saved to stack)
1677|$68D: Call subroutine at @51868 / ca9c [ROM address 51868, bank 0] (program counter address saved to stack)
1680|$690: Call subroutine at @51530 / c94a [ROM address 51530, bank 0] (program counter address saved to stack)
1683|$693: Jump to instruction at @50826 / c68a [ROM address 50826, bank 0]
1686|$696: Move X-index to @16
1688|$698: Move Y-index to @17
1690|$69A: Shift Accumulator left one bit
1691|$69B: Move Accumulator to X-index
1692|$69C: Load @56204 / db8c [ROM address 56204, bank 0] into Accumulator
1695|$69F: Move Accumulator to @248
1697|$6A1: Load @56205 / db8d [ROM address 56205, bank 0] into Accumulator
1700|$6A4: Move Accumulator to @249
1702|$6A6: Load 0 into Y-index
1704|$6A8: Load @248 into Accumulator
1706|$6AA: Y-index + 1
1707|$6AB: Move Accumulator to @20
1709|$6AD: Load @19 into X-index
1711|$6AF: Load @248 into Accumulator
1713|$6B1: Y-index + 1
1714|$6B2: Set carry bit to 0
1715|$6B3: Add @17 to Accumulator (+ 1 if carry set)
1717|$6B5: Move Accumulator to @512
1720|$6B8: Load @248 into Accumulator

The decompiled version:
Code: [Select]
1621|$655: Decimals OFF
1624|$658: @8192 / 2000 [Picture Processor Control 1] = X (0)
1627|$65B: @8193 / 2001 [Picture Processor Control 2] = X (0)
1630|$65E: X (0) - 1
1631|$65F: Set stack pointer address to Index-X
1635|$663: if Accumulator (@8194 / 2002 [Picture Processor Status]) >= 0 then goto @1384
1640|$668: if Accumulator (@8194 / 2002 [Picture Processor Status]) >= 0 then goto @1389
1644|$66C: @16407 / 4017 [Control Pad 2 / Expansion Slot] = Accumulator (192)
1647|$66F: Call @51813 / ca65 [ROM address 51813, bank 0]
1650|$672: Call @49472 / c140 [ROM address 49472, bank 0]
1655|$677: @3 = Accumulator (6)
1657|$679: @8193 / 2001 [Picture Processor Control 2] = Accumulator (6)
1662|$67E: @2 = Accumulator (136)
1664|$680: @8192 / 2000 [Picture Processor Control 1] = Accumulator (136)
1669|$685: @5 = Accumulator (0)
1671|$687: Goto @50832 / c690 [ROM address 50832, bank 0]
1674|$68A: Call @51547 / c95b [ROM address 51547, bank 0]
1677|$68D: Call @51868 / ca9c [ROM address 51868, bank 0]
1680|$690: Call @51530 / c94a [ROM address 51530, bank 0]
1683|$693: Goto @50826 / c68a [ROM address 50826, bank 0]
1686|$696: @16 = X (0)
1688|$698: @17 = Y (0)
1690|$69A: Shift Accumulator left one bit
1691|$69B: X = Accumulator (0)
1695|$69F: @248 = Accumulator (@56204 / db8c [ROM address 56204, bank 0])
1700|$6A4: @249 = Accumulator (@56205 / db8d [ROM address 56205, bank 0])
1706|$6AA: Y (0) + 1
1707|$6AB: @20 = Accumulator (@248)
1713|$6B1: Y (0) + 1
1714|$6B2: RESET Carry
1715|$6B3: Add @17 to Accumulator (+ 1 if carry set)
1717|$6B5: @512 = Accumulator (@248)

I removed output for all the loading instructions. Instead, the values of the registers are referenced after the registers in parentheses when operations involving them are invoked.

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #26 on: October 24, 2010, 09:19:55 pm »
So what are the opinions on this project?

Valendian

  • Jr. Member
  • **
  • Posts: 68
    • View Profile
Re: Code Naturalizer
« Reply #27 on: October 24, 2010, 10:10:50 pm »
So is this gonna be a tool that generates the type of comments an amatuer asm coder would write or will it actually be more of a decompiler?

If its gonna be a decompiler I'd suggest you forget about the natural language interface and just focus on decompilation. The ideal user of such a tool would be someone well versed in coding asm and some form of HLL. Cater to this persons needs, let the person who's only learning asm pick things up in their own good time.

The difference is a tool that looks at a push opcode and says "a value is being pushed on top of the stack" rather than a tool that analyzes further and finds that this is a local variable that is kept on the stack and it appears to be an unsigned int, or a pointer to a structure which is made up of 5 int's and 2 char's. It's this kind of feedback that is needed IMO.

http://www.backerstreet.com/decompiler/introduction.htm

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #28 on: October 24, 2010, 10:48:16 pm »
Well the naturalizer is pretty well finished. It uses most of the same routines as the decompiler so improvements to the decompiler relevant to the naturalizer are also improvements to the naturalizer.

I hadn't thought about figuring the types of the datums.

Honestly seeing code output by the naturalizer makes the program more understandable. I'd rather see the naturalized code than a simple disassembly; but I'd rather see the decompiled code to either. Seeing the naturalized version did assist in the development of the decompiler, however.

Right now I'm trying to figure a routine that distinguishes between writes and reads. (memory writes are, of course, the means by which mapper functions are invoked).


UPDATE: I finished the mapper routine by trashing most of the output from the descriptor attachment function when the instruction is a memory write (STA/Y/X). Wasteful, true, but irrelevant. Also removing the NES specific bits (register labels) from the main routine, so as to make the program capable of supporting configurations for other 6502 systems. This marks a move to a general 6502 naturalizer/decompiler.

I've not updated the program since the last (stable) update, so the version accessible from my site is the same as before.

The new version (retitled "6502 Naturalizing Decompiler"), uses a script-based plug-in format. Once I get the NES config completed, I'll move on to a PC-Engine configuration.
« Last Edit: October 25, 2010, 09:37:24 pm by tcaudilllg »

PolishedTurd

  • Full Member
  • ***
  • Posts: 214
    • View Profile
Re: Code Naturalizer
« Reply #29 on: October 27, 2010, 06:22:13 pm »
I just wanted to chime in and say I think the naturalizer is a very good idea. I've worked with assembly somewhat before, but I still find it helpful to have verbose clarification about what is happening, especially as things go into the wee hours and the likelihood of dumb errors on my part increases. If I'm stuck somewhere, it's just one more tool to help me get through. Can't turn that down. An offline version would be great, too.

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #30 on: October 31, 2010, 03:27:23 am »
Thank you PT. I'll have the next version of the naturalizer, which includes full RAM and mapper simulation, available soon.

I agree that the naturalizer is a good idea and I think it would be helpful to see them for other CPUs and systems. My chief reason for doing an NES naturalizer is to illustrate the method by which such a program would be designed. Maybe no one will use the naturalizer... but if its design is implemented in a disassembler then all the better.

For the next version, I'm going to try something new. I was thinking about the advantages of RAM simulation and realized that for purposes of reverse engineering, the ability to conduct the simulation is not necessarily as important as the ability to make the overall calculation processed through the operations as transparent as possible.

Consider:

LDA 4
ADC ACC, $3 (234)
SBC ACC, $10F5 (3)

which computed yeilds:

4 + 234 = 238 (assume carry zero)
238 - 3 = 235

Now let's imagine instead that we took the documentation approach.

Accumulator = 4
Accumulator = Accumulator [4] +  $3 (234)
Accumulator = Accumulator [4 + $3 (234)] - $10F5 (3)

[ Accumulator = 4 + 234 ($3) - 3 ($10F5) ]

And of course the two approaches could be mixed to offer both excellent simulation and documentation.

The most obvious application is the study of compression routines, however math formulas could be exposed as well and even identified.
« Last Edit: November 10, 2010, 12:16:16 pm by tcaudilllg »

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #31 on: February 03, 2011, 06:43:13 pm »
I've not worked on this for a while. I recently underwent a lifestyle change (working part-time + school) and with that happening my enthusiasm for this project has diminished. I'd like to work with someone to bring out the next version, but on the matter of the technical issues of the design I am nonplussed.

Vanya

  • Hero Member
  • *****
  • Posts: 1914
    • View Profile
Re: Code Naturalizer
« Reply #32 on: February 03, 2011, 11:09:32 pm »
I hope things work out for you. I would definitely like to have this tool for my remake projects.
It would be very useful to be able to study NES code without having to spend hours translating ASM.

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #33 on: February 09, 2011, 04:25:34 am »
Quote
Right now I'm trying to figure a routine that distinguishes between writes and reads. (memory writes are, of course, the means by which mapper functions are invoked).

This is where I had trouble.

tcaudilllg

  • Sr. Member
  • ****
  • Posts: 431
    • View Profile
Re: Code Naturalizer
« Reply #34 on: March 29, 2011, 11:14:21 am »
I submitted a zip of the naturalizer.