Tracing where graphics come from in SNES debugger

Started by Ansarya, June 03, 2019, 10:22:50 PM

Previous topic - Next topic

Ansarya

I know I saw (not long ago) a thread where someone walked through how to trace backwards through a debugger to find out where graphics were in the ROM for an SNES game. I am trying to start learning SNES debugging and would like to know how to use the tools to accomplish this sort of thing.

But, I can't seem to find a guide of this type anywhere and google isn't helping me find the posts I saw.

Could someone point me to the thread or a good tutorial on the subject?

Thank you!

chillyfeez

Are you trying to find the actual graphic tiles? If so, it's usually much easier to open the ROM in a graphics editor and scroll through until you see what looks like images (or pieces of images).
That may not serve to familiarize you with tracing in a debugger, but tracing for graphical data isn't really entry-level stuff either. A lot of complicated stuff goes into displaying an image, and often the SNES does several of the steps concurrently, so it can be difficult to pick out what's relevant.
Ongoing project: "Final Fantasy IV: A Threat From Within"

Latest Demo

Psyklax

On the SNES, it's often not helpful to open the ROM in a graphics editor due to how often compression is used. The SNES is much more suited to graphics compression than its predecessor as the extra RAM, speed, and DMA ability makes it a logical option. The NES usually had to just leave everything uncompressed, unless the cartridge had its own RAM.

Anyway, the SNES. What usually happens is the graphics data is read in compressed form from the ROM, assembled byte-by-byte in RAM into its uncompressed form, then blasted via DMA (direct memory access) to the VRAM. What we want to do is reverse the process. I use bsnes-plus and its debugger.

Get to a point where the graphics you want are on screen, and export the memory in the memory editor. The .vram file is the contents of the VRAM at that moment, and is viewable in an editor like Tile Molester, or a regular hex editor. The latter is important because we'll want to compare it to the ROM.

So as I said, usually the game goes from the RAM to the VRAM, so we want to know where in RAM the graphics are decompressed. What I'd do is set a breakpoint for the address in VRAM that you're interested in and find the moment that it's written to. Go a second or so before it and do the "write log" or whatever option in the debugger (don't remember the name, going from memory). Make sure this is literally a second before, because the text file log is BIG. But the breakpoint will hit, so you can then turn off the log.

The log file could be 50MB or more potentially, but it's no big deal since Textpad (which I use) opens massive files without issue. This is where it starts to get technical and I'll have to take a break from my explanation. If you're unfamiliar with assembly, this'll be tough. Basically, we need to look at the registers for DMA to see where the VRAM is being loaded from, and going back, see where the RAM is being loaded from.

Hopefully you'll find the address in ROM where the compressed graphics are, and by comparing the ROM data with the resulting VRAM data, and going step by step through the decompression routine in the log... you'll have your answer.

If this sounds tricky to you, I don't blame you, but hey, us ROM hackers like a challenge. ;)

chillyfeez

Granted, I've only hacked a handful of games on SNES and several of them are Final Fantasy-something, and I'm aware that graphics compression is a common roadblock, but I have never hacked a game that used compressed graphics. So a relavent question might be, which game?
Ongoing project: "Final Fantasy IV: A Threat From Within"

Latest Demo

phonymike

Hello, would you perhaps be talking about my post here about Mystical Ninja, or this one about Super Formation Soccer? I like Psyklax's description of how graphics are stored uncompressed in rom, or decompressed from rom into work ram, and then "blasted via DMA" into VRAM, that's a great way to put it.

I have had great luck in simply bypassing compressed graphics by inserting the uncompressed data at the end of the rom, then inserting an assembly jump to DMA the graphics into ram myself, instead of the game decompressing then DMAing them to vram. This requires a log to see when the game jumps to the decompression routine, so you can jump to your own. I get the uncompressed data by letting the game decompress the data to work ram, then exporting the data before it copies it to vram. I use the snes9x debugger or Vsnes (using a savestate). The Super Formation Soccer is an example with some assembly code as well. With that I didn't use DMA, but looping assembly code to copy byte by byte. DMA is way faster but hey I'm still learning too.

Oh, and you can go and reverse engineer the decompression routine. That's not fun to me, but if you'd like to see the results I did the NBA Jam TE graphics decompression routine. I wrote a little program to decompress the data from rom and it doesn't even work on 100% of the graphics. Maybe like 26 out of 28 graphics it'll work lol.

Ansarya

Thanks guys!

I am currently using Final Fantasy Mystic Quest as my learning game.

My goal isn't to just find the tiles and edit them, but rather to learn how to trace the data back to its origin and understand the code that moved/transformed it into its current state. I figured I'd start with tile data and then trace back where the tilemap data comes from then move onto how monsters and chest and doors work on the map (down the line, after a lot of practice).

I did poke around with Tile Molester but it seemed even the 2bpp basic text tiles have odd things between sections so they're not lined up neatly like nes char data (starting addresses aren't 16 byte aligned).

Last night I started tracing right before entering a town and stopped right after and have been poking around. I'll follow your helpful steps now, Psyklax, thanks.

I've been playing around with all this stuff since like '98~2000 (back in the Whirlpool and Dejap days) but I haven't gone all in and I'm so ready to do so, I need a new programming project. I haven't used assembly since the early gba days (lots of fun switching between ARM and THUMB! good times) but I've read a lot of docs and used a lot of utilities over the years and have been programming since I was six so I think i can figure this out, with a little help from my friends :)

I'm gonna have a million more questions and thank you again and here I go, lets see if I can backtrace some DMA transfers.

---------

Thanks, phonymike! I'm not sure if those posts are what I saw but they're what I'm looking for, a quick walkthrough to solve a task. I'll try recreating the scenario and then applying it to FFMQ.

Vehek

So, on Geiger's snes9x debugger. Under the "special tracing" section, there's the option "DMA". When active, it shows every DMA write: which memory address it came from, and where it was written to. So, toggle it, and look for VRAM writes to the rough location of the graphics in VRAM. Keep in mind that the VRAM addresses it lists are given in words (two-byte entries rather than single-byte), so you'll have to divide whatever VRAM addresses you got in another tool by 2.

Ansarya

#7
@Vehek Thanks! I'll check that out next, very useful.
-----

I figured out the loading of the background tiles into vram and learned a whole lot in the process. This is incredibly fun.

I have one thing I can't figure out yet:
I found the raw data is loaded from $058c80 (bank 5) in the S-CPU Bus (and can see it in the memory viewer there) but the raw data in the ROM (file) starts at $028c80 (bank 2).
How are these addresses mapped?

So here's what I did (everything is way over annotated since this is my first real try):

Resources:
SNES registers: http://baltimorebarcams.com/eb/snes/docs/65816/SNES%20Registers.html
DMA example: https://wiki.superfamicom.org/grog's-guide-to-dma-and-hdma-on-the-snes
SNES ASM reference: https://wiki.superfamicom.org/65816-reference#toc-1
SNES info (used for additional register information, memory, etc): http://patrickjohnston.org/ASM/ROM%20data/snestek.htm
SNES ASM reference: http://6502.org/tutorials/65c816opcodes.html

Loaded "Final Fantasy - Mystic Quest (U) (V1.0) [!].smc" in bsnes-plus and entered the first real map, "Level Forest"

In the debugger I paused the game (with the "break" button), checked the Tile Viewer and the tiles are plainly visible, with background graphics starting at $0000
"tile viewer image"

I opened the Memory Viewer and hit "export" which saved all the ram to files in the same folder as the rom.  Then saved off the created files in a new folder so I could reference it later. "Final Fantasy - Mystic Quest (U) (V1.0) [!]-vram.bin" is the VRAM data with the tiles we're looking at.

Resumed the game, left the map, and paused again. At this point we can reenter the map by pressing A, so we're right before the loading of the graphics.

I set a Write breakpoint on $0 to $400 for "S-PPU VRAM". Some of the tiles are being updated for animation every few frames so I limited the range to $400 to not encompass those tiles, although we'll be breaking on $0.
"breakpoint image"

I hit "run" and pressed A to enter the map, and the breakpoint triggered. The trace log was ~50MB

The last 15 lines are the DMA setup/kickoff subroutine. I used the resource documents above to figure out what it's doing.


; Start of  routine to load from WRAM to VRAM through DMA (tiles for first map "Level Forest")
018435 ldx #$0000             A:0200 X:023c Y:2000 S:1fc8 D:0000 DB:01 nvMxdiZc V:221 H:156 F:55 ; --destination address in VRAM is $0000 (start at beginning of VRAM)
018438 stx $2116     [012116] A:0200 X:0000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdiZc V:221 H:162 F:55 ; set Video port address [VMADDL/VMADDH]
01843b lda #$80               A:0200 X:0000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdiZc V:221 H:171 F:55 ; --80 == 1000_0000, bit 7==1 so "Addr-inc after writing to $2119 or reading from $213A." and bits 0-4==0000 so "Address increments 1x1"
01843d sta $2115     [012115] A:0280 X:0000 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:175 F:55 ; set Video port control [VMAIN]
018440 ldx #$1801             A:0280 X:0000 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:183 F:55 ; --the 18 goes into 4301/bbadx which means VRAM, the 01 goes in 4300/DMAPx (01 means cpu to ppu, auto increment address, Transfer 2 bytes xx, xx+1 Low High)
018443 stx $4300     [014300] A:0280 X:1801 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:189 F:55 ; set [BBADX] (4301 to $18==vram) and [DMAPx] (4300 to $01)
018446 ldx #$d274             A:0280 X:1801 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:198 F:55 ; --put lower 16 bits of source address in A
018449 stx $4302     [014302] A:0280 X:d274 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:204 F:55 ; Set source offset to $d274
01844c lda #$7f               A:0280 X:d274 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:213 F:55 ; --put upper 16 bits of source address in A
01844e sta $4304     [014304] A:027f X:d274 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:217 F:55 ; Set source bank to $7F
018451 ldx #$2000             A:027f X:d274 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:224 F:55 ; --we're going to load $2000 bytes
018454 stx $4305     [014305] A:027f X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:230 F:55 ; set dma transfer size of $2000 bytes
018457 lda #$01               A:027f X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:239 F:55 ; --put 1 in A so we can start channel 0
018459 sta $420b     [01420b] A:0201 X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:243 F:55 ; Start DMA transfer on channel 0
01845c rts                    A:0201 X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:251 F:55 ; Return from sub



That RTS is where it hits the breakpoint because the DMA transfer is updating the VRAM. The source data is at $7fd274, and we're loading $2000 bytes.

Question: Why does "ldx #$1801" / "stx $4300" put $01 at $4300 and $18 at $4301? (Oh I think I see, little endian in memory, big endian in code [in asm I mean, in the rom I see the bytecode stores the value little endian as well]? so like when the source address is loaded, $4302 to $4304 are $74d27f which is the little endian version of the address $7fd274)

So I opened the Memory Viewer and exported again (and saved those files elsewhere for later). The WRAM at $7fd274 and the VRAM saved earlier matched, so that's confirmed.
"wram/vram matching image"

Next up, in the game I left the map again to get ready to renter, and I set a breakpoint at $7fd274 for "S-CPU bus" so that we'll break when our uncompressed tile data is written to WRAM.

Looking through this code shows it starts copying data from $058c80 to $7fd274. It copys $10 bytes, then for the next $8 it writes the byte followed by zero($00), for a total of $20 bytes which is one 16x16 tile at 4bpp. And then repeat with the next tile and so on for all 100 tiles.


; Copy one tile to WRAM
; At this point the bank is set and Y contains the starting address of the source data (lower 16 bits)
; and $2181-$2183 are setup with the WRAM destination address, etc
; $18 bytes from source => $20 bytes destination
; copy the first $10 bytes then copy each of the next $8 bytes followed by a zero byte (so AABBCC... becomes AA00BB00CC00...)
01e90c phd                  ; save Direct Page Register to stack
01e90d phx                  ; save X to stack
01e90e pea $2100            ; push the value $2100 to the stack
01e911 pld                  ; pull that $2100 into the Direct Page Register
01e912 ldx #$0010            ; setup X as a counter starting at $10
; start of loop - copy first ten bytes {
    01e915 lda $0000,y   [058c80] ; load the source byte (address = bank:y + $0, so bank = 05 and y = 8c80 so address = 058c80)
                                  ; (range I am looking at: first = 058c80, last = 059577
    01e918 iny                    ; Y += 1 (increment source address for next byte)
    01e919 sta $80       [002180] ; write byte (lower A) to WRAM through WMDATA (destination address is auto incremented)
    01e91b dex                  ; X -= 1 (decrement our counter)
    01e91c bne $e915     [01e915] ; if X > 0, go back to top of loop
}
01e91e ldx #$0008               ; setup X as a counter starting at $8
; start of loop - copy last $8 bytes mixed with zeros (3 byte example: AABBCC => AA00BB00CC00) {
    01e921 lda $0000,y   [058c90] ; load the source byte (address = bank:y + $0, so bank = 05 and y = 8c90 so address = 058c90)
                                  ; (range I am looking at: first = 058c90, last = 05957f
    01e924 iny                    ; Y += 1 (increment source address for next byte)
    01e925 sta $80       [002180] ; write byte (lower A) to WRAM
    01e927 stz $80       [002180] ; write $00 to WRAM
    01e929 dex                    ; X -= 1 (decrement our counter)
    01e92a bne $e921     [01e921] ; if X > 0, go back to top of loop
}
01e92c plx                        ; restore X from the stack
01e92d pld                        ; restore direct page from stack
01e92e rts                        ; exit routine


Tracing further back, and a lot more digging, triggering breakpoints, watching registers (like following the M7A M7B multiplication stuff into $2134), etc. I figured out the additional code that calls the copy tile routine:


; This is the function call that starts everything
019161 jsr $fd7b     [01fd7b] A:ff01 X:0006 Y:d274 S:1fcb D:0000 DB:01 nvMxdizC V:116 H:178 F: 0

; Main function ------ Copy tile data to WRAM (for certain maps, like first map "Level Forest")
; Note that the M flag is set so A is 8bit
                                                                                ; set bank to 05
01fd7b phb                    A:ff01 X:0006 Y:d274 S:1fc9 D:0000 DB:01 nvMxdizC ; save bank to stack
01fd7c lda #$05               A:ff01 X:0006 Y:d274 S:1fc8 D:0000 DB:01 nvMxdizC ; set A to $5
01fd7e pha                    A:ff05 X:0006 Y:d274 S:1fc8 D:0000 DB:01 nvMxdizC ; push the $5 to the stack
01fd7f plb                    A:ff05 X:0006 Y:d274 S:1fc7 D:0000 DB:01 nvMxdizC ; set the bank to $5 from the stack
                                                                                ; setup destination address
01fd80 ldx #$d274             A:ff05 X:0006 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; set x to the lower 16 bits of the destination wram address
01fd83 stx $2181     [052181] A:ff05 X:d274 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; load X into $2181-2182, WMADDL and WMADDM
01fd86 lda #$7f               A:ff05 X:d274 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; set x to the upper byte of the destination wram address
01fd88 sta $2183     [052183] A:ff7f X:d274 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; load X into $2183, WMADDH
01fd8b ldx #$0000             A:ff7f X:d274 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; setup X as a counter starting at $0000
; start of loop - copy $8 blocks of $20 tiles {
    01fd8e lda $191a,x   [05191a] A:ff7f X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; set A to the value in $05191a (lowram) (which in this case is $00), TODO: trace where this comes from
    01fd91 bpl $fd9e     [01fd9e] A:ff00 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; if A isn't negative (if n flag = 0), branch to next section
   
    ; opcodes from $01fd94 to $01fd9d are not in trace log (would run if A had been negative)
   
    01fd9e xba                    A:ff00 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; swap the high and low bytes of A (save that $00 for later)
    01fd9f stz $211b     [05211b] A:00ff X:0000 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; clear $211b-211c (multiplication registers [M7A] and [M7B])
                                                                                    ; mpy* = 01 00 00 ($000001)
    01fda2 lda #$03               A:00ff X:0000 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; set A to $3
    01fda4 sta $211b     [05211b] A:0003 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; copy A to [M7A]
                                                                                    ; mpy* = 00 03 00 ($000300)
    01fda7 xba                    A:0003 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; swap the high and low bytes of A (bring back the $00 from earlier)
    01fda8 sta $211c     [05211c] A:0300 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; copy A to [M7B]
                                                                                    ; mpy* = 00 00 00 ($000000)
    01fdab rep #$20               A:0300 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; clear the M flag, so A is now 16bit
    01fdad lda #$8c80             A:0300 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvmxdiZC ; set A to $8c80
    01fdb0 clc                    A:8c80 X:0000 Y:d274 S:1fc8 D:0000 DB:05 NvmxdizC ; clear the carry flag
    01fdb1 adc $2134     [052134] A:8c80 X:0000 Y:d274 S:1fc8 D:0000 DB:05 Nvmxdizc ; add the multiplication result at $2134-2135 ([MPYL] and [MPYM]) to A
                                                                                    ; so mpy* is a source address offset
    01fdb4 tay                    A:8c80 X:0000 Y:d274 S:1fc8 D:0000 DB:05 Nvmxdizc ; copy A to Y
    01fdb5 sep #$20               A:8c80 X:0000 Y:8c80 S:1fc8 D:0000 DB:05 Nvmxdizc ; set the M flag, so A is now 8bit
    01fdb7 phx                    A:8c80 X:0000 Y:8c80 S:1fc8 D:0000 DB:05 NvMxdizc ; push X to the stack (save our loop counter)
    01fdb8 ldx #$0020             A:8c80 X:0000 Y:8c80 S:1fc6 D:0000 DB:05 NvMxdizc ; setup X as a counter starting at $20
    ; start of loop - copy $20 tiles to WRAM {
        01fdbb jsr $e90c     [01e90c] A:8c80 X:0020 Y:8c80 S:1fc6 D:0000 DB:05 nvMxdizc ; jump to the "Copy one tile to WRAM" routine

        01fdbe dex                    A:8c00 X:0020 Y:8c98 S:1fc6 D:0000 DB:05 nvMxdiZc ; X -= 1 (decrement our counter)
        01fdbf bne $fdbb     [01fdbb] A:8c00 X:001f Y:8c98 S:1fc6 D:0000 DB:05 nvMxdizc ; if X > 0, go back to top of loop
    }
    01fdc1 plx                    A:8c00 X:0000 Y:8f80 S:1fc6 D:0000 DB:05 nvMxdiZc ; pull our saved counter from the stack into X
    01fdc2 inx                    A:8c00 X:0000 Y:8f80 S:1fc8 D:0000 DB:05 nvMxdiZc ; X += 1 (increment our counter)
    01fdc3 cpx #$0008             A:8c00 X:0001 Y:8f80 S:1fc8 D:0000 DB:05 nvMxdizc ; compare X to $0008
    01fdc6 bne $fd8e     [01fd8e] A:8c00 X:0001 Y:8f80 S:1fc8 D:0000 DB:05 NvMxdizc ; if x != 8, go back to top of loop
}


The routine continues further to load other graphics into WRAM, but this is the end of the bg tiles we were looking at.

However, the data is not in the ROM (file) at $058c80 but rather at $028c80 (found it by searching in HxD). While in the memory viewer it is the other way around, with other random stuff at $028c80 and our data is at $058c80.

"data in the rom in hex editor"

So again, one last question: Why is the data in the ROM file at $028c80 while the code is reading from $058c80?

Thanks everybody!

mziab

Quote from: Ansarya on June 07, 2019, 12:30:46 AM
I found the raw data is loaded from $058c80 (bank 5) in the S-CPU Bus (and can see it in the memory viewer there) but the raw data in the ROM (file) starts at $028c80 (bank 2).
How are these addresses mapped?

028c80 is the ROM address, i.e. the offset inside the file, it's not what the CPU uses internally. Simplifying a bit, there are two main type of mappings: LoROM and HiROM. Mystic Quest uses the former. I recommend you take a look at Lunar Address, which allows for easy conversion both ways and will even detect which mapping needs to be used.

Quote from: Ansarya on June 07, 2019, 12:30:46 AM
Why does "ldx #$1801" / "stx $4300" put $01 at $4300 and $18 at $4301? (Oh I think I see, little endian in memory, big endian in code [in asm I mean, in the rom I see the bytecode stores the value little endian as well]? so like when the source address is loaded, $4302 to $4304 are $74d27f which is the little endian version of the address $7fd274)

Yes, it's because of the SNES cpu being little-endian. Any 16-bit or 24-bit value/address in RAM or ROM (opcode operands, pointers, data) will be stored this way. Trace logs and assembly code you write uses big-endian to make it easier for us humans, but the internal representation is always little-endian. It's just as you deduced.