@Vehek Thanks! I'll check that out next, very useful.
-----
I figured out the loading of the background tiles into vram and learned a whole lot in the process. This is incredibly fun.
I have one thing I can't figure out yet:I found the raw data is loaded from $058c80 (bank 5) in the S-CPU Bus (and can see it in the memory viewer there) but the raw data in the ROM (file) starts at $028c80 (bank 2).
How are these addresses mapped?
So here's what I did (everything is way over annotated since this is my first real try):
Resources:
SNES registers:
http://baltimorebarcams.com/eb/snes/docs/65816/SNES%20Registers.htmlDMA example:
https://wiki.superfamicom.org/grog's-guide-to-dma-and-hdma-on-the-snesSNES ASM reference:
https://wiki.superfamicom.org/65816-reference#toc-1SNES info (used for additional register information, memory, etc):
http://patrickjohnston.org/ASM/ROM%20data/snestek.htmSNES ASM reference:
http://6502.org/tutorials/65c816opcodes.htmlLoaded "Final Fantasy - Mystic Quest (U) (V1.0) [!].smc" in bsnes-plus and entered the first real map, "Level Forest"
In the debugger I paused the game (with the "break" button), checked the Tile Viewer and the tiles are plainly visible, with background graphics starting at $0000

I opened the Memory Viewer and hit "export" which saved all the ram to files in the same folder as the rom. Then saved off the created files in a new folder so I could reference it later. "Final Fantasy - Mystic Quest (U) (V1.0) [!]-vram.bin" is the VRAM data with the tiles we're looking at.
Resumed the game, left the map, and paused again. At this point we can reenter the map by pressing A, so we're right before the loading of the graphics.
I set a Write breakpoint on $0 to $400 for "S-PPU VRAM". Some of the tiles are being updated for animation every few frames so I limited the range to $400 to not encompass those tiles, although we'll be breaking on $0.

I hit "run" and pressed A to enter the map, and the breakpoint triggered. The trace log was ~50MB
The last 15 lines are the DMA setup/kickoff subroutine. I used the resource documents above to figure out what it's doing.
; Start of routine to load from WRAM to VRAM through DMA (tiles for first map "Level Forest")
018435 ldx #$0000 A:0200 X:023c Y:2000 S:1fc8 D:0000 DB:01 nvMxdiZc V:221 H:156 F:55 ; --destination address in VRAM is $0000 (start at beginning of VRAM)
018438 stx $2116 [012116] A:0200 X:0000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdiZc V:221 H:162 F:55 ; set Video port address [VMADDL/VMADDH]
01843b lda #$80 A:0200 X:0000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdiZc V:221 H:171 F:55 ; --80 == 1000_0000, bit 7==1 so "Addr-inc after writing to $2119 or reading from $213A." and bits 0-4==0000 so "Address increments 1x1"
01843d sta $2115 [012115] A:0280 X:0000 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:175 F:55 ; set Video port control [VMAIN]
018440 ldx #$1801 A:0280 X:0000 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:183 F:55 ; --the 18 goes into 4301/bbadx which means VRAM, the 01 goes in 4300/DMAPx (01 means cpu to ppu, auto increment address, Transfer 2 bytes xx, xx+1 Low High)
018443 stx $4300 [014300] A:0280 X:1801 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:189 F:55 ; set [BBADX] (4301 to $18==vram) and [DMAPx] (4300 to $01)
018446 ldx #$d274 A:0280 X:1801 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:198 F:55 ; --put lower 16 bits of source address in A
018449 stx $4302 [014302] A:0280 X:d274 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:204 F:55 ; Set source offset to $d274
01844c lda #$7f A:0280 X:d274 Y:2000 S:1fc8 D:0000 DB:01 NvMxdizc V:221 H:213 F:55 ; --put upper 16 bits of source address in A
01844e sta $4304 [014304] A:027f X:d274 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:217 F:55 ; Set source bank to $7F
018451 ldx #$2000 A:027f X:d274 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:224 F:55 ; --we're going to load $2000 bytes
018454 stx $4305 [014305] A:027f X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:230 F:55 ; set dma transfer size of $2000 bytes
018457 lda #$01 A:027f X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:239 F:55 ; --put 1 in A so we can start channel 0
018459 sta $420b [01420b] A:0201 X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:243 F:55 ; Start DMA transfer on channel 0
01845c rts A:0201 X:2000 Y:2000 S:1fc8 D:0000 DB:01 nvMxdizc V:221 H:251 F:55 ; Return from sub
That RTS is where it hits the breakpoint because the DMA transfer is updating the VRAM. The source data is at $7fd274, and we're loading $2000 bytes.
Question: Why does "ldx #$1801" / "stx $4300" put $01 at $4300 and $18 at $4301? (Oh I think I see, little endian in memory, big endian in code [in asm I mean, in the rom I see the bytecode stores the value little endian as well]? so like when the source address is loaded, $4302 to $4304 are $74d27f which is the little endian version of the address $7fd274)
So I opened the Memory Viewer and exported again (and saved those files elsewhere for later). The WRAM at $7fd274 and the VRAM saved earlier matched, so that's confirmed.

Next up, in the game I left the map again to get ready to renter, and I set a breakpoint at $7fd274 for "S-CPU bus" so that we'll break when our uncompressed tile data is written to WRAM.
Looking through this code shows it starts copying data from $058c80 to $7fd274. It copys $10 bytes, then for the next $8 it writes the byte followed by zero($00), for a total of $20 bytes which is one 16x16 tile at 4bpp. And then repeat with the next tile and so on for all 100 tiles.
; Copy one tile to WRAM
; At this point the bank is set and Y contains the starting address of the source data (lower 16 bits)
; and $2181-$2183 are setup with the WRAM destination address, etc
; $18 bytes from source => $20 bytes destination
; copy the first $10 bytes then copy each of the next $8 bytes followed by a zero byte (so AABBCC... becomes AA00BB00CC00...)
01e90c phd ; save Direct Page Register to stack
01e90d phx ; save X to stack
01e90e pea $2100 ; push the value $2100 to the stack
01e911 pld ; pull that $2100 into the Direct Page Register
01e912 ldx #$0010 ; setup X as a counter starting at $10
; start of loop - copy first ten bytes {
01e915 lda $0000,y [058c80] ; load the source byte (address = bank:y + $0, so bank = 05 and y = 8c80 so address = 058c80)
; (range I am looking at: first = 058c80, last = 059577
01e918 iny ; Y += 1 (increment source address for next byte)
01e919 sta $80 [002180] ; write byte (lower A) to WRAM through WMDATA (destination address is auto incremented)
01e91b dex ; X -= 1 (decrement our counter)
01e91c bne $e915 [01e915] ; if X > 0, go back to top of loop
}
01e91e ldx #$0008 ; setup X as a counter starting at $8
; start of loop - copy last $8 bytes mixed with zeros (3 byte example: AABBCC => AA00BB00CC00) {
01e921 lda $0000,y [058c90] ; load the source byte (address = bank:y + $0, so bank = 05 and y = 8c90 so address = 058c90)
; (range I am looking at: first = 058c90, last = 05957f
01e924 iny ; Y += 1 (increment source address for next byte)
01e925 sta $80 [002180] ; write byte (lower A) to WRAM
01e927 stz $80 [002180] ; write $00 to WRAM
01e929 dex ; X -= 1 (decrement our counter)
01e92a bne $e921 [01e921] ; if X > 0, go back to top of loop
}
01e92c plx ; restore X from the stack
01e92d pld ; restore direct page from stack
01e92e rts ; exit routine
Tracing further back, and a lot more digging, triggering breakpoints, watching registers (like following the M7A M7B multiplication stuff into $2134), etc. I figured out the additional code that calls the copy tile routine:
; This is the function call that starts everything
019161 jsr $fd7b [01fd7b] A:ff01 X:0006 Y:d274 S:1fcb D:0000 DB:01 nvMxdizC V:116 H:178 F: 0
; Main function ------ Copy tile data to WRAM (for certain maps, like first map "Level Forest")
; Note that the M flag is set so A is 8bit
; set bank to 05
01fd7b phb A:ff01 X:0006 Y:d274 S:1fc9 D:0000 DB:01 nvMxdizC ; save bank to stack
01fd7c lda #$05 A:ff01 X:0006 Y:d274 S:1fc8 D:0000 DB:01 nvMxdizC ; set A to $5
01fd7e pha A:ff05 X:0006 Y:d274 S:1fc8 D:0000 DB:01 nvMxdizC ; push the $5 to the stack
01fd7f plb A:ff05 X:0006 Y:d274 S:1fc7 D:0000 DB:01 nvMxdizC ; set the bank to $5 from the stack
; setup destination address
01fd80 ldx #$d274 A:ff05 X:0006 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; set x to the lower 16 bits of the destination wram address
01fd83 stx $2181 [052181] A:ff05 X:d274 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; load X into $2181-2182, WMADDL and WMADDM
01fd86 lda #$7f A:ff05 X:d274 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; set x to the upper byte of the destination wram address
01fd88 sta $2183 [052183] A:ff7f X:d274 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; load X into $2183, WMADDH
01fd8b ldx #$0000 A:ff7f X:d274 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; setup X as a counter starting at $0000
; start of loop - copy $8 blocks of $20 tiles {
01fd8e lda $191a,x [05191a] A:ff7f X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; set A to the value in $05191a (lowram) (which in this case is $00), TODO: trace where this comes from
01fd91 bpl $fd9e [01fd9e] A:ff00 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; if A isn't negative (if n flag = 0), branch to next section
; opcodes from $01fd94 to $01fd9d are not in trace log (would run if A had been negative)
01fd9e xba A:ff00 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; swap the high and low bytes of A (save that $00 for later)
01fd9f stz $211b [05211b] A:00ff X:0000 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; clear $211b-211c (multiplication registers [M7A] and [M7B])
; mpy* = 01 00 00 ($000001)
01fda2 lda #$03 A:00ff X:0000 Y:d274 S:1fc8 D:0000 DB:05 NvMxdizC ; set A to $3
01fda4 sta $211b [05211b] A:0003 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; copy A to [M7A]
; mpy* = 00 03 00 ($000300)
01fda7 xba A:0003 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdizC ; swap the high and low bytes of A (bring back the $00 from earlier)
01fda8 sta $211c [05211c] A:0300 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; copy A to [M7B]
; mpy* = 00 00 00 ($000000)
01fdab rep #$20 A:0300 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvMxdiZC ; clear the M flag, so A is now 16bit
01fdad lda #$8c80 A:0300 X:0000 Y:d274 S:1fc8 D:0000 DB:05 nvmxdiZC ; set A to $8c80
01fdb0 clc A:8c80 X:0000 Y:d274 S:1fc8 D:0000 DB:05 NvmxdizC ; clear the carry flag
01fdb1 adc $2134 [052134] A:8c80 X:0000 Y:d274 S:1fc8 D:0000 DB:05 Nvmxdizc ; add the multiplication result at $2134-2135 ([MPYL] and [MPYM]) to A
; so mpy* is a source address offset
01fdb4 tay A:8c80 X:0000 Y:d274 S:1fc8 D:0000 DB:05 Nvmxdizc ; copy A to Y
01fdb5 sep #$20 A:8c80 X:0000 Y:8c80 S:1fc8 D:0000 DB:05 Nvmxdizc ; set the M flag, so A is now 8bit
01fdb7 phx A:8c80 X:0000 Y:8c80 S:1fc8 D:0000 DB:05 NvMxdizc ; push X to the stack (save our loop counter)
01fdb8 ldx #$0020 A:8c80 X:0000 Y:8c80 S:1fc6 D:0000 DB:05 NvMxdizc ; setup X as a counter starting at $20
; start of loop - copy $20 tiles to WRAM {
01fdbb jsr $e90c [01e90c] A:8c80 X:0020 Y:8c80 S:1fc6 D:0000 DB:05 nvMxdizc ; jump to the "Copy one tile to WRAM" routine
01fdbe dex A:8c00 X:0020 Y:8c98 S:1fc6 D:0000 DB:05 nvMxdiZc ; X -= 1 (decrement our counter)
01fdbf bne $fdbb [01fdbb] A:8c00 X:001f Y:8c98 S:1fc6 D:0000 DB:05 nvMxdizc ; if X > 0, go back to top of loop
}
01fdc1 plx A:8c00 X:0000 Y:8f80 S:1fc6 D:0000 DB:05 nvMxdiZc ; pull our saved counter from the stack into X
01fdc2 inx A:8c00 X:0000 Y:8f80 S:1fc8 D:0000 DB:05 nvMxdiZc ; X += 1 (increment our counter)
01fdc3 cpx #$0008 A:8c00 X:0001 Y:8f80 S:1fc8 D:0000 DB:05 nvMxdizc ; compare X to $0008
01fdc6 bne $fd8e [01fd8e] A:8c00 X:0001 Y:8f80 S:1fc8 D:0000 DB:05 NvMxdizc ; if x != 8, go back to top of loop
}
The routine continues further to load other graphics into WRAM, but this is the end of the bg tiles we were looking at.
However, the data is not in the ROM (file) at $058c80 but rather at $028c80 (found it by searching in HxD). While in the memory viewer it is the other way around, with other random stuff at $028c80 and our data is at $058c80.
So again, one last question: Why is the data in the ROM file at $028c80 while the code is reading from $058c80?
Thanks everybody!