Finding texts and pointers table through Debugger

Started by apraxiumRum, March 08, 2018, 11:42:00 AM

Previous topic - Next topic

apraxiumRum

After skimming through ASM documents, I think I can start the slow crawl toward hacking SNES roms.
I am trying my hand at Geiger's Debugger, viewing the RAM as I am playing, but there is so much going on in such a fast pace!  :o
I still trying to figure out the character table for the rom I am researching, so I don't know if there is compression involved.
I know a bit on pointer tables and Little Endian format, so what I want to know is:
-How do you "isolate" which part of RAM responsible for calling what "text" to appear whenever you talk to NPCs so I can track it in the ROM from its address?
-How do you find the pointers table to control the length of the text?
I view the RAM through using Show Hex.

RedComet

Start with what you know: the text is on-screen which means the tiles for the font are loaded into VRAM and one or more tilemap contains references to those tiles.

First figure out where in VRAM the tilemap and tiles are stored. With bsnes (not sure about with Geiger's debugger), you can set write breakpoints on the VRAM addresses for the tilemap. From there it's just a matter of working backwards until you find where the text originated from in the ROM.

Always start with what you know and you can eventually work your way back to what you want to find.  :thumbsup:
Twilight Translations - More than just Dragonball Z. :P

Psyklax

I echo RedComet's comment, my hacking generally involves working backwards. There's a reason they call it "reverse engineering". :D

I've not done much on the SNES, but it's kinda like the NES with extra bells and whistles, so I can briefly mention the typical way I do that. On the NES you have a tilemap with one background layer, and text goes there. Since you know that the game put that letter on the screen somehow, you use the debugger to find out where that letter came from. Often on the NES it came from the CPU's RAM before being sent off to the PPU, so you then find how it got to that part of RAM. Sometimes it got there from the zero page of RAM, so you see how it got THERE, and eventually you find the part of the ROM that it came from.

I've never dealt with compression on the NES aside from simple dictionary compression, but generally the principle remains the same: you see the data in the VRAM, so you follow the trail backwards until you find the source. If it's compressed, it might take a bit of practice with assembly before you can figure it out, but that's the way to go.

Now I can't believe I ever did relative searches... :D

KingMike

If you are trying to find just the text on SNES, there's another hint with games which use kanji.
Often games with kanji will need to draw the textbox as a graphic (since the SNES doesn't have VRAM to store huge kanji fonts). Often that window will be drawn somewhere in RAM.

I would use ZSNES (pretty much only for this sort of thing) to make a savestate with text, then load it into WindHex (since WindHex recognizes ZSNES savestates and hides the header automatically), the first 0x20000 bytes that WindHex shows are RAM (7E:0000-7F:FFFF). Looking in the tile editor mode, see if the window text shows up in that area.)
If so, you can trace it.
(tiles are MUCH easier to trace when they are in RAM rather than in VRAM, since with RAM addresses you can just search for them in a trace log. Whereas to find VRAM access in a trace log, you would have to search for every case where $2116 (00:2116) is written, which is the VRAM word (VRAM address / 2) which will be written to by the following commands, usually using the DMA channels)
"My watch says 30 chickens" Google, 2018

apraxiumRum

Well...looks like luck decided to pay me a short visit.
Turned out the game has some "extremely rare" instances of using ASCII characters and numbers in the ROM for Dialog. Which prompted me to try the hard way of finding the characters value.
And after some painful hours of hexediting, talking to the same NPC, and some asking for help to identify some troublesome Kanji, it happened  :happy::

The Character Table is complete!!
Strange that the characters' values follow the same convention (ASCII=1 Byte, Shift-JIS=2 Bytes), but with different values from the actual Shift-JIS. Lots of Japanese characters were skipped.
Funny enough, the way the game handles its text is fairly reminiscent to the game I am still working on (same franchise, different game, different platform), the way it using same bytes to color the text, linebreaks, etc.
Now that I can "See" the text in the ROM and figure its Offset, I am now facing the bigger problem: figuring out the Pointer Table.
Adding more to my confusion, the game stores the names of NPCs, Main Characters, Monsters, Items, etc. in very separate "Groups". Even storing the NPC's "Title" (Like Emperor, King, etc) to call them using 2 Bytes as a variable .

The good news, the games uses different yet easy to figure out EndBytes. The bad news, I'll definitely need to change pointers to make the text coherent.
What should I do to find the Pointer Table?


Psyklax

Quote from: apraxiumRum on March 19, 2018, 11:24:19 AM
Strange that the characters' values follow the same convention (ASCII=1 Byte, Shift-JIS=2 Bytes), but with different values from the actual Shift-JIS. Lots of Japanese characters were skipped.

But of course: console games need to include the graphics for text inside the ROM itself, rather than on the machine it's played on (as with computers like the PC-98). So 8- and 16-bit games never use Shift-JIS, because there would be little point - you'd never include every single kanji in a ROM. Not only that, but since Shift-JIS uses two bytes per character without exception, it's not the most efficient system in terms of keeping the script small. One game I tried on the SNES had the kanji in four banks, and would have a two-byte control code to switch banks, then use one byte per character within that bank. Naturally, the kana, Roman alphabet, numbers and punctuation were in the first bank, which really saved space.

Quote from: apraxiumRum on March 19, 2018, 11:24:19 AM
The good news, the games uses different yet easy to figure out EndBytes. The bad news, I'll definitely need to change pointers to make the text coherent.
What should I do to find the Pointer Table?

Same method I mentioned before: reverse engineering! :) If you now know where the dialogue is, just set a read breakpoint for that general area (just in case the first important byte isn't the one you think it is) and see where the game gets the address from. So if a particular line is at $21F10, the game has to get that address from somewhere.

On some games with little dialogue, they may be what's called 'hardcoded' pointers. This means the pointer is just an instruction within the code: load the Accumulator with a number, store it in RAM, load the next byte, store that in RAM. Games with large amounts of text don't bother with that because it's very cumbersome, so they use pointer tables: one instruction can just say "add X to this address, load the Accumulator with the number stored there, and put THAT in RAM". That way each line of dialogue can simply be a number to load into the X register, for example.

Your game almost certainly uses the latter, so when the debugger breaks by reading the dialogue, what it's most likely doing is getting the address from somewhere in RAM, and that value in RAM is the address of the table. Just keep following the trail, but remember that the address you see may not be the one in the ROM, because the SNES uses banks.