News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Understanding NES text routines  (Read 2848 times)

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Understanding NES text routines
« on: February 20, 2019, 11:12:05 pm »
Hey there,

I know I recently translated DBZ II for the Famicom, and while I do know some stuff about computer science, etc., I'm having trouble with rom hacking a lot of games.

If the text is stored like DBZ II, where it's mostly character for character in the rom and there are few dictionary entries, I don't have much trouble. But I don't even know where to begin or how to understand what's going on if I can't find a string of characters like the basic tutorials say.

I'm assuming that means the text is somehow compressed. Is DTE really common on the NES or are there other things for me to look out for? How do I know which one I'm dealing with and from there, how do I go about editing the in-game text?

Thanks for your help with this noob question.

Psyklax

  • Hero Member
  • *****
  • Posts: 956
    • View Profile
    • Psyklax Translations
Re: Understanding NES text routines
« Reply #1 on: February 21, 2019, 04:55:51 am »
Having translated a bunch of Famicom games, I think I'm well placed to answer this. Some games do simply have basic uncompressed text strings, but due to the lack of storage space, any game with more than a few sentences is likely to use compression.

I've worked on Time Stranger, which uses dictionary compression. The US localisation of Dragon Warrior uses a kind of DTE. So it's quite common on the 8-bit machines, less so on the 16-bit ones.

The problem is that I feel the newbie guides have got it wrong: relative searching is just fumbling around in the dark. People ignore the much more effective option, which is learning how assembly works and properly reverse engineering the text.

See, you start with what you know - the RAM, what's on the screen. After all, RAM ain't compressed: the text on the screen is stored in VRAM and it got there somehow. Your task is to trace its path from the VRAM back to the part of the ROM where it originated. To do this, you don't need to be a programmer, you just need to know a few principles about how assembly works, and use the tools in an emulator such as FCEUX, like the debugger and memory editor. With a bit of perseverance, you can crack any game.

In some cases it's ludicrously easy: the game just reads a byte from the ROM and sends it directly to the VRAM. In many cases, though, the text is assembled in RAM first, then transferred to VRAM at an appropriate time. Often the reason for this is because it needs to be decompressed into work RAM first. If this is the case, then understanding the compression routine can sometimes be straightforward, sometimes be a headache.

It may sound daunting, yet I can't even program in BASIC let alone assembly, and I've cracked everything I've attempted. In a Game Gear translation, I even reverse engineered a graphics compression scheme, decompressed the graphics, changed them, and recompressed them with a rewritten routine, all without being a programmer. So if I can, you can. ;)

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #2 on: February 21, 2019, 01:32:48 pm »
Thank you very much, Psyklax. I was looking for some direction/motivation and you gave both.

I'll try again on some stuff and see what I can figure out.

Detective Conan is awesome, by the way.

Pennywise

  • Hero Member
  • *****
  • Posts: 2229
  • I'm curious
    • View Profile
    • Yojimbo's Translations
Re: Understanding NES text routines
« Reply #3 on: February 21, 2019, 02:39:14 pm »
Basically you find the location where the text is being written on-screen, set a write breakpoint on PPU memory of that location and trace your way backwards. The process typically involves screen -> RAM -> ROM.

You can find the screen offset with the nametable viewer in FCEUX or the PPU viewer in MESEN.

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #4 on: February 21, 2019, 03:13:29 pm »
I used this* last night and was able to get to the point where the CPU/RAM was snapping during writing text, but if I picked a specific value, it was snapping 1 tile before or after (can't remember which, sorry), and then it was hard to find an instruction like:
Code: [Select]
LDA ($40), Y @0123 = #$A0
*http://archive.rpgclassics.com/subsites/twit/docs/text/

Psyklax

  • Hero Member
  • *****
  • Posts: 956
    • View Profile
    • Psyklax Translations
Re: Understanding NES text routines
« Reply #5 on: February 21, 2019, 05:24:35 pm »
I used this* last night and was able to get to the point where the CPU/RAM was snapping during writing text, but if I picked a specific value, it was snapping 1 tile before or after

Not sure I understand the problem you're having. The document describes the process pretty well, it's the method I'd use. I would say that you shouldn't bother with specifying exact values: the time taken writing it is time you could spend just doing it and getting the right breakpoint. If you use save states to reach the moment before the breakpoint, you won't need to be so specific. I never use conditions.

All in all, though, the doc works for me (though I'm just reading rather than trying it myself).

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #6 on: February 21, 2019, 05:48:36 pm »
Not sure I understand the problem you're having.



Like in this photo, the PPU's snapped before the first text tile is written, since the breakpoint is any write to that particular space on the PPU (first text tile in the window, in this case). I know it's going to be A0, but I don't see that anywhere. But if I have it snap at the tile after that, I see A0 one of the times it breaks.



Interestingly, neither character's been drawn yet, but breaking at the spot where the 2nd character would go shows me an A0 that I was anticipating.

KingMike

  • Forum Moderator
  • Hero Member
  • *****
  • Posts: 6813
  • *sigh* A changed avatar. Big deal.
    • View Profile
Re: Understanding NES text routines
« Reply #7 on: February 21, 2019, 06:54:53 pm »
This is definitely the harder way to go about finding text.

But from what I gather of that code, stored at RAM $021C (the LDX $0200,X @ $021C part) is the string length, $21E-21F is the VRAM address data is written to, and from $220 on is the data to be written (such as the text string).
So how did that data get into the area at $220?
Set a write breakpoint for CPU $220 and find when data gets written into that point.
(the instruction that writes to $220 will probably be some kind of STA instruction. It's a LDA before that we'd probably be interested in)
(note that this if is a standard routine for copying data to VRAM, it is probably going to be called A LOT. So you will probably only want to enable the breakpoint close to your test point in the game for finding text).
Repeat until you get a hit in the ROM ($8000+) area.

(note that the @ $xxxx values in this original posted code could possibly change as different strings get loaded. I guess I would savestate right before my test point.)
"My watch says 30 chickens" Google, 2018

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #8 on: February 21, 2019, 07:58:04 pm »
This is the part where it's really confusing for me. I didn't realize that the 21C part is the string length, but it definitely is, after going through the text. I also didn't know the 21E-F part was the VRAM address written to.
I sort of at a basic level get what each operator does, but I'm not getting the big picture at all and I'm not sure how to reason out what you did.

Worse, I get really confused on if I break @ 220. It seems like the tile values get written to 204 and 205, maybe others, and yet 220 has something to do with this? If I break at 220 and try to continue, it will break a few more times and then go on with the entire string.

I can't figure out what to get from the parts before 220 is written to.

I did see that these instructions are stored really early in the ROM, like $1536, but I'm just seeing the assembly instructions that I see on the debugger.




EDIT:


These two instructions will bounce back and forth (with "Run"). 204 in RAM just keeps just STX $01.
205 is always the tile value printed to screen.

LDA, $0441,Y @ $044D <- that address goes up by 1 each time

But the value doesn't match up with anything I'm familiar with in the nametable, so it must do something else?

If I "Step Into" these functions, they jump around to different subroutines and do all sorts of stuff that I have trouble comprehending, but probably necessary stuff in order to draw a character.

What's going on at 440 is the current character of the current line of the string, 0-indexed. So $00 at 440 is the first char and $17 at 440 is the last character on that line (aka before a line break character or something like that is reached).


EDIT2:


This image shows that while 440 is incrementing, so is 202, although from something like 60 or 70 instead of 00
« Last Edit: February 21, 2019, 08:42:38 pm by Gyroballer »

Psyklax

  • Hero Member
  • *****
  • Posts: 956
    • View Profile
    • Psyklax Translations
Re: Understanding NES text routines
« Reply #9 on: February 22, 2019, 06:03:39 am »
Okay, I'm completely lost at this point. What exactly are you trying to do? Are you making a table file? Are you just experimenting? Clearly the $200-ish region in the internal RAM is being used as a temporary thing for the text before it's sent off to the VRAM, and the different numbers refer to different things. I just don't understand your end goal here, and why it's so confusing.

If I knew what you wanted to achieve then maybe I could help, but as it stands I haven't a clue.

Okay, let me open up Bio Senshi Dan and do it myself, ignoring the tutorial...

So the first letter of the prologue appears at $21A4, and it comes from $785. Just before $785 I see 84 21 FF 1D.... which tells me the game takes $2184 as the base address in the PPU, puts FF (black) in that position (which I assume is where the dakuten markers go), adds $20 to it and puts 1D (which is 'se'). Sure enough, I test the theory with the 'zo' at $21B5, which has 43 (the dakuten marker) at $2195. This is from experience, of course, but you can learn these things when you keep experimenting.

So looking before a write to $785, I see it comes from $167, so let's go there... aha, the text comes from around the $EEAB area. Where's that in ROM: $16EBB. So there's the text, I make a quick table file in Tabular based on what the PPU viewer shows me, and it's all laid out there. I also notice that to make the dakuten ones like 'zo' they write 'so' then the dakuten marker, and the game takes them both and figures out to put them on top of one another. If the next character doesn't match that value, it'll just put FF (black) in that space instead.

So, if I were translating the game, this is about as much as I'd need to know, so I don't know what else you need to know. Anything unclear? :)

EDIT: another five minutes of investigating $40 (the indirect address location) and $169 (where the address came from) points me to a hard-coded pointer at $EB38 ($16B48 in ROM) that loads the A and X registers with the location of the text ($EE8B). Change that to something else and the game will look elsewhere for the prologue. Nice. :)
« Last Edit: February 22, 2019, 06:16:54 am by Psyklax »

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #10 on: February 22, 2019, 11:57:18 am »
Sorry about that. I'm working with Dragon Ball 3 - Gokuu Den. The text seems like it must be compressed in some way because I've made accurate tables but can't find any of the strings except for the stats page.

If I try to search for any consecutive hex values in the dialogue, I don't find them.

I translated DBZ2 so I'm familiar with tables and how hex editing roms work, but I'm completely unfamiliar with what to do when the text isn't just stored normally.

I'm guessing it could be DTE or some other compression method because I can't even find dictionary entries for something like Gokuu.

nesrocks

  • Hero Member
  • *****
  • Posts: 565
    • View Profile
    • nesrocks.com
Re: Understanding NES text routines
« Reply #11 on: February 22, 2019, 12:02:46 pm »
You're gonna have to look in the PPU memory, find the address for one specific letter, then set a breakpoint to a write to that PPU address and work your way back from there to reverse engineer what it is doing.

https://www.youtube.com/watch?v=d2XkJQFs0OQ

This video explains some of that (the text in dr. mario isn't compressed, but you can see how to use the PPU mem view and the debugger to find it).

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #12 on: February 22, 2019, 12:12:23 pm »
You're gonna have to look in the PPU memory, find the address for one specific letter, then set a breakpoint to a write to that PPU address and work your way back from there to reverse engineer what it is doing.

The last few words are the confusing part for me. I figured out the values using the PPU viewer and then figured out that, ignoring mirroring, the dialogue is written starting at $2084 in PPU MEM.

It seems like $204 or $205 corresponds with that in the RAM, but that's about as far as I get. I know that the value in 205 for instance gets changed all the time and thrown to the next spot around the 2085 area until the string is complete.

My problem is that if I wanna go into the ROM and change what dialogue is there, and therefore gets stored in $205 and then PPU $2085-$20XX, I have no idea how to do that.

cccmar

  • Full Member
  • ***
  • Posts: 228
    • View Profile
    • Nebulous Translations site
Re: Understanding NES text routines
« Reply #13 on: February 22, 2019, 01:37:12 pm »
I believe that the text in DB 3 is stored in CHR-ROM, kinda like what Culture Brain tend to do with their games, so the pointers are not normal. If that's the case, even just finding and dumping it will be very difficult. Maybe DvD or some other hacker who's done that could give you some tips there.

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #14 on: February 22, 2019, 01:43:08 pm »
I believe that the text in DB 3 is stored in CHR-ROM, kinda like what Culture Brain tend to do with their games, so the pointers are not normal. If that's the case, even just finding and dumping it will be very difficult. Maybe DvD or some other hacker who's done that could give you some tips there.

Thanks for this lead. I thought maybe it was just compressed like DTE and I didn't understand how to handle that (probably still don't, but maybe I'd figure it out eventually, whereas with this, maybe it makes sense I'm so confused).

nesrocks

  • Hero Member
  • *****
  • Posts: 565
    • View Profile
    • nesrocks.com
Re: Understanding NES text routines
« Reply #15 on: February 22, 2019, 01:56:48 pm »
There are four nametables on PPU memory: 0x2000-0x23ff, 0x2400-0x27ff, 0x2800-0x2bff and 0x2c00-0x2fff. If it's writing to 0x2084, it's in the first one (top left).

By reverse engineer I mean that you need to set a breakpoint to a write to that address and then read the assembly code and understand what it is doing and where the value is coming from. I admit I hadn't read the thread and Psyklax had already said basically exactly the same thing! Sorry :)

$204 and $203 are the temporary bytes to store the 16bit sized address for where to write to the nametable (since addresses are 16 bit and each byte is 8 bits it needs two bytes).
$205 is the data to write there.

That data is coming from $2007, which is the register to write PPU data to. I don't know exactly what is going on there, but this is the line that does it
07:E0F9:AD 07 20  LDA PPU_DATA = #$B7 (A becomes 02 after this read)
07:E0FC:9D 64 02  STA $0264,X @ $026D = #$93

Maybe it has to do with this, but I'm not sure http://wiki.nesdev.com/w/index.php/Reading_2007_during_rendering

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #16 on: February 22, 2019, 02:46:18 pm »
There are four nametables on PPU memory: 0x2000-0x23ff, 0x2400-0x27ff, 0x2800-0x2bff and 0x2c00-0x2fff. If it's writing to 0x2084, it's in the first one (top left).

By reverse engineer I mean that you need to set a breakpoint to a write to that address and then read the assembly code and understand what it is doing and where the value is coming from. I admit I hadn't read the thread and Psyklax had already said basically exactly the same thing! Sorry :)

$204 and $203 are the temporary bytes to store the 16bit sized address for where to write to the nametable (since addresses are 16 bit and each byte is 8 bits it needs two bytes).
$205 is the data to write there.

That data is coming from $2007, which is the register to write PPU data to. I don't know exactly what is going on there, but this is the line that does it
07:E0F9:AD 07 20  LDA PPU_DATA = #$B7 (A becomes 02 after this read)
07:E0FC:9D 64 02  STA $0264,X @ $026D = #$93

Maybe it has to do with this, but I'm not sure http://wiki.nesdev.com/w/index.php/Reading_2007_during_rendering

Awesome, thank you! I've got a lot of stuff to look at now. I'll try this out over the next couple of days.

Everyone's been a big help.

Pennywise

  • Hero Member
  • *****
  • Posts: 2229
  • I'm curious
    • View Profile
    • Yojimbo's Translations
Re: Understanding NES text routines
« Reply #17 on: February 22, 2019, 02:59:53 pm »
Oh, DB3. The script for that is stored in the CHR-ROM, which is a very special case. My plan was to tackle that after I did a new translation of the first Dragon Ball game, but that project's been in limbo for so long, it kind of fell by the wayside.

Anyhow, I ripped the scripts for the game years back. As for how I would approach the hacking, I mulled this over years back.

AFAIK, the CHR-ROM is maxed out at 256kb, but the PRG-ROM can be expanded from 128kb to 256kb. My plan was to move the text to the PRG-ROM, split text between multiple banks etc and rewrite the code to support loading from the PRG-ROM. This would effectively remove any space constraints, but would require a bit of work to pull off.

Anyhow, the text isn't compressed and the pointers are your typical NES pointers, but with wacky offsets instead. From a basic standpoint, the game isn't hard to manage. It's only when you start digging in does the difficulty increase exponentially.
« Last Edit: February 22, 2019, 03:16:21 pm by Pennywise »

Psyklax

  • Hero Member
  • *****
  • Posts: 956
    • View Profile
    • Psyklax Translations
Re: Understanding NES text routines
« Reply #18 on: February 22, 2019, 03:32:13 pm »
NOTE: I started this before Pennywise posted.

Okay, I was using the game from that tutorial because it wasn't clear that you had moved on to DB3 - the two games aren't going to do things exactly the same.

I can see why you had trouble, though, because I just had a look. The good news is that it's not compressed at all. :D To find it, I started with VRAM $2084 for the first character and saw that $203-205 is where the screen location and the character come from, and I used the trace logger to go from there. I reset the game, got to the dialogue screen and started the trace logger: I got an 11MB file until I reached my $2084 breakpoint. Trace logging is usually much quicker when you want to reverse engineer, because you can just go back and back until you find the root.

In this case, I found that the value in $205 goes through an ORA #$80 process (which changes it to the value it needs), and before that it comes from $441... and THAT comes from $264 (sigh). So then, the real revelation: $264 is loaded from a PPU register, as nesrocks suggested. That's pretty weird, but this is a later game with an advanced MMC chip so the screen drawing is a bit weird.

Anyway, I actually don't know how the hell it's getting into the PPU data at this point, but I don't care, because I have the actual bytes from the ROM. So I do a simple search in the ROM for the first four bytes (20 05 0B B7) and what do you know, they're at $4DEA2. :) So I made a simple Romaji table file and there you go. Here it is:

Code: [Select]
00=a
01=i
02=u
03=e
04=o
05=ka
06=ki
07=ku
08=ke
09=ko
0A=sa
0B=shi
0C=su
0D=se
0E=so
0F=ta
10=chi
11=tsu
12=te
13=to
14=na
15=ni
16=nu
17=ne
18=no
19=ha
1A=hi
1B=fu
1C=he
1D=ho
1E=ma
1F=mi
20=mu
21=me
22=mo
23=ya
24=yu
25=yo
26=ra
27=ri
28=ru
29=re
2A=ro
2B=wa
2C=wo
2D=nn
2E=xa
2F=xi
30=xu
31=xe
32=xo
33=tt
34=xya
35=xyu
36=xyo
37="
38=*
39=-
3A=_
3B=?
3C=!
3D=0
3E=.
40=A
41=I
42=U
43=E
44=O
45=KA
46=KI
47=KU
48=KE
49=KO
4A=SA
4B=SHI
4C=SU
4D=SE
4E=SO
4F=TA
50=CHI
51=TSU
52=TE
53=TO
54=NA
55=NI
56=NU
57=NE
58=NO
59=HA
5A=HI
5B=FU
5C=HE
5D=HO
5E=MA
5F=MI
60=MU
61=ME
62=MO
63=YA
64=YU
65=YO
66=RA
67=RI
68=RU
69=RE
6A=RO
6B=WA
6C=WO
6D=NN
6E=XA
6F=XI
70=XU
71=XE
72=XO
73=TT
74=XYA
75=XYU
76=XYO
77=1
78=2
79=3
7A=4
7B=5
7C=6
7D=7
7E=8
7F=9
85=ga
86=gi
87=gu
88=ge
89=go
8A=za
8B=ji
8C=zu
8D=ze
8E=zo
8F=da
90=di
91=du
92=de
93=do
99=ba
9A=bi
9B=bu
9C=be
9D=bo
B7=
B8=pa
B9=pi
BA=pu
BB=pe
BC=po
C5=GA
C6=GI
C7=GU
C8=GE
C9=GO
CA=ZA
CB=JI
CC=ZU
CD=ZE
CE=ZO
CF=DA
D0=DI
D1=DU
D2=DE
D3=DO
D9=BA
DA=BI
DB=BU
DC=BE
DD=BO
E8=PA
E9=PI
EA=PU
EB=PE
EC=PO
FE=[BRK]
FF=[END]

I might have missed something but I think that's everything in there. Just go to #4DEA2 in your favourite editor like WindHex32 EX and use that table file, and it should all be quite clear. :)

Gyroballer

  • Jr. Member
  • **
  • Posts: 27
    • View Profile
Re: Understanding NES text routines
« Reply #19 on: February 22, 2019, 04:02:06 pm »
Oh my goodness, that seemed like magic, Psyklax. When I was looking at the PPU and nametables on FCEUX, it was suggesting that there were 3 main tables where one starts at 40 with another at C0, and then another starts at 80, which was what the story dialogue seemed to be referencing, but from what you said, if I understand correctly, it's that ORA #$80 that was causing my main confusion. It shows up as 80, but it comes from 00. Wow.

And to Pennywise, if you still end up doing it, I'm sure it'll end up better than mine, but I'll probably still go for it anyway if I can make something decent.

EDIT:
Is there a good doc for a "work smarter not harder" way to like dump and reinsert scripts? I'll be honest and say that I manually edited each line and pointer in DBZ II because I didn't quite understand how to use Cartographer and Atlas, but I'd be happy to learn how if there's a good tutorial for me to read or something since you guys already helped a bunch with this  :D
« Last Edit: February 22, 2019, 04:12:02 pm by Gyroballer »