News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Dragon Warrior 1, 2 & 3 Hacking Discussion  (Read 43813 times)

abw

  • Sr. Member
  • ****
  • Posts: 287
    • View Profile
Re: Dragon Warrior 1, 2 & 3 Hacking Discussion
« Reply #120 on: October 09, 2019, 08:41:44 pm »
The work is now available at Data Crystal, enjoy!
Great work! We may find some value in this for the NES projects and it certainly looks like a nice win for Dragon Quest hacking community in general.
+1 :cookie:! It'll be interesting to see how well the GBC RAM map matches up to the NES RAM map.

Therefore we decided that we would need to do an extraction of the DQ3 Japanese rom's text.
Wait, if you didn't have a script dump of either the English or Japanese text, what were you translating from?

1, any thoughts about my kana viewing issues in a hexeditor?
Saving your table file encoded in Shift-JIS, loading it in WindHex, and then enabling the "View Text Data As Unicode" menu option does the trick for me. If you're going with romaji and want/need to differentiate between the kana systems, one thing I've done in the past is to use e.g. lowercase for hiragana and uppercase for katakana.

2, do you think my suspicions of compression are correct? And 3, if compression, could the solution be relatively simple like the bit based table file we created for DQ2?
Yes, DQ3's text encoding appears to be quite similar to DQ4's from a structural point of view. It's a 6-bit encoding with hiragana <=> katakana switches on a $3C (a.k.a. %111100) and dictionary switches on $3D/3E/3F. The major obvious difference between the two encodings is the dictionary contents and some of the kana being different/reordered. For table file building, you can use the DQ4 example that ships with abcde as a basis; DQ3's 6-bit -> 8-bit hiragana lookup table starts at 0x3BB8A, its katakana lookup table starts at 0x3BBC6 (these two mostly follow the order of tiles in the PPU, but not exactly), and the 6 dictionary pointers are at 0x3BC74, pointing to $FE-terminated dictionary entries in ROM bank $10 (i.e. starting at 0x28B4B). Once you've got that sorted out, it looks like the main script pointer table starts at 0x3BC02.


In other news, I did a little bit of poking around the text engine code for DW3, and adding in DTE was pretty easy; the $80 - $AF byte range was apparently unused and had its own little code path, so I stole that for the new DTE entries, and $30 DTE entries is enough to compress the original script by over 27%, so if you're only 5% - 10% larger than the original, you should have plenty of room to spare. Assuming Choppasmith is eventually going to want to insert a script that's somewhere around 100% larger than the original, I also looked at the code that sets up the bank and pointer for a given string ID, and rewrote it to make using extra banks easy; in combination with the DTE compression, using 3 of the existing empty ROM banks should be enough space to hold double the original text.

Chicken Knife

  • Sr. Member
  • ****
  • Posts: 295
    • View Profile
Re: Dragon Warrior 1, 2 & 3 Hacking Discussion
« Reply #121 on: October 10, 2019, 03:32:16 pm »
Quote
Wait, if you didn't have a script dump of either the English or Japanese text, what were you translating from?
Well, I said above that I had an English dump so we are definitely using that. :P It probably would have been more efficient for us to crack our heads getting the Japanese dump at the beginning of the process, but instead we obtained *most* of the Japanese script through various resources online and a Japanese playthrough. I have a bad habit of choosing the longer and less technically difficult path, which is quite foolish when here I am struggling with the technicalities of a Japanese extraction regardless.

Quote
Saving your table file encoded in Shift-JIS, loading it in WindHex, and then enabling the changing the settings to "View Text Data As Unicode" menu option does the trick for me. If you're going with romaji and want/need to differentiate between the kana systems, one thing I've done in the past is to use e.g. lowercase for hiragana and uppercase for katakana.
I ended up downloading the Notepad++ software after investigating Shift-JIS and kind of fell in love with it in general. As far as viewing kana in WindHex, I'm stuck on your instruction of "View Text Data As Unicode". I do see the option of "View Text Data As Japanese" under options, but I've scoured the tabs at the top several times and can't find the Unicode option.

Quote
Yes, DQ3's text encoding appears to be quite similar to DQ4's from a structural point of view. It's a 6-bit encoding with hiragana <=> katakana switches on a $3C (a.k.a. %111100) and dictionary switches on $3D/3E/3F.
Ok, the concepts of 3 table files and switches are making my head hurt. This reminds me of the numerous table files used for fixing the SHILD issue in DW2 but I'd be lying if I said I actually understood that incredible web of files you sent me. Now I'll try to comprehend this (since 3 is better than 20).

So if I understand correctly, every time the game alternates between displaying a word in hiragana vs katakana that a switch byte--$3C appears and causes the game to pull from a different character table. Does this $3C switch appear in the code that tells the game to display the text or does it appear in the text itself? I assume that my Cartographer instructions would have to load all the table files and then tell abcde to perform the same switches between tables based on the presence of that byte. If you could point out the instructions that do that in your DQ4 cartographer doc that would be helpful.

Quote
DQ3's 6-bit -> 8-bit hiragana lookup table starts at 0x3BB8A, its katakana lookup table starts at 0x3BBC6 (these two mostly follow the order of tiles in the PPU, but not exactly), and the 6 dictionary pointers are at 0x3BC74, pointing to $FE-terminated dictionary entries in ROM bank $10 (i.e. starting at 0x28B4B). Once you've got that sorted out, it looks like the main script pointer table starts at 0x3BC02.
This is another very confusing element here. In fact, I'm so confused I can hardly even articulate an appropriate question. I did see in the table files that the characters with diacritics used 8 bits instead of 6. Wouldn't they just be pairs of bytes at this point and could be saved in the table file as such? And therefore wouldn't they essentially be uncompressed and I could rely on the byte information showing in a PPU viewer?

Quote
so if you're only 5% - 10% larger than the original, you should have plenty of room to spare.
During my second big round of editing as I've been adding our script lines into the insertion file one at a time, I've found several opportunities to reduce redundancies and make for punchier language. Nothing was compromised, and in fact I strongly prefer dense, to the point writing (in spite of what my RHDN forum activity would probably indicate.) That all said, it would seem likely that I may not have a problem with text space. But we shall see. Dealing with that problem can't be any more daunting than these obstacles around extracting DQ3's Japanese script.
« Last Edit: October 10, 2019, 03:41:14 pm by Chicken Knife »

filler

  • RHDN Patreon Supporter!
  • Hero Member
  • *****
  • Posts: 896
  • "WINNERS DON'T SELL REPROS"
    • View Profile
    • Filler's Translation Projects
Re: Dragon Warrior 1, 2 & 3 Hacking Discussion
« Reply #122 on: October 10, 2019, 06:57:30 pm »
As far as viewing kana in WindHex, I'm stuck on your instruction of "View Text Data As Unicode". I do see the option of "View Text Data As Japanese" under options, but I've scoured the tabs at the top several times and can't find the Unicode option.

Those are both the same command, essentially telling WindHex to display bytes that appear in a table file as their corresponding Japanese characters. The reason for the change in wording is likely because "View Text as Unicode" was not accurate since it reads table files in S-JIS format and that setting simply renders the characters in Japanese. It's more accurate to say "View Text Data as Japanese" and Bongo` changed the wording in subsequent version(s).

Chicken Knife

  • Sr. Member
  • ****
  • Posts: 295
    • View Profile
Re: Dragon Warrior 1, 2 & 3 Hacking Discussion
« Reply #123 on: October 10, 2019, 09:58:39 pm »
Those are both the same command, essentially telling WindHex to display bytes that appear in a table file as their corresponding Japanese characters. The reason for the change in wording is likely because "View Text as Unicode" was not accurate since it reads table files in S-JIS format and that setting simply renders the characters in Japanese. It's more accurate to say "View Text Data as Japanese" and Bongo` changed the wording in subsequent version(s).
Thanks for the clarification. I did experiment with that and tried it again just now. I loaded the Japanese Dragon Quest 2 rom, loaded a hiragana table file that has been converted to SHIFT-JIS, and changed to the mode of View Text Data as Japanese.

It showed basically all the data as kanji characters, but I didn't see any of the hiragana. I used that same table file several months ago to do a text dump of the kana and it came out more or less correctly. So that all still doesn't solve the issue. hmm..

abw

  • Sr. Member
  • ****
  • Posts: 287
    • View Profile
Re: Dragon Warrior 1, 2 & 3 Hacking Discussion
« Reply #124 on: October 10, 2019, 10:19:38 pm »
I have a bad habit of choosing the longer and less technically difficult path, which is quite foolish when here I am struggling with the technicalities of a Japanese extraction regardless.
Sometimes it can be very tricky indeed to tell whether the path that looks easy will actually end up being any faster than the path that looks difficult. That's a pain I'm sure most of us here can sympathize with ;).

Ok, the concepts of 3 table files and switches are making my head hurt.
Alright, how about an example? If you talk to the old man standing by the pool in the lower left corner of Aliahan castle at the start of a new game, he says:
Quote
とうぞくバコタの つくった
カギは かんたんなドアを
すべて あけたそうじゃ

That text starts partway through the byte at 0x2D3D2:

      E7 F8 BC DD 01 8A F1 8D 91 1E E3 FA F0 3D
C4 F1 9D 85 B4 FB 54 F3 73 00 F2 CE 8C DF D3 D2
D8 02 0F FA 5E 7E

Which in binary is:

                  11100111 11111000 10111100 11011101 00000001 10001010 11110001 10001101 10010001 00011110 11100011 11111010 11110000 00111101
11000100 11110001 10011101 10000101 10110100 11111011 01010100 11110011 01110011 00000000 11110010 11001110 10001100 11011111 11010011 11010010
11011000 00000010 00001111 11111010 01011110 01111110


More specifically, the old man's text starts at the second-last bit of 0x2D3D2, so if we ignore the first 6 bits (which are the end token for the previous string), we get

                        11 11111000 10111100 11011101 00000001 10001010 11110001 10001101 10010001 00011110 11100011 11111010 11110000 00111101
11000100 11110001 10011101 10000101 10110100 11111011 01010100 11110011 01110011 00000000 11110010 11001110 10001100 11011111 11010011 11010010
11011000 00000010 00001111 11111010 01011110 01111110


DQ4 uses 6-bit tokens instead of 8-bit, so considering that string of bits in groups of 6 gives us:

111111 100010 111100 110111 010000 000110 001010 111100 011000 110110 010001 000111 101110 001111 111010 111100
000011 110111 000100 111100 011001 110110 000101 101101 001111 101101 010100 111100 110111 001100 000000 111100
101100 111010 001100 110111 111101 001111 010010 110110 000000 001000 001111 111110 100101 111001 111110

which tokenizes as:

Table File   Token   Text/Effect
(hiragana)   111111   [switch to dictionary $3F for 1 token]
(dict_$3F)   100010   とう[add dakuten to next token]そく
(hiragana)   111100   [switch to katakana]
(katakana)   110111   [add dakuten to next token]
(katakana)   010000   ハ
(katakana)   000110   コ
(katakana)   001010   タ
(katakana)   111100   [switch to hiragana]
(hiragana)   011000   の
(hiragana)   110110   
(hiragana)   010001   つ
(hiragana)   000111   く
(hiragana)   101110   っ
(hiragana)   001111   た
(hiragana)   111010   [line]
(hiragana)   111100   [switch to katakana]
(katakana)   000011   カ
(katakana)   110111   [add dakuten to next token]
(katakana)   000100   キ
(katakana)   111100   [switch to hiragana]
(hiragana)   011001   は
(hiragana)   110110   
(hiragana)   000101   か
(hiragana)   101101   ん
(hiragana)   001111   た
(hiragana)   101101   ん
(hiragana)   010100   な
(hiragana)   111100   [switch to katakana]
(katakana)   110111   [add dakuten to next token]
(katakana)   001100   ト
(katakana)   000000   ア
(katakana)   111100   [switch to hiragana]
(hiragana)   101100   を
(hiragana)   111010   [line]
(hiragana)   001100   す
(hiragana)   110111   [add dakuten to next token]
(hiragana)   111101   [switch to dictionary $3D for 1 token]
(dict_$3D)   001111F   へ
(hiragana)   010010   て
(hiragana)   110110   
(hiragana)   000000   あ
(hiragana)   001000   け
(hiragana)   001111   た
(hiragana)   111110   [switch to dictionary $3E for 1 token]
(dict_$3E)   100101   そう[add dakuten to next token]しゃ
(hiragana)   111001   [end]

(those extra 6 bits at the end are the start of the next string)
So, every time the game reads a $3C (%111100), it toggles between hiragana and katakana and stays in the new table until the next time it reads a switch token; when it reads a $3D, $3E, or $3F (%111101/%111110/%111111), it switches to the corresponding dictionary for 1 token (or at least that's the high level effect; the actual ASM divides the bits up differently and treats the dictionary as 6 parts with 32 entries each rather than 3 parts with 64 entries each).

In the sample DQ4 table files, DQ4's table switches happen on the
Quote
!%111100=,<@katakana>:%111100
!%1111=,<@dictionary>:1
lines. You'll notice that I split the 12 bits for the dictionary switch + dictionary entry into 4 bits for switching and 8 bits (1 byte) for the entry; keeping them as 6 bits each will give you the exact same effect, so which form you prefer is entirely up to you.

This is another very confusing element here. In fact, I'm so confused I can hardly even articulate an appropriate question. I did see in the table files that the characters with diacritics used 8 bits instead of 6. Wouldn't they just be pairs of bytes at this point and could be saved in the table file as such? And therefore wouldn't they essentially be uncompressed and I could rely on the byte information showing in a PPU viewer?
DQ3's dictionary entries are a series of 8-bit values, so they get to use the full 8-bit range, but the hiragana and katakana entries are only 6-bit values and they get translated into 8-bit values via a pair of lookup tables, one for hiragana (e.g. hiragana %000000 translates to $0B, which is あ in the PPU viewer) and one for katakana (e.g. katakana %000000 translates to $3D, which is ア in the PPU viewer). You should be able to read the dictionary entries and 6-to-8-bit lookup tables with a byte-based table file, but the script itself is still encoded as a series of 6-bit tokens, so you won't be able to see that as easily.

Choppasmith

  • Full Member
  • ***
  • Posts: 131
    • View Profile
Re: Dragon Warrior 1, 2 & 3 Hacking Discussion
« Reply #125 on: October 13, 2019, 01:40:17 am »
Assuming Choppasmith is eventually going to want to insert a script that's somewhere around 100% larger than the original, I also looked at the code that sets up the bank and pointer for a given string ID, and rewrote it to make using extra banks easy; in combination with the DTE compression, using 3 of the existing empty ROM banks should be enough space to hold double the original text.

Thank you for this! From earlier posts I was looking at the amount of space going, with the first two games so far, I'll probably need DTE at least.

For control codes, here's what I had (I think the A* ones were not used in the main script, but you can confirm that):

I had heard about the mostly gender neutral, but infamous "boy" dialog in the Japanese version as well. This is great for me because the mobile script just gives the female hero a separate string, while there's some like below where they ADD dialog for Female Hero, there's a lot of duplicate lines.

Code: [Select]
My father was always telling me stories about the mighty hero Ortega.<10
And now here I am working in my dad's place, and there you are adventuring in yours.<10
It's true what they say, isn't it? Like father, like son!<41<63 Well, daughter in your case!<64<0F

I think it was something the Japanese script did to help correct said dialog problem in the original.

Also I just saw as I'm typing this there ARE cases where the gender is adjusted for strings. Weird

Code: [Select]
We expected no less of you, <41son<63daughter<64 of Ortega! We have witnessed the birth of a true hero!
So I finally translated the biggest RAM mapping work I have done for these games, the DQ3 GBC version. I have been stocking this here for more than half a decade, always forget to post this massive stuff but when I saw this topic, flashes happened! I was going to do the NES and SNES versions too, but time is short plus it shares many traits with them.

The work is now available at Data Crystal, enjoy! Off: it is about time to update those templates!


Hey thanks for this! I still plan on tackling the GBC remakes down the line. So, the more info on III, the better!



So abw, sorry to bring up DW2, I really hope this is the last thing. But I allllmost got the code right for changing battle dialog to:

1: Ignore counting the last few bosses
2: Replace the multiple enemy counts with "Some" and "A/An"
3: Change the extra groups that appear to "And/AND some [monster]"

here's my code

Code: [Select]
norom ; stop Asar from trying to apply SNES memory mapping to this NES code
org $00BF00 ; set the ROM file insertion point
base $BEF0 ; set the starting RAM address

LDA $0161 ; monster ID for the current group
CMP #$4E ; bosses have IDs >= #$4E (so does the "Enemies" monster, but that's not a monster ID you can encounter)
BCS no_change
LDY $8F    ; number of monsters in group
DEY        ; count from 0 instead of 1
BEQ one     ; 0 => only one monster => handle "A" vs "An"
LDX #$00    ; read index
LDY #$00    ; write index
loop:
LDA some,X ; Monster Counts text
STA $60F1,Y ; start of text variable buffer
INY       
INX       
CMP #$FA    ; [end-FA]
BNE loop  ; if not end token, keep copying
done:
SEC        ; SEC to trigger read of [end-FA]-terminated string from $60F1, CLC to use A
RTS       
some:
db $36,$18,$16,$0E,$FA ;"Some" not using a table here
one:
; at this point we know Y = 0
LDA #$24 ; "A"
STA $60F1,Y ; start of text variable buffer
LDA $6119 ; first letter of monster name
CMP #$24 ; "A"
BEQ an
CMP #$28 ; "E"
BEQ an
CMP #$2C ; "I"
BEQ an
CMP #$32 ; "O"
BEQ an
CMP #$38 ; "U"
BNE no_change
an:
LDA #$17 ; "n"
INY
STA $60F1,Y ; start of text variable buffer
no_change:
LDA #$FA ; [end-FA]
STA $60F1,Y ; start of text variable buffer
BNE done
no_cardinal:
LDA #$FA ; [end-FA] the game will handle trimming the empty space from the
STA $60F1
BNE done

The thing that has me stumped is the A/an part. In battle if I get a single enemy with a consonant letter I get, for example _ Dracky where the underscore is an extra space the A should go. Yet getting a single Iron Ant gives me "A Iron Ant"

(to my credit, I was getting nothing but garbage before, so this has come a long way before this post)