Romhacking.net

Romhacking => ROM Hacking Discussion => Topic started by: Majin3 on May 19, 2011, 11:12:51 am

Title: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 19, 2011, 11:12:51 am
Hello,
I hope this is the right section.

I want to try translating the Nintendo DS game "Dengeki Gakuen RPG: Cross of Venus Special".
The problem is, I have no clue where the text is... I tried memory hacking which worked perfectly. The text is simply SJS, but the game files are compressed/encrypted/whatever so I don't even know which file the text is stored in.
The game files have a .pack extension. THIS (http://s1.directupload.net/images/110518/8tj8jo76.png) is how the header looks like and THIS (http://s7.directupload.net/images/110518/vr8ykxbi.png) is how the content looks like.
Does anyone have any clues about this pack format?

Thanks in advance.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 19, 2011, 03:25:01 pm
Ah. Archive files. You'll need some programming skills to work with these, but it should be trivial.

The first thing you'll need to work out is where the "start" and "end" (or "stored size") data in each header is stored. Once you know that, you can dismantle the archive into its constituent files. Then we can make a reasoned guess as to its compression format.

But if you want to jump the gun in that regard, odds are it's the DS's built-in LZ compression, which is well-documented, or a close cousin. Keep an eye out for permutations in the header entries: one of the bytes is likely a compression flag, i.e. not all the files stored in an archive are necessarily compressed.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 19, 2011, 06:06:13 pm
Thanks for your response. I'm lacking programming skills though...

This is what I've found out so far about the header:
XX 00 [file name] 00 00 02 00 00 00 00 YY YY 00 ZZ ZZ 00 00
XX is the file name length, YY YY is the stored size and ZZ ZZ seems to be some sort of hash (is there any 2-bytes-long hash?)

But I still have no clue about what compression it is... I've tried to run DSDecmp on an extracted file but it returned "no matching compression method found".
(http://s7.directupload.net/images/110519/temp/ovyni7h3.png) (http://s7.directupload.net/file/d/2529/ovyni7h3_png.htm)

The first byte of every file is mostly 8F (I found 1 file with 8D though) and the next 4 seem to be the file format (.bin has 4D 4F 4A 49 as seen in the screenshot). Any clues?
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 19, 2011, 07:06:27 pm
MOJI is 文字, or "letters". A lot of the file names are in romanized Japanese: "bunsyou" is 文章 ("bunshou"), meaning "text". "Hissatu" is 必殺 ("hissatsu"), literally "certain death" but usually used to denote a special or super attack. "Syujinkou" is 主人公 ("shujinkou"), or "protagonist".

Anyway, I'm looking at the data, and it certainly doesn't look like LZ-compressed SJIS. Which file is this?
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 20, 2011, 08:19:40 am
I already suspected it being 文字 but then I though it could be a coincidence because all .bin files have MOJI but for example .iba mostly has 03 03 0F 01 which makes no sense.
But you're most likely right, those 4 bytes do sometimes vary even between the same file extension, so I guess they're simply the compressed beginning of the file.

It wouldn't surprise me if it's not SJS. There are 640 .pack files and each of them contains a lot of files as well. That file was the first from System.pack: bunsyou1.bin (as a test since without decompressing them I can't find out where the text is...)
Maybe this one is more helpful since you can recognize some words: (System.pack: script.ifb)
(http://s7.directupload.net/images/110520/temp/hk4ay8km.png) (http://s7.directupload.net/file/d/2530/hk4ay8km_png.htm)
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 20, 2011, 04:50:11 pm
It might help if you tried opening the file using one of the Japanese tables we have on hand - here's Shift-JIS (http://www.romhacking.net/docs/442/), and here's EUC (http://www.romhacking.net/docs/178/).

The data starting at $8D2 looks promising:

45 FF 46 46 45 43 54 5F 54 41 A2 47

This comes out to:

E($FF)FFECT_TA($A2)G

And there, ladies and gentlemen, is your smoking gun.

You'll also see it earlier at $314:

FF 5A 59 55 55 4E 49 4E 4E 7A

Which is:

($FF)ZYUUNINN($7A) ("Juunin", likely 獣人 or "beastman")

See how "FF" precedes strings of precisely eight non-gibberish characters? It's compression flags.

Let's have a look at that 7A. In binary, that's 01111010: that is, each bit tells the decompressor which of the following sequences should be treated as plaintext and which should be treated as compression codes. In FF, the bits are all set (i.e. 1), so we can assume that 0 indicates a compression code.

After it comes this sequence:

B2 1E 41 52 49 41 F7 0F 45 BC 2E 05

Or:

($B2)($1E)ARIA($F7)($0F)E($BC)($2E)($05)

If every compression code is two bytes, then this pairs up nicely with our compression flags:

0 - ($B2)($1E)
1 - A
1 - R
1 - I
1 - A
0 - ($F7)($0F)
1 - E
0 - ($BC)($2E)

And what of that mysterious 05 at the end? That's the compression flags for the next block!

Now. Without seeing what all this decompresses to, I really can't help you much further than this, but each of those compression codes comprises a "length/distance" pair - that is, they encode one value that tells the decompressor how far to look back for the next bit of data, and another that tells it how many bytes from that point to copy. Hopefully you can puzzle it out from here.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 20, 2011, 06:22:19 pm
Wow, nice reverse engineering. I think I got the main idea, more or less.
So this is a completely new compression? If there is no (de)compressor, I guess I'm out of luck.

Using SJIS I do get some results like 有力者 but looking through the memory seems to be a better solution:
(http://s1.directupload.net/images/110520/temp/uwopqydp.png) (http://s1.directupload.net/file/d/2530/uwopqydp_png.htm)
The file was twice in memory: compressed and decompressed. I hope this helps.

One hypothetical thought, would it work to fetch the data from memory, change it, put FF every 8 bytes (and 05 every block?), place it back into the .pack files and adjust the stored size and the hash (however it works)? It'd be subpar, but at least I could skip compressing & decompressing.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 20, 2011, 08:08:49 pm
If you could get the game to do all the decompression work for you, yeah, that might work, but it'll probably take less time and effort overall to simply code together a decompressor. The first step is, as you've described, to get a memory dump of the decompressed output so you can compare the compressed and uncompressed files; then, it should be trivial to figure out the exact details of the compression codes.

The good news is that the game might flag which files are compressed/decompressed. You should check to see if every file entry has a "02" in that spot - it might be a flag indicating which files are compressed and how.

Also, I think you misunderstood me - when I said "05 is the compression flags for the next block", I mean the next sequence of eight "codes". I say "codes", not "bytes", because each "code" might be a plaintext byte or a two-byte compression code. 05 is 00000101 in binary, so the next sequence (without looking at it myself) will look like this:

0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
1 - One-Byte Plaintext
0 - Two-Byte Compression Code
1 - One-Byte Plaintext
Title: Re: Text hacking Cross of Venus (NDS)
Post by: jjjewel on May 20, 2011, 08:58:07 pm
If it might help, the conversation when the game starts is in
Chapter_Ep01.pack.

The compression is probably as Ryusui explained.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 20, 2011, 10:04:52 pm
There's some similar plaintext, but I don't think the compressed and decompressed data you've got side-by-side there match up - the first eight bytes after the initial "FF" are plaintext and therefore should appear in the decompressed output, but the last two bytes in the sequence (26 77) are missing.

However, this still tells us something important. They may not match up, but we can still tell enough from the decompressed sample that we can puzzle out the compression code.

At address $25, we see "get($FF)OnceInst($BA)a($F8)($1F)". In the decompressed output, we can see this should be "getOnceInstance". Where did the "nce" go, then? The question we should ask is "where did it come from": the most likely place is the last three letters of "Once".

LZ uses compression codes to determine where in the decompressed output to look for the next snippet of data and how many bytes to copy from there. Since we know where it's looking and how many bytes it had to copy, we can figure out the compression code format.

The next compression flag is $BA, or 10111010; we know that the "a" is plaintext (giving us "getOnceInsta"), and the 0 tells us that what comes next is a compression code: F8 1F, or likely 1F F8 (if these two-byte codes are being read in little-endian format). 1F F8 is 0001111111111000  - quite a whopper (and a palindrome to boot). We know that the sequence is eight (1000) bytes back and three (11) bytes long...it certainly seems to match with what we know, but it's not enough information to go on. We only know what six out of those sixteen bits are likely to mean, and we're bound to run into trouble if we go on partial information.

This appears to be where any useful similarities between the compressed and uncompressed data ends, so you'll have to dig up a different comparison if we're going to make any headway on this.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: jjjewel on May 20, 2011, 11:39:29 pm

The next compression flag is $BA, or 10111010; we know that the "a" is plaintext (giving us "getOnceInsta"), and the 0 tells us that what comes next is a compression code: F8 1F, or likely 1F F8 (if these two-byte codes are being read in little-endian format). 1F F8 is 0001111111111000  - quite a whopper (and a palindrome to boot).

In this case of F8 1F

F8 tells you how many bytes (in the decompressed output) you have to read back
1F tells you how many bytes you will copy to the output

For F8, subtract it from FF and add 1. So FF - F8 + 1 = 8. You read back 8 bytes from wherever your last byte of output is.
1F tells you to copy 3 bytes from where you read back. (0F = 2 bytes, 1F = 3 bytes, 2F = 4 bytes, and so on.)


This is what I tried with Chapter_Ep01.pack. (I did it manually so some bytes might be a bit off.)

(http://img51.imageshack.us/img51/2508/555sp.th.png) (http://imageshack.us/photo/my-images/51/555sp.png/)
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 20, 2011, 11:52:38 pm
Brilliant. Still not perfect, mind, but I think it's safe to say you've cracked the case.

The value is little-endian: 1FF8. The first nibble (four bits) indicates how many bytes to copy; the rest is a signed twelve-bit value. I.e., FF8 is actually "-8", as in "subtract 8 from the current output address to get the source address". As for why the length value doesn't match up with the actual number of bytes to copy, that's common practice: since the shortest possible match is two bytes (though really, the shortest useful match is three), the length value stored in the compression code usually gets an offset added to it. So you take the 1, add 2 to it, and get 3.

So here's how to read a two-byte compression code!

Step 1. Take the two bytes and swap them around to get the little-endian value.
Step 2. Take the first four bits and add 2. That's the length value.
Step 3. The remaining 12 bits is the signed distance value. Subtract it from $FFF to get the positive equivalent.

Or, for those of a more technical mindset:

Compression Code (Little-Endian)
XXXX YYYY YYYY YYYY
X = Length (add 2 to get actual value)
Y = Distance (signed)

Congratulations! That's all you need to know to write up a decompression program!

Here are a couple more examples:

($B2)($1E)ARIA($F7)($0F)E($BC)($2E)($05)

So the compression codes here are 1EB2, 0FF7, and 2EBC.

1EB2
XYYY

1 + 2 = 3 (length)
EB2 = -14D (distance)

0FF7
XYYY

0 + 2 = 2 (length)
FF7 = -9 (distance)

2EBC
XYYY
2 + 2 = 4 (length)
EBC = -143 (distance)

I hope this explains it well enough!
Title: Re: Text hacking Cross of Venus (NDS)
Post by: jjjewel on May 21, 2011, 12:15:02 am
^
^
Wow. I've never known the significance of those 2 bytes indicator.
There are a few NDS games that use similar compression, but not exactly the same
and I've been trying to hack them. Now I'll give them another try. :D

Thank you so much. I'm glad I dropped by this thread.
(The data format just looked familiar so I gave it a try.
It's different from the games I'm hacking but the concept is very similar.)
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 21, 2011, 12:51:06 am
You get familiar with LZ compression when you've played around with GBA and DS for a while.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 21, 2011, 11:02:43 am
Thanks, guys. I didn't understand it completely, though...
The first example (1FF8): Why is it -8? Isn't FFF-FF8=7? And by looking at the compressed file, the distance to "nce" is even 9.
The third example (0FF7): Once again, isn't it -8? But -9 is right here.
And as for the other 2, I don't get where they should be pointing at all. The second one points to B6 0F 8F (should be 00 4B 02) and the fourth one to 75 B7 1F 08 (should be 00 00 00 00).
But I think I got the rest now at least.

jjjewel, are you planning to write a (de)compressor for that "NDS games that use similar compression"?
Or I guess I could ask the writer of DSDecmp or something to implement this one...
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 21, 2011, 03:30:18 pm
Thanks, guys. I didn't understand it completely, though...
The first example (1FF8): Why is it -8? Isn't FFF-FF8=7? And by looking at the compressed file, the distance to "nce" is even 9.
The third example (0FF7): Once again, isn't it -8? But -9 is right here.
And as for the other 2, I don't get where they should be pointing at all. The second one points to B6 0F 8F (should be 00 4B 02) and the fourth one to 75 B7 1F 08 (should be 00 00 00 00).
But I think I got the rest now at least.

jjjewel, are you planning to write a (de)compressor for that "NDS games that use similar compression"?
Or I guess I could ask the writer of DSDecmp or something to implement this one...

*headdesks* Yeah, it looks like I fudged my math. Add 1 to get the actual distance.

Also, you misunderstand. It looks in the uncompressed data for the snippets, not the compressed. "nce" will be eight characters back from the output address at that time.

Let's look at that "getOnceInstance" sequence again.

get($FF)OnceInst($BA)a($F8)($1F)

When it hits the compression code, the output looks like this:

getOnceInsta

And "length 3, distance 8" tells it that the next snippet is:

getOnceInstance

Get it now?
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 21, 2011, 05:24:43 pm
Hmm, not yet.
Shouldn't I subtract 1 to get the actual value?
Yes, "nce" is 8 characters back (uncompressed data) but FFF-FF8 is 7, isn't it?

And it looks in the uncompressed data? In the third example (0FF7) that would be 00 4B instead of "NN" which is back at 13.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 21, 2011, 05:36:20 pm
Subtract, add, whichever. You understand the math better than I can explain it.

Also, I don't see your problem. The compression code in the middle of "ARIANNE" points right back to the "NN" of "ZYUUNINN".

At that point, the uncompressed data should look something like this:

ZYUUNINN($??)($??)($??)ARIA

I know that the compression code in the middle indicates a three-byte sequence, but I don't know what the actual bytes are, so they're marked in with question marks.

"Length 2, Distance 9" therefore gives us...

ZYUUNINN($??)($??)($??)ARIANNE
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Majin3 on May 21, 2011, 06:12:47 pm
Oh, I forgot I didn't upload any screenshot of that part uncompressed.
Here it is:
(http://s7.directupload.net/images/110522/temp/nm6iqpe2.png) (http://s7.directupload.net/file/d/2532/nm6iqpe2_png.htm)
As you can see the distance is 13.

And the distance from getOnceInsta to nce is also 8, while FFF-FF8 gets us 7.
Either there is something wrong there or I didn't understand it yet.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 21, 2011, 06:23:43 pm
Like I said: you have it right. Take the negative number and add (or subtract, whichever) 1. It makes sense there'd be an offset value, since a distance of 0 means it would be looking at empty space.

Note that a distance of 1 and an arbitrary length can be a valuable space-saving trick: the end result is similar to RLE, where compression codes are used to indicate long sequences of repeating bytes.
Title: Re: Text hacking Cross of Venus (NDS)
Post by: jjjewel on May 21, 2011, 09:49:42 pm
jjjewel, are you planning to write a (de)compressor for that "NDS games that use similar compression"?
Or I guess I could ask the writer of DSDecmp or something to implement this one...
I've been working on it but not getting very far. The recompression can be quite complicated. I may have to ask for help with the programs later too.

Anyway, if anyone interested, other NDS games that I mentioned are Nana DS (graphic compression in .R00 archive), Duel Love DS (text and graphic compression in .ptd archive), and Oshare Princess DS (text compression in mes.bin). They all use different kinds of compressions, but the concepts are pretty similar to what we're discussing here. ^_^
Title: Re: Text hacking Cross of Venus (NDS)
Post by: Ryusui on May 22, 2011, 02:56:30 am
Recompression isn't hard. It's just tedious.

Search the preceding data for any match between 2 bytes and $11 (17 dec) long that's less than $1000 bytes back from the current position. Once you've found the longest possible match, calculate the compression code and output it, or else output the plaintext code. Track the compression flags for each code, and once you've got a full byte, output that. (Note that you'll need to do things backwards - you have to write the compression flag byte which goes before the codes after you've calculated them.)
Title: Re: Text hacking Cross of Venus (NDS)
Post by: henke37 on May 31, 2011, 06:34:56 am
It helps that the DS comes with built in decompression routines in the bios.

Check if you are using the variant compression that has support for variable length backreferences. It has some flag bits used to indicate how long the backreference code itself is.