News: 11 March 2016 - Forum Rules
Current Moderators - DarkSol, KingMike, MathOnNapkins, Azkadellia, Danke

Author Topic: Text hacking Cross of Venus (NDS)  (Read 14466 times)

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Text hacking Cross of Venus (NDS)
« on: May 19, 2011, 11:12:51 am »
Hello,
I hope this is the right section.

I want to try translating the Nintendo DS game "Dengeki Gakuen RPG: Cross of Venus Special".
The problem is, I have no clue where the text is... I tried memory hacking which worked perfectly. The text is simply SJS, but the game files are compressed/encrypted/whatever so I don't even know which file the text is stored in.
The game files have a .pack extension. THIS is how the header looks like and THIS is how the content looks like.
Does anyone have any clues about this pack format?

Thanks in advance.

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #1 on: May 19, 2011, 03:25:01 pm »
Ah. Archive files. You'll need some programming skills to work with these, but it should be trivial.

The first thing you'll need to work out is where the "start" and "end" (or "stored size") data in each header is stored. Once you know that, you can dismantle the archive into its constituent files. Then we can make a reasoned guess as to its compression format.

But if you want to jump the gun in that regard, odds are it's the DS's built-in LZ compression, which is well-documented, or a close cousin. Keep an eye out for permutations in the header entries: one of the bytes is likely a compression flag, i.e. not all the files stored in an archive are necessarily compressed.
In the event of a firestorm, the salad bar will remain open.

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #2 on: May 19, 2011, 06:06:13 pm »
Thanks for your response. I'm lacking programming skills though...

This is what I've found out so far about the header:
XX 00 [file name] 00 00 02 00 00 00 00 YY YY 00 ZZ ZZ 00 00
XX is the file name length, YY YY is the stored size and ZZ ZZ seems to be some sort of hash (is there any 2-bytes-long hash?)

But I still have no clue about what compression it is... I've tried to run DSDecmp on an extracted file but it returned "no matching compression method found".


The first byte of every file is mostly 8F (I found 1 file with 8D though) and the next 4 seem to be the file format (.bin has 4D 4F 4A 49 as seen in the screenshot). Any clues?

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #3 on: May 19, 2011, 07:06:27 pm »
MOJI is 文字, or "letters". A lot of the file names are in romanized Japanese: "bunsyou" is 文章 ("bunshou"), meaning "text". "Hissatu" is 必殺 ("hissatsu"), literally "certain death" but usually used to denote a special or super attack. "Syujinkou" is 主人公 ("shujinkou"), or "protagonist".

Anyway, I'm looking at the data, and it certainly doesn't look like LZ-compressed SJIS. Which file is this?
In the event of a firestorm, the salad bar will remain open.

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #4 on: May 20, 2011, 08:19:40 am »
I already suspected it being 文字 but then I though it could be a coincidence because all .bin files have MOJI but for example .iba mostly has 03 03 0F 01 which makes no sense.
But you're most likely right, those 4 bytes do sometimes vary even between the same file extension, so I guess they're simply the compressed beginning of the file.

It wouldn't surprise me if it's not SJS. There are 640 .pack files and each of them contains a lot of files as well. That file was the first from System.pack: bunsyou1.bin (as a test since without decompressing them I can't find out where the text is...)
Maybe this one is more helpful since you can recognize some words: (System.pack: script.ifb)
« Last Edit: May 20, 2011, 11:47:40 am by Majin3 »

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #5 on: May 20, 2011, 04:50:11 pm »
It might help if you tried opening the file using one of the Japanese tables we have on hand - here's Shift-JIS, and here's EUC.

The data starting at $8D2 looks promising:

45 FF 46 46 45 43 54 5F 54 41 A2 47

This comes out to:

E($FF)FFECT_TA($A2)G

And there, ladies and gentlemen, is your smoking gun.

You'll also see it earlier at $314:

FF 5A 59 55 55 4E 49 4E 4E 7A

Which is:

($FF)ZYUUNINN($7A) ("Juunin", likely 獣人 or "beastman")

See how "FF" precedes strings of precisely eight non-gibberish characters? It's compression flags.

Let's have a look at that 7A. In binary, that's 01111010: that is, each bit tells the decompressor which of the following sequences should be treated as plaintext and which should be treated as compression codes. In FF, the bits are all set (i.e. 1), so we can assume that 0 indicates a compression code.

After it comes this sequence:

B2 1E 41 52 49 41 F7 0F 45 BC 2E 05

Or:

($B2)($1E)ARIA($F7)($0F)E($BC)($2E)($05)

If every compression code is two bytes, then this pairs up nicely with our compression flags:

0 - ($B2)($1E)
1 - A
1 - R
1 - I
1 - A
0 - ($F7)($0F)
1 - E
0 - ($BC)($2E)

And what of that mysterious 05 at the end? That's the compression flags for the next block!

Now. Without seeing what all this decompresses to, I really can't help you much further than this, but each of those compression codes comprises a "length/distance" pair - that is, they encode one value that tells the decompressor how far to look back for the next bit of data, and another that tells it how many bytes from that point to copy. Hopefully you can puzzle it out from here.
In the event of a firestorm, the salad bar will remain open.

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #6 on: May 20, 2011, 06:22:19 pm »
Wow, nice reverse engineering. I think I got the main idea, more or less.
So this is a completely new compression? If there is no (de)compressor, I guess I'm out of luck.

Using SJIS I do get some results like 有力者 but looking through the memory seems to be a better solution:

The file was twice in memory: compressed and decompressed. I hope this helps.

One hypothetical thought, would it work to fetch the data from memory, change it, put FF every 8 bytes (and 05 every block?), place it back into the .pack files and adjust the stored size and the hash (however it works)? It'd be subpar, but at least I could skip compressing & decompressing.

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #7 on: May 20, 2011, 08:08:49 pm »
If you could get the game to do all the decompression work for you, yeah, that might work, but it'll probably take less time and effort overall to simply code together a decompressor. The first step is, as you've described, to get a memory dump of the decompressed output so you can compare the compressed and uncompressed files; then, it should be trivial to figure out the exact details of the compression codes.

The good news is that the game might flag which files are compressed/decompressed. You should check to see if every file entry has a "02" in that spot - it might be a flag indicating which files are compressed and how.

Also, I think you misunderstood me - when I said "05 is the compression flags for the next block", I mean the next sequence of eight "codes". I say "codes", not "bytes", because each "code" might be a plaintext byte or a two-byte compression code. 05 is 00000101 in binary, so the next sequence (without looking at it myself) will look like this:

0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
0 - Two-Byte Compression Code
1 - One-Byte Plaintext
0 - Two-Byte Compression Code
1 - One-Byte Plaintext
In the event of a firestorm, the salad bar will remain open.

jjjewel

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #8 on: May 20, 2011, 08:58:07 pm »
If it might help, the conversation when the game starts is in
Chapter_Ep01.pack.

The compression is probably as Ryusui explained.

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #9 on: May 20, 2011, 10:04:52 pm »
There's some similar plaintext, but I don't think the compressed and decompressed data you've got side-by-side there match up - the first eight bytes after the initial "FF" are plaintext and therefore should appear in the decompressed output, but the last two bytes in the sequence (26 77) are missing.

However, this still tells us something important. They may not match up, but we can still tell enough from the decompressed sample that we can puzzle out the compression code.

At address $25, we see "get($FF)OnceInst($BA)a($F8)($1F)". In the decompressed output, we can see this should be "getOnceInstance". Where did the "nce" go, then? The question we should ask is "where did it come from": the most likely place is the last three letters of "Once".

LZ uses compression codes to determine where in the decompressed output to look for the next snippet of data and how many bytes to copy from there. Since we know where it's looking and how many bytes it had to copy, we can figure out the compression code format.

The next compression flag is $BA, or 10111010; we know that the "a" is plaintext (giving us "getOnceInsta"), and the 0 tells us that what comes next is a compression code: F8 1F, or likely 1F F8 (if these two-byte codes are being read in little-endian format). 1F F8 is 0001111111111000  - quite a whopper (and a palindrome to boot). We know that the sequence is eight (1000) bytes back and three (11) bytes long...it certainly seems to match with what we know, but it's not enough information to go on. We only know what six out of those sixteen bits are likely to mean, and we're bound to run into trouble if we go on partial information.

This appears to be where any useful similarities between the compressed and uncompressed data ends, so you'll have to dig up a different comparison if we're going to make any headway on this.
In the event of a firestorm, the salad bar will remain open.

jjjewel

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #10 on: May 20, 2011, 11:39:29 pm »

The next compression flag is $BA, or 10111010; we know that the "a" is plaintext (giving us "getOnceInsta"), and the 0 tells us that what comes next is a compression code: F8 1F, or likely 1F F8 (if these two-byte codes are being read in little-endian format). 1F F8 is 0001111111111000  - quite a whopper (and a palindrome to boot).

In this case of F8 1F

F8 tells you how many bytes (in the decompressed output) you have to read back
1F tells you how many bytes you will copy to the output

For F8, subtract it from FF and add 1. So FF - F8 + 1 = 8. You read back 8 bytes from wherever your last byte of output is.
1F tells you to copy 3 bytes from where you read back. (0F = 2 bytes, 1F = 3 bytes, 2F = 4 bytes, and so on.)


This is what I tried with Chapter_Ep01.pack. (I did it manually so some bytes might be a bit off.)


Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #11 on: May 20, 2011, 11:52:38 pm »
Brilliant. Still not perfect, mind, but I think it's safe to say you've cracked the case.

The value is little-endian: 1FF8. The first nibble (four bits) indicates how many bytes to copy; the rest is a signed twelve-bit value. I.e., FF8 is actually "-8", as in "subtract 8 from the current output address to get the source address". As for why the length value doesn't match up with the actual number of bytes to copy, that's common practice: since the shortest possible match is two bytes (though really, the shortest useful match is three), the length value stored in the compression code usually gets an offset added to it. So you take the 1, add 2 to it, and get 3.

So here's how to read a two-byte compression code!

Step 1. Take the two bytes and swap them around to get the little-endian value.
Step 2. Take the first four bits and add 2. That's the length value.
Step 3. The remaining 12 bits is the signed distance value. Subtract it from $FFF to get the positive equivalent.

Or, for those of a more technical mindset:

Compression Code (Little-Endian)
XXXX YYYY YYYY YYYY
X = Length (add 2 to get actual value)
Y = Distance (signed)

Congratulations! That's all you need to know to write up a decompression program!

Here are a couple more examples:

($B2)($1E)ARIA($F7)($0F)E($BC)($2E)($05)

So the compression codes here are 1EB2, 0FF7, and 2EBC.

1EB2
XYYY

1 + 2 = 3 (length)
EB2 = -14D (distance)

0FF7
XYYY

0 + 2 = 2 (length)
FF7 = -9 (distance)

2EBC
XYYY
2 + 2 = 4 (length)
EBC = -143 (distance)

I hope this explains it well enough!
In the event of a firestorm, the salad bar will remain open.

jjjewel

  • Jr. Member
  • **
  • Posts: 23
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #12 on: May 21, 2011, 12:15:02 am »
^
^
Wow. I've never known the significance of those 2 bytes indicator.
There are a few NDS games that use similar compression, but not exactly the same
and I've been trying to hack them. Now I'll give them another try. :D

Thank you so much. I'm glad I dropped by this thread.
(The data format just looked familiar so I gave it a try.
It's different from the games I'm hacking but the concept is very similar.)

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #13 on: May 21, 2011, 12:51:06 am »
You get familiar with LZ compression when you've played around with GBA and DS for a while.
In the event of a firestorm, the salad bar will remain open.

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #14 on: May 21, 2011, 11:02:43 am »
Thanks, guys. I didn't understand it completely, though...
The first example (1FF8): Why is it -8? Isn't FFF-FF8=7? And by looking at the compressed file, the distance to "nce" is even 9.
The third example (0FF7): Once again, isn't it -8? But -9 is right here.
And as for the other 2, I don't get where they should be pointing at all. The second one points to B6 0F 8F (should be 00 4B 02) and the fourth one to 75 B7 1F 08 (should be 00 00 00 00).
But I think I got the rest now at least.

jjjewel, are you planning to write a (de)compressor for that "NDS games that use similar compression"?
Or I guess I could ask the writer of DSDecmp or something to implement this one...

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #15 on: May 21, 2011, 03:30:18 pm »
Thanks, guys. I didn't understand it completely, though...
The first example (1FF8): Why is it -8? Isn't FFF-FF8=7? And by looking at the compressed file, the distance to "nce" is even 9.
The third example (0FF7): Once again, isn't it -8? But -9 is right here.
And as for the other 2, I don't get where they should be pointing at all. The second one points to B6 0F 8F (should be 00 4B 02) and the fourth one to 75 B7 1F 08 (should be 00 00 00 00).
But I think I got the rest now at least.

jjjewel, are you planning to write a (de)compressor for that "NDS games that use similar compression"?
Or I guess I could ask the writer of DSDecmp or something to implement this one...

*headdesks* Yeah, it looks like I fudged my math. Add 1 to get the actual distance.

Also, you misunderstand. It looks in the uncompressed data for the snippets, not the compressed. "nce" will be eight characters back from the output address at that time.

Let's look at that "getOnceInstance" sequence again.

get($FF)OnceInst($BA)a($F8)($1F)

When it hits the compression code, the output looks like this:

getOnceInsta

And "length 3, distance 8" tells it that the next snippet is:

getOnceInstance

Get it now?
« Last Edit: May 21, 2011, 05:36:36 pm by Ryusui »
In the event of a firestorm, the salad bar will remain open.

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #16 on: May 21, 2011, 05:24:43 pm »
Hmm, not yet.
Shouldn't I subtract 1 to get the actual value?
Yes, "nce" is 8 characters back (uncompressed data) but FFF-FF8 is 7, isn't it?

And it looks in the uncompressed data? In the third example (0FF7) that would be 00 4B instead of "NN" which is back at 13.

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #17 on: May 21, 2011, 05:36:20 pm »
Subtract, add, whichever. You understand the math better than I can explain it.

Also, I don't see your problem. The compression code in the middle of "ARIANNE" points right back to the "NN" of "ZYUUNINN".

At that point, the uncompressed data should look something like this:

ZYUUNINN($??)($??)($??)ARIA

I know that the compression code in the middle indicates a three-byte sequence, but I don't know what the actual bytes are, so they're marked in with question marks.

"Length 2, Distance 9" therefore gives us...

ZYUUNINN($??)($??)($??)ARIANNE
In the event of a firestorm, the salad bar will remain open.

Majin3

  • Jr. Member
  • **
  • Posts: 7
    • View Profile
Re: Text hacking Cross of Venus (NDS)
« Reply #18 on: May 21, 2011, 06:12:47 pm »
Oh, I forgot I didn't upload any screenshot of that part uncompressed.
Here it is:

As you can see the distance is 13.

And the distance from getOnceInsta to nce is also 8, while FFF-FF8 gets us 7.
Either there is something wrong there or I didn't understand it yet.

Ryusui

  • Hero Member
  • *****
  • Posts: 4989
  • It's the greatest day.
    • View Profile
    • Tumblr
Re: Text hacking Cross of Venus (NDS)
« Reply #19 on: May 21, 2011, 06:23:43 pm »
Like I said: you have it right. Take the negative number and add (or subtract, whichever) 1. It makes sense there'd be an offset value, since a distance of 0 means it would be looking at empty space.

Note that a distance of 1 and an arbitrary length can be a valuable space-saving trick: the end result is similar to RLE, where compression codes are used to indicate long sequences of repeating bytes.
In the event of a firestorm, the salad bar will remain open.