News:

11 March 2016 - Forum Rules

Main Menu

Compression in NGC's PAK files?

Started by Dashman, February 25, 2014, 09:14:47 AM

Previous topic - Next topic

Dashman

I'm trying to figure out how the text is compressed in Super Robot Wars GC. Who knows, maybe I can actually make a patch for the game instead of a bazillion texture files to translate it.

Looking at the PAK files in hex, I've kinda figured out how some of the ones with textures work, and now I want to see if I can get to the text. The iso has a lot of rXXXX.pak files, these are the main suspects of having it.

A PAK file has this inside:

Bytes 0 - 3: Number of files inside the PAK (there's PAK files that have other 8 PAK files inside, for example). In the rXXXX.pak ones, it's always 1.
Bytes 4 - 7: Seems to indicate the type of file. All the rXXXX.pak files and the font.pak have it set to 9.
Bytes 8 - 15: Name of the file
Byte 16: What seems to be a delimiter. It's value is always 5f
Bytes 17 - 19: Size of the file minus 25 (written in reverse, for example 'da 4b 07' should be read as 0x74bda). Might indicate the header is 25 bytes long.
Bytes 20 - 23: Unknown. Some key to decompress the text, maybe?
Bytes 24 - 27: Always 0.
Bytes 28 onwards: what seems to be compressed text.

Here's a couple of example files:

https://dl.dropboxusercontent.com/u/144016034/r001.pak
https://dl.dropboxusercontent.com/u/144016034/r002.pak

As you may have guessed, I'm pretty cueless when it comes to compression (and a lot of other things, to be honest). Can anybody help me with this, please? It might not even be text after all, but it's worth a try.

Zoinkity

The header is probably more like:
0x0    4    number of files
0x4    4    length of filename
0x8    var filename
var    4    compressed size (size of data)
var    4    decompressed size
var    var data

Without either tracing the code or a decompressed sample it would be hard to say exactly what compression format they're using.  It looks similiar to what you see with lzo1x, and it's easy enough to rule out the other common types (not zlib, not LZ77 derivatives, not lzh or arithmetic, etc.).  With a decompressed sample it would be obvious if it were LZO or not.

Dashman

I'll be damned, you're right about the length of the filename (including the '5f' end delimiter). I was confused because font.pak also had it set to 9, but that's after indexing (files inside have names like fXXXX.pak). Thanks, Zoinkity!

Getting my hands on a decompressed sample would be a bit difficult, and I don't think I can trace the code in Dolphin. I read in another thread that there's some DOL reader out there, but I can't seem to find any DOL file in the game's filesystem.

Isn't it possible to pass the data through several decoders and discard those that return garbage?

RetroHelix

I can only think of Signsrch 0.2.2 from http://aluigi.altervista.org/mytoolz.htm But somehow I don't think it will find anything.

Zoinkity

Well, the typical thing would be to use a debugger and clip the decompressed file out of memory.

Without a good sample it's difficult to know if the file decompressed properly.  Plus, it may in fact be the right method but with an annoying nuance that causes decompression to fail.  For instance, on the N64 Polaris SnoCross, the New Tetris, and a couple others use lzo1x but invert a flag used to backtrack -0x8000.  It makes them incompatible with the official tool but by changing that one test decompression works fine.

Turns out LZO is common on the GC though, so I'll try the data part through some existing code.

Dashman

Thanks for the suggestion, RetroHelix. You were right, it didn't find anything (or I'm not using the tool properly, who knows), but at least this guy has some interesting tools.

I'll have to seriously consider getting one of those memory debuggers. Thanks for the help, Zoinkity.

Dashman

I hope this doesn't count as necroposting...

Anyway, there's been a series of developments lately and we've managed to find most of the game's text uncompressed in some files I failed to notice before. However, decompressing pak files still feels necessary, as one of those most probably holds the missing text and who knows what else.

I finally learned that Dolphin has a debugger mode, so I used it to make a memory dump and found the uncompressed font.pak:

Compressed: https://dl.dropboxusercontent.com/u/144016034/font.pak

Uncompressed: https://dl.dropboxusercontent.com/u/144016034/font.bin

I'm positive it's the uncompressed contents of font.pak because its size is the same as indicated in the "uncompressed size" field in font.pak's header and its end was right before the contents of another uncompressed file containing text. Besides, it contains the font, as the name implied.

Now, going through the game's memory, I found this line:



It proves that at least something uses zlib compression in the game. So I downloaded the zlib library and made a little C program to try uncompressing font.pak based on the examples given on the official site. The program crashed (and most probably leaked memory) when it entered the uncompress section. I googled around and found what's supposed to be a working C++ example of a decompressor. It didn't crash, but it doesn't accept the data I give it as valid no matter how I call the function.

Here's the code of the C++ file: https://dl.dropboxusercontent.com/u/144016034/zpipe_custom2.zip

So now I'm in doubt. Am I doing something really wrong in this decompression code or I'm simply trying to decompress a not-zlib-compressed file?

RetroHelix

If you just want to test decompressing, you can use Quickbms and a simple script (Tutorial: http://forum.xentax.com/viewtopic.php?p=29963, Example http://aluigi.altervista.org/papers/bms/zip.bms). Quickbms supports zlib by Mark Adler and also many other compression types. Just use the comtype command to specify the compression you want to use. By default its zlib.

This quickbms script would decompress a file using zlib:

#comtype deflate #if commented out the default zlib compression is used.
get ZSIZE long #compressed size
get SIZE long #uncompressed size
get OFFSET long #offset of where to start
clog NAME OFFSET ZSIZE SIZE

Dashman

#8
I had no idea something like this existed! :o Thanks for showing me :)

I tried your script on font.pak and it didn't work - taking the offset value left 4 bytes out of the file and ZSIZE became 4 bytes too big. I adjusted the script to take everything and I got the same errors I was getting in my little program, so I guess I should discard zlib on this one. I'll see if I can find some other scripts to work around other known compression types like lzop.

I've been reading around the quickbms homepage and I've found somebody had already made a script that extracts the files listed inside indexed PAK files. It works wonderfully well, I must say.

Edit:
I've run the script that tries to decompress a file against 500 known methods for different versions of the file (basically taking parts of the header out) and none of those have worked. I guess there's a trick to it that I'm not seeing.

RetroHelix

Quote from: Dashman on June 08, 2014, 08:20:54 AM
I've been reading around the quickbms homepage and I've found somebody had already made a script that extracts the files listed inside indexed PAK files. It works wonderfully well, I must say.
Good to hear :) Does this mean you can extract your font from font.pak? Or is this another kind of pak file?

Dashman

It's another kind of pak, sadly. Some pak files like font.pak only have one file listed inside (indicated by the first 4 bytes - "01 00 00 00" in font.pak), but there's others with several files. These ones typically have an index with name, offset and file size per entry. The script I've found (it has to be Aluigi's) recognizes that structure and extracts the files inside that pak, although it does nothing about their compression.

I'll probably not use it though, since the pak file I needed to split (bpilot.pak) has a couple of dummy files that are repeated several times without following a recognizable pattern, so I'll need a list of extracted files in order for when I merge the modified files back together... aaaand now I'm wondering if quickbms can actually do that. I'll have to read more about how to make scripts for it.