News:

11 March 2016 - Forum Rules

Main Menu

Nihon Falcom PC Game File Compression

Started by Joshua, February 03, 2014, 08:44:09 PM

Previous topic - Next topic

Joshua

I've been picking apart a certain Nihon Falcom PC game, potentially to start a translation project for. The biggest hurdle that I've hit so far is the compression that is being used.

The game's EXE makes references to zlib, but the files don't seem to be compressed with it. I'm thinking the zlib mention might be connected to libpng, which the game does use.

I ran one of the files through the compression scan that QuickBMS has, all of the algorithm tests from that turned up negative.

There is a tool that can decompress the files (not compress, though) that was made by a Japanese programmer (http://www.pokanchan.com/dokuwiki/software/falcnvrt/start).

I've been combing through the files, comparing the compressed version with the decompressed version. I haven't managed to find anything beyond the fact that it seems to be a light compression. What I mean by this is that a good portion of the beginning of the files and smaller portions scattered throughout the file have partially visible text.

I'm going to keep poking at the game to see if I can figure it out. In the meantime I'm wondering if anyone on here with more experience dealing with compression algorithms might be able to see something that I haven't... Any help is appreciated.

Here is one of the smaller files with text (compressed and decompressed versions included):
https://mega.co.nz/#!x9BEmYoR!wYEoxKJafGqhH0U21lh-eliZrilu48pk6p6VD9fat4U

CUE

#1
That seems like a simple LZSS compression starting @ offset 9

Joshua

Quote from: CUE on February 04, 2014, 03:11:24 AM
That seems like a simple LZSS compression starting @ offset 9

I did some research on LZSS and put together a simple program to decompress, but I can't seem to get it to work.

http://pastebin.com/03M8ZDiG

I'm a little stumped at the moment. 

Pokeytax

CUE is right that this looks like a simple LZSS. You can tell because it starts out much like the compressed file and gradually deteriorates, and because original bytes are clearly present throughout. If you could see original bytes but it was messy throughout, it might be DTE, and if it were just a mess of random noise, you would guess some kind of entropy coding (e.g. Huffman... or Huffman on top of LZSS!).

I can't debug your code for you, but it looks like it might be designed for bitwise LZSS compression. From a general computer science perspective this is more common, but video games tend to use bytewise LZSS compression. You can tell that this is bytewise because the original bytes are still legible. The difference is that bitwise LZSS compression tells the decompressor whether the following byte is a literal or compression code every single time with a single bit, while bytewise compression stores chunks of codes and sends them bytes at a time for easier processing. If it were bitwise, it would very quickly dissolve into noise as soon as a bit was shifted and we got "out of sync" with the original ordering.

Bytewise LZSS compression frequently is split up into three kinds of "codes":

literals, chunks of hex that appear verbatim in the decompressed file;
bitfields, one or two byte segments that tell the decompressor how to treat the next chunk of following codes;
compression, codes that tell the decompressor how far to go back and/or how long a string of hex to copy

Someone else may have a better way, but generally I have attacked these schemes like ciphers. For example, here's my attempt to parse your file:



On the left is the compressed version, and on the right is the decompressed version. It can be hard at first, and you'll need to start just by separating it into "literals" and "non-literals". (In fact, examining this, it's clear that the last three lines should be "bitfield compressed literal", not "compressed bitfield literal".) but with this you can start to guess at what your bitfield codes mean.

If you're comfortable with assembly, examining the decompression routine in action with a debugger might be faster... but to be honest I kind of like puzzling it out this way.

My impression is that it looks like the compression code just tells the decompressor how far back to go ("distance"), not how long to copy ("length"), so the "length" may be buried in the bitfields.

If that's confusing, sorry. It's just a complicated subject. But maybe that will give you some basis to go on.

esperknight

#4
I'll tell you right now this is not a simple LZSS compression :D  Falcom are the devils!  Overall, yes it is LZSS but they do some tricky stuff depending on certain flags and such.  I actually have a working recompressor that'll I'll gladly share : http://www.mediafire.com/view/iboe6if362ik1gi/falcom_compress.cpp  This is geared towards Xanadu 2 for PCE but they haven't changed there compression scheme at all since then (I can verify this by using my decompressor I wrote on a certain PSP game that came out not to long ago by them ;)

This is the decompressor here : http://www.mediafire.com/view/v0xkd9jhb6yzaxx/x2_data_uncomp.cpp  This works as well on all of them too (this has been tested on multiple Falcom games for PC and the one PSP :) although geared towards Xanadu 2 PCE as well.  Be warned my uncomp code sucks as I wrote it a LONG time ago when I was still learning so yeah.... my compression code is more recent so much better (although still messy of course... too lazy to clean it up :) )

The only real difference between Xanadu 2 if I call right was that it started with a leading 0 or some such but otherwise the compression is the same overall.

Can I ask what game you are looking at?  I've looked at few on PC, Dinosaur Resurrection and some others (I forget...) so I may be able to help.

Edit : Thinking more on it, I think Dinosaur Ressurection used a normal LZSS scheme for decompression while the others did not... I recall just hacking out the header from it and feeding it into the original lzss decompressor (http://dev.gameres.com/Program/Other/LZSS.C) and it looking good.  The others on the other hand used the custom one they wrote.  I could be wrong as my memory is hazy for the PC ones I looked at as it's been quite a while.

Edit: Eiyū Densetsu III is the one I looked at a while ago that used it and the PSP game as well (not Trails in the Sky :) )

Joshua

Quote from: esperknight on February 06, 2014, 09:46:34 PM
I'll tell you right now this is not a simple LZSS compression :D  Falcom are the devils!  Overall, yes it is LZSS but they do some tricky stuff depending on certain flags and such.  I actually have a working recompressor that'll I'll gladly share : http://www.mediafire.com/view/iboe6if362ik1gi/falcom_compress.cpp  This is geared towards Xanadu 2 for PCE but they haven't changed there compression scheme at all since then (I can verify this by using my decompressor I wrote on a certain PSP game that came out not to long ago by them ;)

This is the decompressor here : http://www.mediafire.com/view/v0xkd9jhb6yzaxx/x2_data_uncomp.cpp  This works as well on all of them too (this has been tested on multiple Falcom games for PC and the one PSP :) although geared towards Xanadu 2 PCE as well.  Be warned my uncomp code sucks as I wrote it a LONG time ago when I was still learning so yeah.... my compression code is more recent so much better (although still messy of course... too lazy to clean it up :) )

The only real difference between Xanadu 2 if I call right was that it started with a leading 0 or some such but otherwise the compression is the same overall.

Can I ask what game you are looking at?  I've looked at few on PC, Dinosaur Resurrection and some others (I forget...) so I may be able to help.

Edit : Thinking more on it, I think Dinosaur Ressurection used a normal LZSS scheme for decompression while the others did not... I recall just hacking out the header from it and feeding it into the original lzss decompressor (http://dev.gameres.com/Program/Other/LZSS.C) and it looking good.  The others on the other hand used the custom one they wrote.  I could be wrong as my memory is hazy for the PC ones I looked at as it's been quite a while.

Edit: Eiyū Densetsu III is the one I looked at a while ago that used it and the PSP game as well (not Trails in the Sky :) )

Thanks for those.

I haven't tried the decompression tool yet, because your program seems to want scripts.txt and unc.txt (I'm not very familiar with C++, I program with C# myself), I'll probably end up seeing if I can convert it over to C# to integrate it into the tool I've been making for the game (Only if you don't mind, I'd give you proper credit, of course).

I did try the compression tool on a decompressed file. The resulting file was similar to the original compressed file, but a little larger in file size. For the file that I included in the first post (T_TITLE._DT), the original compressed size = 764 bytes, re-compressed size = 832 bytes.

Anyway, the game I've been working on is The Legend of Heroes: Trails in the Sky the 3rd (Eiyū Densetsu Sora no Kiseki The 3rd).

So far, really the only two hurdles I have is the compression and then converting the graphic files to an editable format (BMP or PNG) and then back to the original format. That Japanese developed tool that I mentioned before can convert them to BMP and PNG, but not back again.

My tool, at the moment, can dump all of the files contained in the .dat files and then re-insert them. So I've gotta get the compression/decompression implemented, then I can add in text extraction/insertion and image file conversion.

esperknight

Been a while since I've looked at the decompressor, scripts.txt is the list of files you want it to try to decompress and unc.txt is just the list of what it uncompressed too which later fed into my script extraction tool.  I code in C# too so any help I can provide let me know.  I interchange between the two depending on what I feel like coding in :)

And feel free to do what you like with them as well.  If you wouldn't mind sharing the C# version that could help others as well.

Also far as size difference, it's always a pain to get it exact.  My rule of thumb is as long as it works and it's decent in compressing, I'm good ;)  That one was a pain to redo due to all the flags and such needing ot be set... but it works great so far *crosses fingers*

Also I'm thinking I didn't add anything particular to Xanadu 2 to the compressor...  I think I left that to my packer since this tool was going to be used for others... but I could be wrong so hey ;)

Converting them back could be interesting... do you have some samples I could look at?  Graphics I suck at but I may be able to help out.  Nothing else I can help look at how the game is doing it and help dissamble it and everything.

Joshua

It seems like the decompression and compression is going to need some work.

Decompression seemed to go somewhat ok. The text became fully readable, but there was portions of the file that came out differently compared to the other decompressed file. Decompression also seems to add a bunch of empty padding.

Still, I changed a string of text and then used the compression tool. This is where it hit a problem. I was working on a larger text file that has the items in it. Original compressed size = 24,615 bytes / New compressed size = 37,326 bytes. Regardless, I thought I'd give it a shot anyways. So I re-inserted the file into the .dat and I updated the hex value before the file that is the compressed file size.

The game simply closes immediately when launched. The same thing happened back when I experimented with re-inserting decompressed data. It's not that there isn't enough room in the .dat. There is plenty of padding, so the .dat remains the same size. It's just the game either doesn't like the individual files to be larger than they were originally or because the compression is outputting a slightly different file and it's having trouble decompressing the file when the game is launched.

And sure, I'll post some of the graphic files tomorrow.

esperknight

Figures :)  Mine could have a bug with it though although it did work well with Xanadu 2.  Very possible though.

I can help with this although I'll need to debug the game and take a look at the routine and see what's up.  Shouldn't be too bad though (heh I say this now...)  If you're interested in doing it as well I can provide tips and what not.  Normally I use OllyDbg for this bug IDA works nicely as well and there is a free version of that that you can use too.

Far as reinsertion, I'd go with the latter. I'd say if my decompressor is borking out on it then the recompressor will as well since it's coded for Xanadu 2 specifically.  I'm wondering now if I tweaked it for the other games I looked at...  It's very possible I did and forgot... I can't find all my code for them so not quite sure.  But no worries at least, I'm betting it is the compression being off.  The only bad thing is updating my decompression code will be a pain as it's written pretty bad but ah well.  Could give me an excuse to make it a bit better :)

Edit: Forgot to ask, do you have a translator lined up already or will you be doing it?  Curious :)

Joshua

#9
Yeah, the compression/decompression programs you gave me are definitely close to being right in that it makes the files more legible. I've been poking at the decompression program, trying to figure out what does what.

I did spend a good deal of time in IDA looking for the decompression routine. I'm new to assembly, so I wasn't able to find it. I even ran that tool that I linked to in the original post through IDA to see if I could find out how their program does it. I didn't have any luck with that route, either. And that tool is for a large variety of Falcom games, so that might not be the easiest way.

I don't have a translator yet. I was waiting until I knew for sure that there wouldn't be any hurdles I couldn't get past.

I do know it's going to be a very big translation effort for whoever does do it. This series is extremely text heavy.  :P

Edit: Here are those graphic file samples that I promised. I included two decompressed files in their original format and their bitmap versions, as well: https://mega.co.nz/#!o9IAmA6B!m3jIzYuSlSX7055vuHd3rChlwKRDCn6OWiGWkLJvuZo

esperknight

#10
The way to do it in general is you want to have it show you all the function calls it makes such as CreateFile, ReadFile, and others.  From there I'd place a breakpoint on CreateFile and then have it run until it hits the file you know it's going to read from.  Then place a breakpoint on ReadFile and jump in memory to where it reads it, then have it step over that and then put a breakpoint on memory and then have it run till hits those breakpoints (in mem).  Then you'll be on the start or middle of the routine depending on where you set the break points at.  This is just in general.  With IDA you can do this but I usually use OllyDbg (http://www.ollydbg.de/) but it's about the same.  OllyDbg I can probably guide you through easier if you want to try it out :)

Far as CreateFile, ReadFile and etc those are all windows library calls which you can view doing the same lookups like CreateFile and ReadFile and another good one is SetFilePointer which sets the file position to read from (I believe... I forgot 0_0  Easier when looking at the list in OllyDbg :) ) You need to remember too that parameters are pushed backwards, so the last one to be pushed is the first one to be used.  OllyDbg (at least v2.x which is what i use mostly) will mark them for you though so it's not a huge deal, IDA should too.

Far as the graphic samples, I suspect those are raw graphics.  The question then is, where does it get the Width and Height from?  What file did they come from?  The reason I suspect raw graphics is the BMPs aren't indexed (i.e 256 colors) and there about the same size as the raw ones.  So I suspect the raw ones are most likely A, R, G, B or some such (mix those how you like).  And then the question is, does it place them top to bottom or left to right or even backwards?  The easiest way to figure that out is is mess up one of the graphics nicely, say a line of 0xFFs and see how it looks in game.  Make sure to pick an easy one to get to.  You should see a line of white either going up and down or left and right and that'll give you an idea of where to start.  You'll still need to figure out how to at least get the height or width as then you can calculate the width or height from there.  I have a program that may help with this in C# but I need to look where it's at.  I used it for Shigami no Shiro as ALL of the text was graphical and I had to the same technique I described here.

Of course I'll help as I can as now I can do some looking into the game now too :)

Edit :  Found the code for Shiki : http://www.mediafire.com/view/gomfjoya4a0l4mn/shiki_text_png_out.cs  Bit of a mess as I only needed it for one graphic :)  (Well, this particular one as I just needed it for the title screen.  The text I did in a different but wrong way... but it worked well enough ;) )  Shiki used palette files but I think yours won't so you can instead read from the file itself for the bytes to use for the colors and use that to set your pixels.

Joshua

#11
I'll definitely look into those things more tomorrow. I've been working on converting the decompression program to C#, so I can then mess around with it in a more familiar language.  :)

Edit: Sooo... I decided to take a look in IDA again and I was scrolling down the list of functions and came across a function named "decomp_". I'm not sure what else that could be, other than the decompression routine. I'm gonna say I was tired and that's my excuse for not seeing it the last time that I looked. lol

I'll definitely be looking into this more.

Pennywise

I actually bought this game from Falcom when it first came out and incredibly I played through most of the game which is something I hardly do with imports. I even recorded videos of the characters doing their special moves.

Anyhow, I hope something can happen with this project and consequently justify my purchase of the game. Good luck!

esperknight

#13
Quote from: Joshua on February 07, 2014, 11:35:26 PM
Edit: Sooo... I decided to take a look in IDA again and I was scrolling down the list of functions and came across a function named "decomp_". I'm not sure what else that could be, other than the decompression routine. I'm gonna say I was tired and that's my excuse for not seeing it the last time that I looked. lol

You never know, you could get lucky :)  It's possible debug symbols were left on when doing the compilation so some function names could be left in there.  IDA seems to pick up on these better then OllyDbg does or maybe I just don't pay attention enough...

So did a bit of quick debugging and noticed it doesn't decompress the text till after it shows the location title, the Airship one.  That next file open (I think it was DT21?) is when it then reads in the compressed text and decompresses it.  Course you'll know which file it is based on where you extracted it from :)  The way I figured this out (and I forgot to mention it) is I use TSearch (note, I didn't download it from her so hopefully it's good).  TSearch is an easy to use tool for memory searches.  Ollydbg can do it and so can IDA but I started using TSearch and just stuck with it.  Just pop in the text you're looking for (and it does support SJIS) or the hex bytes and search for it and voila!  I usually do it after certain parts to get an idea of where it loads such as before a title screen, after, during a loading screen, after some text shows up (never know if it loads a bit at a time or some such), etc.  Very useful :)

And taking a look at ED3... I'm thinking I have to be wrong about my decompressor working on it as it should be the same setup as this one... but never know.  Then again Dinosaur Resurrection does too (at least I'm pretty sure.... :) ) and I recall that distinctly using a regular LZSS decompressor.  So we'll see :)

Edit : Scratch that, ED3 is different.  For ED3, the first 4 bytes of a block are the size of the compressed block whiel ED6 doesn't have that (as I suspect it's stored in the corresponding DIR file).  So it may be that ED3 uses an older version while ED6 does use a newer version.

Joshua

So I ran into a problem with converting the decompression program from C++ to C#. I'll need to deduce how to go about doing what the "uncompress" method does in a way that is valid for C#. At the moment, it uses negative array indexes and C# simply doesn't allow that.

I'm not convinced that the "decomp_" function that I found is actually the decompression routine. So far, it doesn't seem connected to CreateFileA, CreateFileW, ReadFile, or SetFilePointer and placing a breakpoint in the function has no effect.

And yeah, the first text is when the guy on the airship starts talking (DT21 contains all of the script files). The "text" that shows up on the screen before that is image-based. Decompression would kick in for that, too. I believe the only uncompressed data is (of course) the EXE (which has a sizable amount of text) and the movies which are AVI format.

flame

This one is my own work.
I got a good hint from Kelebek at GBAtemp forum to help me solve it.
The hint from Kelebek was to copy the MIPS code but in Python, so you will see register names in the code. The other guy's code is better because it has actual descriptive names for the variables, etc....

I made a similar program that works with Nayuta no Kiseki.
This format is a little more simple, so I simplified the program I wrote for Nayuta to work with this.
It will decompress your T_TITLE._DT file without any problem; not sure about other files.
Written in Python 2:
http://pastebin.com/mNEYS5Cb

You should try it with some other files to see if it will work.
Do you need help re-compressing (fake compressing)?

Please mark it SOLVED if your problem has been taken care of.